Get Data Lakehouse Books:
- Apache Iceberg: The Definitive Guide
- Apache Polaris: The Defintive Guide
- Architecting an Apache Iceberg Lakehouse
- The Apache Iceberg Digest: Vol. 1
Lakehouse Community:
- Join the Data Lakehouse Community
- Data Lakehouse Blog Roll
- OSS Community Listings
- Dremio Lakehouse Developer Hub
This week brought major governance milestones, technical proposals for Iceberg V4, and release activity across the Apache data lakehouse ecosystem. Polaris moved closer to top-level project status. Iceberg tackled metadata management and AI-generated contribution guidelines. Arrow shipped a patch release, and Parquet continued its push toward geospatial blog documentation.
Apache Iceberg
[DISCUSS] metadata.json in V4?
Anton Okolnychyi started a high-signal discussion on February 10 about making the root metadata JSON file optional in Iceberg V4. The problem: writing metadata.json on every commit hurts streaming write performance, especially with HMS or Hadoop catalogs. Anton proposed two paths. One is to let catalogs skip writing the file. The other is to offload parts of it to external files. Yufei Gu raised portability concerns, noting that Spark's static tables and driver still read this file from storage. Prashant Singh pointed to related prior work on optimizing both reads and writes. This discussion will shape how V4 handles fast commits for real-time workloads. (Thread)
[VOTE] Guidelines for AI-Generated Contributions
Junwang Zhao opened a vote on February 10 to formalize guidelines for AI-generated contributions to the Iceberg project. The proposed guidelines give maintainers a clear reference when evaluating PRs and deciding whether to close AI-authored submissions. Gang Wu and Anurag Mantripragada both cast +1 votes. This reflects growing community attention to AI tooling in open source. (Thread)
[VOTE RESULT] Data Access Delegation for registerTable
Alexandre Dutra announced the vote passed on February 9 with 10 +1 votes (6 binding, 4 non-binding) and no objections. The change adds data access delegation support to the registerTable REST endpoint. The related PR is now ready for merging. (Thread)
[DISCUSS] Race Condition in Snapshot Expiration
Krutika Dhananjay raised a concurrency bug involving snapshot expiration and concurrent ref additions. The race window occurs between when ExpireSnapshots computes candidate snapshots and when the commit executes. A client can add a branch ref to a snapshot in that window, and the maintenance job may then remove it. The iceberg-go project already addressed this with a fix in MetadataBuilder.RemoveSnapshots(). Amogh Jahagirdar responded that the existing reachability logic should prevent this, and asked for a reproducible test case. (Thread)
[DISCUSS] Refreshing Storage Credentials for Staged Table Creation
Maninder Parmar proposed a new mechanism for refreshing storage credentials during staged (CTAS) table creation. Today, staged tables are invisible to the loadTable and credential endpoints, so credentials cannot be refreshed. Huaxin Gao asked follow-up questions about session expiration and cleanup. (Thread)
Iceberg Index Support Sync
The first dedicated sync meeting for Iceberg's native index support happened today (February 11), organized by Huaxin Gao and Steven Wu. The meeting covers the index design doc and implementation planning, a key V4 feature. (Thread)
Apache Polaris
Graduation Vote and IPMC Discussion
The biggest Polaris news this week: the project is on the verge of becoming a top-level Apache project. Russell Spitzer submitted the formal IPMC graduation vote on February 3. The PPMC consensus vote received 27 +1 votes. Jean-Baptiste Onofré, the proposed PMC Chair, reiterated that the project has reached strong maturity across its six releases (0.9 through 1.3.0), 100+ contributors, and 2,819 merged PRs. The initial vote from Dave Fisher requested a separate [DISCUSSION] thread on the general@ list, which was addressed. JB reminded IPMC members to weigh in before the formal vote closes. Polaris 1.3.0 shipped on January 9 with generic table GA, improved cloud integration tests, and simplified event hooks. (Vote Thread)
S3 Credential Vending Without STS
A continuing discussion from earlier weeks drew a new reply from a Backblaze engineer. The thread explores options for vending S3-compatible credentials in environments without AWS STS. Two proposals are on the table: vending the same credentials Polaris uses, or managing a separate client credential pair. The Backblaze contributor expressed interest in an S3 signing approach for non-AWS storage. (Thread)
Apache Arrow
Arrow 23.0.0 Release Announcement
Apache Arrow 23.0.0 was announced on January 27 with 336 resolved issues. This major release spans C++, Python, Java, R, and Go bindings. The release blog covers the full changelog. (Thread)
Arrow Rust 57.3.0 Release Vote
Andrew Lamb proposed Arrow Rust 57.3.0-rc1 on February 3 as a patch release. The vote completed and the RC was cleaned up by February 6. (Thread)
IPC Stream Multiplexing Discussion
Rusty Conover continued a discussion on IPC stream multiplexing, pushing back on the suggestion to use QUIC. His use case requires explicit ordering across batches from different logical streams, not just independent delivery. This is a niche but interesting format-level proposal for interleaving Arrow schemas in a single IPC channel. (Thread)
Security Model Published
The Arrow PMC published a formal security model for the project on February 5. The blog post and documentation clarify how Arrow handles security considerations across its libraries. (Commit)
GSoC Interest
A student named Prasanna expressed interest in contributing to Arrow through Google Summer of Code 2026. The message was redirected to the Arrow dev list from another project. (Thread)
Apache Parquet
Parquet Java 1.17.0 Released
Fokko Driesprong announced the release of Parquet Java 1.17.0 on January 13, following a successful vote that closed in mid-January. This is the latest stable Java release of the format library. (Thread)
Geospatial Blog Post in Progress
Andrew Lamb opened a PR for a geospatial blog post on the Parquet website. A reviewer noted that geospatial column statistics are not yet integrated in all engines, though the Rust Parquet implementation and SedonaDB already handle them. The discussion highlighted the need to update the implementation status page to reflect actual engine support. (Thread)
ALP Encoding Spec Progressing
The ALP (Adaptive Lossless floating-Point) encoding discussion gained momentum. Contributors discussed whether the spec should land as a PR against the parquet-format repo. Julien Le Dem asked who still needs to review the document before finalization. (Thread)
file_path Deprecation Merged
Micah Kornfield confirmed that the PR deprecating the file_path field in column chunk metadata had several approvals and was ready to merge. The change discourages use of external column references in favor of table-level handling, simplifying the Parquet file scope. (Thread)
New PMC Member: Andrew Lamb
Julien Le Dem announced that Andrew Lamb joined the Parquet PMC on January 21. Lamb has been instrumental in governance discussions and cross-project collaboration efforts. (Thread)
Cross-Project Themes
Two themes dominated this week. The first is governance maturity. Polaris is about to graduate. Iceberg is voting on AI contribution guidelines and recently adopted the SQL UDF spec. Parquet welcomed a new PMC member. These projects are growing not just technically, but organizationally.
The second theme is V4 planning for Iceberg. The metadata.json discussion, column-level update proposals, and index support sync all point toward a table format that is faster for streaming, better for wide-table ML workloads, and less dependent on legacy catalog patterns. Parquet's own format evolution (ALP encoding, geospatial types, file_path deprecation) directly supports this trajectory.
Looking Ahead
Watch for the Polaris graduation vote result this week. The Iceberg index support sync may produce concrete design decisions. Expect continued V4 discussion threads and movement on the AI contribution guidelines vote. On the Parquet side, look for the geospatial and variant blog posts to land on parquet.apache.org soon.
Top comments (0)