Delta Lake

@DeltaLakeOSS

Delta Lake is an open-source storage framework that enables building a Lakehouse architecture for Spark, Flink, Trino, Hive, Scala, Java, Rust, Python, & more!

San Francisco, CA

Joined April 2019

67Following

10KFollowers

Delta Lake@DeltaLakeOSS · Jul 24

Curious about how Delta Lake structures its tables and achieves both scalability and reliability? In our recent webinar, Scott Haines breaks down the building blocks of a #DeltaLake table: 🔹 Partitioned storage — boosts performance and query efficiency 🔹 Parquet files —…

393

Delta Lake@DeltaLakeOSS · Jul 22

🎥 Now Available: Delta-Kernel-RS — Unparalleled Interoperability Across Query Engines At #DataAISummit, Robert Pack and Zach Schuermann (@Databricks) introduced Delta-Kernel-RS — a new #Rust implementation of the Delta Lake protocol designed for unparalleled interoperability…

DeltaLakeOSS's tweet image. 🎥 Now Available: Delta-Kernel-RS — Unparalleled Interoperability Across Query Engines

At #DataAISummit, Robert Pack and Zach Schuermann (@Databricks) introduced Delta-Kernel-RS — a new #Rust implementation of the Delta Lake protocol designed for unparalleled interoperability…

415

Delta Lake@DeltaLakeOSS · Jul 21

In this walkthrough, ChanChan Mao with @daftengine shows how #Daft and #DeltaLake can work together by merging complex datasets (image metadata, annotations, categories) and writing them natively to Delta Lake, complete with partitioning and Delta log tracking—no #JVM, no #Spark…

510

Delta Lake@DeltaLakeOSS · Jul 18

In this clip, Ion Koutsouris (Rustacean & Delta Maintainer) explains how @lakeFS garbage collection policy integrates with #DeltaLake to manage unreferenced files and automate cleanups. By leveraging Spark jobs, lakeFS ensures your Delta Lake storage remains lean and organized.…

391

Delta Lake@DeltaLakeOSS · Jul 18

“We cut our streaming ingestion costs over 90% by adopting kafka-delta-ingest, which means we can invest those savings in really interesting large language model products or innovative data processes.” — R. Tyler Croy, Principal Engineer at @Scribd & Maintainer of delta-rs 💬…

596

Delta Lake@DeltaLakeOSS · Jul 17

🚀 From Delta Lake to downstream ML pipelines — without Spark? In this clip, Daniel Beach shares a real production use case where he used Polars inside an Apache Airflow worker to read from a @unitycatalog_io–backed #DeltaLake table, filter job-specific records, and dynamically…

567

Delta Lake@DeltaLakeOSS · Jul 15

"With over three petabytes of processed data and more than 1,200 active users, our Lakehouse platform powered by Delta Lake is at the core of how we drive insights at scale." - Satya Mandavilli, Solutions Architect at @SPGlobal Learn how S&P Global puts Delta Lake at the center…

683

Delta Lake@DeltaLakeOSS · Jul 14

In this blog, R. Tyler Croy (Buoyant Data) shares how re-architecting his data pipeline with Rust and the oxbow architecture for Delta Lake writes reduced resource usage to just 1% of the previous setup: a massive win for both cost and sustainability! 🌱🌐 Academic research and…

DeltaLakeOSS's tweet image. In this blog, R. Tyler Croy (Buoyant Data) shares how re-architecting his data pipeline with Rust and the oxbow architecture for Delta Lake writes reduced resource usage to just 1% of the previous setup: a massive win for both cost and sustainability! 🌱🌐

Academic research and…

3.0K

Delta Lake@DeltaLakeOSS · Jul 9

In this clip, @YoussefMrini explains how deletion vectors in #DeltaLake help you avoid rewriting #Parquet files for every update or delete. Instead, a bitmap marks which rows are deleted, so files are only rewritten when necessary. This approach minimizes repeated file rewrites…

840

Delta Lake@DeltaLakeOSS · Jul 7

Ever wondered how @lakeFS integrates with Delta Lake? In this short clip, Ion Koutsouris walks through the process. ✅ With lakeFS and Delta Lake, your data changes are managed safely—even when multiple people are working at the same time. Conflicts are handled automatically, so…

764

Delta Lake@DeltaLakeOSS · Jul 2

Join the conversation on Delta Lake GitHub Discussions! 💬 Have questions, ideas, or updates about Delta Lake? Curious about community decisions? 🤔 GitHub Discussions is the place for open conversations, Q&A, and community-driven collaboration around Delta Lake. ✅ Ask and…

DeltaLakeOSS's tweet image. Join the conversation on Delta Lake GitHub Discussions! 💬

Have questions, ideas, or updates about Delta Lake? Curious about community decisions? 🤔 GitHub Discussions is the place for open conversations, Q&amp;A, and community-driven collaboration around Delta Lake.

✅ Ask and…

541

Delta Lake Retweeted

Daft@daftengine · Jun 30

Thank you all for your love and support after last week’s funding announcement, it means everything to us. This week, let’s highlight ways you can interact with and use Daft. Starting off with @DeltaLakeOSS, Daft internally uses the deltalake Python package to fetch metadata…

499

Delta Lake@DeltaLakeOSS · Jun 30

Announcing: Row Tracking Write Support in Delta Kernel Java 🚀 The latest release of Delta Kernel Java introduces support for writing to row tracking-enabled tables! ✅ Track individual rows: Accurately identify which rows have been inserted, updated, or deleted in your tables.…

DeltaLakeOSS's tweet image. Announcing: Row Tracking Write Support in Delta Kernel Java 🚀

The latest release of Delta Kernel Java introduces support for writing to row tracking-enabled tables!

✅ Track individual rows: Accurately identify which rows have been inserted, updated, or deleted in your tables.…

991

Delta Lake@DeltaLakeOSS · Jun 25

𝗗𝗲𝗹𝘁𝗮 𝗟𝗮𝗸𝗲 𝟰.𝟬 𝗶𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗲𝘀 𝗗𝗲𝗹𝘁𝗮 𝗖𝗼𝗻𝗻𝗲𝗰𝘁 — bringing Delta Lake support to Spark Connect for Apache Spark! 🚀 With Delta Connect, you can run all #DeltaLake operations remotely from lightweight clients, thanks to a decoupled client-server…

DeltaLakeOSS's tweet image. 𝗗𝗲𝗹𝘁𝗮 𝗟𝗮𝗸𝗲 𝟰.𝟬 𝗶𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗲𝘀 𝗗𝗲𝗹𝘁𝗮 𝗖𝗼𝗻𝗻𝗲𝗰𝘁 — bringing Delta Lake support to Spark Connect for Apache Spark! 🚀

With Delta Connect, you can run all #DeltaLake operations remotely from lightweight clients, thanks to a decoupled client-server…

826

Delta Lake@DeltaLakeOSS · Jun 23

⚡ Supercharging Customer-Facing Analytics with Delta Kernel + @StarRocksLabs Delta Lake’s Delta Kernel offers a robust set of Engine APIs to make data access faster and more efficient. StarRocks taps into these APIs to power smart caching techniques that: 🔹 Eliminate redundant…

DeltaLakeOSS's tweet image. ⚡ Supercharging Customer-Facing Analytics with Delta Kernel + @StarRocksLabs

Delta Lake’s Delta Kernel offers a robust set of Engine APIs to make data access faster and more efficient. StarRocks taps into these APIs to power smart caching techniques that:
🔹 Eliminate redundant…

950

Delta Lake@DeltaLakeOSS · Jun 18

DNB achieved a 90% cost reduction by adopting a serverless pipeline using Delta Lake, @DuckDB, @ApacheArrow, kafka-delta-ingest (github.com/delta-io/kafka…), and Azure Container App Jobs. 🙌⭐ This new approach leverages Delta’s transaction identifiers for efficient, stateful…

DeltaLakeOSS's tweet image. DNB achieved a 90% cost reduction by adopting a serverless pipeline using Delta Lake, @DuckDB, @ApacheArrow, kafka-delta-ingest (github.com/delta-io/kafka…), and Azure Container App Jobs. 🙌⭐

This new approach leverages Delta’s transaction identifiers for efficient, stateful…

1.0K