Jared Sulzdorf

@j_sulz

I like pretty things, functional things, funny things, food things, and computer things. Not necessarily in that order. Making things go fast @huggingface

Seattle, WA

Joined September 2008

285Following

382Followers

Jared Sulzdorf Retweeted

Quentin Lhoest 🤗@lhoestq · Jul 25

New blog post 🚨 Every data engineer should read it @kszucs_ (@ApacheArrow PMC) announces how to drastically speed up Parquet files uploads and downloads. Yes, it can easily outspeed S3. Best part: the feature enabling this is open source Link in 🧵

955

Jared Sulzdorf Retweeted

Quentin Lhoest 🤗@lhoestq · Jul 21

A new Pandas feature landed 3 days ago and no one noticed. Upload ONLY THE NEW DATA to dedupe-based storage like @huggingface (Xet). Data that already exist in other files don't need to be uploaded. Possible thanks to the recent addition of Content Defined Chunking for Parquet.

16.0K

Jared Sulzdorf Retweeted

Quentin Lhoest 🤗@lhoestq · May 27

Xet is now the default storage for new builders on @huggingface ! What it means for 🤗Datasets: - Deduplicated downloads and uploads for speed⚡ - Works with the new Parquet CDC writer, robust to insert/delete/edits 💪 @ApacheParquet has a bright future on HF :)

6.0K

Jared Sulzdorf Retweeted

Quentin Lhoest 🤗@lhoestq · May 16

CDC Parquet writer is out in PyArrow nightlies 🔥🔥 $ pip install \ -i pypi.anaconda.org/scientific-pyt… \ "pyarrow>=21.0.0.dev0" it's changing the way I view data versioning👇

3.0K