Tom Nicholas
@TEGNicholasCode
OSS for science, #python, @pangeo_data, @xarray_dev team. Ex-fusion, oceanography, now @_cworthy 🦋 https://bsky.app/profile/tegnicholas.bsky.social
At AGU I talked to NASA people about how agencies could better support open-source tools they rely on. I argued that our recent collaboration between Xarray and NASA ESDIS on xarray.DataTree was a good model to copy - read about how it happened here! xarray.dev/blog/datatree
That said, it isn't 100% clear that NASA's best move is to immediately convert 10000+ data sets into cutting edge ARCO formats. Kerchunk and Virtual Zarr offer benefits of ARCO while keeping data in the native formats.
I'll also be there if you want to join me working on @xarray_dev , DataTree, or VirtualiZarr!
Are you heading to #AGU24 next month? Consider joining us for a bonus day of hacking on @pangeo_data. I'll be there representing @EarthmoverHQ and helping folks work with #icechunk and @zarr_dev. Details and signup here: discourse.pangeo.io/t/post-agu-pan…
Science needs a social network for sharing big data hackmd.io/@TomNicholas/H… by @TEGNicholasCode
We're moving over to BlueSky and LinkedIn for all our future announcements. Follow us at bsky.app/profile/pangeo… to find out more about tomorrow's showcase 😉 (p.s., it's on Xpublish at Scale at 4 PM EST 🚀) Connect with us on LinkedIn at linkedin.com/company/pangeo…
Our friend's over at @zarr_dev made a big release today! Xarray v2025.01.1 was also released today with full support for Zarr-Python 3 🚀
🎉 Zarr-Python 3 is here! 🎉 - Full support for Zarr v3 spec - Chunk-sharding for more efficient data storage - Major performance boosts with async I/O & parallel compression 💻 pip install --upgrade zarr Blog post: zarr.dev/blog/zarr-pyth…
🌤️ #AMS2025 is just around the corner! We are taking AMS by storm with an exhibitor booth (booth 353), two talks from @_jhamman and @rabernat , and hosting a @pangeo_data Community Happy Hour (register here: lu.ma/ddtba5f5)!
Completely agree - "in theory" we have the simple scalability of the cloud, but in practice it's often a headache, for no good reason, which prevents adoption by most users (including many scientists)
New Post: Cloud Computing is Broken matthewrocklin.com/cloud-is-broke… Investor asks: "What's next for Data/Cloud Infrastructure?" My answer: "Boring stuff. People struggle with basics." Cloud feels like MP3 players before iPod. In theory everything is good. In practice adoption is low
Come learn about recent @xarray_dev GroupBy improvements at tomorrow's (Wed, Nov 13) Pangeo Showcase! discourse.pangeo.io/t/pangeo-showc…
We've talked a lot about #Icechunk's performance this week 🚀. But the Zarr-Python 3 results are also very encouraging! We're a few weeks away from the 3.0 launch but what this chart shows is that the new AsyncIO + multi-threading functionality in Zarr is going to be really good.
ALSO this release is the first to be compatible with the much anticipated v3 implementation of zarr-python! (still on its beta branch right now) This brings big performance benefits when reading @zarr_dev on S3 via async and (b) compatibility with @EarthmoverHQ 's Icechunk.
⚡️ Icechunk is fast! What does this mean for users? Reduced cost for all data-intensive compute jobs and enhanced productivity for the data scientists who work with data all day long. Icechunk, @EarthmoverHQ's new transactional cloud-native storage engine for array / tensor…
🎉 @source_coop is now open source! The web application - github.com/source-coopera… - and the data proxy - github.com/source-coopera… - have been opened up & updated with documentation on how to get it running locally. More documentation coming soon + tasks for new developers!
We’re hosting a webinar on Tuesday, October 22 from 12- 1 PM EST to discuss what Icechunk means for the scientific data community and answer questions from attendees. Register here: share.hsforms.com/1SCOFqe2kTjipo…
🚀 We are thrilled to announce the release of the Icechunk storage engine, a new open-source library and specification for the storage of multidimensional array (a.k.a. tensor) data in cloud object storage. Read our blog post about Icechunk here: earthmover.io/blog/icechunk
Great opportunity to work with @BalwadaDhruv, one of most innovative physical oceanographers in the world, at @LamontEarth in NYC!
We are looking to hire a postdoctoral scholar at Lamont Doherty Earth Observatory to work on submesoscale and mesoscale ocean turbulence using observations and machine learning: academic.careers.columbia.edu/#!/149390