Bartosz Konieczny

@waitingforcode

Freelance Data Engineer and trainer, enjoy solving data problems with #ApacheSpark #AWS #GCP #Azure 👨‍🏭 | [email protected]

remote

Joined January 2011

79Following

2KFollowers

Pinned

Bartosz Konieczny@waitingforcode · Oct 5, 2023

Last week I spent some time to understand the #PySpark applyInPandasWithState. This week I'm refactoring the code, hoping to still understand it 2 months later ;) 👉 waitingforcode.com/apache-spark-s…

waitingforcode's tweet image. Last week I spent some time to understand the #PySpark applyInPandasWithState. This week I'm refactoring the code, hoping to still understand it 2 months later ;) 👉 waitingforcode.com/apache-spark-s…

700

Pinned

Bartosz Konieczny@waitingforcode · Sep 6, 2023

It's not a rebranding but more a regrouping 😉 All my additional #dataengineering content is now available from there waitingforcode.com/better (planning to add some stream processing materials soon)

waitingforcode's tweet image. It's not a rebranding but more a regrouping 😉 All my additional #dataengineering content is now available from there waitingforcode.com/better (planning to add some stream processing materials soon)

819

Bartosz Konieczny Retweeted

Shroff Publishers@shroffpub · Apr 24

Releasing Soon! Pre-order now shroffpublishers.com/books/97893680… Data Engineering Design Patterns By Bartosz Konieczny @waitingforcode. with @OReillyMedia Focusing on various aspects of data engineering, including data ingestion, data quality, idempotency, and more. #dataengineering

559

Bartosz Konieczny Retweeted

Jack Vanlightly@vanlightly · Oct 14

If you want to understand the consistency models of the mentioned table formats of the paper, I've written about it extensively and written formal models. * jack-vanlightly.com/analyses/2024/… * jack-vanlightly.com/analyses/2024/… * jack-vanlightly.com/analyses/2024/… * github.com/Vanlightly/tab…

4.0K

Bartosz Konieczny Retweeted

Leanpub@leanpub · Jun 30, 2024

Data Engineering patterns on the cloud by Bartosz Konieczny is on sale on Leanpub! Its suggested price is $39.00; get it for $24.65 with this coupon: leanpub.com/sh/ygsnqbRD @waitingforcode #CloudComputing #AmazonWebServices #GoogleCloudPlatform #MicrosoftAzure

1.0K

Bartosz Konieczny Retweeted

Delta Lake@DeltaLakeOSS · Mar 11, 2024

Join @newfront and @waitingforcode and learn all about streaming Delta Lake tables with Apache Spark Structured Streaming! 🦀 🗓 March 21st 🕝 9:00AM PT / 12:00PM ET 💻 Join this webinar via LinkedIn, YouTube, or Zoom! Learn more: linkedin.com/events/streami… #deltalake #streaming

2.0K

Bartosz Konieczny Retweeted

Jim Dowling@jim_dowling · Mar 1, 2024

I have been busy the last few months writing a book for O'Reilly about how to build ML systems (batch, real-time, and LLMs), distilling much of what I have learnt from both working with customers as well as students. Why could the book interest you? * Data Scientists - transition…

118

15.0K

Bartosz Konieczny@waitingforcode · Dec 28, 2023

I don't want to start a flame war here, but IMO it is a mistake to jump straight to distributed databases (and 90% of the content below is distributed databases) without first learning fundamentals on single node databases. Here's my 10 things to understand about databases:…

KKaivalya Apte - The Geek Narrator@thegeeknarrator · Dec 27, 2023

Ten things to understand about your database: 1) High level Architecture 2) How writes work? (Replication, data distribution, internal organisation etc) 3) How reads work? (Consistency guarantees, tuning options, etc) 4) CAP theorem, ex. CP or AP 5) Transactions and Concurrency…

592

748

123.0K

Bartosz Konieczny Retweeted

Leanpub@leanpub · Dec 6, 2023

Data Engineering patterns on the cloud by Bartosz Konieczny is on sale on Leanpub! Its suggested price is $39.00; get it for $26.10 with this coupon: leanpub.com/sh/1T4q5Z81 @waitingforcode #CloudComputing #AmazonWebServices #GoogleCloudPlatform #MicrosoftAzure

882

Bartosz Konieczny Retweeted

Jack Vanlightly@vanlightly · Nov 21, 2023

Chapter 4 of The Architecture of Serverless Data Systems: CockroachDB (serverless). jack-vanlightly.com/analyses/2023/…

237

168

30.0K

Bartosz Konieczny Retweeted

Delta Lake@DeltaLakeOSS · Nov 12, 2023

The early release of Delta Lake: The Definitive Guide is here! 🎉 The latest edition includes the addition of Chapter 12: Performance Tuning. Download here ➡️ bit.ly/472DVY7 Authors @dennylee, Prashanth Babu, Tristen Wentling, & @newfront #opensource #deltalake #oss

14.0K

Bartosz Konieczny Retweeted

Leanpub@leanpub · Nov 1, 2023

Data Engineering patterns on the cloud: How to solve common data engineering problems with cloud services? leanpub.com/data-engineeri… by Bartosz Konieczny is the featured book on the Leanpub homepage! leanpub.com @waitingforcode #CloudComputing #AmazonWebServices

711

Bartosz Konieczny@waitingforcode · Sep 29, 2023

In the previous release #PySpark has got an interesting streaming feature -> the arbitrary stateful processing. It has a different API than the Scala version but is more adapted to the Python world. More 👉 waitingforcode.com/apache-spark-s…

waitingforcode's tweet image. In the previous release #PySpark has got an interesting streaming feature -&gt; the arbitrary stateful processing. It has a different API than the Scala version but is more adapted to the Python world.
More 👉 waitingforcode.com/apache-spark-s…

414

Bartosz Konieczny Retweeted

Antón@antonmry · May 31, 2023

A list of articles I share again and again when developers ask me about Kafka 🧵

329

574

58.0K

Bartosz Konieczny Retweeted

Apache Spark@ApacheSpark · Sep 17, 2023

[ANNOUNCEMENT] Congrats to the Apache Spark community and all the contributors! The Apache Spark 3.5.0 release is here. Try it out! spark.apache.org/releases/spark…

112

14.0K

Bartosz Konieczny@waitingforcode · Sep 4, 2023

If Delta Lake implemented the commits only, I could stop exploring this transactional part after the previous article. But as for RDBMS, #DeltaLake implements other ACID-related concepts, such as isolation levels 👉 waitingforcode.com/delta-lake/tab…

527

Bartosz Konieczny@waitingforcode · Aug 30, 2023

One of the great features of table file formats is the ability to handle write conflicts. It wouldn't be possible without commits that are the topic of my #DeltaLake blog post. waitingforcode.com/delta-lake/tab…

882

Bartosz Konieczny@waitingforcode · Aug 21, 2023

Surprises may be hidden elsewhere, even in the provider-managed libraries. I got punished once for relying on them without verifying the ins and outs before. Lessons learned 👉 waitingforcode.com/data-engineeri…

611

Bartosz Konieczny@waitingforcode · Aug 13, 2023

OOM problems in #ApacheSpark Structured Streaming were often due to the infinitely growing metadata layer. There were a few workarounds but it's also possible to use a proper configuration, at least for file sink 👉 waitingforcode.com/apache-spark-s…

785

Bartosz Konieczny@waitingforcode · Aug 3, 2023

If you rely on the watermark for the state expiration in #ApacheSpark arbitrary stateful processing, be careful. The first micro-batch doesn't contain the watermark yet! You can find some of possible workarounds in the new blog post 👉 waitingforcode.com/apache-spark-s…

355