I've been deep in Kafka + DuckDB development the past few weeks, but here are some late-night...
π Created a Kafka topic with 8 partitions on my M3 MacBook Air (one per core) β‘ Inserted 50M messages from DuckDB (47 bytes each) in 11.62s (4.3 million...
π Created a Kafka topic with 8 partitions on my M3 MacBook Air (one per core) β‘ Inserted 50M messages from DuckDB (47 bytes each) in 11.62s (4.3 million msgs/sec) π Scanned those messages directly from Kafka in DuckDB in 17.54s (2.85 million msgs/sec) π€ Inserted them into a local DuckDB table in 29.63s (1.69 million msgs/sec)
More CPU cores and more Kafka partitions make it even faster.
A lot of this speed comes from DuckDBβs ability to facilitate zero-data-copy with the Kafka client library β no need to copy messages around instead just attach them to DuckDB vectors as auxiliary data.
π€ DuckDB + Kafka is fast π β and this is just the beginning!
Next up: benchmarking Avro, Protobuf, and JSON Schema registry support. The code is written β itβs been quite the adventure.
This will be Query.Farmβs first commercially licensed DuckDB extension.Β The new website is launching soon.
https://query.farm - Meet with us to get a demo.
Originally posted on LinkedIn.