by ambrood on 8/15/24, 5:16 PM with 31 comments
Hey HN,
We’d like to showcase a very early version of our embeddable stream processing engine called Denormalized. The rise of DuckDB has abundantly made it clear that even for many workloads of Terabyte scale, a single node system outshines the distributed query engines of previous generation such as Spark, Snowflake etc in terms of both performance and cost.
Now a lot of workloads DuckDB is used for were normally considered to be “big data” in the previous generation, but no more. In the context of streaming especially, this problem is more acute. A streaming system is designed to incrementally process large amounts of data over a period of time. Even on the upper end of scale, productionized use-cases of stream processing are rarely performing compute on more than tens of gigabytes of data at a given time.
Even so, the standard stream processing solutions such as Flink involve spinning up a distributed JVM cluster to even compute against the simplest of event streams. To that end, we’re building Denormalized designed to be embeddable in your applications and scale up to hundreds of thousands of events per second with a Flink-like dataflow API. While we currently only support Rust, we have plans for Python and Typescript bindings soon.
We’re built atop DataFusion and the Arrow ecosystems and currently support streaming joins as well as windowed aggregations on Kafka topics.
Please check out out repo at: https://github.com/probably-nothing-labs/denormalized
We’d love to hear your feedback.
by dman on 8/15/24, 7:12 PM
by emgeee on 8/15/24, 7:34 PM
by theLiminator on 8/15/24, 11:05 PM
Ideally, you'd support an api similar to Polars (which I have found to be the nicest thus far).
It'd also be important/useful to support Python udfs (think numpy/jax/etc.).
It'd be very cool if you could collaborate with or even tap into the polars frontend. If you could execute polars logical plans but with a streaming source, that would be huge.
by j-pb on 8/16/24, 6:33 AM
by ethegwo on 8/15/24, 5:54 PM
by shrisukhani on 8/15/24, 8:42 PM
by stereosky on 8/16/24, 8:45 AM
by eXpl0it3r on 8/16/24, 12:09 PM
All the description for Denormalized use the term, so if don't know it, it's kind of impossible to understand what Denormalized is / trying to solve.
by nonlogical on 8/16/24, 8:58 AM
Bookmarked for future projects!
by ztratar on 8/15/24, 7:28 PM
Will reach out! Congrats on the ship.
by drawnwren on 8/15/24, 8:36 PM
by franciscojarceo on 8/15/24, 7:11 PM
by lhnz on 8/15/24, 9:38 PM
by akshay2881 on 8/15/24, 8:41 PM
by rNULLED on 8/16/24, 4:53 AM