by sewen on 6/12/24, 3:25 PM with 109 comments
https://github.com/restatedev/ https://restate.dev/
It is free and open, SDKs are MIT-licensed, runtime permissive BSL (basically just the minimal Amazon defense). We worked on that for a bit over a year. A few points I think are worth mentioning:
- Restate's runtime is a single binary, self-contained, no dependencies aside from a durable disk. It contains basically a lightweight integrated version of a durable log, workflow state machine, state storage, etc. That makes it very compact and easy to run both on a laptop and a server.
- Restate implements durable execution not only for workflows, but the core building block is durable RPC handlers (or event handler). It adds a few concepts on top of durable execution, like virtual objects (turn RPC handlers into virtual actors), durable communication, and durable promises. Here are more details: https://restate.dev/programming-model
- Core design goal for APIs was to keep a familiar style. An app developer should look at Restate examples and say "hey, that looks quite familiar". You can let us know if that worked out.
- Basically every operation (handler invocation, step, ...) goes through a consensus layer, for a high degree of resilience and consistency.
- The lightweight log-centric architecture gives Restate still good latencies: For example around 50ms roundtrip (invoke to result) for a 3-step durable workflow handler (Restate on EBS with fsync for every step).
We'd love to hear what you think of it!
by BenoitP on 6/12/24, 6:06 PM
Question for OP: I'd bet Flink's Statefuns comes in Restate's story. Could you please comment on this? Maybe Statefuns we're sort of a plugin, and you guys wanted to rebase to the core of a distributed function?
by sewen on 6/12/24, 4:22 PM
- Blog post with an overview of Restate 1.0: https://restate.dev/blog/announcing-restate-1.0-restate-clou...
- Restate docs: https://docs.restate.dev/
- Discord, for anyone who wants to chat interactively: https://discord.com/invite/skW3AZ6uGd
by yaj54 on 6/12/24, 4:02 PM
by senorrib on 6/13/24, 12:50 AM
by hintymad on 6/12/24, 7:19 PM
by bilalq on 6/12/24, 5:19 PM
1. Max execution duration of a workflow
2. Max input/output payload size in bytes for a service invocation
3. Max timeout for a service invocation
4. Max number of allowed state transitions in a workflow
5. Max Journal history retention time
by bilalq on 6/12/24, 4:38 PM
One big hangup for me is that there's only a single node orchestrator as a CDK construct. Having a HA setup would be a must for business critical flows.
I stumbled on Restate a few months ago and left the following message on their discord.
> I was considering writing a framework that would let you author AWS Step Functions workflows as code in a typesafe way when I stumbled on Restate. This looks really interesting and the blog posts show that the team really understands the problem space.
> My own background in this domain was as an early user of AWS SWF internally at AWS many, many years ago. We were incredibly frustrated by the AWS Flow framework built on top of SWF, so I ended up creating a meta Java framework that let you express workflows as code with true type-safety, arrow function based step delegations, and leveraging Either/Maybe/Promise and other monads for expressiveness. The DX was leaps and bounds better than anything else out at the time. This was back around 2015, I think.
> Fast-forward to today, I'm now running a startup that uses AWS Step Functions. It has some benefits, the most notable being that it's fully serverless. However, the lack of type-safety is incredibly frustrating. An innocent looking change can easily result in States.Runtime errors that cannot be caught and ignore all your catch-error logic. Then, of course, is how ridiculous it feels to write logic in JSON or a JSON-builder using CDK. As if that wasn't bad enough, the pricing is also quite steep. $25 for every million state transitions feels like a lot when you need to create so many extra state transitions for common patterns like sagas, choice branches, etc.
> I'm looking forward to seeing how Restate matures!
by aleksiy123 on 6/12/24, 4:28 PM
Also something about this area always makes me excited. I guess it must be the thought of having all these tasks just working in the background without having to explicitly manage them.
One question I have is does anyone have experience for building data pipelines in this type of architecture?
Does it make sense to fan out on lots of small tasks? Or is it better to batch things into bigger tasks to reduce the overhead.
by netvarun on 6/12/24, 4:53 PM
by mikelnrd on 6/12/24, 6:45 PM
by hamandcheese on 6/12/24, 3:39 PM
by azmy on 6/12/24, 3:51 PM
by magnio on 6/12/24, 4:09 PM
by akbirkhan on 6/12/24, 3:39 PM
Question tho, when will you guys have python support? I’m a ml researcher here and can you tell that most of my work is now pipelines between different services, e.g. Chaining multiple LLM services. Big bottleneck is if one service returns an error and crashes the full chain.
Big fan of this work nevertheless. Just think you have alpha on the table
by p10jkle on 6/12/24, 3:29 PM
by jamifsud on 6/12/24, 4:29 PM
by rubyfan on 6/12/24, 4:14 PM
by mnahkies on 6/12/24, 9:45 PM
I'm particularly interested in the scaling characteristics, and how your approach to durable storage (seems no external database is required?) differs
by sharkdoodoo on 6/12/24, 4:53 PM
by whoiskatrin on 6/12/24, 3:31 PM
by dovys on 6/12/24, 3:39 PM
by AhmedSoliman on 6/12/24, 3:38 PM
by jiehong on 6/12/24, 9:34 PM
I couldn’t find an equivalent of the codec server in temporal that basically encrypts all data in the event log. Is there something similar?
by ko_pivot on 6/12/24, 3:38 PM
by qwertyuiop_ on 6/12/24, 3:56 PM
by swyx on 6/12/24, 4:15 PM
by sharkdoodoo on 6/12/24, 4:51 PM
by johtso on 6/12/24, 4:33 PM
by exabrial on 6/15/24, 1:31 PM
by _1tan on 6/12/24, 7:47 PM