by tekacs on 8/15/23, 5:54 PM with 355 comments
by RomanPushkin on 8/15/23, 8:28 PM
I wish I didn't see this comparison, which is not fair at all. Everyone in their right mind understands that the number of features is much less, that's why you have 10k lines.
Add large-scale distributed live video support at the top of that, and you won't get any close to 10k lines. It's only one of many many examples. I really wish you compare Mastodon to Twitter 0.1 and don't do false advertising
> 100M bots posting 3,500 times per second... to demonstrate its scale
I'm wondering why 100M bots post only 3500 times per second? Is it 3500 per second for each bot? Seems like it's not, since https termination will consume the most of resources in this case. So I'm afraid it's just not enough.
When I worked in Statuspage, we had support of 50-100k requests per second, because this is how it works - you have spikes, and traffic which is not evenly distributed. TBH, if it's only 3500 per second total, then I have to admit it is not enough.
by dataangel on 8/15/23, 9:14 PM
I'm not saying there's nothing here, but I am adjacent to your core audience and I have no idea whether there is after reading your post. I think you are strongly assuming a shared basis where everybody has worked on the same kind of large scale web app before; I would find it much more useful to have an overview of, "This what you would usually do, here are the problems with it, here is what we do instead" with side by side code comparison of Rama vs what a newbie is likely to hack together with single instance postgres.
by buro9 on 8/15/23, 7:04 PM
Updates per second to end users who follow the 7K tweets per second seems more realistic, it's the timelines and notifications that hurt, not the top of ingest tweets per second prior to the fan out... and then of course it's whether you can do that continuously so as not to back up on it.
by mping on 8/15/23, 6:51 PM
Now, the really hard part becomes selling. If companies start using your product to get ahead, that will be the real proof, otherwise its "just" tech that is good on paper.
On a side note, did you guys got any inspiration from clojure? I see lots of interesting projects propping up from clojure people...
Best of luck!
by Pxtl on 8/15/23, 6:24 PM
This write-up is very detailed but I couldn't find that explanation.
by softwaredoug on 8/15/23, 6:31 PM
It’s hard enough trusting Google or Amazons cloud offerings won’t change.
It seems that’s what they’re proposing right? What am I missing?
by afro88 on 8/15/23, 7:28 PM
Still super impressive. Reminds me of when I discovered Elixir while building a social-ish music discovery app. Switching the backend from Rails to Elixir felt like putting on clothes that actually fit after wearing old sweats. Rama looks like a similar jump, but another layer up, encompassing system architecture.
by sharms on 8/15/23, 6:13 PM
I think they have thought a lot about typical hard problems, such as having the timeline processing happen along side the pipeline, taking network / storage etc out of the picture. Nice work!
by jitl on 8/15/23, 8:04 PM
The way Ignite works overall is similar. You make a cluster of JVM processes, your data partitioned and replicated across the cluster, and you upload some JARs of business logic to the cluster to do things. Your business logic can specify locality so it runs on the same nodes as the relevant data, which ideally makes things a lot faster compared to systems where you need to pull all your data across the wire from a DB. Like Rama, Ignite uses a Java API for everything, including serializing and storing plain 'ol java objects.
Ignite's architecture isn't focused on "ETL" into "PStates". Instead it's more about distributed "caches" of data. It does have streaming for ingestion (https://ignite.apache.org/docs/latest/data-streaming), but you can transactionally update the datastore directly (https://ignite.apache.org/docs/latest/key-value-api/transact...). It also has a "continuous query" feature for those reactive queries to retrieve data (https://ignite.apache.org/docs/latest/key-value-api/continuo...).
Rama's data-structure oriented PState index seems easier to work with than building indexes yourself on top of Ignite's KV cache, but Ignite also offers an SQL language, so you can insert your data into the KV cache however, add some custom SQL functions, and then accept more flexible SQL querying of your data compared to the very purpose-built PCache things, but still be able to do lower-level or more performance-oriented logic with data locality.
Anyways, if you like some of this stuff but want to use an existing, already battle-tested open source project, you can look for these "in-memory data grid", "distributed cache", kind of projects. There's a few more out there that have similar JVM cluster computing models.
by clusterhacks on 8/15/23, 7:16 PM
It's not so much that I think the comment is wrong or anything, but rather that it seems so similar to what I have heard in the past from power-lisp (or Clojure in this case) super-smart engineers.
I feel like we have reached a point in software development where "better" paradigms don't necessarily gain much adoption. But if Rama wins in the marketplace, that will be interesting. And I am quite excited to see what a smart tech leader and good team have been able to grind out given a years-long timeframe in this programming platform space . . .
by ThinkBeat on 8/15/23, 9:43 PM
This is meant to be hyped to sell your Rama platform/product/framework? That you have spent 10 years building in secret? During that time you have built a datastore and a Kafke competitor and ?
Should not those 10 years be factored into the time it took to develop this technical demo?
Is it 100x less code including every LOC in all of Rama?
I mean I am sure you picked a use cast that is well suited to creating a Twitterish architecture implementation.
If I went off and wrote a ThinkBeat platform for creating Twitterish systems and then created a Twitterish implementation on top if it, its real easy to reach low LOCs.
by skybrian on 8/15/23, 6:39 PM
by failuser on 8/15/23, 7:18 PM
by miki123211 on 8/15/23, 10:22 PM
Their "variables" have names that you have to keep as Java strings and pass to random functions. If you want composable code, you don't declare a function, you call .macro(). For control flow and loops, you don't use if and for, but a weird abstraction of theirs.
I feel like this code could have been a lot simpler if it was written in a specialized language (or a mainstream language with a specialized transpiler and/or Macro capabilities.)
I'd quote the old adage about every big program containing a slow and buggy implementation of Common Lisp, but considering that this thing is written in Clojure, the authors have probably heard it before.
by kyle-rb on 8/15/23, 8:35 PM
I've been digging around for a while and haven't found any posts with more than 20 faves. The accounts I've found with ~1 million followers have little to no engagement. I want to see how a post with a million faves holds up to the promises of "fast constant time".
I'm especially curious about these queries — fave-count and has-user-faved — since a couple years ago Twitter stopped checking has-user-faved when rendering posts more than a month or so old, so I imagine it was expensive at scale.
by NoraCodes on 8/15/23, 6:08 PM
by gfodor on 8/15/23, 6:41 PM
by yayitswei on 8/15/23, 9:15 PM
by duped on 8/15/23, 6:19 PM
FWIW, why hype at all? Why "We'll more in a week. Then more in two weeks." Show the code today!
by jvans on 8/16/23, 3:24 PM
I will make one minor suggestion that I hope is constructive. I found the post difficult to read, largely because you rapid fire introduce a bunch of completely new concepts and propose a solution to many problems at once. You make a passing comparison to "just event sourcing and materialized views", although this was the easiest way for me to understand what you are doing. Starting from event sourcing and materialized views puts the reader on a ground they already understand, and moving on from there to why rama is better/what it adds on top, would be an easier transition.
by ltr_ on 8/15/23, 8:39 PM
i mean if you think about this as public services not as a business, profit is secondary, and first is just to make the thing better and better for the users, no need for spying , no advertisement, no need for a rich piece of shit somewhere getting a piece of the money paid in your city for every taxi drive, food delivery or to give up privacy to a soulless/faceless entity just because you want to say something publicly or keep in touch with people. there is no disruption from their part, its just an old thing put on the internet, they are just in the middle of everyone's life, just sucking everything they can. is the actual state of affairs "efficient"?
there must be fed up engineers and tech people everywhere with the sad state of IT industry.
by endisneigh on 8/15/23, 9:06 PM
Saying this is 100 or even a million times cheaper is like saying taking a picture of Sistine chapel and printing out copies is a trillion times cheaper than making it originally.
Many of us on this site could make a number of products very efficiently and cheaply given a static and fixed set of requirements as well as an existing implementation for reference.
That being said it was a very detailed post, so kudos for that, but it’s far too vague to be actionable. Why not just release the code and post simultaneously instead of just bragging about how little code was required?
by sixo on 8/15/23, 6:28 PM
by beefnugs on 8/16/23, 6:32 AM
by rubiquity on 8/15/23, 7:09 PM
by elisbce on 8/16/23, 1:11 AM
Any attempt to build a simplified version of the ecosystem will face the same fundamental distributed system tradeoffs like consistency/reliability/flexibility/... For example, one of the simplifications may be mixing storage/serving/ETL workloads on the same node. And the consequence is that without certain level of performance isolation, it could impact the serving latency during expensive ETL workload.
For Rama to be adopted successfully, I think it is important to identify areas where it has the most strengths, and low LOCs might not be the only thing that matters. For example, demonstrating why it is much better/easier than setting up Kafka/Spark and a database and build a Twitter clone on top of that while providing similar/better performance/reliability/extensibility/maintainability/... is a much stronger argument.
by jauntywundrkind on 8/15/23, 6:49 PM
Mastodon has to send messages to each instance with a recipient. That server can then fan out to all it's subscribers. The way this point is worded makes me think all the bits are on just a single instance, meaning all the fan out can be dealt with internally without having to do any server-to-server at all.
That is a fair comparison to Twitter, which is single instance. But it sounds like a much reduced ambition versus the task Mastodon has to do.
by cduzz on 8/16/23, 12:14 AM
by raverbashing on 8/15/23, 6:14 PM
Of course, for actual production use, there's probably a lot of things still, but this is a very nice works nonetheless
by gexla on 8/16/23, 2:10 AM
One question, why Google Groups rather than something like Discord? Not sure I would trust Google Groups to be around long.
by primitivesuave on 8/15/23, 10:06 PM
by NoahTheDuke on 8/15/23, 8:40 PM
Are there any plans for exposing a Clojure API? Given that it's implemented in Clojure, seems like it would be a natural fit. Interop with Java is nice but can be cumbersome compared to the more natural calling conventions and idioms (threading macros instead of `..` builder patterns, etc).
by donavanm on 8/16/23, 4:47 AM
Second whats the product/business angle on customer confidence, technical novelty, and your business core competency? A dated example but Im thinking of somewhere like basho with riak. Super cool tool, takes some mental adjustment to “get”, challlenges selling hosting vs software vs pro services.
by nevi-me on 8/16/23, 3:20 AM
by zubairq on 8/16/23, 5:53 AM
However, I do not see Rama's initial market being startups, since they just want the simplest way possible to build UI + backend and want to iterate super fast with tech that their developers already know in the initial stages.
by doublepg23 on 8/15/23, 10:27 PM
by runeks on 8/16/23, 6:16 AM
Is Twitters 7k tweets per second the average? If so, what’s the peak rate, and have you tested your system under this load?
by wink on 8/16/23, 8:01 AM
That's a bit like starting an Oracle clone now and summing up what they spent on developer salaries in the last 40 years. You basically can't not "reduce costs".
And no "the original consumer product" is not a real cop-out, you probably still have tons of people building iterations.
by beders on 8/16/23, 12:11 AM
That said: You need better advisors. Your investors and/or the board gave you bad advice on how to publish these accomplishments and talk about them.
I hope your go-to-market strategy works out a little better. Hyperbole is fine, but at least on hacker news, the audience is a bit careful with regards to grandiose statements.
What might work well on an investor presentation might backfire when you target engineers as audience.
by debadyutirc on 8/21/23, 3:05 PM
I saw the Twitter post first and the blog next. The premise is compelling but it's been a promise made to the data and software world for decades together.
The architecture and the core primitives are something that we agree with a lot. Use cases and business value are a whole different ballgame.
We have invested the past 5 years at InfinyOn building Fluvio our open source rust implementation of core event streaming primitives which is implementing this architecture to orchestrate data as efficiently as computationally possible today. I am happy to see this project as an effort in the same direction.
by _dwt on 8/15/23, 6:17 PM
by prepor on 8/16/23, 12:12 PM
Could you please compare Rama with Kafka Streams, especially from the point of view, if I would try to reimplement Rama API on top of Kafka Streams? What fundamental difficulties I'd face?
by dustingetz on 8/15/23, 6:33 PM
What is it? build web-scale reactive backends with an expressive java dataflow API. Instead of a database you develop your own custom app-specific indexes which are reactive, distributed and durable. It's like event sourcing and materialized views but integrated in a linearly scalable way.
> I cannot emphasize enough how much interacting with indexes as regular data structures instead of magical “data models” liberates backend programming
> It allows for true incremental reactivity from the backend up through the frontend. ... enable UI frameworks to be fully incremental instead of doing expensive diffs to find out what changed.
Ok, so in my mind I am positioning this against Materialized / differential dataflow, whose key primitive is a efficient streaming incremental join that works across very large relational tables. Materialized makes SQL reactive, Rama gives you a java dataflow DSL for developing purpose-built reactive database indexes.
How it works? 4 concepts: Depot, ETLs, PState, Query
Depots: "distributed, durable, and replicated logs of data." [Event streams?] "like Kafka except integrated" "All data coming into Rama comes in through depot appends."
ETLs: data arrives via depots, and is ETLed to PStates via "a Java dataflow API for coding topologies that is extremely expressive". "Most of the time spent programming Rama is spent making ETLs."
PStates seem like reactive data structures that are also durable/replicated, these are meant to supersede your database and indexes, letting you build custom purpose-built indexes that contain 100M elements:
> “partitioned states” are how data is indexed in Rama ... Unlike existing databases, which have rigid indexing models (e.g. “key-value”, “relational”, “column-oriented”, “document”, “graph”, etc.), PStates have a flexible indexing model. In fact, they have an indexing model already familiar to every programmer: data structures. A PState is an arbitrary combination of data structures. ... nested data structures can efficiently contain hundreds of millions of elements. For example, a “map of maps” is equivalent to a “document database”, and a “map of subindexed sorted maps” is equivalent to a “column-oriented database”. Any [composition] is valid – e.g. you can have a “map of lists of subindexed maps of lists of subindexed sets”.
Query: once you develop PStates to aggregate relevant data into a custom index of the right ... shape?, query seems sorta like GraphQL selectors over your custom index:
> Queries in Rama take advantage of the data structure orientation of PStates with a “path-based” API that allows you to concisely fetch and aggregate data from a single partition
> “query topologies” ... real-time distributed querying and aggregation over an arbitrary collection of PStates. These are the analogue of “predefined queries” in traditional databases, except programmed via the same Java API as used to program ETLs and far more capable.
by mariusor on 8/16/23, 11:27 AM
It's something else that maybe speaks the Mastodon API and/or ActivityPub, but we don't know since it doesn't really federate with anyone.
I commend the effort to try to make happen a non-open fediverse service, but appropriating the Mastodon name is just wrong. You should know better.
by alexcpn on 8/16/23, 6:25 AM
by Huhuhn on 8/15/23, 8:12 PM
by samsquire on 8/16/23, 11:55 AM
I am actually really impressed. Well done! Good work!
There's lots of interesting lessons and knowledge in the design of this platform.
I also like how you've decided to use Java as your API rather than Clojure.
I hope you're not discouraged by HN's reaction to your hard work. Don't be discouraged!
by whateverman23 on 8/15/23, 6:13 PM
ctrl+f "monetization"
ctrl+f "moderation"
ctrl+f "existing infrastructure"
ctrl+f "personalization"
etc etc
Yeah about what I expect from a "we rebuilt twitter for cheap" post. There's no point to the comparisons with the Twitter codebase size/cost. It completely distracts from what is probably a perfectly fine project.
by FridgeSeal on 8/15/23, 9:17 PM
by jonstewart on 8/16/23, 2:05 AM
I see “microbatching” in the diagram and, maybe this isn’t a fair take, but it feels more 2013 than 2023.
by Ryan_HD on 8/19/23, 2:46 PM
I guess most people can't accept things which is fundamentally harder in such architecture than normal ones.
by j45 on 8/15/23, 7:27 PM
Clever architecture can help as much if not more than clever coding especially when keeping it simple but scalable is needed.
by DigitalSea on 8/15/23, 10:33 PM
by RHSman2 on 8/16/23, 6:47 AM
by romgrk on 8/16/23, 9:24 PM
But I won't ever consider investing in it unless it's some form of open-source. It's too much of a risk to have a closed-source core.
by elwell on 8/16/23, 4:47 PM
EDIT: Oh, I see in comments: "The customer API in Java, and the implementation of that API is in Clojure"
by runeks on 8/16/23, 7:51 AM
Is this the case — ie. would a TodoMVC app implemented in Rama also be much simpler than a traditional frontend/backend/database CRUD implementation?
by chiefalchemist on 8/15/23, 9:19 PM
X years from now "We reduced the cost of building _____ at Mastodon-scale by 1000x".
It's certainly interesting, certainly an accomplishment, but it's also the nature of the game. The present eating the past, to be eaten by the future. Rinse. Repeat.
by stuaxo on 8/15/23, 8:11 PM
by crenwick on 8/16/23, 12:28 AM
Is there any rough infrastructure cost comparison?
Excluding the cost of engineering effort, which I understand is the major pitch.
by 2Gkashmiri on 8/15/23, 7:05 PM
is it baremetal?
vps?
how about doing a comparison on consumer grade vps like 1 vcpu/4GB ram setup comparison between your product and mastodon or pleroma for example?
i mean sure you can build a twitter scale product but federation means people can do that on their own and with your tech, they dont have to worry about scaling issues.
by boredumb on 8/15/23, 6:18 PM
by kennydude on 8/16/23, 9:55 AM
You built a Mastodon-compatible clone in Spring/Reactor.
by freecodyx on 8/16/23, 12:28 AM
by ketang on 8/16/23, 2:10 PM
by rugina on 8/15/23, 11:25 PM
by lionkor on 8/15/23, 7:26 PM
Why Java?
by throwaway892238 on 8/16/23, 12:08 AM
by mlindner on 8/16/23, 2:43 AM
by say_it_as_it_is on 8/15/23, 8:51 PM
by itissid on 8/15/23, 9:19 PM
by ceejayoz on 8/15/23, 6:12 PM
Article: "building Mastodon at sub-Twitter-scale"
by phillipcarter on 8/15/23, 6:28 PM
by polishdude20 on 8/15/23, 6:43 PM
Nono, you can't say that when later on you say it's built on top of Rama. You literally spent 10 years building the framework to even make this.
And yes, you built this in 10k lines of code but how many lines of code is Rama? This seems disingenuous.
by sandGorgon on 8/15/23, 6:32 PM
by reset2023 on 8/16/23, 12:22 AM
by sourcecodeplz on 8/15/23, 7:20 PM
by LeifCarrotson on 8/15/23, 6:17 PM
> You can begin to understand this by starting with a simple observation: you can describe Mastodon (or Twitter, Reddit, Slack, Gmail, Uber, etc.) in total detail in a matter of hours. It has profiles, follows, timelines, statuses, replies, boosts, hashtags, search, follow suggestions, and so on. It doesn’t take that long to describe all the actions you can take on Mastodon and what those actions do. So the real question you should be asking is: given that software is entirely abstraction and automation, why does it take so long to build something you can describe in hours?
> At its core Rama is a coherent set of abstractions...
This conclusion is alarming to read from a company that's trying to sell a new platform. The vast majority of the work in building Twitter or Reddit is not about building a coherent set of abstractions, it's working with an often incoherent reality, dealing with a myriad of laws that describe, as if your web app were a human clerk at a post office, how to handle PII and credit cards and CSAM filters and audits and copyright claims and on and on...
I'm honestly shocked that the technical implementation of a simplified, coherent platform took a full 9 person-months. That shouldn't be the hard part. What I'd want to know as a prospective customer is how you handle exceptions to your beautiful, idealized architecture, when some foreign country requires that you only store comments posted by their citizens within their borders or something like that.
by riffic on 8/15/23, 6:09 PM
https://joinmastodon.org/trademark
removed part about the mastodon subreddit since this is clearly not about the Mastodon software per se.
by throwaway7382 on 8/15/23, 6:32 PM
Move along, nothing to see here.
by trollied on 8/15/23, 6:18 PM
+ the time spent creating Rama, the platform that enables it.
Very dishonest leaving that out.
by MisterBastahrd on 8/15/23, 6:22 PM
I work in marketing automation, and I guess I have in one way or another my entire career. The clients who need to use the platform to communicate with their own clients over social networking may never touch our print delivery system, but that doesn't mean that print delivery doesn't exist or isn't important.
If you are unwilling to recreate the totality of the application in terms of functionality, then you are lying if you say that you have recreated it.