from Hacker News

The Ingredients of a Productive Monorepo

by mifydev on 5/25/25, 10:49 AM with 261 comments

by bob1029 on 5/28/25, 9:16 AM
This thread is reminding me of a prior one about complexity merchants. I am seeing a lot of sentiment that there is somehow a technical sacrifice by moving to a monorepo.
This is absolutely ludicrous unless you fail to grasp the power of a hierarchical file system. I don't see how a big mess like CI/CD is made easier by spreading it out to more points of configuration.
To me the whole point of a monorepo is atomic commits for the whole org. The power of this is really hard to overstate when you are trying to orchestrate the efforts of lots of developers - contrary to many claims. Rebasing in one repo and having one big meeting is a hell of a lot easier than doing it N times.
Even if the people on the team hate each other and refuse to directly collaborate. I still don't see the reason to not monorepo. In this scenario, the monorepo becomes a useful management and HR tool.
by lxe on 5/28/25, 5:12 AM
So there are 2 kinds of big tech monorepos.
One is the kind described in the article here: "THE" monorepo of the (mostly) entire codebase, requiring custom VCS, custom CI, and a team of 200 engineering supporting this whole thing. Uber and Meta and I guess Google do it this way now. It takes years of pain to reach to this point. It usually starts with the other kind of "monorepo":
The other kind is the "multirepo monorepo" where individual teams decide to start clustering their projects in monorepos loosely organized around orgs. The frontend folks want to use Turborepo and they hate Bazel. The Java people want to use Bazel and don't know that anything else really exists. The Python people do whatever the python people do these days after giving up on Poetry, etc... Eventually these might coalesce into larger monorepos.
Either approach costs millions of dollars and millions of hours of developers' time and effort. The effort is largely defensible to the business leaders by skillful technology VPs, and the resulting state is mostly supported by the developers who chose to forget the horror that they had to endure to actually reach it.
by AlotOfReading on 5/28/25, 5:35 AM
One thing I don't usually see discussed in monorepo vs multi repo discussions is there's an inverse Conway's law that happens: choosing one or the other will affect the structure of your organization and the way it solves problems. Monorepos tend to invite individual heroics among common infrastructure teams, for example. Because there are so many changes going in at once, anything touching a common area has a huge number of potential breakages, so effort to deliver even a single "feature" skyrockets. Doing the same thing in a multi-repo may require coordinating several PRs over a couple of weeks and some internal politics, but that might also be split among different developers who aren't even on a dedicated build team.
by Flux159 on 5/28/25, 4:00 AM
This definitely tracks with my experience in big tech - managing large scale build systems ends up taking a team that works on the build system itself. The underlying repo technology itself needs to work at scale & that was with a virtual file system that downloaded source files on demand when you needed to access them.
One thing that this article didn't mention is that most development was done either on your development server running a datacenter (think ~50-100 cores) - or on an "on demand" machine that was like a short lived container that generally stayed up to date with known good commits every few hours. IDE was integrated with devservers / machines & generally language servers, other services were prewarmed or automatically setup via chef/ansible, etc. Rarely would you want to run the larger monorepos on your laptop client (exception would generally be mobile apps, Mac OS apps, etc.).
by bittermandel on 5/28/25, 9:55 AM
I firmly believe that us at Molnett(serverless cloud) going for a strict monorepo built with Bazel has been paramount to us being able to make the platform with a small team of ~1.5 full-time engineers.
We can start the entire platform, Kubernetes operators and all, locally on our laptops using Tilt + Bazel + Kind. This works on both Mac and Linux. This means we can validate essentially all functionality, even our Bottlerocket-based OS with Firecracker, locally without requiring a personal development cluster or such.
We have made this tool layer which means if I run `go` or `kubectl` while in our repo, it's built and provided by Bazel itself. This means that all of us are always on the same version of tools, and we never have to maintain local installations.
It's been a HUGE blessing. It has taken some effort, will take continuous effort and to be fair it has been crucial to have an ex Google SRE on the team. I would never want to work in another way in the future.
EDIT: To clarify, our repo is essentially only Golang, Bash and Rust.
by rwieruch on 5/28/25, 9:51 AM
Over the past four years, I’ve set up three monorepos for different companies as contract work. The experience was positive, but it’s essential to know your tools.
Since our monorepos were used exclusively for frontend applications, we could rely entirely on the JavaScript/TypeScript ecosystem, which kept things manageable.
What I learned is that a good monorepo often behaves like a “polyrepo in disguise.” Each project within it can be developed, hosted, and even deployed independently, yet they all coexist in the same codebase. The key benefit: all projects can share code (like UI components) to ensure a consistent look and feel across the entire product suite.
If you're looking for a more practical guide, check out [0].
[0] https://www.robinwieruch.de/javascript-monorepos/
by yc-kraln on 5/28/25, 8:39 AM
The answer, of course, is "it depends".
We have something like ~40 repos in our private gitlab repo, and each one has its own CI system, which compiles, runs tests, builds packages for distribution, etc. Then there's a CI task which integrates a file system image from those ~40 repo's packages, runs integration tasks, etc.
Many of those components communicate with each other with a flatbuffers-defined message, which of course itself is a submodule. Luckily, flatbuffers allows for progressive enhancement, but I digress--essentially, these components have some sort of inter-dependency on them which at the absolute latest surfaces at the integration phase.
Is this actually a multi-repo, or is it just a mono-repo with lots of sub-modules? Would we have benefits if we moved to a mono-repo (the current round-trip CI time for full integration is ~35 minutes, many of the components compile and test in under 10s)? Maybe.
Everything is a tradeoff. Anything can work, it's about what kinds of frustrations you're willing to put up with.
by lihaoyi on 5/28/25, 6:51 AM
I wrote a bit about monorepo tooling in this blog post. It covers many of the same points in the OP, but in a lot more detail.
- https://mill-build.org/blog/2-monorepo-build-tool.html
People like to rave about Monorepos, and they are great if set up correctly, but there's a lot of intricacies that often goes on behind the scenes to make a Monorepo successful that it's easy to overlook since usually some "other" team (devops teams, devtools team, etc.) is shouldering all that burden. Still worth it, but most be approached with caution
by tayo42 on 5/28/25, 4:32 AM
Working with a well maintained mono repo is so nice, any other workflow just sucks to go back to. Working with a "lets do a monorepo" monorepo, where who ever set it up didn't understand the points in this article and more is a nightmare.
I think this is a business opportunity, if someone could sell the polished monorepo experience and tools to companies with engineering organizations but can't pull off a successful "we need to fork git" project to support their developers.
by spankalee on 5/28/25, 9:00 AM
I love monorepos, but in large organizations they have a counter-intuitive incentive for teams to _not_ allow other teams to depend on them, which can _reduce_ code reuse - the opposite of what some adopters want.
This issue is that users of a library can put almost infinite friction on the library. If the library team wants to make a change, they have to update all the use sites, but Hyrum's Law will get you because users will do the damndest things.
So for the top organization, it's good if many other teams can utilize a great team's battle-tested library, but for the library team it's just liability (unless making common code is their job). In a place like Google you either end up with internal copies and forks, strict access control lists, or libraries that are slow as molasses to change.
by gorgoiler on 5/28/25, 9:34 AM
An unspoken truth of a monorepo is that everyone is committed to developing on trunk, and trunk is never allowed to be broken. The consequence of this is that execution must be configurable at runtime: feature flags and configuration options with old and new code alongside each other.
You can have a monorepo and still fail if every team works on their own branch and then attempts to integrate into trunk the week before your quarterly release process begins.
You can fail if a core team builds a brand new version of the product on master with all new tests such that everything is green on every commit but your code is unreleasable because customers aren’t ready for v2 and you need to keep that v1 compatability around.
by jonthepirate on 5/28/25, 3:36 PM
I'm on the build team at DoorDash. We're in year 1 of our Bazel monorepo journey. We are heavy into Go, already have remote execution and caching working, and are looking to add support for Python & C++ soon.
If this sort of stuff happens to be something you might want to work on, our team has multiple openings... if you search for "bazel" on our careers page, you'll find them.
by jph on 5/28/25, 3:57 AM
Good practical article, thank you. I've added the link to my monorepo-vs-polyrepo guide here: https://github.com/joelparkerhenderson/monorepo-vs-polyrepo/
by KaiserPro on 5/28/25, 11:31 AM
One of the things not covered here is how to deal with versioning.
By default a monorepo will give you $current and nothing else.
A monorepo is not a bad idea, but you should think about either preventing breaking changes in some dependency killing the build globally, or have some sort of artefact store that allows versioned libraries (both have problems, you'll need to work out which is better for you. )
by ecoffey on 5/28/25, 3:53 PM
Monorepo is one of few things I’ve drunk the koolaid on. I joke that the only thing worse than being in a monorepo, is not being in one.
by chrismatic on 5/28/25, 10:07 AM
The point about trying to stick with a single language build tooling really cannot be stressed enough. It is what prompted me to write a simplified version of Bazel, a generic "target determinator" with caching capabilities if you will. I call it "Grog", the monorepo build tool for the grug-brained developer.
https://grog.build/why-grog/
by cormacrelf on 5/28/25, 9:41 AM
> Meta has a sophisticated implementation of a target determinator on top of buck2, but I don’t believe it is open-source.
It is: https://github.com/facebookincubator/buck2-change-detector
> Some tools such as bazel and buck2 discourage you from checking in generated code and instead run the code generator as part of the build. A downside of this approach is that IDE tools will be unable to resolve any code references to these generated files, since you have to perform a build for them to be generated at all in the first place
Not an issue I have experienced. It's pretty difficult to get into a situation where your IDE is looking in buck-out/v2/gen/781c3091ee3/... for something but not finding it, because the only way it knows about those paths is by the build system building them. Seeing this issue would have to involve stale caches in still-running IDE after cleaning the output folder, which is a problem any size repo can have. In general, if an IDE can index generated code with the language's own build system, then it's not a stretch to have it index generated code from another one.
The problem is more hooking up IDEs to use your build system in the first place. It's a real slog to support many IDEs.
Buck recently introduced an MSBuild project generator where all build commands shell out to buck2. I have seen references to an Xcode one as well, I think there's something there for Android as well. The rust-analyzer support works pretty well but I do run a fork of it. This is just a few. There is a need (somewhat like LSP, but not quite) for a degree of standardization. There is a cambrian explosion of different build systems and each company that maintains one of them only uses one or two IDEs and integrates with those. If you want to use a build system for an IDE they don't support, you are going to have a tough time. Last I checked the best effort by a Language Server implementation at being build-system agnostic is gopls with its "gopackagesdriver" protocol, but even then I don't think anyone but Bazel has integrated with it: https://github.com/bazel-contrib/rules_go/wiki/Editor-and-to...
by zvr on 5/28/25, 9:06 AM
Genuine question, because I've never worked somewhere with a monorepo infrastructure: is it really "one repo for all code in the organization" or "one repo for everything related"?
In my organization we have around 70k internal git repos (and an order of magnitude fewer public ones), but of course not everything is related to everything else; we produce many distinct software products. I can understand "collect everything of a product to a single repo"; I can even understand going to "if there is a function call, that code has to be in the same repo". But putting everything into a single place... What are the benefits?
by spankalee on 5/28/25, 8:52 AM
For those of you working in Node and npm, npm has pretty good built-in support for monorepos now with the workspaces feature. The big missing thing is incremental builds, which I highly recommend looking at Google's Wireit project for: https://github.com/google/wireit/
Wireit is the smallest change from plain npm that gets you a real dependency graph of scripts, caching (with GitHub Actions support), incremental script running, and services.
by cloogshicer on 5/28/25, 8:31 AM
Here's what I never got about monorepos:
Imagine you have an internal library and also two consumers of that library in the repo. But then you make breaking changes to the library but you only have time to update one of the consumers. Now how can the other consumer still use the old version of that library?
by bigbuppo on 5/28/25, 3:09 PM
It's kind of weird that both Microsoft and Google were both using Perforce. What does Perforce do that worked well at those companies for so long, and what caused them to dump it? Did they just get tired of the licensing cost?
I think what I'm getting at is that maybe the real missing feature isn't whatever it is that allows you to make stupidly large monorepos, but that maybe we should add Perforce's client workspace model as a git extension?
by woile on 5/28/25, 6:56 AM
I've been very happy with nix. I've been using nix in the reciperium.com monorepo, granted, it's only me, but I'm quite happy with having everything there. From docs, to the infra with terraform, to frontend and backend. The procedure for the CI is quite straightforward (nix build .#project), and caching the dependencies in the CI works quite okay. Even the secrets are there, encrypted using age (might not be the best, but good enough).
by vinnymac on 5/28/25, 6:06 AM
I established monorepos for the last two large projects I operated. I’ve never heard such nice compliments from contributors in my whole career. It seems not only can it be a productivity booster but people genuinely love when things are easy to grok and painless.
Multiple large monorepos in an organization are highly valuable imo, and should become more of a thing over time.
by boxed on 5/28/25, 10:30 AM
The article links to a site with this definition:
> A monorepo is a single repository containing multiple distinct projects, with well-defined relationships.
It would be better if there were terms that delineated "one repo for the company" from "one repo per project" from "many repos for a single project".
by slippy on 5/28/25, 8:36 AM
It's also worth noting that in systems that get as large as Google's that you end up with commits landing around the clock. It gets so that it's impossible to test everything for an individual commit, so you have a 2nd kind of test that launches all tests for all branches and monitors their status. At Google, we called this the Test Automation Platform (TAP). One cool thing was that it continuously started a new testing run of all testable builds every so often -- say, 15 minutes, and then your team had a status based on the flaky test failures vs solid test failures of if anyone in any dependency broke your code.
So if your code is testing fine, and someone makes a major refactor across the main codebase, and then your code fails, you have narrowed the commit window to only 15 minutes of changes to sort through. As a result, people who commit changes that break a lot of things that their pre-commit testing would be too large to determine can validate their commits after the fact.
There's always some amount of uncertainty with any change, but the test it all methodology helps raise confidence in a timely fashion. Also decent coding practices include: Don't submit your code at the end of the day right before becoming unavailable for your commute...
by scrubs on 5/31/25, 7:06 PM
The OP had a point to make then made it. It's refreshing. And, moreover, I'm smarter for it. Well done. Thank you for posting it. As readers may see from my recent comments elsewhere, there's a ton of junk out there. But when the good stuff arrives, one likewise stops, and says so.
by s17n on 5/28/25, 7:39 PM
If you've got less than 100 engineers, you aren't going to hit any of the scalability issues and there's literally no downside to a monorepo
by nc0 on 5/28/25, 7:50 PM
For the people interested in a good VCS system to achieve such monorepos, have a look at Ark [0]. It works really well for huge codebases, it is really fast, faster than Perforce Helix, it has an ethical and respectful pricing scheme, with a self-hosting mentality. Also it's indie, which is typically better than greedy corporate.
[0]: https://ark-vcs.com
by nssnsjsjsjs on 5/28/25, 8:44 AM
> Any operation over your repository that needs to be fast must be O(change) and not O(repo).
This is a good thought! It actually needs to be O(1/commit rate) though, so that having the monorepo doesn't create long queues of commits.
Or have some process batch passing ready to merge PRs into a combined PR and try to merge that. And best guess on the failing PR if it fails.
by baq on 5/28/25, 5:29 AM
Perfect write up. Rarely do I nod and murmur ’yes’ and ‘finally someone has written about it’ alternatively on each paragraph.
by ianpurton on 5/28/25, 5:46 AM
I've never worked on a mono repo that has the whole organizations code in it.
What are the advantages vs having a mono repo per team?
by l5870uoo9y on 5/28/25, 6:53 PM
Separating out the database layer in a monorepo package was the best architectural decision I made this year. Now it is my default because at some point you either want to rebuild the existing app entirely or separate out services such as public API access that all need access to the same database.
by kfkdjajgjic on 5/28/25, 6:53 AM
The artikel doesn’t bringa it up, but I’ve seen several places where repos has been cut according to company silos, where applikation code was in a monorepo for all teams, IaC was in one monorepo for all teams, and ops was in one monorepo for all teams. It was not good at all.
by cousin_it on 5/28/25, 12:21 PM
I've worked for a company with a large monorepo. At first I was a fan, but now I'm not so sure. The web of dependencies was too much. Now I think teams should be allowed to reuse other teams' code only as libraries or APIs with actual release cycles. There shouldn't be any "oh let's depend on the HEAD of this random build target somewhere else in the monorepo". There should be only "let's depend on a released version of such-and-such library or API".
If you adopt this discipline, you basically don't need a monorepo. Every team can have its own repo and depend on other stuff as third party. This adds some friction, but removes some other kinds of friction, and overall I think it's a better compromise.
by calvinmorrison on 5/28/25, 1:10 PM
One to look at historically was KDE using SVN.
all the downside of svn the partial checkout was great for a repo containing practically the entire K source tree
by joaonmatos on 5/29/25, 1:01 PM
As an Amazon employee, this is the kind of discussion that makes me glad we have the Brazil build system.
by jbverschoor on 5/28/25, 6:19 AM
Is there a way to set permissions on certain directories / force partial clones.
Not just a sparse clone.
by pawanjswal on 5/28/25, 5:52 AM
I felt like a pep talk and reality check rolled into one.
by v3ss0n on 5/28/25, 4:19 PM
Monorepo in ai driven development world is a disaster. The context consumption gonna be so off the roof
by countWSS on 5/28/25, 11:58 AM
From viewpoint of security and separation of concerns giving unlimited access to everything by virtue of "everything" being stored in one giant repo sounds exceptionally short-sighted. A single rogue actor would be able to insert code to any component of choice instead of working on isolated repo with people who specifically know it and approve the code: the monorepo is a "big ball of mud" with vague shared responsibility that defers to people who worked on "specific parts" but they lack any authority or control, auditing the entire codebase doesn't scale.