by jdorfman on 3/18/24, 8:00 PM with 62 comments
by conjecTech on 3/18/24, 9:08 PM
by airstrike on 3/18/24, 8:49 PM
At some point, I don't know, maybe when you cross the 100 repos mark, you've gotta ask yourself "maybe we could try a different approach?"
It's not like reddit has been known for its wonderful stability over the years
I'm sure the scale here is completely unlike anything I've ever worked on, but how hard can it be to write a sane implementation of a message board?
I'd be curious how much of this problem is caused by the junk that is "new reddit". I've been there since 2007... The day old.reddit.com is the day I abandon it for good
by mebazaa on 3/18/24, 8:49 PM
by tayo42 on 3/18/24, 8:34 PM
I know they did alot with git to make it manageable, hopefully what ever they did makes it to the open source world eventually so we can all avoid these crazy thousand repo worlds.
by heads on 3/18/24, 10:31 PM
A lot of the old guard have left the company though and our main product moved from four repos to just one. The threat from the legal team to have enforced OWNERS files — essentially replicating the divisive politics of the old repos but in the monorepo — thankfully withered on the vine. We still audit what goes into each release but it’s no longer part of any active permissions thing. We trust our developers but verify, for legal reasons, that nothing went wrong.
You either want one engineering team to act in unison behind your company’s mission, or you want to live a divisive narrative that you are actually multiple teams “working” together with none of the advantages of living under one roof and all the disadvantages of hard repository boundaries crisscrossing your intellectual property.
So many factors threaten to curdle your team dynamic: multiple offices, multiple floors, work from home hermits, bad management, etc. It’s simply org entropy and it takes much effort to keep the weeds out of the garden. Multiple repositories is one less bullet you can keep out of your feet while fighting all the other battles that threaten to turn your team from 1990s Sun Microsystems into 2010 Sun Microsystems.
by IshKebab on 3/18/24, 9:18 PM
Though I always wonder - how do Google, Microsoft, Facebook etc. deal with developing code near the root of their dependency tree? Utility libraries for example. Technically you're going to have every change you make there building all the code and running all the tests, which is obviously unworkable. What do they do?
by miduil on 3/18/24, 8:25 PM
by nolist_policy on 3/18/24, 9:40 PM
by ivanjermakov on 3/18/24, 9:12 PM
by ydnaclementine on 3/18/24, 8:48 PM
by sethammons on 3/18/24, 9:07 PM
In our monorepo, everyone passes around django orm objects and boundaries are practically non-existent. N+1 queries abound. Tests are full of patching and mocking and are _slow_. Our build takes over an hour to run tests. Someone on team A can and absolutely will mess up what someone on team B is doing. We are now having to spend quarter upon quarter as we define and enforce domain boundaries within the python code base. It is all bolted on checks. Tests are getting worse and people are actively trying to figure out ways around the testing system because it sucks.
Compare to my last gig. We had several hundred production repos. Each repo starts from a template with its own build pipeline. All production repos are gated so that any PR must pass tests before it can merge. Any merge has to pass tests before it could be deployed. As the base build processes matured, teams could, at their leisure, pull their services up to the latest and greatest. We even migrated from Jenkins to Buildkite; yeah, it took N pulls into N repos. Not a big deal. Most projects' tests and builds could get code out to production in under 10 minutes, including all those checks. Due to the network boundary, you couldn't accidentally get around someone's abstraction. And if one team blew up their build doing something dumb? No problem, it only affects that one team.
The argument is "gah, managing all those services!" Keep data behind APIs. Keep APIs backwards compatible. Keep dependencies acyclic. This is _possible_ with monorepos, but you have to do extra work compared to networked services -- yes, when any particular team/service can deploy in minutes due to low build system complexity you are winning. Can you get that wrong and make strange cyclic dependencies and introduce performance issues due to network hops? Yeah, of course. However, we were processing, literally, 10s of billions of api requests on this system and teams could work untethered from one another. The new gig does eerily similar software, but is several orders of magnitude slower in their ability to process data and their ability to move new features.
yes, yes, you could have networked services and a monorepo and you can leverage tooling like Pants to minimize the testing to only account for changed files. It is just fighting what I have found to be a better model. Keep things separate. Keep things fast to change.
by ZephyrBlu on 3/18/24, 8:33 PM
by MilStdJunkie on 3/18/24, 8:38 PM
I don't think someone knows what "repository" means.
At least they're bringing in Sourcegraph. That tool's helped me make sense of some chaos. Not 2000 repos' worth of chaos, but still, some chaos.
by hackmiester on 3/18/24, 8:37 PM