by jamesfisher on 1/30/23, 10:04 AM with 213 comments
Lots of brain cycles are spent on "programming language theory". We've roughly figured out the primitives required to express real-world computation.
In contrast, we apparently have no "package management theory". We have not figured out the primitives required to express dependencies. As a result, we keep building new variants and features, until we end up with <script>, require(), import, npm, yarn, pnpm, (py)?(v|virtual|pip)?env, (ana)?conda, easy_install, eggs and wheels ...
Is it just a "law of software" that this must happen to any successful language? Or are there examples of where this it has not happened, and what can we learn from them? Is there a "theory of package management", or a "lambda calculus of package management" out there?
by tazjin on 1/30/23, 11:08 AM
We have a good hunch. The basic theory behind Nix definitely goes in the right direction, and if we look away from all the surface-level nonsense going on in Nix, it's conceptually capable (e.g. [0]) of being a first-class language dependency manager.
For this to work at scale we'd need to overcome a couple of large problems though (in ascending order of complexity):
1. A better technical implementation of the model (working on it [1]).
2. A mindset shift to make people understand that "binary distribution" is not a goal, but a side-effect of a reasonable software addressing and caching model. Without this conceptual connection, everything is 10x harder (which is why e.g. Debian packaging is completely incomprehensible - their fundamental model is wrong).
3. A mindset shift to make people understand that their pet programming language is not actually a special snowflake. No matter what the size of your compilation units is, whether you call modules "modules", "classes" or "gorboodles", whether you allow odd features like mutually-recursive dependencies and build-time arbitrary code execution etc.: Your language fits into the same model as every other language. You don't have to NIH a package manager.
This last one is basically impossible at the current stage. Maybe somewhere down the line, if we manage to establish such a model successfully in a handful of languages and people see for themselves, but for now we have to just hold out.
by jakewins on 1/30/23, 10:44 AM
My experience is that the older gen languages you mention had to invent package management, made lots of understandable mistakes and now are in a backwards compat hellscape.
Rust and Go built their packaging story with the benefit of lessons learned from those other systems, and in my experience the difference is night and day.
by ChrisRackauckas on 1/30/23, 10:41 AM
Back when those languages were designed, you'd manually download the few modules you need, if you downloaded any packages at all. In C you'd normally build your own world, since it came before the www times, and C++ kind of inherited that. But languages which came out later decided that we now live in a world where most of the code that is executed is packages, most likely packages which live on Github. So Julia and Rust build this into the language. Julia in particular with the Project.toml and Manifest.jl for fully reproducing environments, its package manager simply uses git and lets you grab the full repository with `]dev packagename`, its package registry system lets you extend with private package worlds.
I think the issue is that dependencies are central, so you can never remove old package systems because if that's where the old (and rarely updated) dependencies live, then you need to keep it around. But for dependencies to work well, you need all dependencies to be resolved using the same package system. So package systems don't tend to move very fast in any language, whatever you had early has too much momentum.
by derefr on 1/30/23, 4:38 PM
Have you tried to package a random Python/Ruby/etc. CLI program, for Debian? Or how about for Homebrew? Each one involves a cacophony of PL scripts calling shell-scripts calling PL scripts, and internal/undocumented packager subcommands calling other internal/undocumented packager subcommands. It takes ~forever to do a checked build of a package in these systems, and 99% of it is just because of how spread out the implementation of such checks is over 100 different components implemented at different times by different people. It could all be vastly simplified by reimplementing all the checks and transforms in a single pass that gradually builds up an in-memory state, making assertions about it and transforming it as it goes, and then emitting it if everything works out. You know — like a compiler.
by denton-scratch on 1/30/23, 10:57 AM
There's no reason why a package-management system needs to be language-specific; dependencies are often cross-language. Hell, even some blocks of code contain more than one language.
The package-management system is responsible for deploying packages. The way a package is deployed should depend on the operating environment, not on the language. These language-specific packaging arrangements typically deploy into some private part of the file-system, organized in its own idiosyncratic way.
Using git as a repository is just nuts. Git is privately-owned, and stuffed with all kinds of unaudited junk. You can't audit everything you install by hand; so these systems force you to install unaudited code, or simply not install.
I've been using Debian derivatives for years. I appreciate having an audited repository, and an installation system that deploys code to more-or-less predictable filesystem locations.
by perrygeo on 1/30/23, 5:25 PM
We self-organize into communities of practice and select the package management strategy that works best. Reaching across communities to develop a grand centralized strategy that fits everyone's needs would be __possible__, but involves significant communication and coordination overhead. So instead we fracture, and the tooling ecosystem reflects ad-hoc organizational units within the community.
Ecosystems like Rust cargo that have batteries included from the start have an advantage, virtually all Rust developers have a single obvious path to package management because of this emergent social organization.
Ecosystems like Python's seem like the wild west, there is deep fracturing (particularly between data science and software engineering) and no consensus on the requirements or even the problems. So Python fractures further, ironically in a search for something that can eventually unify the community. And python users feel the strain of this additional overhead every day, needing to side with a team just to get work done.
I'd argue both of these cases are driven by consequences easily predictable from Conway's Law.
by neilv on 1/30/23, 11:09 AM
But to fully appreciate it, it helps to understand syntax transformation in Racket. Once the rigorous phase system forces non-kludgy static rules about when things are evaluated, your syntax transformers and the code on which they depend could cause a mess of shuffling code among multiple files to solve dependency problems... until you use submodules with the small set of visibility rules, and then suddenly your whole tricky package once again fits in a single file cleanly.
I leveraged this for some of my embedded doc and test experiments, without modifying the Racket core. (I really, really like single-source-file modules that embed doc, test, and package metadata all in the same file, in logical places.)
by jrockway on 1/30/23, 6:21 PM
Python and Node both need a way to compile the code down to a single statically-linked binary like more modern languages (Go, Rust), solving the distribution problem once and for all.
There are module systems that aren't insane, like Go's module system. It uses semantic versioning to mediate version conflicts. Programs can import multiple major versions of the same module. The module requirements files ensure that every checkout of the code gets the exact same bytes of the code's dependencies. The compiler is aware of modules and can fetch them, so on a fresh workstation, "go test ./..." or "go install ./cmd/cool-thing" in a module-aware project works without running any other command first. It is actually so pleasant to use that I always think twice "do I want to deal with modules" before using a language like Javascript or Python, and usually decide "no".
npm and pip are the DARK AGES. That's why you're struggling. The community has been unable to fix these fundamental flaws for decades, despite trying in 100 incompatible ways. I personally have given up on the languages as a result. The standard library is never enough. You HAVE to solve the module problem on day 1.
by rpigab on 1/30/23, 11:04 AM
Sometimes languages are relased with just a spec and don't want to force any choice of tool or way of doing things on you, so you just manage that yourself or create third party tools, and that's where it gets wild in every direction, but it also creates room for new ideas, innovation, which later are used in "official" modern package managers built in the language tools.
And yeah, nowadays I use Rust even for scripting stuff that would be easier to do in Python, just because I don't want to create a thousandth virtualenv, find a lib that does what I want but it's for Python<3.6 ou Python2 etc., so in the end it's easier in Rust even though some new libs require nightly builds of the toolchain.
by rklaehn on 1/30/23, 5:16 PM
In the long term, I think the whole notion of a "package" is obsolete. In unison https://www.unison-lang.org/ , code is just a content-addressed merkle tree. So dependencies are both more precise and more fine grained than packages.
E.g. if there is a new version of a library, but the part of the library that you actually use is unchanged, unison will detect this and not trigger a full rebuild.
The whole notion of a package becomes less important.
by jcarrano on 1/30/23, 3:50 PM
Eventually the shortcomings of the package manager will become more evident, but at first it will seem to work and solve some pain points programmer have.
Something similar goes for build systems.
by namelosw on 1/30/23, 4:17 PM
JavaScript, C/C++, Python are very old and the package systems were pretty much bolt-on. And of course a lot of historical factors were playing a bit role.
In the old days, people usually start minimal. When they get burned they also find they were locked-in - thus the bolt-ons.
> we apparently have no "package management theory"
People are definitely standing on top of each others' shoulders. Node.js npm actually felt like Ruby package tools but also being first-party (they also cut some corners, and made some design choices - some worked well and some didn't).
i.e. Hex for Elixir feels like nothing new, but avoided most common pitfalls others have experienced. I believe most language designers nowadays can do the same, given they want to play it safe and want nothing too novel.
by choeger on 1/30/23, 10:46 AM
It's no coincidence that Rust (which has basically copied Haskell's type system to a large degree) and Julia (which is essentially a Lisp in disguise) have somewhat sound packaging systems. C/C++ does not even have a tidy core language, let alone proper modules (yes, yes, C++ modules might change that, but they are still a complex mix of templates and normal code).
by gwd on 1/30/23, 10:42 AM
by Akronymus on 1/30/23, 10:49 AM
Dotnet has nuget and it is quite pleasant to use in my experience. I think the big deal is having some standardized way of managing them as a layer on top. In teh dotnet case, it is having the solution pull in the nuget packages, rather than individual files. Along with having a standardized way to index available packages from each source (In this case the nuget json)
by thenerdhead on 1/30/23, 5:27 PM
The author does a great job pointing out the problems, theory, and ecosystem change that makes it a Rube Goldberg machine.
By definition, a package manager is always "incomplete" because it cannot catch all the security vulnerabilities, binary compatibility, or knowing the dependencies ahead of time nor their interactions together. Thus it can be unsuccessful at managing dependencies - its primary job.
Source: Also work on a notable package manager.
by michael1999 on 1/30/23, 5:24 PM
by pastage on 1/30/23, 11:08 AM
I think the world is complicated and you will in the end need to support compilations/installations on AmigaOS 2.05 on m68k running in kubernetes native.
by warrenm on 1/30/23, 5:43 PM
It all boils down to NIHS - "not invented here syndrome"
Between developers being suspicious of anyone else's work, and their innate love to build things they'll use (your wheel is cool, I guess...but MY wheel does THIS!) ...you get a gajillion libraries that overlaps 95% of the time (but never on the 5% you "care" about)
Eons ago David Pogue wrote in his review of Word (98, I think) for the Mac, "MacWrite fit on one floppy disk and had 95% of all the features I've ever needed in a word processor. Word take a CD, and has 96% of all the features I've ever needed in a word processor."
Why is Word so big? Is it because a 'word processor' needs to be that big?
Why are there so many libraries with incredibly weird interdependencies (version 1.9.2a of this lib needs 2.1.x2 of that one, not 2.1.x3 or 2.1.g7; but if you have version 1.9.3 of this lib, you can use 2.1.x2 up through 2.1.x9 ot he other one)?
Same thing - somebody somewhere sometime somewhy decided to use that library in their own stuff, and is now forever and always bound to it (...until they refactor or rewrite a perfectly good tool that was in Scala into Haskell)
by tsukikage on 1/30/23, 11:40 AM
Step 1: look at all the existing package+module systems, and realise they are all horrible Rube Goldberg machines, way too bloated for the simple thing you want to do
Step 2: write your own package+module system! It's lean, mean and just does the simple obvious thing your simple one-man project needs.
Step 3: your system hits the real world! Your system is used with projects that involve more than one person and grow organically, and that spread to more and more esoteric environments. The Rube Goldberg projects have Rube Goldberg aims and their target ecosystems have Rube Goldberg requirements while goalposts shift continuously with tight deadlines. You add features and toggles to your system to support these things.
Step 4: congratulations! You are the proud inventor and owner of yet another Rube Goldberg machine. It is now time to move on to better things; the system, meanwhile, is handed over to some committee and will only grow in complexity from here.
by tikkabhuna on 1/30/23, 10:46 AM
I feel like you've picked the 3 worst ecosystems as an example.
Java has a wonderful ecosystem. You can use Maven/Gradle/Bazel as a dependency manager/build tool. All support the Maven repository format. Packages are published as jars and easily consumed. Of course you can still end up in dependency hell and you should take steps to mitigate those issues, but I'm quite happy with it.
> As a result, we keep building new variants and features, until we end up with <script>, require(), import, npm, yarn, pnpm, (py)?(v|virtual|pip)?env, (ana)?conda, easy_install, eggs and wheels ...
IMO the problem here is that there isn't a monopoly and its easy to swap in a replacement. JavaScript/Python developers are happy (or willing) to try something new, so new alternatives are created. Convincing a Java developer to try a new build system is difficult, so changes are made to existing systems rather than creating new ones.
Go and Rust are two other systems that work well and are (almost?) universally adopted. They come with the language and there is little reason to look around.
by qbasic_forever on 1/30/23, 3:50 PM
The pain and fragmentation you're mentioning is that everyone has different opinions about how they want to configure/bundle/organize code and package metadata. That's really the only core difference between deb/rpm, npm/yarn, pip/poetry/conda/<a million other python tools>, handmade makefile/cmake/autotools/etc. People just had different ideas about how they wanted to do things at the periphery. At their core all of those tools are just simple DAG walkers.
Basically, you're thinking it's a technical problem when in reality it's just a social issue. Some people prefer something one way, others prefer it the other way--there is no consensus (and likely never will be, people will always be building tools to do things their way).
by emmelaich on 1/30/23, 11:02 AM
Honestly, I'd like to have a lowest common denominator of "just dump the contents of the tar/zip archive" here.
Only check required would be to ensure you're not overwriting anything.
Dependencies and PATH management etc could be done out-of-band. That way you would not have to re-download stuff at least.
by jonfw on 1/30/23, 2:45 PM
A declarative (go.mod) file that always installs the same dependencies every build, a tool to update that lock file (go mod tidy), and an indication in the source that indicates how the dependency is being used (import).
by lloydatkinson on 1/30/23, 9:27 PM
by karel_3d on 1/30/23, 11:10 AM
The proxy thing and the drama with sr.ht is annoying, but maybe it was solved while I didn't look.
by chubot on 1/30/23, 3:17 PM
Big pyramids of dependencies are inherently fragile (whether dynamically or statically typed)
Dependency inversion can make it so that your dependency tree is exactly 1 deep, and it's dynamically wired together in main()
But most people don't write code in that style. It takes extra effort
by gwbas1c on 1/30/23, 2:56 PM
Not really "package management," but it was straightforward and worked very well... (Until you had to juggle 32/64 bit native dlls.)
by DemocracyFTW2 on 1/30/23, 1:26 PM
[1](https://medium.com/weekly-webtips/how-to-install-multiple-ve...)
by Someone on 1/30/23, 7:20 PM
Foo may not be updated for Bar 2.0, while the Baz you use states it needs it, Bar 3.0 may incorrectly declare to be a 100% stand-in for anything between Bar 1.5 and Bar 3.0, Baz and Qux may contain similar functions that would better be part of their shared dependency Quux, etc.
Tooling can make navigating that mess more or less difficult, but the mess still is there.
by Vosporos on 1/30/23, 12:04 PM
by maximumcomfy on 1/31/23, 4:39 AM
1) Every developer no matter how new wants to contribute something. 2) There is no centralized authority in most languages/package system to stop them doing so 3) Imagine all the bad code you write in the first 2 years but now you can't delete it because 100 other developers rely on it and 100 other developers rely on them
The rube goldberg machine exists because of the matryoshka dolls that is the dependency tree. Sometimes rewriting things is actually easier than trying to untangle the wire of dependencies.
by samsquire on 1/30/23, 11:13 PM
What is harder regarding computation - arranging for a computation or doing the computation?
What I mean is that the computation of an addition or subtraction is easy but the arranging of instructions of code into an algorithm is hard. Everything has to be in the right place for the algorithm to work right.
This is why packaging is so difficult. Each algorithm and code expects things to be in a certain place. ABIS and packaging for them is tedious.
by PaulHoule on 1/30/23, 3:45 PM
The worst thing that happened to Python was that distributions like Red Hat not only shipped Python as an rpm but shipped system scripts in Python. This made it practically impossible to upgrade the system Python without breaking scripts and is fundamentally incompatible with how Python packaging works. (You are just one 'sudo pip install' from breaking your system)
Like people who are impressed with ChatGPT (wow that answer seemed so confident even if it is wrong), Pythoners systematically mistake footguns for solutions. For instance, use 'pip install --user' and now the package you've included is visible when you type 'python', 'python3', use a venv, conda, whatever. I've had people tell me that having a 'python3' is the best idea anyone's had since Solomon proposed cutting a baby in half, but if you stop and think you realize that pretty soon you have 'python3', 'python3.5' (broken), 'python3.6' (maybe broken), 'python3.7', 'python3.8', 'python3.9', ...
I would say venv's really solve the problem with the exception that you can currently trash all your venv's with 'pip install --user'.
The trouble is that people who are bothered by this stop using Python, the people who are still using Python are people who have difficulty perceiving the nature of the problem never mind that there is a problem.
The algorithm used by pip to resolve packages is not sound. It goes off trying to install things and can only find out late in the game that it installed a package which is not compatible with another package. This can be solidly blamed on the egg package format which can only determine the dependencies of a package when it tries to install it by running a setup.py. This is nice because it can adapt to the environment dynamically (say add a timezone db on windows) but it is not like Java's maven that knows it has a consistent solution before it downloads the JARs.
I'm pretty sure you could make a Python package manager that works like Maven if it only installed wheels because you can read the directory of a wheel (a ZIP file) with one or two http range requests and then read the dependency data with another http range request. Poetry doesn't quite do this. Poetry does a lot better than pip does, but whatever it does takes a really, really, really, long time.
Poetry might be the best thing going but it has the fundamental flaw of trying to be a 95% or 98% solution which, to the Pythoner, sounds like a good idea. The trouble is that writing programs that are 98% is to computer science what Bigfoot is to zoology or ESP is to psychology. The gap between that and 100% is not 2% but more like 200% because figuring out exactly what is wrong with the conceptual model and working around it in a hard case is at best like pushing a bubble under a rug. Imagine how much trouble you could get into with a 98% correct sort or binary search algorithm... It's a place where professional developers just don't want to go.
Java has the advantage of extreme xenophobia and bad feelings all around that mean that: (1) people don't expect to link very much C code into Java, and (2) people don't feel entitled to use the Java bundled with their Linux system and have it really work. Because of (1), Java code tends to be self-sufficient and avoids a whole sector of 'dependency hell' involving .so and .dll files. Because of (2) people feel both responsible and empowered to install a JDK that works unlike the typical who Pythoner doesn't.
Don't think that docker helps in any way, what I found what that Docker accelerates the ability of Pythoner to find pythons that are broken in odd ways, configured with strange default character sets and such.
Javascript has the nice feature that file A can import from package version B and file C can import from package version D so that the 'diamond dependency' problems that are so terrible in Java (Like that bug in Guava post-13 that broke HDFS) and still problematic in Python rarely produce problems. (You only have problems if an object from A migrates to C and it gets used in incompatible way, contrast that to Java where probably the classloader blows up)
I was wishing over the weekend that Python had something like that because once more I have been building and packaging machine learning models for inference in Python and it would be really sweet to pack up a model that uses scikit-learn or pytorch or whatever into a Python package and then load it into a web server or a batch job. It's totally practical to do that, using joblib to unpickle a few MB of code. Once you end up needing two different versions of pytorch though, you are really screwed.
by drewcoo on 1/30/23, 11:11 AM
Or was that open source software? Or was that any private corporate software ecosystem? If I squint the problems all start to look the same. I must need glasses, considering my squint frequency.
by ttyprintk on 1/30/23, 10:41 AM
- Cross-platform - Wraps another packaging format - Lowest disk space - Dependency version solving - An OS can run make changes to its own low-level state using its own tools - User can configure when and what are installed
Even Nix, which has its own language and isolation model, can’t meet all these.
by conor_f on 1/30/23, 10:43 AM
Following from this, I think that most back-end applications should try solve their messy runtime environment issues with some containerization. Java/C(++) however...
by tstrimple on 1/31/23, 6:27 AM
by goodpoint on 1/30/23, 10:46 AM
It's a form of not-invented-here syndrome and you can see it in all the comments defending their favorite language..
> we apparently have no "package management theory".
Apparently. In reality there's been plenty of research - and for decades - that is not being used.
by jerf on 1/30/23, 5:29 PM
The problem is a classic "looks simple until you really examine it".
by tbarone on 1/30/23, 10:56 AM
I do agree it is an issue. Doing away with node_modules completely on the last rails 7 app I worked on was the most cathartic experience of my life.
by digitalpacman on 1/31/23, 8:17 AM
by intrasight on 1/30/23, 5:40 PM
My experience as a .Net developer has not been that it's a Rube Goldberg machine. I isolate all my projects and minimize 3rd party dependencies.
by Falkon1313 on 1/30/23, 11:27 AM
No 3rd party Rube Goldberg contraption needed. No package managers or dependency resolvers, no special build tooling, no lock files. No constantly having to run updates and figure out what they broke by way of transitive dependencies. Just get what you need and use it.
I haven't, however, developed enough professionally with it to know how that plays out long-term in practice. But I'd have to imagine it would compare favorably to just about everything else nowadays. The things I've been using for the last couple of decades are all pretty abysmal in comparison.
I'd say they were a solution in search of a problem, but people have managed to create a problem to justify the solution, which really still isn't very good. At least, not nearly as good as the simple get, add, use pattern.
by Dowwie on 1/30/23, 3:04 PM
by al2o3cr on 1/30/23, 4:12 PM
by badpun on 1/30/23, 1:08 PM
This is a huge oxymoron. Additionally, C++ does not even have a package manager.
by noloblo on 1/30/23, 1:23 PM
by icelancer on 1/30/23, 10:46 AM
by Quequau on 1/30/23, 10:51 AM
by sinenomine on 1/31/23, 7:07 PM
by onion2k on 1/30/23, 3:17 PM