from Hacker News

Python startup time: milliseconds matter

by vanni on 5/2/18, 5:00 PM with 378 comments

by quotemstr on 5/2/18, 7:13 PM
I've always been disappointed by how large software projects, both FOSS and commercial, lose their "can do" spirit with age. Long-time contributors become very quick with a "no". They dismiss longstanding problems as illegitimate use cases and reject patches with vague and impervious arguments about "maintainability" or "complexity". Maybe in some specific cases these concerns might be justified, but when everything garners this reaction, the overall effect is that progress stalls, crystallized at the moment the last bit of technical boldness flowed away.
You can see this attitude of "no" on this very HN thread. Read the comments! Instead of talking about ways we can make Python startup faster, we're seeing arguments that Python shouldn't be fast, we shouldn't try to make it faster, and that programs (and, by implication, programmers) who want Python startup to be fast are somehow illegitimate. It's a dismal perspective. We should be exercising our creativity as a way to solve problems, not finding creative ways to convince ourselves to accept mediocrity.
by rossdavidh on 5/2/18, 6:19 PM
I have to say that my first reaction was: "maybe you shouldn't use python for this, then". If you are using a language in a way that it gets worse in subsequent versions, that's a good sign that they're optimizing for something other than what you care about.
The programming language R does not, as I understand it, optimize for speed, because they are optimizing for ease of exploratory data analysis. R is growing quite rapidly. So is python, actually. It doesn't mean that either one is good at everything, and it's probably the case that both are growing because they don't try to be good at everything. A good toolbox is better than a multi-tool.
by deaps on 5/2/18, 5:36 PM
I totally understand that milliseconds matter in the use case described in the article.
For me, personally, I use python to automate tasks - or to quickly parse through loads and loads of data. To me, startup speed is somewhat irrelevant.
I built a micro-framework that is completely unorthodox in nature, but very effective for what I needed - that being a suite of tools available from an 'internet' server, available to me (and my coworkers) over port 80 or 443.
My internet server, which runs python on the backend (and uses apache to actually serve the GET / POST) literally spits out pages in 0.012 seconds. Some of the 'tools' run processes on the system, reach out to other resources, and spit the results out in under 0.03 seconds (much of that being network / internet RTT). To me, that's good enough - adding 30 or even 300 milliseconds to any of that just wouldn't matter.
I totally get that if Python wants to be a big (read bigger?) player then startup time matters more...but for my personal use cases, I'm not concerned with the current startup time one bit.
by stinos on 5/2/18, 6:34 PM
Sort of related story: we needed a scripting language able to run on an x86 RTOS type of architecture compiled with msvc and looked into CPython because, well, Python is after all quite a nice language. After spending a considerable amount of time to get it compiled (sorry, don't recall all the issues there, but main one was that the source code assumed msvc == windows which I know is true for 99% of cases but didn't expect a huge project like CPython to trip over) it would segfault at startup. During step-by-step debugging it was astonishing how much code got executed before even doing some actual interpreting/REPL. Now I get there might not be a way around some initialization, but still it simply looked too much to me and perhaps not overly clean either. Moreover it included a bunch of registry access (again, because it saw msvc baing used) which the RTOS didn't have in full hence the segfault. Anyway we looked further and thankfully found MicroPython which took less time to port than the time spend to get CPython even compiling. While not a complete Python implementation, it does the job fur us, and it gets away with startup/init code of just something like 100 LOC (including argument parsing etc). Yes I know it's not a fair comparision, but still, the difference is big enough to, at least for me, indicate CPython might just be doing too much at startup and/or possibly spend time on features which aren't used by many users and/or possibly drags along some old cruft. Not sure, just guessing.
by faho on 5/2/18, 5:38 PM
Mercurial's startup time is the reason why, for fish, I've implemented code to figure out if something might be a hg repo myself.
Just calling `hg root` takes 200ms with hot cache. The equivalent code in fish-script takes about 3. Which enables us to turn on hg integration in the prompt by default.
The equivalent `git rev-parse` call takes about 8ms.
by std_throwaway on 5/2/18, 5:25 PM
This is truly a problem. Even more so if you host your application on a network directory. Loading all the small files takes ages. I really wish there would be a good way to compile the whole application with all the modules into one package once you're ready to release. I really wish the creators of Python would have given such use-cases more consideration.
Edit: I'm aware that there are solutions that put everything a program touches into a kind of executable archive. A single file several hundred Megabytes in size. I've tested it. It doesn't really pre-compile the modules. The startup time was exactly the same.
by marshray on 5/2/18, 5:45 PM
Here's what has worked for me:
1. Don't do that. Either write the driving app in Python or write the subprocesses in an ahead-of-time compiled language. Python's a great language but it's not the right tool for everything.
2. Be parsimonious with the modules you import. During development, measure the performance after adding new imports. E.g., one graph libraries I tried had all its many graph algorithm implementations separated into modules and it loaded every single one of them even if all you wanted to do was to create a data structure and do some simple operations on it. We just wrote our own minimal class.
by the_mitsuhiko on 5/2/18, 5:30 PM
The slow startup combined with the general lack of interest of the Python ecosystem to try to find a solution for distributing self contained applications was the biggest reason we ended up writing out CLI tool in something else even though we are a Python shop.
I'm really curious why there hasn't been much of a desire to change this and it even got worse as time progressed which is odd.

by avar on 5/2/18, 7:37 PM

Best out of 5 times on my Debian testing laptop for a "hello world", in order of worst to best:

    ruby2.5:     83ms (-e 'puts "hi"')
    python3.6:   35ms (-c 'print("hi")')
    python2.7:   24ms (-c 'print("hi")')
    perl5.26.2:  8ms  (-e 'print "hi"')
    C (GCC 7.3): 2ms  (int main(void) { puts("hi"); })

by oneweekwonder on 5/2/18, 7:21 PM

  in the temple of tmux
  for the cult of vi
  we sit and wait 
  for venv to activate

by _0w8t on 5/2/18, 5:29 PM
Given it is known how slow Python at starting up, I am puzzled why Mozilla continue to use it in build scripts. Perl is just as portable but starts up like 10 times faster.
by falcolas on 5/3/18, 1:57 PM
Naive question: If the startup time matters because you're imposing that startup time hundreds or thousands of times - why not remove the startup time?
I'm saying, use the emacs model. Start hg with a flag so it simply keeps running in the background while listening on a port. Run a bare-bones nc script to pipe commands to hg over a port and have it execute your commands.
This isn't a new problem, nor is it even a new solution. No complete re-write of the interpreter or the tool required.
Anyways, that's my 2¢
by agumonkey on 5/2/18, 6:25 PM
I hate to admit it but it's partly why I don't use clojure (pardon the side-topic) more. I can't bear the boot process and the overall cost.
Python is free to tinker, and all similar interpreters are joyful to use. Anything else is probably better for heavy duty jobs environments.
by bayesian_horse on 5/2/18, 7:22 PM
Knock, Knock, who's there? ---- Long Pause --- Java!
by zwieback on 5/2/18, 6:34 PM
Python is great for prototyping or even real apps if performance isn't so critical. However, more than once I've found myself in the situation where I wrote a bunch of Python code and then end up starting that code up from another app, just like the thread discusses and I immediately feel like this is an anti-pattern.
What's even more annoying is that my Python code usually calls a whole lot of C libraries (OpenCV, numpy, etc.) So it's like this: app->OS process->python interpreter->my python code->C libraries. That just really feels wrong so I'd like two things:
1) better/easier path to embed python scripts into my app e.g. resident interpreter
2) some way of passing scripts to python without restarting a new process, this may exist and I'm unaware
by SZJX on 5/3/18, 4:21 PM
Startup time has also been the biggest gripe I have with Julia so far. Otherwise it's a truly fantastic language to work in. I wasn't able to put the `__precompile__()` function to good use it seems - the time it takes to execute my program didn't change at all for some reason. Or maybe it's not actually the startup time that caused the problem, but the time it took to perform file IO. Anyways my program now takes even much longer time to startup than the Python equivalent (though it runs much faster once started), which is a real disappointment.
by area_man on 5/3/18, 1:33 AM
Truly solving this problem is difficult, but you can hack around it with a zygote process to remove a substantial amount of overhead, in exchange for RAM. While this is generally more of win for server processes, you can see it applied to a CLI proof of concept:
https://github.com/msolo/pyzy
by NelsonMinar on 5/2/18, 10:11 PM
I agree Python's startup time is too slow. But one trick you can use to improve it some is the "-S" flag, which skips site-specific customizations. On my Ubuntu system it brings Python 3.6 startup time down from 36ms to 18ms for me; still not great, but it helps.
The drawback is this may screw up your Python environment, not sure how easy it is to work around it if it does.
by pjc50 on 5/2/18, 8:27 PM
Proposed solution: steal undump from emacs. https://news.ycombinator.com/item?id=13073566
Perhaps it would be possible to read in the source files, compile them, and preserve an image of the state immediately before reading input or command line.
by makecheck on 5/2/18, 10:45 PM
I was kind of amazed how penalized a script could be by collecting all its “import” statements at the top. Once somebody’s command couldn’t even print “--help” output in under 2 seconds, and after measuring the script I told them to move all their imports later and the docs appeared instantly.
by kelvin0 on 5/2/18, 8:07 PM
I'm a long time python user, but never really peeked under the hood. However, I have a few ideas.
Optimized modules loading: maybe loading a larger 'super' module would be faster than several smaller ones? For example a python program could be analyzed to find it's dependent modules, and then pack all these into a 'super' module.
Once the python program executes, it would load the single 'super' module and hopefully bypass all the dynamic code which each module runs when imported to load up.
As mentioned previously, this is just off the top of my head and would certainly warrant more investigation/profiling to confirm my hypothesis.
by bgongfu on 5/2/18, 10:57 PM
I'm pretty sure it's too late by now for Python, but I've had some success with compiling C-based interpreters [0] to C; that is, generating the actual C code that the interpreter would execute to run the program. That way you can reuse much of the interpreter, keep the dynamic behavior and still get nimble native executables.
[0] https://github.com/basic-gongfu/cixl#compiling
by crb002 on 5/3/18, 7:24 PM
Should be able to hot boot the VM with the right tooling. You can reuse HPC "checkpoint" code from supercomputing environments as a generic hammer for Python/Ruby/JVM. Some Russians figured out how to do it in userspace without a kernel mod: https://criu.org/Main_Page
by beiller on 5/3/18, 1:57 AM
People here comment about how python is slow, but even fast/slow is I'll defined in my opinion. You don't see people hacking tensor flow (generally) in native languages to speed it up, they just enable CUDA. I'm imagining fast definition is limited to massively parallel server workloads with io.
by est on 5/3/18, 1:28 AM
Reminds me of buildout. It's awful piece of software. We used in previous Flask project, and a simple flask shell takes 3 minutes to start. If you type `import` in CPython shell it will literally freeze for a few seconds. Because it injects one sys.path for each packages specified!!!
by Murrawhip on 5/2/18, 6:19 PM
I'm just curious why more people don't make use of chg to avoid the mercurial startup time. It seemed to solve it for me - are there drawbacks?
by YesThatTom2 on 5/3/18, 4:52 AM
A recent article in ACM Queue included an off-hand remark that Go's compile time is often faster than Python's startup time. Just sayin'
by 2RTZZSro on 5/2/18, 5:25 PM
Would it be feasible to keep a set of Python interpreters around at all times and use a round robin approach to feed each already-on interpreter commands then perform an interpreter environment cleanup out-of-band after a task is complete?
by jahvo on 5/2/18, 6:27 PM
Slowness is the elephant in the room in Python land. It's like everybody has decided to cover their eyes in front of this massive pachyderm. A massive delusion
by dingo_bat on 5/3/18, 1:16 AM
It's weird to see someone make this pitch when C systems software development regularly requires us to try and shave off microseconds. Millisecond delays mean you've already fucked up.
by peterkelly on 5/2/18, 5:15 PM
For use cases where performance is important, using an interpreted (implementation of a) language is a bad idea.
There are many great reasons to use Python, but execution speed is not one of them.
by strkek on 5/2/18, 6:04 PM
IIRC CPython devs reject performance-related patches if they cause the code to become "less readable".
>> I believe Mercurial is, finally, slowly porting to Python 3.
I just gave up on Mercurial since it didn't let me push to BitBucket nor to an Ubuntu VPS via SSH.
For better or worse, Git just works.