by chuckgreenman on 5/1/21, 2:10 PM with 79 comments
All of them have been interpreted languages like PHP, Python and Ruby. Their builds and tests took between 30-45 minutes. As far as project size and complexity, these were projects that were built and maintained by four person teams over the course of 3-5 years, so it's not like they were massive services with hundreds of developers.
I'm still kind of new, I worked at a couple internships and I've been working full time for a year so I might be totally wrong but I feel like these CI pipelines could be optimized to run faster.
by jart on 5/1/21, 2:45 PM
on travis for a repository that builds 14,479 objects, 67 libraries, and 456 static executables, 284 of which are test executables which are run too. If I want to run all the test binaries on freebsd openbsd netbsd rhel7 rhel5 xnu win7 win10 too, then it takes 15 additional seconds. On a real PC, building and testing everything from scratch takes 34 seconds instead of two minutes.
by jasonpeacock on 5/1/21, 3:49 PM
What matters is the development process - local build & test should be fast.
Otherwise, with CI/CD, it's a continually-moving release train where changes get pushed, built, tested, and deployed non-stop and automatically without human intervention. Once you remove humans from the process, and you have guard rails (quality) built into the process, it doesn't matter if your release process for a single change takes 1min, 1hour, or 1day.
Even if it takes 1 day to release commit A, that's OK b/c 10min later commit B has been released (because it was pushed 10min after commit A).
I've seen pipelines that take 2 weeks to complete because they are deploying to regions all over the world - the first region deploys within an hour, and the next 2weeks are spent serially (and automatically) rolling out to the remaining regions at a measured pace.
If any deployment fails (either directly, or indirectly as measured by metrics) then it's rolled back and the pipeline is stopped until the issue is fixed.
[1] Yes, even for fixing production issues. You should have a fast rollback process for fixing bad pushes and not rely on pushing new patches.
by ilmiont on 5/1/21, 2:44 PM
Anyway as the question is "How Long Is Your CI Process", here we go!
I have two main types of pipelines, both running on a self-hosted GitLab instance which runs on an 8th-gen i3 Intel NUC. No project is particularly massive.
1. PHP Projects. Run PHPStan + unit tests on each branch. Most projects take 1-5 mins. On master, run PHPStan + unit tests, build a Docker image, and use Helm to deploy to managed Kubernetes on DigitalOcean. This takes 5-10 mins.
2. React Projects, again not massively huge, but sizable. Biggest time is to run ESLint on every branch. About 5 mins (due to very poor caching which I keep meaning to fix). On master, run ESLint, create a Docker image, and deploy to managed Kubernetes. 5-10 mins.
There are opportunities to improve this by fixing/optimising caching. Overall I'm reasonably happy with the pipeline performance. I'm also sure that upgrading the hardware would make a big difference, probably more so than fixing the caching; an i3 isn't really ideal but this machine does well overall for my small team.
by dmoy on 5/1/21, 3:41 PM
30-45 minutes just for a simple test suite, even if it's PHP, Python, and Ruby - that sounds long. But without any details on exactly what's being tested, it's hard to say.
by wwwigham on 5/1/21, 10:39 PM
by glacials on 5/1/21, 3:50 PM
I joined a company last year that's trying to solve this [1] by tracing tests so it can skip any whose dependencies (functions, environment variables, etc.) haven't changed. It's amazing what "what if we don't run tests we know will pass?" can do to a CI pipeline.
[1]: https://yourbase.io
by nickjj on 5/1/21, 3:27 PM
Most of that time is spent building the Docker image.
The CI pipeline does:
- Build Docker images for the project
- Run the project
- Run Shellcheck on any shell scripts
- Run flake8 to lint the code base
- Run black in check mode to ensure proper formatting
- Reset and initialize the DB
- Run test suite
That's a baseline. At this point any increase to the ~2 min is a result of running more tests but it's usually possible to run about 100 assorted tests in ~10 seconds (testing models, views, etc.).
An example of the above is here: https://github.com/nickjj/docker-flask-example
A similar pipeline with comparable tools for Rails takes ~4-5 minutes and Phoenix takes ~4-5 minutes too. You can replace "flask" with "rails" and "phoenix" in the above URL to see those example apps too, complete with GH Action logs and CI scripts. These mainly take longer due to the build process for installing package dependencies, plus Phoenix has a compile phase too.
by wyc on 5/1/21, 2:48 PM
by pydry on 5/1/21, 3:09 PM
The only times when it was long enough that it was painful it was because there was a stage that couldn't be debugged without running the build. That's invariably what I actually preferred to fix, not the total lead time.
A 45 minute sanity check to verify nothing is fucked before releasing is fine. A 45 minute debugging feedback loop is a nightmare.
Faster CI builds are typically a nice-to-have rather than a critical improvement (& doing too many nice to haves has killed many a project).
by pyrophane on 5/1/21, 3:15 PM
Our pipeline typically takes 10-30 minutes, depending on what jobs run and where cache gets used.
The longest job, at a consistent 12 minutes is our backend test job. There’s not a lot we can do to speed this up any further because a lot of the tests run agains a test db so we can’t easily run them in parallel. Perhaps if we wanted to be really clever we could use multiple test dbs.
The build for our containers is usually very quick (a few minutes) unless we modify our package requirements.txt. That happens infrequently but it triggers an install step that will increase the overall time for the job to 10-12 minutes.
The deploy phase is very quick.
We spent a bit of time optimizing this and it came down mostly to:
1. Using cache where we can.
2. Ensuring we had enough resources allocated so that jobs were not waiting or getting slowed down by lack of available cpu.
2. Making sure that each command we run is executing optimally for performance. Some commands have flags that can speed things up, or there alternate utilities that do the same thing faster. One example of the latter is that we were using pytype as our type checker, but it often took about 15 minutes to run. We swapped it out for pyright, which takes under 5.
by tomduncalf on 5/1/21, 2:32 PM
Our current CI takes an hour because it has to build quite a complex app on iOS and Android, this happens in parallel but the Azure build nodes we use are pretty slow. Ideally it would be faster but it’s not too huge an issue in practice, we have the lint/unit tests etc. run first so the build will fail early for any glaring errors.
by evantahler on 5/1/21, 3:00 PM
* A "complex" library (node-resque). In CI (CircleCI) we install deps, compile Typescript to JS, test on 3 versions of node, and build docs. 4 min w/ some parallelization https://app.circleci.com/pipelines/github/actionhero/node-re...
* A web server framework (actionhero): In CI(Github Actions) we install deps, compile Typescript to JS, test on 3 versions of node, and build docs. 7 min w/ some parallelization https://github.com/actionhero/actionhero/actions/runs/801273...
* A Monorepo (Grouparoo): In CI (CircleCI) we install deps, compile Typescript to JS, run migrations, check licenses, test UIs, CLI tools, Plugins, and try out a few different databases. 5 minutes with rather extreme parallelization https://app.circleci.com/pipelines/github/grouparoo/grouparo...
In my experience, the biggest wins in CI speed improvements come from parallelization. You can parallelize by either running multiple processes/containers or by running tests in parallel on the same container (jest, parallel_tests, etc)
by adamcharnock on 5/1/21, 9:59 PM
Build and test steps take about equal time. We build from a common docker image which has most of the time consuming work already done.
It can take longer if the Python deps have changed and therefore the ‘poetry install’ step cannot be pulled from the cache.
Also, we deploy multiple individual Django projects, rather than one huge monolithic project. That probably gives some speed up. It means that changes to common code can trigger 5-15 pipelines, but they all take a similar amount of time.
30-45 minutes seems like a really long time to me. Maybe you have a lot of slow tests, but I’d also looking at the build process too. If you’re doing docker images you may find you can extract a lot of the time consuming work to a common base image. You can also get plugins that help docker pull already-built layers from a cache.
If it is the tests then you could always try running tests in parallel. One worker per CPU or some such.
FWIW - I find that these long feedback loops can really kill productivity and morale. 10 mins for a deploy is about my limit.
by bastijn on 5/1/21, 3:21 PM
* multi-million Loc
* number of projects > 50
* languages C#, C, C++, typescript
* Frameworks: .NET Framework, .NET core, .NET standard, Angular, React
* Quality tools in build: TICS, Coverity, Roslyn, custom tools (>10)
* Tests running in build: nunit, msvstestv2, jest, karma
* number of tests running in build > 5000
* package managers used: Nuget, npm
* number of packages (private and public) > 500
Still a lot I forgot now.
It all runs in approximately 45 mins for stage1 builds, stage2-4 run nightly and weekly and take much longer (>2 hours to >24 hours for long duration stage 4). Increasing stages run longer test suites, up to approx 50k or so for stage 3 and 4, more quality checks, etc.
P.s. We spend countless hours reducing our build times. In addition we have setups to split build pipelines for those who do not need the entire archive build for their dev purposes etc. Yet, CI server aways runs single-core and cold builds.
by john-tells-all on 5/1/21, 4:43 PM
None of the above really matters, the important bit is that USERS actually see the work! Everything else is necessary, of course, but doesn't create value in itself.
So, the question is, how does each system create VALUE for its audience, and what's the latency (LAG)? CI is often for 4-10 developers, and takes ~10-20 minutes for smallish web shops. The value the business gets is that devs can check they didn't forget to "git add" a file :)
Devs and the business always complain about the slowness of CICD, but rarely invest the modest effort to make it faster. Here are some ways to improve the development cycle:
Speed up databases. Move from "install database and sample data interactively every time" to having a pre-baked Docker image with the database and seed data. Much faster: you get lower LAG and the same VALUE for the team.
Run fewer tests. Running tests creates business value -- confidence a deployment will give features to users -- but takes time (LAG). However, for 90% of the cases Devs get value by running a subset of the tests. Thus, much faster: less LAG, same VALUE. Run all the tests before a real deploy, or run the full suite nightly. Devs get the value of a full test without having to wait for it.
Simplify. CICD should just run things Devs can run locally. That is, Devs can run fast local test subsets to get rapid feedback (low LAG), and get focused VALUE. When CICD tests fail, it's very easy for Devs to figure out what went wrong, because CICD and local environments are nearly identical.
CICD creates a lot of value for several audiences. Plot out each one, and see what you, the business, want to improve upon!
by thinkafterbef on 5/1/21, 3:42 PM
After some talk we decided to build a CI service based on this premise, i.e desktop CPU outform Cloud CPUs for the CI use case. After some months we managed to create BuildJet.
I would say it at minimum cuts the the build time in half and the best part is that it plugs right into Github Actions, just need to change one line in your Github Actions configuration.
If it sounds useful for you, check it out: https://buildjet.com
by other_herbert on 5/1/21, 3:36 PM
Then in the main build that hopefully is deploying to a qa environment that can do more testing, bundle artifacts for whatever dependencies need them, all that kind of stuff...
That’s how ours is set up... we use Jenkins with parallel parts where possible (like build the ui while tests that hit the db are run) it’s a process that takes time to get right and time to optimize...
We’re at about 5 mins for the quick part and 8 or so for the slower part
Both of those will probably get worse as we are planning to include full ui testing on the deployed environment too
by safeerm on 5/1/21, 9:35 PM
It's really interesting how many companies these days have a primary pricing model of build minutes.
If you are looking for a DIY solution for your CI, check out https://tinystacks.com. We have the fastest way to launch and operate your Docker app on AWS. In one click, we setup infra and an automated pipeline on your AWS. Uses ECS with Fargate. All setup for you with a control center for logs, env vars and scaling. No config nightmare.
Email me safeer at tinystacks.com and I can get you onboarded.
by eqvinox on 5/1/21, 3:10 PM
by erikpukinskis on 5/2/21, 3:12 PM
The slowest part of the previous CI process was our integration tests on Selenium. And the new stack doesn’t have any of those (it just does unit tests in Karma).
And frankly, I think I’d take the 15 minutes with the extra security of knowing the whole stack is functioning together, over the speedup to my dev cycle.
But I feel a bit crazy saying that. In the end, the site doesn’t seem to go down due to the lack of integration tests. Maybe because we complement with manual testing. I never deploy without opening up the site in a browser anymore.
by lacker on 5/1/21, 2:52 PM
by rubyn00bie on 5/1/21, 3:40 PM
by house9-2 on 5/1/21, 4:18 PM
What makes the build so slow is that the database is involved, if you want fast builds decouple your unit tests from the database. With rails including the database access in tests makes everything easier and you get closer to real-life execution but slow ...
by detaro on 5/1/21, 2:32 PM
What is applicable to the specific project depends, same as to what is worth which effort. To a degree, of course throwing more resources at the problem helps - faster build workers, parallelized tests, ... but isn't always easily implemented on a chosen platform and costs money of course.
In projects I worked on, it varied greatly. From just a few minutes to cases where the full process took 6 hours (which then was only done as a nightly job, and individual merge requests only ran a subset of steps). I really would want <15 mins as the normal case, but it's often difficult to get the ability to do so.
by some_developer on 5/2/21, 6:28 PM
The frontend/TS stuff takes longer, usually 10-11 minute, where it's "truly building" and we can hardly parallelize this one. Or we lack the expertise to fix it probably.
At the moment though this is non-container environment; once we add building/deploying into the mix, I'd assume the time will go up a bit.
by innocentoldguy on 5/2/21, 1:35 AM
by systematical on 5/1/21, 3:18 PM
Our pipeline runs in Jenkins and builds a docker image that runs composer installs, application copy, and that sort of thing. We also run phpunit, phpstan, phpmd, and phpcs in our pipeline. Finally the image gets pushed up to ECR.
I think that's all pretty standard stuff. TBH I'd like us to move to github actions and optimize for more staged builds in our docker images, but we have higher priorities at the moment.
by carlmr on 5/1/21, 3:31 PM
It really depends on what you're trying to achieve and how big the project is.
by mhh__ on 5/1/21, 10:11 PM
There is absolutely a huge amount of room for performance in areas like this. With Python especially it's very common so think "Ah yes but numpy", when it comes to performance, and that is true in steady state where you are just number crunching, but there is mindnumbingly large amounts of performance left on the table vs. even a debug build with a compiler. Testing in particular is lots of new code running for a short amount of time, so it's slow when interpreted.
by wiredfool on 5/1/21, 3:46 PM
A full GHA CI run is ~30 minutes, but that involves 3 platforms, a (short) ci-fuzz run, and running the full test suite through valgrind.
by nojvek on 5/2/21, 6:54 AM
Basically cache + parallelize.
Once PR passes, merge to master deploys in a minute. If something is wrong we can revert within a minute.
It’s joyful to build things when your tools are fast and reliable.
by maccard on 5/1/21, 9:59 PM
by tpxl on 5/1/21, 9:30 PM
by duped on 5/1/21, 3:09 PM
CI tech debt is very difficult to pay down, and imho not worth it unless the dollar costs are becoming excessive and you have a dedicated release or DevOps engineer who can own it as an internal product.
by gravypod on 5/1/21, 3:51 PM
The main blockers I've seen to CI performance is:
1. Caching: Most build systems are intended to run on a developers laptop and do not cache things correctly. Because of this most CIs completely chuck all of your state out of the window. The only CI that I've found that lets you work around this is Gitlab CI (this is my secret for getting <1 min build/test CI pipeline)
2. What you do in CI: If you want to run end-to-end integration tests, it's going to be slow. Any time you're accessing a disk, accessing the network, anything that doesn't touch memory, it's slow. Make sure your unit tests are written to use Mocks/Fakes/Stubs instead of real implementations of DBs like sqlite or postgres or something.
3. The usage pattern: If you don't have developers utilizing your CI machines 100% of the time you are "wasting" those resources. People will often say "lets autoscale these nodes" and, when you do, you'll notice they scale down to 1 node when everyone is asleep, everyone starts work and pushes code, then the CI grinds to a halt. You can make a very inefficient CI just by having the correct number of runners available at the correct time.
Another thing to consider: anything you can make asynchronous doesn't need to be fast. If you setup a bot to automatically rebase and merge your code after code review then you don't really need to think about how fast the CI is.
by bluGill on 5/1/21, 3:59 PM
Note that half of the tests on the fast build are regression that can't possibly fail based on my changes... we run them anyway because about once a month something has a completely unexpected interaction and so a test fails that the developer didn't think to test.
by sjburt on 5/1/21, 9:41 PM
I think we could get it down to 3 minutes or so if we changed some things, but 10 minutes vs 3 minutes doesn't really change the workflow for us.
by jareds on 5/1/21, 2:29 PM
by davewasthere on 5/2/21, 12:10 PM
Deployment takes a little under a minute in total.
Worst one was probably a big Sharepoint application at one client's site. But that still only took about 12 minutes in total.
by thiht on 5/1/21, 4:41 PM
The pipeline is: build, unit test, lint in parallel, then package and save the relevant artifacts, then build a Docker image, then run the integration tests, and finally deploy (staging, dev or prod depending on the branch).
We also have end to end tests that run periodically and are a bit longer, but they're not on the path to prod.
by megous on 5/1/21, 3:37 PM
Kinda feels like a waste of time, especially if your code is well componentized and there are not many central points of failure. (and those are pretty easy to see with cursory manual testing)
by 1_player on 5/2/21, 8:23 AM
Between 5 and 10 minutes from push to staging deploy for our Elixir and Node apps. And most of it is spent compiling Javascript assets.
by fmiras on 5/1/21, 2:59 PM
by jdlshore on 5/2/21, 1:23 AM
It’s fast because the code has very few end-to-end tests... only eight or so. They take six seconds. The rest of the tests average about 200/sec, including narrow integration tests.
by rcxdude on 5/1/21, 3:54 PM
by nitwit005 on 5/1/21, 3:54 PM
by formerly_proven on 5/1/21, 3:07 PM
by jiux on 5/1/21, 3:54 PM
Currently experimenting with Travis-CI, but man it sure does take awhile at 45-60 minutes roughly in my personal case. Have heard a dedicated Mac of some kind to leave at the office may help. Overall, I am all ears to any advice.
by throwaway189262 on 5/1/21, 10:30 PM
We run mostly Java backend and JS frontend, same story.
Tons of opportunities for optimization but company doesn't want to spend the time and devs appreciate the extra fuckoff time
by djxfade on 5/1/21, 2:48 PM
by Graffur on 5/2/21, 6:21 PM
by nevinera on 5/1/21, 10:38 PM
The truth is that most slow pipelines "could" be optimized to run wildly faster, but that it is costly to do so. You may be able to find low-hanging fruit that affect the build-time significantly, but most of the optimizations to be done are very large projects, like updating thousands of tests to be isolated from the database.
by TeeMassive on 5/1/21, 9:01 PM
by thiago_fm on 5/4/21, 9:40 PM
by gardnr on 5/1/21, 3:03 PM
by sparker72678 on 5/2/21, 1:24 AM
by jimmyvalmer on 5/1/21, 2:35 PM