from Hacker News

Common Infrastructure Errors I've Made

by eandre on 12/3/21, 8:13 PM with 155 comments

  • by travisd on 12/3/21, 9:40 PM

    > Don't write internal cli tools in python

    100%. I stopped writing anything that had to be deployed (basically everything except Jupyter notebooks for data stuff) in Python because it’s truly a nightmare. Go and goreleaser is great for writing a CLI (and if it’s public, it can auto generate binaries and upload to GitHub, create a Homebrew/Scoop bucket, etc)

  • by torton on 12/3/21, 9:54 PM

    > Don't write internal cli tools in python

    A lot of the advice is good but I take an issue with this one. With poetry and docker, packaging Python apps for easy consumption is a non-issue. Same for Ruby. If you can get your team to standardize on poetry, you might not even need containers -- but, honestly, running these tools from CI or automation (anywhere) is so useful that you probably want container versions anyway.

    Golang is not a good fit for exploratory CLIs that work with complex data structures and are written for one-off, low-CPU consumption purposes -- not for scale-up API services. Just having an interactive shell (or `pry` in Ruby, those two are identical for the purpose) saved me probably weeks of time. Trying to unit test any moderately complex API surface brings me to tears when I compare it to trivial object mocking in something like Ruby.

    Python (or Ruby) are ideal for this and have excellent frameworks for CLI tools.

  • by dijit on 12/3/21, 8:49 PM

    Regarding being cloud provider agnostic: it’s not always for fault tolerance, there can be a couple different reasons.

    1) it gives your company a stronger bargaining position with the cloud provider.

    Granted, my companies tend to have extremely high spend- but being able to shave a dozen or so percent off your bill is enough to hire another 50 engineers in my org.

    2) you may end up hitting some kind of unarguable problem.

    These could be business driven (my CEO doesn’t like yours!), technical (GCP is not supported by $vendor) or political (you need to make a China version of your product, no GCP in China!)

    Everything is trade offs. AWS never worked for us because the technical implementation of their hypervisor was not affined to CPU cores of the machine, meaning you often compete with other VMs on memory bandwidth. — but AWS works in China (kinda). So my solutions support both GCP and AWS as slightly less supported backup.

  • by klodolph on 12/3/21, 9:24 PM

    > Don't migrate an application from the datacenter to the cloud

    Reading the actual text of this one I get a different impression, but I'm still not sure I agree with this one. Applications can be radically different from each other in terms of how they are run.

    At one company, we ran a simple application as SaaS for our customers or gave them packages to run on-prem. We'd stack something like seven SaaS customers on a single set of hardware (front-ends and DB servers). The cloud offering was a no-brainer, you can just migrate customers one by one to AWS or whatever, or spin up a new customer on AWS instead of in our colocation center.

    Applications have a very wide range of operational complexity. Some applications are total beasts--you ask a new engineer to set up a test environment as part of on-boarding and it takes them a week. Some applications are very svelte, like a single JAR file + PostgreSQL database. The operational complexity (complexity of running the software) doesn't always correspond to the complexity of the code itself or its featureset.

  • by raffraffraff on 12/4/21, 8:25 AM

    Especially the alerts thing. I think every company I've ever worked for made the mistake of ignoring alert spam. If an alert doesn't require human action, then it should be a log or a metric. And by all means plot it on a graph (the metric that triggered the alarm, or in the case of a boolean test result, the frequency of the failure). Look at the graph during real incidents if you want. Talk about it at the monthly meeting. But don't generate an alert that people should ignore. You're playing Russian Roulette.
  • by hdjjhhvvhga on 12/4/21, 11:08 AM

    > If you are in AWS, don't pretend that there is a real need for your applications to be deployable to multiple clouds. If AWS disappeared tomorrow, yes you would need to migrate your applications. But the probability of AWS outliving your company is high

    Well, it's not about AWS shutting down at all! It's about them having complete control over your infrastructure, so they dictate the terms. This has many consequences: (1) they can raise prices and you can do absolutely nothing about it, (2) since you chose AWS with its dynamic pricing instead of flat-rate dedicated servers, each expansion (traffic, new services) is a cost for you. This means at some point you will realize you will save sick amounts of money if you switch to bare metal (as several notable companies have done). Except that at this point it's really difficult because you have to basically start from zero so the inertia basically pulls you into continuing this vicious cycle.

    So this is just a straw-man argument. Really, I haven't heard anyone saying "but Amazon can go out of business", it's just ridiculous.

  • by grafelic on 12/3/21, 10:27 PM

    > You spring back to the present day, almost bolting out of your chair to object, "Don't do X!". Your colleagues are startled by your intense reaction, but they haven't seen the horrors you have.

    They may be startled, but they almost certainly won't listen. The purgatory nature of IT work culture ensures this repetitive pattern.

  • by worik on 12/4/21, 12:57 AM

    > Don't Design for Multiple Cloud Providers

    Designing for portability is important. Otherwise you expose yourself to dreadful uncertainties.

    "AWS will not disappear". That is probably true. The average business can take this risk (and if you are huge you are not listening to me). But AWS might raise its prices to a point they are getting all your profit. DO you trust Amazon? Really? The particular AWS feature you depended on with the tight coupling "Don't Design for Multiple Cloud Providers" implies may get deprecated. What then?

    This is as old as the hills: Design in layers. Have an AWS layer. If AWS goes away, quadruples their fees, deprecates your services, or you are hit with USA sanctions then there is a layer that has to be rewritten.

    Old wisdom. Use it.

  • by timwis on 12/3/21, 9:40 PM

    Bummer about Python :/ it’s my go-to for CLI tools, but I’ve seen that problem too.. pipenv helps, but I wonder if there’s a better way to package them so they’re more future proof.. or do I really need to learn go?
  • by odiroot on 12/4/21, 11:46 AM

    The post has really great advice but...

    > Don't write internal cli tools in python

    Disagree completely with this. This has been probably the biggest overall boost for both engineers and operators at a few companies, I worked at.

    You deliver fast, it's easy to debug, and requires no compilation -- which is usually a bigger hassle than any Python-specific problem. It gets really important if you have operators on Linux/Windows/Mac.

  • by iechoz6H on 12/3/21, 10:19 PM

    > Don't run your own Kubernetes cluster

    If we ran our cluster in the cloud we'd be on the hook for hundreds of thousands of dollars of additional costs due to the high throughput of our service. There are always exceptions to any list of rules.

  • by SkipperCat on 12/3/21, 11:08 PM

    Not sure if I agree about the Python jab. I've seen "pip install ...." run flawlessly more times than I've had breakfast cereal, and I eat a lot of cereal.

    I kinda agree on his first point about migrating stuff to the cloud but if you've done your deployments on like-like platforms (on prem containers to cloud containers) its not that bad.

  • by zebraflask on 12/3/21, 11:01 PM

    Nobody wants to mention "don't roll your own security"? That's a 101 kind of question - very easy to feel clever when you try it as an amateur, nightmarish when (really not if, when) you get it wrong.

    That is one area where I think you want to outsource that to specialists.

  • by nickjj on 12/3/21, 11:44 PM

    I think Python still has a place for CLI tools, both internal and external.

    If you can get away with a zero dependency Python script then there's no struggle. You can download the single Python file and run it, that's it. It works without any ceremony and just about every major system has Python 3.x installed by default. I'd say it's even easier than a compiled Go binary because you don't need to worry about building it for a specific OS or CPU architecture and then instructing users on which one to download.

    Argparse (part of the Python standard library) is also quite good for making quick work out of setting up CLI commands, flags, validation, etc..

    There's a number of tasks where using Python instead of Bash is easier. I tend to switch between both based on what I'm doing.

  • by betaby on 12/3/21, 8:56 PM

    0) Don't write software - outsource it to someone smarter.
  • by gumby on 12/4/21, 3:04 AM

    > Don't migrate an application from the datacenter to the cloud

    Eh, the salesman told me it would be seamless while we were watching the football game from his company’s box. And they are the experts: it’s their cloud!

    I’m gonna tell the team to do it this way when I get back to the office. I think they just like running hardware and aren’t thinking of our balance sheet.

  • by gorgoiler on 12/5/21, 4:36 AM

    Late to the party, but here’s a solution to “Python packaging” that works fantastically for all my stuff and requires four lines of setup and (only) one magical invocation — pip:

      $ cat dogclock/__init__.py
      import arrow  # Example dependency
      def bark():
          print(*(["woof"] * arrow.get().hour))
    
      $ cat scripts/dogclock
      #!/usr/bin/env python3
      import dogclock
      dogclock.bark()
    
      $ cat setup.py
      from setuptools import setup
      setup(
          install_requires=['arrow'],
          packages=['dogclock'],
          scripts=['scripts/dogclock'])
    
      $ pip3 install .
      …
    
      $ dogclock
      woof woof woof woof woof woof
  • by maleldil on 12/4/21, 3:31 AM

    > Don't write internal cli tools in python

    What if your team is Python-based? Why would I write a CLI tool to be used by other Python programmers in Go or Rust, when some of them know neither?

    It doesn't matter that you know Go and can generate all possible binaries; eventually, someone else will have to make a change in your tool. It will already be difficult for them to understand a new codebase, so you don't need to make it harder by also exposing them to another language.

  • by privacyonsec on 12/4/21, 6:26 AM

    > Nobody knows how to correctly install and package Python apps

    Anybody tried PyInstaller ? it packages the whole Python project, dependencies included into a single exactable binary

  • by jl6 on 12/5/21, 8:11 AM

    > Don't Design for Multiple Cloud Providers

    This has its own sub-antipattern: “Just put your application in a container, then it will run anywhere!”

  • by raz32dust on 12/3/21, 11:20 PM

    > If you are in AWS, don't pretend that there is a real need for your applications to be deployable to multiple clouds.

    Isn't the reason people do this to make sure they have leverage in case AWS increases prices in the future? I can see how cloud providers have probably made it extremely difficult to design for multiple clouds and so this effort might not be worth it but the reason at least seems justifiable.

  • by tommiegannert on 12/4/21, 8:04 AM

    > Soon I was auditing new services for "multi-cloud compatibility", ensuring that instead of using the premade SDKs from AWS, we maintained our own.

    I wonder if a useful middle-ground is to have lint checker rules to enforce using a blessed subset of a cloud provider's SDK. So that some thought/effort must be put into using a new feature?

  • by cloudengineer94 on 12/4/21, 10:13 AM

    I work with Azure and my issue is always problems with the Azure Keyvault.. At a personal deployment, it's always passwords I always forget to save them because I'm juggling with tons of things at once lol.
  • by tex0 on 12/4/21, 6:18 AM

    That's a seriously good and honest list.

    Thank you.

  • by streetcat1 on 12/3/21, 11:38 PM

    Hopefully you will have enough money to pay AWS once they raise their prices.

    I am not sure what will happen if you will not?