from Hacker News

Ask HN: What are the biggest databases you've worked with?

by barelyusable on 5/1/17, 5:23 AM with 16 comments

Originally thinking about SQL databases done with PostgresSQL/MySQL, but would be interested in anything. Wondering about number of queries/transactions per second, and how you handled that scale.
  • by kevrone on 5/1/17, 2:46 PM

    At Timehop we currently work with a single instance AWS Aurora (MySQL-ish) database with over 40TB of data (plus a read-only replica on a smaller instance). Some stats: 1.5MB/sec receive throughput, 10MB/sec transmit throughput, commit latency around 3-4ms (with very occasional spikes to 10-20ms), select & commit counts are about 300/s, and select latency hovers around 35ms (we do about a dozen unions per query though).

    All in all it's the easiest relational database I've ever worked with in terms of stability, speed, and scalability. I know this sounds like an ad for Aurora, but I just really like it.

  • by tupshin on 5/1/17, 9:40 AM

    Having worked with Cassandra for many years, I have worked on:

    * 1000+ node clusters

    * Petabyte scale data

    * 10s of millions of reads and writes per second

    Given my preface, the "how" is scale out on top of Cassandra, of course. Not SQL, and hard to do if you have a highly relational model, but many stories of success at those kinds of scale.

  • by spthorn60 on 5/1/17, 7:19 PM

    MLB's Statcast collects 7TB/game, or 17 petabytes of raw data annually. http://fortune.com/2015/09/04/mlb-statcast-data/
  • by thinkMOAR on 5/1/17, 10:23 AM

    Would be nice if people answering include:

    - hot vs cold data ratio of the total size - read vs write data ratio - if read/writes are split - how partitioning, if used, is done - total machine(s) resources disk/ram - average (read)query response time - how machine/node failure is handled

  • by CyberFonic on 5/1/17, 10:16 AM

    My biggest installation was an accounting system for a multi-national corporation. 30 Oracle instances running on a 256 core Sun cluster, 192 GB RAM, 40 TB EMC SAN. Typical enterprise system overkill.One of the big 6 consulting firms designed it and deployed PeopleSoft on the completed system. I was just the lowly engineer who configured the hardware, the SAN and the Oracle instances. As for Rolls Royce cars, the performance was "adequate".
  • by abalashov on 5/1/17, 9:36 AM

    I've handled ~1 TB DBs in Postgres with about 2000 read queries/sec. Technically these were stored function invocations, so wrapped a considerably larger number of queries inside.

    This didn't seem to be a problem. It was the simultaneous write operations that created real limits, banging on the disks/disk controller like that.

  • by ohstopitu on 5/1/17, 12:38 PM

    I had worked with ~2 TB of data in CouchDB (a few thousand endpoint calls / sec) for my capstone project at University and I thought I had experience, but reading these comments, I realized how much less experience I really have.
  • by avitzurel on 5/4/17, 5:49 PM

    12TB MongoDB spread across 9 shards (2 replicas per shard) 4TB MySQL with some tables ranging the 400GB size.

    MongoDB handled about 13K ops/sec at peak times with around 5-8K of these being writes.

    MySQL was probably around 2-3K ops/sec.

  • by Clownshoesms on 5/2/17, 5:54 AM

    2G, but it was a Cache database, in a hospital. Probably the worst job of my life.