from Hacker News

Amazon Timestream – Fast, scalable, fully managed time series database

by irs on 11/28/18, 5:16 PM with 124 comments

  • by citilife on 11/28/18, 5:47 PM

    At my day job, I build a lot of machine learning systems that require data to be fed in a time series manner[1].

    Often this means building systems to analyze terabytes of logs [semi]-realtime. All I have to say is - thank god! This is going to make my job a lot easier, and likely empower us to remove our current infrastructure setup.

    I know at one point we actually considered building our own time series database. Instead, we ended up utilizing a Kafka queue with an SQL based backend after we parsed and paired down the data, because it was the only one quick enough to do the queries.

    Should make a lot of the modeling I've worked on a bit easier[1].

    [1] https://medium.com/capital-one-tech/batch-and-streaming-in-t...

  • by sciurus on 11/28/18, 7:23 PM

    This is not cheap for the "DevOps" use case.

    Imagine you have 1000 servers submitting data to 100 timeseries each minute. That's 100,000 writes a minute (unless they support batch writes across series) At $0.50 per million writes that's $72 a day or $26k a year.

    Now imagine you want to alert on that data. Say you have 100 monitors that each evaluate 1GB of data once a minute. At $10 per TB of data scanned, that's $1,440 a day or $525k a year!

  • by willlll on 11/28/18, 5:52 PM

    I'm actually impressed at how incredibly expensive they made this. $0.50 per million 1KB writes, which is 20x what aurora charges, since aurora allows 8KB writes. And Aurora is already expensive if you actually read/write to it.
  • by Tehnix on 11/28/18, 9:22 PM

    Quite excited for this! We have currently been experimenting with using DynamoDB, and managing our own rollups of our incoming data (previously on an RDS, which is not a good choice for this kind of data).

    ---

    I've seen a lot of people complain about pricing, so I thought I'd share a little why we are excited about this:

    We have approximately 280 devices out, monitoring production lines, sending aggregated data every 5 seconds, via MQTT to AWS IoT. The average messages published that we see is around ~2 million a day (equipment is often turned off, when not producing). The packet size is very small, and highly compressable, each below 1KB, but let's just make it 1KB.

    We then currently funnel this data into Lambda, which processes it, and puts it into DynamoDB and handles rollups. The costs of that whole thing is approximately $20 a day (IoT, DynamoDB, Lambda and X-Ray), with Lambda+DynamoDB making up $17 of that cost.

    Finally, our users look at this data, live, on dashboards, usually looking at the last 8 hours of data for a specific device. Let's throw around that there will be 10,000 queries each day, looking at the data of the day (2GB/day / 280devices = 0.007142857 GB/device/day).

    ---

    Now, running the same numbers on the AWS Timestream pricing[0] (daily cost):

    - Writes: 2million * $0.5/million = $1

    - Memory store: 2 GB * $0.036 = $0.072

    - SSD store: (2GB * 7days) * $0.01 (GB/day) * 7days = $0.98

    - Magnetic store: (2 GB * 30 days) * $0.03 (GB/month) = $1.8

    - Query: 10,0000 queries * 0.007142857GB/device/day --> 71GB = free until day 14, where it'll cost $10, so $20 a month.

    Giving us: $1 + $0.072 + $0.98 + $1.8 + ($20/30) = $4.5/day.

    From these (very) quick calculations, this means we could lower our cost from ~$20/day to ~$4.5/day. And that's not even taking into account that it removes our need to create/maintain our own custom solution.

    I am probably missing some details, but it does look bright!

    [0] https://aws.amazon.com/timestream/pricing/

  • by sciurus on 11/28/18, 5:55 PM

    It's got to be a rough day for the team at https://www.influxdata.com/ . This could become serious competition for their InfluxCloud hosted offering.
  • by addisonj on 11/28/18, 5:31 PM

    Nice to see, this has felt like a gap in cloud offerings for a while... and the open source options have difficulties.

    From the little that was said, going to guess this uses something like Beringei (https://code.fb.com/core-data/beringei-a-high-performance-ti...) under the hood

  • by plasma on 11/28/18, 8:02 PM

    The financial read cost of this database makes it practically unusable for customer facing dashboards, disappointing.
  • by axus on 11/28/18, 5:32 PM

    A place to put the timestamped data they download from yesterday's Amazon Ground Station.
  • by brootstrap on 11/28/18, 8:34 PM

    Been searching for years to a good alternative to postgres for storing gobs of weather timeseries data. So far we have been running postgres system for many years in production and have hired multiple contractors to implement a 'real timeseries solution'. All of which have been utter shit and complete failures. The AWS services are expensive as all hell. With a little bit of imagination we created a unique schema for timeseries data that doesnt require terabytes of space, and processes billions of data points a day, and has blazing fast queries into said data.
  • by samstave on 11/28/18, 5:45 PM

    So what will this compare wrt boundary, signalfx, stackdriver, etc types of previous services...

    Ill have to go look into this, because if aws historic pricing for any large volume stream, quickly becomes untennable.

    Its very easy to have gobs and gobs of time series points... aws might make using this way too expensive for anything at relative scale for a small startup?

  • by brian_herman__ on 11/28/18, 5:24 PM

    I wonder how this compares to KDB
  • by probdist on 11/28/18, 5:32 PM

    Seems positioned to compete with Azure Data Explorer (MSFT's log/time series optimized service). I know Azure runs a lot of services on top of Data Explorer (previously called Kusto) I wonder if this is a true internal battle tested product or a me-too offering.
  • by erikcw on 11/28/18, 5:35 PM

    Seems like this could be a great remote storage backend for Prometheus.
  • by temuze on 11/28/18, 5:41 PM

    Honest question: when dealing with time-series data, do you actually need every data point? Is that level of granularity really necessary?

    IMO, it makes way more sense to decide the aggregations you want ahead of time (e.g. "SELECT customer, sum(value) FROM purchases GROUP BY customer"). That way, you deal with substantially less data and everything becomes a whole lot simpler.

  • by MagicPropmaker on 11/28/18, 6:21 PM

    We had applications where we were tracking guests in a venue through various means. We tried a number of queuing systems to manage the flood of events, but they'd all fall over. I'll love to run my old "venue simulator" through this and see if it can stand up to actual guest load as they walk around, ride, purchase things.
  • by coredog64 on 11/28/18, 7:04 PM

    I'm wondering if this shares any technology with the CloudWatch metrics backend. They've been making improvements there all year, and most of them generally align with what's announced here.

    CloudWatch metrics are also very expensive for what you get, so that's another similarity to Timestream ;)

  • by taf2 on 11/28/18, 5:53 PM

    I couldn’t tell from the page is this SQL based similar to timescale or a more similar to influxdb?
  • by mharroun on 11/28/18, 11:57 PM

    This is looking like a managed druid... that would be very nice to have.
  • by tjholowaychuk on 12/12/18, 12:33 PM

    Anyone know if this is what CloudWatch Insights uses? If so, it doesn't even come close to competing with Elasticsearch performance (with a tiny cluster), it seemed quite slow.
  • by inoiox on 11/28/18, 6:01 PM

    There have been a lot of amazon links this week
  • by jopsen on 11/28/18, 5:32 PM

    where is the docs?
  • by booleandilemma on 11/28/18, 5:36 PM

    There are at least 7 Amazon-related stories on the HN front page right now, what’s going on?
  • by superkuh on 11/28/18, 6:08 PM

    A quick look at the Hacker News frontpage shows a bit of a problem,

        1.  Amazon Timestream (amazon.com)
        3.  Amazon Quantum Ledger Database (amazon.com)
        8.  Amazon FSx for Lustre (amazon.com)
        13. AWS DynamoDB On-Demand (amazon.com)
        14. Amazon's homegrown Graviton processor was very nearly an AMD Arm CPU (theregister.co.uk)
        21. Building an Alexa-Powered Electric Blanket (shkspr.mobi)
        30. Amazon FSx for Windows File Server (amazon.com)
  • by nimbius on 11/28/18, 5:44 PM

    jesus christ six amazon articles in a day? AWS is undeniably the body of christ for HN but am i missing something? FSX, blockchain, timestream, Graviton, ground station, and cloudwatch... all of these articles are advertisements for mundane shit.
  • by mLuby on 11/28/18, 6:30 PM

    I count 7 separate Amazon posts on the front of HN. Is this some conspiracy? #NotAmused #ShouldBeBundled