from Hacker News

NASA to launch 247 petabytes of data into AWS, but forgot about egress costs

by nobita on 3/19/20, 10:15 AM with 173 comments

  • by slowhand09 on 3/19/20, 2:13 PM

    Wow! I worked on EODSIS in 93-96. We estimated 16 petabytes, at the time it would be one of the worlds largest databases. We changed horses midstream moving our user interfaces from X-windows Motif to WWW. And built a very early Oracle DB accessible via WWW. There was no cloud then except missions studying atmospheric water vapor. When this was originally designed there were to be several (6-7) DAACs - Distributed Active Archive Centers (https://earthdata.nasa.gov/eosdis/daacs) to store data near where it was needed or captured. Now they have 12 and are storing on AWS. Amazon didn't exist when this was originally built.
  • by anthonylukach on 3/19/20, 6:35 PM

    This article seems short sighted.

    1. Using the AWS cost calculator is pointless, naturally an entity the size of NASA would get heavily discounted rates. 2. As data volume grows, the complexities of working with that data expands. NASA appears to be embracing cloud computing by embracing a paradigm where scientists push computation to where the data rests rather than downloading data [1], [2], [3], thereby paying egress on only the higher order data products. 3. The report notes that NASA has tooling to rate limit and throttle access to data. This, in itself, proves that NASA didn't "[forget] about eye-watering cloudy egress costs before lift-off".

    People may scream about vendor lock in, which is a fair complaint; but acting like NASA just didn't think about egress is misleading.

    NASA is ultimately a science institution, I think diverting effort away from infrastructure management and towards studying data is likely a wise decision.

    [1: https://www.hec.nasa.gov/news/features/2018/cloud_computing_...] [2: https://link.springer.com/article/10.1007/s10712-019-09541-z] [3: https://ui.adsabs.harvard.edu/abs/2017AGUFMIN21F..02P/abstra...]

  • by Dunedan on 3/19/20, 1:11 PM

    > “However, when end users download data from Earthdata Cloud, the agency, not the user, will be charged every time data is egressed.

    Not necessarily, depending on how the users access the data. If users access the data through their own AWS accounts, NASA could leverage S3's "Requester Pays" feature [1], to let the user pay for downloading the data.

    1: https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPay...

  • by djrogers on 3/19/20, 1:26 PM

    I'm not saying this won't be a financial cluster - it likely will cost many times more than planned - but the headline here is just a flat-out lie.

    TFA says:

    "a March audit report [PDF] from NASA's Inspector General noticed EOSDIS hadn’t properly modeled what data egress charges would do to its cloudy plan."

    'Hadn't properly modeled' is very different from 'forgot about'. And if you actually read the linked report, it says things like:

    "ESDIS officials said they plan to educate end users on accessing data stored in the cloud, including providing tools to enable them to process the data in the cloud to avoid egress charges." and "To mitigate the challenges associated with potential high egress costs when end-users access data, ESDIS plans to monitor such access and “throttle” back access to the data"

    Neither of those statements would be in the audit if the entire topic had been a surprise.

  • by unhammer on 3/19/20, 4:42 PM

        YOU ARE NOT AFRAID?
        'Not yet. But, er...which way to the egress, please?'
        There was a pause. Then Death said, in a puzzled voice: ISN'T THAT A FEMALE EAGLE?
    
    I've been reading A Hat Full of Sky to my daughter these days, and there's a running joke that "supposedly intelligent people" don't know the meaning of the word "egress", mixing it up with things like egret, ogress or eagles.

    (See also the inspiration for the joke: https://unrealfacts.com/pt-barnum-would-trick-people-with-a-... )

  • by ghostpepper on 3/19/20, 2:57 PM

    There's a joke around here somewhere about AWS pricing being too difficult even for rocket scientists.
  • by movedx on 3/19/20, 10:25 PM

    It's The Register, people. Don't take it seriously. It's practically The Onion of the IT industry, especially the comments sections.

    I've written two articles for them and the comments are a joke. They're all anti-Cloud, anti-progressive. Try selling them Kubernetes has a solution to their problems: they'll think you've come to steal their children. I know, I've tried.

    In short: this never happened. NASA didn't forget anything. It does, however, make for a great eye catching headline!

    Sorry to be bitter about this, but publications like The Register serve little purpose these days. It caters to a specific kind of IT personality that can't let go of their physical tin and they think public Cloud has no place or use at all. Again I know, I've tried convincing these people of such things.

  • by pixelbath on 3/19/20, 7:32 PM

    Unless my numbers are way off, I got around $15.5 million per year using Backblaze's calculator: https://www.backblaze.com/b2/cloud-storage-pricing.html

    Numbers used:

      Initial upload:   258998272 GB (1024*1024*247)
      Monthly upload:   100 GB (default)
      Monthly delete:   5 GB (default)
      Monthly download: 1048576 GB (1 PB)
    
      Period of Time:   12 months (default)
  • by ackbar03 on 3/19/20, 12:48 PM

    Oh but aws didn't forget. Aws never forgets
  • by NikolaeVarius on 3/19/20, 1:08 PM

    Senator Shelby should get AWS to launch a new region in Alabama for NASA at this rate.
  • by OzzyB on 3/19/20, 1:53 PM

    Looks like even the big boys get bitten by the Cloud Meme when forgetting about bandwidth costs; glad I'm not the only one.
  • by 7777fps on 3/19/20, 1:12 PM

    I assume the data accessed is a heavily skewed pareto distribution.

    Given that, it's maybe still cheaper to build their own serving / caching layer in front to save egress costs than to have constructed the whole storage solution themselves.

  • by knorker on 3/19/20, 4:00 PM

    This surely was entirely known to AWS, where they were rubbing their hands at the fact that every user of this data has to process it using EC2 on site.

    This is Cloud lock-in using data location.

  • by tehalex on 3/19/20, 3:39 PM

    I wonder if this includes or if they can use Direct Connect? [1]

    Cloud data transfers are too expensive, personally I assume that it costs more to measure and bill for bandwidth than the usage itself...

    1: https://aws.amazon.com/directconnect/

  • by toomuchtodo on 3/19/20, 12:37 PM

    Cue the cloud apologists that “it’s better to use the cloud than to build and manage your own infra”.

    This is why you build and run your own storage, similar to Backblaze (who is almost entirely bootstrapped except for one reasonable round of investment).

  • by yosito on 3/20/20, 11:16 PM

    > You don't need to be a rocket scientist to learn about and understand data egress costs. Which left The Register wondering how an agency capable of sending stuff into orbit or making marvelously long-lived Mars rovers could also make such a dumb mistake.

    I used to work very closely with this department at NASA. Without saying too much, the short answer is "tenured government employees more concerned about job security than the success of the project" is how an agency could make such dumb mistakes.

  • by jka on 3/19/20, 8:30 PM

    What's the opposite of AWS Snowmobile[0]?

    [0] - https://aws.amazon.com/snowmobile/

  • by Spooky23 on 3/19/20, 3:10 PM

    Using AWS for this type of use case is dumb for an org as large as NASA, if cost savings is a goal. It's cheaper to just land capacity at a datacenter.
  • by julienchastang on 3/19/20, 7:18 PM

    This article is misleading. The entire point is to not move data out of the cloud. Instead bring your computing (analysis, visualization) to the data and pay for compute cycles on AWS. If your workflows are short/bursty, you will come out ahead. Moreover, you will be able to do big data-style computations that you cannot do in a local computing environment. This is bad journalism, IMO.
  • by chx on 3/19/20, 9:42 PM

    If you are facing similar problems you should know traffic via Cloudflare from B2 is free. I am not 100% CF would be happy if NASA picked the CF free tier but probably their quote would be magnitudes lower than Amazon's.
  • by X6S1x6Okd1st on 3/19/20, 5:41 PM

    > NASA also knows that a torrent of petabytes is on the way.

    Oh that sounds like a potential solution.

    /s

  • by gigatexal on 3/19/20, 1:09 PM

    might be cheaper to spin up virtual workstations on AWS and use the data there
  • by Havoc on 3/19/20, 4:15 PM

    Can't they just use the current DAACs as a caching layer? Seems like the least ugly way out of this mess.

    Also - can't they use torrent tech? I wouldn't mind helping out a bit on space & data

  • by CKN23-ARIN on 3/20/20, 1:40 AM

    Putting a dataset into AWS is a lot like putting a satellite into orbit. You still need to pay later to get it down, or to safely destroy it.
  • by Wheaties466 on 3/19/20, 2:19 PM

    at that point why not just use a P2P based system.
  • by szczepano on 3/19/20, 10:57 PM

    To sum up no matter how big the hard drives or data center we produce we will always have problem with storage capacity.
  • by pontifier on 3/20/20, 3:01 PM

    Cloud egress costs killed the business I'm now trying to save. I won't fall into that trap.
  • by ralusek on 3/19/20, 1:50 PM

    I wonder why they wouldn't use Wasabi:

    https://wasabi.com/cloud-storage-pricing/

    Looks like egress is free.

    Maybe because it's comparably untested? Does anyone here have any experience with it?

  • by api on 3/19/20, 5:50 PM

    This is exactly why the costs are set up that way. The first time I saw AWS pricing I chuckled and thought "roach motel." Data goes in but it doesn't come out. Its one of many soft lock in mechanisms cloud hosts use.
  • by tzm on 3/19/20, 7:14 PM

    $5,439,526.92 per month
  • by turdnagel on 3/19/20, 10:33 PM

    Requester pays!
  • by Mave83 on 3/19/20, 5:51 PM

    just build your own storage and save an incredible amount.

    It's hard you might think, but it's not. croit.io provides all you need to deploy a scalable cluster even on multiple geographic regions.

    Price for 1 PB sized cluster including everything from rack to hardware to license to labor for below 3€/TB/Month or at the Amazon Glacier price tag but with the S3-IA access.

  • by oh_hello on 3/19/20, 6:36 PM

    "The audit, meanwhile, suggests an increased cloud spend of around $30m a year by 2025"

    Isn't this a rounding error for NASA?

  • by mensetmanusman on 3/19/20, 6:25 PM

    This seems like a good use of torrenting?
  • by beastman82 on 3/19/20, 1:52 PM

    Torrent FTW
  • by vnchr on 3/19/20, 12:58 PM

    Cloud VERSUS Space. Who will come out on top?
  • by ph2082 on 3/19/20, 1:19 PM

    1 Terabyte of hard disk cost ~50USD.

    247 Petabyte ~ 247000 Terabyte > 50000 USD.

    Network cards, bandwidth, electricity cost > I can't guess.

    Couple of good engineers (hardware and software ones), which they definitely have.

    May be they could have built their own cloud in < ~10-15 million USD. And that won't be recurring cost.

    May be they missed article about Bank of America saving ~2 Billion USD, by building their own cloud.