by usrme on 3/27/24, 8:47 AM with 39 comments
by ot on 3/28/24, 3:37 PM
Usually the codes used for erasure coding are in systematic form: there are k "preferential" parts out of M that are just literal fragments of the original blob, so if you get those you can just concatenate them to get the original data. If you get any other k-subset, you need to perform expensive reconstruction.
by dmw_ng on 3/28/24, 3:28 PM
Say 1 chunk lives in Germany, Ireland and the US each. Client races GETs to all 3 regions and cancels the request to the slowest to respond (which may also be down). Final client latency is equivalent to that of the 2nd slowest region, with substantially better availability due to the ability to tolerate any single region being down
Still wouldn't recommend using E2 for anything important, but ^ was one potential approach to dealing with its terribleness. It still doesn't address the reality of when E2 regions go down, it is often for days and reportedly sometimes weeks at a time. So reliable writing in this scenario would necessitate some kind of queue with capacity for weeks of storage
There are variants of this scheme where you could potentially balance the horrible reliability storage with some expensive reliable storage as part of the same system, but I never got that far in thinking about how it would work
by sujayakar on 3/28/24, 4:00 PM
one followup I was thinking of is whether this can generalize to queries other than key value point lookups. if I'm understanding correctly, the article is suggesting to take a key value store, and for every `(key, value)` in the system, split `value` into fragments that are stored on different shards with some `k` of `M` code. then at query time, we can split a query for `key` into `k` subqueries that we send to the relevant shards and reassemble the query results into `value`.
so, if we were to do the same business for an ordered map with range queries, we'd need to find a way to turn a query for `interval: [start, end]` into some number of subqueries that we could send to the different shards and reassemble into the final result. any ideas?
by loeg on 3/28/24, 3:49 PM
by benlivengood on 3/28/24, 5:43 PM
by siscia on 3/28/24, 3:23 PM
AWS is such a big place that even after a bit of tenure you still got place to look to find interesting technical approaches and when I was introduced to this schema for Lambda storage I was surprised.
As Marc mentions it is such a simple and powerful idea that is definitely not mentioned enough.
by ghusbands on 3/29/24, 1:31 PM
by jeffbee on 3/28/24, 4:22 PM