by chton on 12/10/14, 2:53 PM with 81 comments
by ChuckMcM on 12/10/14, 7:55 PM
I wish I was an investor in them.
[1] http://www.statisticbrain.com/wal-mart-company-statistics/
by hendzen on 12/10/14, 7:09 PM
See this tweet by @aphyr: https://twitter.com/aphyr/status/542755074380791809
(All credit for the idea in this comment is due to @aphyr)
Basically because the transactions modified keys selected from a uniform distribution, the probability of contention was extremely low. AKA this workload is basically a data-parallel problem, somewhat lessening the impressiveness of the high throughput. Would be interesting to see it with a Zipfian distribution (or even better, a Biebermark [0])
[0] - http://smalldatum.blogspot.co.il/2014/04/biebermarks.html
by jrallison on 12/10/14, 6:06 PM
We continue to use it for more and more data access patterns which require strong consistency guarantees.
We currently store ~2 terabytes of data in a 12 node FDB cluster. It's rock solid and comes out of the box with great tooling.
Excited about this release! My only regret is I didn't find it sooner :)
by bsaul on 12/10/14, 7:34 PM
Is it really the first Distributed DB project to have built a simulator ?
Because frankly, if that's the case, it seems revolutionary to me. Intuitively, it seems like bringing the same kind of quality improvement as unit testing did to regular software development.
PS : i should add that this talk is one of the best i've seen this year. The guy is extremely smart, passionate, and clear. (i just loved the The Hurst exponent part).
by dchichkov on 12/10/14, 7:34 PM
I remember evaluating a few low latency key-value storage solutions, and one of these was Stanford's RAMCloud, which is supposed to give 4-5 microseconds reads, 15 microseconds writes, scale up to 10,000 boxes and provide data durability. https://ramcloud.atlassian.net/wiki/display/RAM/RAMCloud Seems like, that would be "Databases at 2000Mhz".
I've actually studied the code that was handling the network and it had been written pretty nicely, and as far as I know, it should work both over 10Gbe and Infiniband with similar latencies. And I'm not at all surprised, they could get pretty clean looking 4-5us latency distribution, with the code like that.
How does it compare with FoundationDB? Is it completely different technology?
by felixgallo on 12/10/14, 6:44 PM
One of the links leads to an interesting C++ actor preprocessor called 'Flow'. In that table, it lists the performance result of sending a message around a ring for a certain number of processes and a certain number of messages, in which Flow appears to be fastest with 0.075 sec in the case of N=1000 and M=1000, compared with, e.g. erlang @ 1.09 seconds.
My curiosity was piqued, so I threw together a quick microbenchmark in erlang. On a moderately loaded 2013 macbook air (2-core i7) and erlang 17.1, with 1000 iterations of M=1000 and N=1000, it averaged 34 microseconds per run, which compares pretty favorably with Flow's claimed 75000 microseconds. The Flow paper appears to maybe be from 2010, so it would be interesting to know how it's doing in 2014.
by shortstuffsushi on 12/10/14, 7:39 PM
by w8rbt on 12/10/14, 10:27 PM
by maliki on 12/12/14, 2:10 PM
The best source for DB benchmarking I know of is http://www.tpc.org/. The methodology is more complicated there, but the top results are around 8 million transactions per minute on $5 million systems. This FoundationDB result is more like 900 million transactions per minute on a system that costs $1.5 million a year to rent (so, approx $5 million to buy?).
The USD/transactions-per-minute metric is clear, but without a standard test suite (schema, queries, client count, etc.), comparing claims of database performance makes my head hurt.
by illumen on 12/11/14, 5:43 PM
However I think there's still plenty of room to grow.
320,000 concurrent sessions isn't that much by modern standards. You can get 12 million concurrent connections on one linux machine, and push 1gigabit of data.
Also, 167 megabytes per second (116B * 14.4 million) is not pushing the limits of what one machine can do. I've been able to process 680 megabytes per second of data into a custom video database, plus write it to disk on one 2010 machine. That's doing heavy processing at the same time on the video with plenty of CPU to spare.
PCIe over fibre can do many transactions messages per second. You can fit 2TB memory machines in 1U (and more).
Since this is a memory + eventually dump to disk database, I think there is still a lot of room to grow.
by mariusz79 on 12/10/14, 6:33 PM
by tuyguntn on 12/11/14, 7:24 AM
by oconnor663 on 12/11/14, 6:06 PM
by lttlrck on 12/10/14, 5:49 PM
Sorry, I don't like that at all.
by imanaccount247 on 12/10/14, 8:34 PM