by sprachspiel on 6/20/21, 5:39 PM with 158 comments
by bob1029 on 6/20/21, 8:13 PM
If you want to go fast & save NAND lifetime, use append-only log structures.
If you want to go even faster & save even more NAND lifetime, batch your writes in software (i.e. some ring buffer with natural back-pressure mechanism) and then serialize them with a single writer into an append-only log structure. Many newer devices have something like this at the hardware level, but your block size is still a constraint when working in hardware. If you batch in software, you can hypothetically write multiple logical business transactions per block I/O. When you physical block size is 4k and your logical transactions are averaging 512b of data, you would be leaving a lot of throughput on the table.
Going down 1 level of abstraction seems important if you want to extract the most performance from an SSD. Unsurprisingly, the above ideas also make ordinary magnetic disk drives more performant & potentially last longer.
by jedberg on 6/20/21, 7:46 PM
I've always been told, "just treat SSDs like slow, permanent memory".
by klodolph on 6/20/21, 9:18 PM
https://www.usenix.org/system/files/conference/inflow14/infl...
> Log-structured applications and file systems have been used to achieve high write throughput by sequentializing writes. Flash-based storage systems, due to flash memory’s out-of-place update characteristic, have also relied on log-structured approaches. Our work investigates the impacts to performance and endurance in flash when multiple layers of log-structured applications and file systems are layered on top of a log-structured flash device. We show that multiple log layers affects sequentiality and increases write pressure to flash devices through randomization of workloads, unaligned segment sizes, and uncoordinated multi-log garbage collection. All of these effects can combine to negate the intended positive affects of using a log. In this paper we characterize the interactions between multiple levels of independent logs, identify issues that must be considered, and describe design choices to mitigate negative behaviors in multi-log configurations.
by andrewmcwatters on 6/20/21, 7:40 PM
This is out of pure speculation, but there had to be a period of time during the mass transition to SSDs that engineers said, OK, how do we get the hardware to be compatible with software that is, for the most part, expecting that hard disk drives are being used, and just behave like really fast HDDs.
So, there's almost certainly some non-zero amount of code out there in the wild that is or was doing some very specific write optimized routine that one day was just performing 10 to 100 times faster, and maybe just because of the nature of software is still out there today doing that same routine.
I don't know what that would look like, but my guess would be that it would have something to do with average sized write caches, and those caches look entirely different today or something.
And today, there's probably some SSD specific code doing something out there now, too.
by rossdavidh on 6/20/21, 9:36 PM
But, fun to read and think about.
by dang on 6/20/21, 6:48 PM
What every programmer should know about solid-state drives - https://news.ycombinator.com/item?id=9049630 - Feb 2015 (31 comments)
by FpUser on 6/20/21, 7:59 PM
So I think that unless this "every programmer" is a database storage engine developer (not too many of them I guess) their only concern would be mostly - how close my SSD to that magical point where it has to be cloned and replaced before shit hits the fan.
by rabuse on 6/20/21, 8:33 PM
by kortilla on 6/20/21, 7:52 PM
These are all reasons SSDs are much more pleasant to work with than old platter disks.
by teddyh on 6/20/21, 7:52 PM
by dataflow on 6/20/21, 7:04 PM
by riobard on 6/21/21, 1:56 AM
> A drive can be over-provisioned simply by formatting it to a logical partition capacity smaller than the maximum physical capacity. The remaining space, invisible to the user, will still be visible and used by the SSD controller.
Does the controller read the partition table to decide that the space beyond logic partition is safe to use as scrap?
by dan-robertson on 6/20/21, 8:58 PM
by Agentlien on 6/21/21, 7:20 AM
Near the beginning they talk about how targeting the PlayStation 5, which has an SSD, drastically changed how they went about making the game.
In short, the quick data transfer meant they were CPU bound rather than disk bound and could afford to have a lot of uncompressed data streamed directly into memory with no extra processing before use.
by 1_player on 6/20/21, 7:35 PM
by 2OEH8eoCRo0 on 6/21/21, 11:15 AM
And where did the word "drive" come from? I thought it referred to motors that spin the media, which SSDs also do not have.
by DrNuke on 6/20/21, 8:15 PM
by personjerry on 6/20/21, 7:00 PM
by mikewarot on 6/21/21, 4:18 AM
by ropeladder on 6/21/21, 1:19 AM
by rectang on 6/20/21, 8:40 PM
by CoolGuySteve on 6/20/21, 7:18 PM
However, random read performance is only somewhere between a 3rd to half as fast as sequential compared to a magnetic disk where it's often 1/10th as fast.
by wly_cdgr on 6/20/21, 10:37 PM
by BatteryMountain on 6/21/21, 9:23 AM
The numbers involved was insane and I played with various scenarios, with/without compression (MessagePack feature), with/without typeless serializer (MessagePack feature), with/without async and then the difference between using sync vs async and forcing disk flushes. I also weighed the difference between writing 1 fat file (append only) or millions of small files. I also checked the difference between using .net streams versus using File.WriteAllBytes (C# feature, an all-in-memory operation, good for small writes, bad for bigger files or async serialization + writing). I also played with the amount of objects involved (100K, 1M, 10M, 50M).
I cannot remember all the numbers involved, but I still have the code for all of it somewhere, so maybe I can write a blogpost about it. But I do remember being utttterly stunned about how fast it actually was to freeze my application state to disk and to thaw it again (the class name was Freezer :p).
The whole reason was, I started using Zfs and read up a bit about how it works. I also have some idea about how ssd's work. I also have some idea how serialization works and writing to disk works (streams etc).. I also have a rough idea how mysql, postgres, sql server save their datafiles to disk and what kind of compromises they make. So one day I was just sitting being frustrated with my data access layers and it dawned on me to try and build my own storage engine for fun, so I started by generating millions of objects that sits in memory, which I then serialized with MessagePack using a Parallel.Foreach (C# feature) to a samsung 970 evo plus to see how fast it would be. It blew my mind and I still don't trust that code enough to use it in production but it does work. Another reason why I tried it out, was because at work we have some postgres tables with 60m+ rows that are getting slow and I'm convinced we have a bad data model + too many indexes and that 60m rows are not too much (since then we've partitioned the hell out of it in multiple ways but that is a nightmare on its own since I still think we sliced the data the wrong way, according to my intuition and where the data has natural boundaries, time will tell who was right).
So I do believe there is a space in the industry where SSD's, paired with certain file systems, using certain file sizes and chunking, will completely leave sql databases in the dust, purely by the mechanism on how each of those things work together. I haven't put my code out in public yet and only told one other dev about it, mostly because it is basically sacrilege to go against the grain in our community and to say "I'm going to write my own database engine" sounds nuts even to me.
by BrissyCoder on 6/21/21, 12:00 AM