by munchor on 8/21/23, 4:17 PM with 60 comments
by LAC-Tech on 8/22/23, 12:31 AM
- Designing Data Intensive Applicatons. Great overview of... basically everything, and every chapter has dozens of references. Can't recommend it enough.
- Read papers. I've had lots of a-ha moments going to wikipedia and looking up the oldest paper on a topic (wtf was in the water in Massachusetts in the 70s..). Yes they're challenging, no they're not impossible if you have a compsci undergrad equivalent level of knowledge.
- Try and build toy systems. Built out some small and trivial implementations of CRDTs here https://lewiscampbell.tech/sync.html, mainly be reading the papers. They're subtle but they're not rocket science - mere mortals can do this if they apply themselves!
- Follow cool people in the field. Tigerbeetle stands out to me despite sitting at the opposite end of the consistency/availability corner where I've made my nest. They really are poring over applied dist sys papers and implementing it. I joke that Joran is a dangerous man to listen to because his talks can send you down rabbit-holes and you begin to think maybe he isn't insane for writing his own storage layer..
- Did I mention read papers? Seriously, the research of the smartest people on planet earth are on the internet, available for your consumption, for free. Take a moment to reflect in how incredible that is. Anyone anywhere on planet earth can git gud if they apply themselves.
by xnx on 8/21/23, 11:59 PM
by richieartoul on 8/22/23, 3:20 AM
by ibgeek on 8/22/23, 3:45 AM
by pradeepchhetri on 8/22/23, 7:15 AM
by TuringNYC on 8/22/23, 2:44 AM
by samsquire on 8/22/23, 8:25 AM
RocksDB is an example of that.
I am playing around with SIMD, multithreaded queues and barriers. (Not on the same problem)
I haven't read the DDIA book.
I used Michaeln Nielsen's consistent hashing code for distributing SQL database rows between shards.
I have an eventually consistent protocol that is not linearizable.
I am currently investigating how to schedule system events such as TCP ready for reading EPOLLIN or ready for writing EPOLLOUT efficiently rather than data events.
I want super flexible scheduling styles of control flow. Im looking at barriers right now.
I am thinking how to respond to events with low latency and across threads.
I'm playing with some coroutines in assembly by Marce Coll and looking at algebraic effects
by avrionov on 8/22/23, 1:44 AM
> Another example is figuring out the right tradeoffs between using local SSD disks and block-storage services (AWS EBS and others).
Local disks on AWS are not appropriate for long term storage, because when an instance reboot the data will be lost. AWS also doesn't offer huge amounts of local storage.
by tayo42 on 8/22/23, 4:32 AM
Amazon, google, MS, these companies print money, have built up massive engineering cultures to run reliable storage. I just dont see what the value is with trusting data with some VC funded group over proven engineering work.
I worked on one of these in house storage systems, all we did was look at how the cloud providers did things already for inspiration. Might as well just use those. IDK maybe someone can convince me of the value?
by betaby on 8/22/23, 2:30 AM
Can someone please elaborate that? What does it mean in conjunction of S3 and DB. I know how traditional DBs work (PostgreSQL and MySQL). I know how S3 work (opensource implementation like minio). But S3 is not a random access file on block storage which is a prerequirement for PostgreSQL and MySQL. How is that solved for S3 based DBs? Can someone point out to the doc, or even better an opensource implementation.
by jollyllama on 8/22/23, 12:15 PM