throwawaypls 5 months ago

I read this book titled "Designing Data Intensive Applications", which covers this and a lot of other stuff about designing applications in general. https://www.amazon.com/Designing-Data-Intensive-Applications...

  • _nrvs 5 months ago

    Just finished reading DDIA, can't recommend it enough! I learned a lot of new info in every single chapter even for topics I thought I had a firm grasp on. Great job Martin Kleppmann!

  • pbowyer 5 months ago

    It's the best written computer-related book I've read. On a par with Friedl's "Mastering Regular Expressions".

    Very highly recommended.

    • _jal 5 months ago

      > On a par with Friedl's "Mastering Regular Expressions".

      That comparison sold me. You deserve a commission.

      • pbowyer 5 months ago

        Thank you!

        Be warned, the O'Reilly print quality is miles apart between the two. Their print-on-demand text quality and the binding are real let-downs.

        • _jal 5 months ago

          That's too bad. The MRE book was great - the special typography for zero-width indicators and whatnot was really great. Progress.

    • sah2ed 5 months ago

      Currently reading the book and I agree with your recommendation, it's very well-written, perhaps written with an intuitive learner in mind.

  • drej 5 months ago

    Too bad I can't upvote this multiple times.

  • acidtrucks 5 months ago

    Thanks, I literally just came here to ask for book recommendations on this topic.

    Are there any other suggestions?

    • commandlinefan 5 months ago

      I know this sounds like a cliche at this point, but volume 3 of Donald Knuth's "The Art of Computer Programming" goes into more depth on the theory that underpins these algorithms than anything else I've ever seen/read/heard of (in fairness, I haven't read OP's suggestion yet, though).

  • Joeri 5 months ago

    This is one of the best technology books I've ever read. I spent a few years diving into big data architecture. I thought I had a reasonably good handle on it, then I read this book. So many insights.

  • rollulus 5 months ago

    That book is a must-read for anyone dealing with data.

    • tw1010 5 months ago

      The simultaneity of your and mkandas89's comment is what I'd call a canonical example of spooky action at a distance.

  • mkandas89 5 months ago

    Its a must read for anyone dealing with data

  • polymathemagics 5 months ago

    Completely agree, fantastic book. Does anyone know of any similarly wonderful technical books?

JelteF 5 months ago

There's is one important detail that is never fully articulated in this article. It says B-trees are read optimized. This is true, but there's only a big difference for random reads and not sequential reads. Because seeking a key in LSM is relatively expensive, because it has to read multiple indexes. But after this is done, reading the a lot of keys an values from this point on is not expensive (or at least not much more so than in a B-tree). The reason for this is mentioned in the article, which are the SSTables. These allow for quick through the keys, because they are stored on disk in sorted order.

  • dhd415 5 months ago

    LSMs do not guarantee that sequential key/value pairs are stored in the same SSTable so reading "n" k/v pairs, in the worst case, could require seeking through "n" different SSTables. There are datastores that use LSMs such as HBase that provide guarantees on top of LSMs to facilitate efficient range reads and, of course, many LSM compaction algorithms improve range reads, but the basic LSM is optimized only for single k/v reads. That's the reason, for example, why Cassandra doesn't even offer range reads such as "SELECT * FROM mytable WHERE KEY > x AND key < (x + k)".

    • misframer 5 months ago

      > That's the reason, for example, why Cassandra doesn't even offer range reads

      This is not true. The reason why Cassandra doesn't support that is because of hashing of keys across the cluster -- you'd have to query all shards and merge the results. That has nothing to do with the LSM storage.

koverstreet 5 months ago

cough bcachefs's b+ tree can push about a million random updates per second (almost half a million single threaded, last I checked).

I haven't heard of anything faster...

edit: found the benchmarks I ran https://www.patreon.com/posts/more-bcachefs-16647914

  • espadrine 5 months ago

    Congrats! That is Redis-on-Xeon territory! https://redis.io/topics/benchmarks

    Though I assume that, like Redis, a reboot would lose acked writes?

    An SSD can't save a write in a microsecond, its latency is at least an order of magnitude higher, right?

    • FullyFunctional 5 months ago

      Enterprise drives usually have ultra caps with enough energy to power the writeback of the writebuffer to stable storage in case of power loss, thus your write latency is just the time it takes to hit the memory buffer, typically dominated by the PCIe transfer, that is, a few µs (unless of course, the buffer has filled but that means hitting peak write bandwidth).

      Consumer drives typically don't pay for this, but instead good drives buffer writes in SLC flash which has lower latency than MLC.

      ADDENDUM: funny thing is that you have to actively manage the energy in the caps; the writebuffer must never have more data than you have energy to write back. This becomes especially important on power-up where the empty cap is charging.

    • koverstreet 5 months ago

      Yes, unless you did a journal flush. Default journal flush interval is 1 second because of hard drives, but on a good SSD it can be turned up to around 10 ms and still maintain those numbers.

    • surajrmal 5 months ago

      NAND program operations are in the order of O(100us) to O(1ms).

      • surajrmal 5 months ago

        I see how you made that mistake now. SSDs typically have many (eg 10s-100s) program operations happening in parallel. They also pack each write into something around 16kb or larger units so it's possible multiple b+ op are in each program.

riskneutral 5 months ago

Isn’t this pretty traditional stuff? What is modern about it?

Does any of this map to a GPU for column-oriented analytical data processing? Basically, machines are only good at reading and writing large, contiguous chunks of data. As these machines evolve, the optimal width of those chunks keeps getting larger. The volume of data available is growing. And the types of operations being done on data are becoming more “analytical” (meaning column-oriented and streaming access, rather than row-oriented random access). I would expect “modern storage” algorithms to therefore be cache friendly, column oriented and take the modern, in-memory storage hierarchy into account (from on-chip registers to, to high bandwidth GPU type parallel devices, to NVRAM system memory).

This article comes off to me like a CS101 intro doing Big-O asymptotic analysis on linked lists, without even mentioning the existence and effects of memory caches.

akeck 5 months ago

This reminds me of RAID 6’s use of abstract algebra.

  • akalin 5 months ago

    I wrote a blog post a while back that might be of interest: https://www.akalin.com/intro-erasure-codes . It's similar to Igor's but fills in a few more details.

    • jorangreef 5 months ago

      That's a fantastic introduction (although it's far more than an introduction!). I wrote a native addon for Node.js that does Reed Solomon, also using optimized Cauchy matrices (MIT license): https://github.com/ronomon/reed-solomon

      • akalin 5 months ago

        Very nice! I think I stumbled on your implementation when researching my post. :)

        • jorangreef 5 months ago

          Thanks. If that was in November last year, then it's changed a lot since then (and now a few times faster). Back then it was using a Vandermonde matrix as it was based on Backblaze's Java implementation, but a month ago everything was rewritten to use optimized Cauchy matrices.

  • hinkley 5 months ago

    Reed Solomon Coding.

    Linear algebra. It’s so hot right now.

    • vanderZwan 5 months ago

      Until recently I worked for a molecular neurobiology research group that, among other things, used Single-molecule RNA Fluorescence in situ Hybridization (smFISH) to measure gene expression in tissue[0].

      Basically: design viruses with fluorescent molecules attached to them such that they attach to specific sections of RNA in the cell that are associated with particular genes. Soak tissue in viruses. Look at tissue through a microscope and count individual fluorescent dots, each of which represents one RNA molecule (I find this absolutely mind-blowing). Wash off the viruses with a specific type of chemical, and repeat with a new one for a different gene.

      You can only use a few colours at a time because otherwise the microscope cannot discern them. But that would severely limit the throughput - there are a lot of genes we want to check, but the tissue will also degenerate after repeated washing. So, what can we do? Well, as I've understood, scientists using smFISH and similar techniques now use multiplexing and with Hamming codes to get around that.

      So yeah, linear algebra is definitely so, so hot right now.

      [0] http://linnarssonlab.org/osmFISH/

  • bogomipz 5 months ago

    Doesn't RAID 6 just use Reed-Soloing/Hamming codes? Is that based on abstract algebra?

jacksmith21006 5 months ago

Most interesting thing I have seen for for database indexes is this paper.

https://www.arxiv-vanity.com/papers/1712.01208v1/

Using a Neural Network in place of a B-tree. What is interesting is a NN can use many processors at the same time versus you can not with a B-tree.

In the end it comes down the power as in joules to get something done.

  • koverstreet 5 months ago

    it's more about space efficiency. if a model can fit in only one or two cachelines, and get you "close enough", you may end up touching a lot less memory than if you just did a binary search.

KirinDave 5 months ago

This was a bad post and shouldn't be preserved. And I can still edit it. So I did. Please see below, it's a better post (even if it's bad).

  • truncate 5 months ago

    >> LSM-trees are great, but better stuff exists.

    Sure. My knowledge is pretty much limited to what this articles talks about, so interested to know what else is out there.

    • willvarfar 5 months ago

      Fractal Trees as used in TokuDB (now owned by Percona).

      The big thing in practice is that TokuDB vs InnoDB is dealing with large datasets.

      However, I don't know where the very latest MyRocks stands vs TokuDB.

  • sanderjd 5 months ago

    This may be an interesting comment if it had some keywords to search or (better) links. As is, I really don't see the point of it.

    • KirinDave 5 months ago

      I don't feel like it tonight. Drinking whiskey and trying not to think about how effed up the Israel footage and the North Korea situation are. Maybe that's why it's not the best post.

      I don't mean to crap on the work presented. Great article, good summary, and the tech is solid. It's just older than what excites me; a lot of progress has been made in 20 years and the majority of it hasn't found commercial applications.

      But here is a citation root for a lot of amazing work in this space:

      https://www.dcc.uchile.cl/~gnavarro/ps/alenex10.pdf

      My absolute hero Edward Kmett gave a talk stitching a lot of this work together a long time ago: https://www.youtube.com/watch?v=uA0Z7_4J7u8 . I have no idea if he's pursued it, it's just one of his many talks that left me with an incredibly lasting impression.

      Variants of this technique work for arbitrary documents and structures, work better at very high volume, have cache oblivious properties, and support transactions. Universal indexes that are reasonably good for all queries (as opposed to specfiic queries) are also possible. Coupled with discrimination-based techniques for O(n) table joins, there's probably a whole startup around there.

      Sorry I can't do better right now.

      • ovao 5 months ago

        I'm curious what you mean by "cache oblivious". How can a data structure be "cache oblivious"?

        • KirinDave 5 months ago

          If you're optimal for a cache without knowing how big the cache is, you're optimal for all caches. It's tough to do, but it happens.

          This property probably doesn't get the respect it deserves in this super weird world where you can't really say how many caches are between you and your data.

      • pasbesoin 5 months ago

        Thank you. That youtubed presentation was quite interesting.

      • ddorian43 5 months ago

        Are you connected somehow to the israel and north-korea situations ? (meaning more than a normal person)

      • sanderjd 5 months ago

        Great response, thanks!