Can't recommend this enough. The website contains almost all the course's materials, including exams and even answers. The only things that are missing are TA office hours and the credit which they obviously can't offer for free.
If you go through all the lectures and the labs, other materials posted here (including Martin Kleppmann's book) will feel easy and straightforward. Of course this does not prepare you for building a production-ready system, but you can go very far with what you learn from the course.
I've audited this course and it's truly excellent! While you're taking it, try and have lunch with SREs in your organization or go to SRE conventions (SRECon, KubeCon, DevOps Days, Percona). There's no substitute for war stories from the trenches.
This book by Martin Kleppmann is really good for learning distributed systems foundations [1]. Couple this with any OS textbook, I think you will be loaded for the bear.
Anything from Martin is good; he bridges the academic and industry worlds, having worked extensively in both. He’s a big part of “Automerge” if you’re interested in Conflict Free Replicated Datatypes (CRDT’s) which is a relatively new development in the distributed space. The video on Google Spanner is also extremely interesting for a modern New SQL style database.
As an aside, the audiobook version of this is really well done. You'll probably get more out of the actual book, but I really enjoyed having this on my long runs.
Martin Kleppmann is an advisor to the Bluesky distributed social network, which has an open protocol and open source code that one can browse, run, and connect to as a supplement to the book if one desires:
https://github.com/bluesky-social
The "Foundations of Blockchain" lecture series by Tim Roughgarden (CS prof at Columbia & head of research at a16z) covers some of these topics (state machine replication, possibility & impossibility results such as FLP) very well in the first lectures. Then it goes into PoW and PoS more specifically, but the foundations are excellent:
It is a good post but I'm suggesting it is missing a Step 0: find a distributed computing problem and spend some time playing with it. Like writing a few toy services and using a text file to coordinate instead of a database even, if you don't have access to an industrial level problem.
A lot of this stuff makes a lot more sense to someone who has a bit of "why we care" context - I'd suggest that for most people transactions look much more exciting after realising how much pain is involved in systems that don't have them. Conversely things like CAP theorem are a lot easier to contextualise as "ah ok, so there are failure cases I can't recover from theoretically; I have to make sure they are rare in practice" - it helps to know how big the magnitude is of these sort of theoretical limits.
I think the book 'Introduction to Reliable and Secure Distributed Programming' by Cachin et al gives a better and more rigorous overview of distributed systems theory than anything else I've seen mentioned on this thread: www.distributedprogramming.net
Is the industry (/academic) answer to distributed systems: you're building it by yourself?
There's no Postgresql or SQLite for distributed system or a template off the shelf standard design? (See LAMP or Linux/Docker/Postgres/Jenkins or rails or django)
There are various options, production grade systems like flink and spark use akka (iiuc) for the distributed building blocks. But of course that's only building blocks, they still have to build up the system.
Imho one of the best guides by far is "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen, I have learned the deep foundamentals regarding distributed systems there.
I have been picking through that book. So far I have to say that the first quarter of the book probably could have been cut, for me the academic exercise of categorizing different hypothetical distributed systems into a complex hierarchy is not really useful. The latter parts of the book are much more interesting and actually give practical examples and guidance.
https://book.mixu.net/distsys/ (Distributed systems: for fun and profit)
you should definitely start nowhere but here. it is meant to be a getting started doc and does a great job in that.
I'd recommend Alex Wu system design interview books volume 1 and 2. Then Designing data intensive applications. It really gets to the point. Also there is added benefit of being ready for MAANG if that opportunity ever comes up..
Would also recommend Kleppmann's book. After reading this you should be able to read some foundational papers. [1] is a good list for that.
I also learned a ton by reading the jepsen analyses [2]. [3] is also helpful, but was recently put "behind a paywall" via an O'Reilly online book, but the internet archive should still have all the content ([4]).
How do I rationalize this article with the casual observation that so much software is distributed software today, and generally it works? It’s an interesting state of affairs.
Just like any other deep area of CS (databases, compilers, kernels, crypto, …) most people use a higher level system, abstraction, or library. You can build a distributed system on top of one of these things, but that doesn’t qualify you to implement (well) the core primitives. And I’d add, that compared to 10 years ago, a lot more people know the core stuff too.
You can also start from classic problems: Dining philosophers, Byzantine Generals, CAP theorem etc. then go towards more concrete examples and implementations.
Checkout MIT's 6.824 Distributed System lectures https://www.youtube.com/@6.824
The course has labs in Go: MapReduce, Raft, Fault-tolerant Key/Value, Sharded Key/Value
Spring 2024 course webpage: https://pdos.csail.mit.edu/6.824/
Can't recommend this enough. The website contains almost all the course's materials, including exams and even answers. The only things that are missing are TA office hours and the credit which they obviously can't offer for free.
If you go through all the lectures and the labs, other materials posted here (including Martin Kleppmann's book) will feel easy and straightforward. Of course this does not prepare you for building a production-ready system, but you can go very far with what you learn from the course.
I've audited this course and it's truly excellent! While you're taking it, try and have lunch with SREs in your organization or go to SRE conventions (SRECon, KubeCon, DevOps Days, Percona). There's no substitute for war stories from the trenches.
This book by Martin Kleppmann is really good for learning distributed systems foundations [1]. Couple this with any OS textbook, I think you will be loaded for the bear.
[1] Designing Data-Intensive Applications:
https://www.oreilly.com/library/view/designing-data-intensiv...
Also from Martin Kleppmann is this lecture series and accompanying notes: https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_H...
I also have the book, and can recommend both!
Anything from Martin is good; he bridges the academic and industry worlds, having worked extensively in both. He’s a big part of “Automerge” if you’re interested in Conflict Free Replicated Datatypes (CRDT’s) which is a relatively new development in the distributed space. The video on Google Spanner is also extremely interesting for a modern New SQL style database.
I knew about him from his work on CRDTs, had no idea he was the same person who wrote Designing Data-Intensive Applications. Impressive dude.
As an aside, the audiobook version of this is really well done. You'll probably get more out of the actual book, but I really enjoyed having this on my long runs.
I have never considered an audio book for something technical like this. My interest is piqued.
Yep. I distinctly remember the few times I spent washing the car listening to this.
Martin Kleppmann is an advisor to the Bluesky distributed social network, which has an open protocol and open source code that one can browse, run, and connect to as a supplement to the book if one desires: https://github.com/bluesky-social
Which OS book would you recommend? Anything similar to the DDIA style (not too academic)?
Operating Systems: Three Easy Pieces is very good
> I think you will be loaded for the bear
I'm misunderstanding this, please explain!
It takes a big gun/bullet to hunt a bear, so you're well prepared aka "have the big guns out".
Typically just written as “loaded for bear”, implying readiness for anything smaller than bear and perhaps even multiple bears! :)
The "Foundations of Blockchain" lecture series by Tim Roughgarden (CS prof at Columbia & head of research at a16z) covers some of these topics (state machine replication, possibility & impossibility results such as FLP) very well in the first lectures. Then it goes into PoW and PoS more specifically, but the foundations are excellent:
https://www.youtube.com/playlist?list=PLEGCF-WLh2RLOHv_xUGLq...
(Note that I still think that crypto and (permissionless) blockchain are inefficient deleterious nonsense...)
It is a good post but I'm suggesting it is missing a Step 0: find a distributed computing problem and spend some time playing with it. Like writing a few toy services and using a text file to coordinate instead of a database even, if you don't have access to an industrial level problem.
A lot of this stuff makes a lot more sense to someone who has a bit of "why we care" context - I'd suggest that for most people transactions look much more exciting after realising how much pain is involved in systems that don't have them. Conversely things like CAP theorem are a lot easier to contextualise as "ah ok, so there are failure cases I can't recover from theoretically; I have to make sure they are rare in practice" - it helps to know how big the magnitude is of these sort of theoretical limits.
I think the book 'Introduction to Reliable and Secure Distributed Programming' by Cachin et al gives a better and more rigorous overview of distributed systems theory than anything else I've seen mentioned on this thread: www.distributedprogramming.net
I always liked patio11's take on this: just play Factorio.
If I built distributed systems like I play Factorio, I'd be understandably blacklisted from the industry.
And when you are ready to go for bit lower level, try Zachtronics games:
Raft is a lot easier to digest than paxos: https://raft.github.io/raft.pdf
Andrew Tanenbaum's Distributed Systems is the gold standard textbook on the subject. https://www.distributed-systems.net/index.php/books/ds4/
For as far as I know, Maarten van Steen mostly wrote that. Tanenbaum is a co-author but Maarten is the driving force behind that one
Is the industry (/academic) answer to distributed systems: you're building it by yourself?
There's no Postgresql or SQLite for distributed system or a template off the shelf standard design? (See LAMP or Linux/Docker/Postgres/Jenkins or rails or django)
There are various options, production grade systems like flink and spark use akka (iiuc) for the distributed building blocks. But of course that's only building blocks, they still have to build up the system.
Imho one of the best guides by far is "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen, I have learned the deep foundamentals regarding distributed systems there.
I have been picking through that book. So far I have to say that the first quarter of the book probably could have been cut, for me the academic exercise of categorizing different hypothetical distributed systems into a complex hierarchy is not really useful. The latter parts of the book are much more interesting and actually give practical examples and guidance.
Byzantine Generals Problem and the CAP theorem should cover most of it.
https://book.mixu.net/distsys/ (Distributed systems: for fun and profit) you should definitely start nowhere but here. it is meant to be a getting started doc and does a great job in that.
There's a nice-looking series of exercises from fly.io: https://fly.io/dist-sys/
(I haven't actually done them myself yet, but they look great. Not a standalone resource, but good for practice)
I'd recommend Alex Wu system design interview books volume 1 and 2. Then Designing data intensive applications. It really gets to the point. Also there is added benefit of being ready for MAANG if that opportunity ever comes up..
Would also recommend Kleppmann's book. After reading this you should be able to read some foundational papers. [1] is a good list for that.
I also learned a ton by reading the jepsen analyses [2]. [3] is also helpful, but was recently put "behind a paywall" via an O'Reilly online book, but the internet archive should still have all the content ([4]).
[1] https://dancres.github.io/Pages/ [2] https://jepsen.io/analyses [3] https://martinfowler.com/articles/patterns-of-distributed-sy... [4] https://web.archive.org/web/20230628001937/https://martinfow...
How do I rationalize this article with the casual observation that so much software is distributed software today, and generally it works? It’s an interesting state of affairs.
Just like any other deep area of CS (databases, compilers, kernels, crypto, …) most people use a higher level system, abstraction, or library. You can build a distributed system on top of one of these things, but that doesn’t qualify you to implement (well) the core primitives. And I’d add, that compared to 10 years ago, a lot more people know the core stuff too.
As Sun would put it, The Network is the Computer. :)
You can also start from classic problems: Dining philosophers, Byzantine Generals, CAP theorem etc. then go towards more concrete examples and implementations.
Besides books and courses, doing stuff on the BEAM VM. Pick a language you find attractive or interesting, hack away.
State machine replication and protocols are also fundamental.
[dead]
The starting point to me is I don't need it. Until it clearly isn't