Learning about distributed systems: where to start? (2020)

muratbuffalo.blogspot.com

198 points by udev4096 15 days ago

Checkout MIT's 6.824 Distributed System lectures https://www.youtube.com/@6.824

The course has labs in Go: MapReduce, Raft, Fault-tolerant Key/Value, Sharded Key/Value

Spring 2024 course webpage: https://pdos.csail.mit.edu/6.824/

pquki4 14 days ago

Can't recommend this enough. The website contains almost all the course's materials, including exams and even answers. The only things that are missing are TA office hours and the credit which they obviously can't offer for free.
If you go through all the lectures and the labs, other materials posted here (including Martin Kleppmann's book) will feel easy and straightforward. Of course this does not prepare you for building a production-ready system, but you can go very far with what you learn from the course.
grepLeigh 14 days ago

I've audited this course and it's truly excellent! While you're taking it, try and have lunch with SREs in your organization or go to SRE conventions (SRECon, KubeCon, DevOps Days, Percona). There's no substitute for war stories from the trenches.

This book by Martin Kleppmann is really good for learning distributed systems foundations [1]. Couple this with any OS textbook, I think you will be loaded for the bear.

[1] Designing Data-Intensive Applications:

https://www.oreilly.com/library/view/designing-data-intensiv...

lll-o-lll 14 days ago

Also from Martin Kleppmann is this lecture series and accompanying notes: https://www.youtube.com/playlist?list=PLeKd45zvjcDFUEv_ohr_H...
I also have the book, and can recommend both!
Anything from Martin is good; he bridges the academic and industry worlds, having worked extensively in both. He’s a big part of “Automerge” if you’re interested in Conflict Free Replicated Datatypes (CRDT’s) which is a relatively new development in the distributed space. The video on Google Spanner is also extremely interesting for a modern New SQL style database.
- danielvaughn 14 days ago
  
  I knew about him from his work on CRDTs, had no idea he was the same person who wrote Designing Data-Intensive Applications. Impressive dude.
PNewling 14 days ago

As an aside, the audiobook version of this is really well done. You'll probably get more out of the actual book, but I really enjoyed having this on my long runs.
- xyzzy_plugh 14 days ago
  
  I have never considered an audio book for something technical like this. My interest is piqued.
- cqqxo4zV46cp 14 days ago
  
  Yep. I distinctly remember the few times I spent washing the car listening to this.
akgerber 14 days ago

Martin Kleppmann is an advisor to the Bluesky distributed social network, which has an open protocol and open source code that one can browse, run, and connect to as a supplement to the book if one desires: https://github.com/bluesky-social
abhijeetpbodas 14 days ago

Which OS book would you recommend? Anything similar to the DDIA style (not too academic)?
- dondraper36 14 days ago
  
  Operating Systems: Three Easy Pieces is very good
jimmcslim 14 days ago

> I think you will be loaded for the bear
I'm misunderstanding this, please explain!
- aaronax 14 days ago
  
  It takes a big gun/bullet to hunt a bear, so you're well prepared aka "have the big guns out".
  
  photonthug 14 days ago
  
  Typically just written as “loaded for bear”, implying readiness for anything smaller than bear and perhaps even multiple bears! :)

FabHK 15 days ago

The "Foundations of Blockchain" lecture series by Tim Roughgarden (CS prof at Columbia & head of research at a16z) covers some of these topics (state machine replication, possibility & impossibility results such as FLP) very well in the first lectures. Then it goes into PoW and PoS more specifically, but the foundations are excellent:

https://www.youtube.com/playlist?list=PLEGCF-WLh2RLOHv_xUGLq...

(Note that I still think that crypto and (permissionless) blockchain are inefficient deleterious nonsense...)

roenxi 14 days ago

It is a good post but I'm suggesting it is missing a Step 0: find a distributed computing problem and spend some time playing with it. Like writing a few toy services and using a text file to coordinate instead of a database even, if you don't have access to an industrial level problem.

A lot of this stuff makes a lot more sense to someone who has a bit of "why we care" context - I'd suggest that for most people transactions look much more exciting after realising how much pain is involved in systems that don't have them. Conversely things like CAP theorem are a lot easier to contextualise as "ah ok, so there are failure cases I can't recover from theoretically; I have to make sure they are rare in practice" - it helps to know how big the magnitude is of these sort of theoretical limits.

anonymousDan 14 days ago

I think the book 'Introduction to Reliable and Secure Distributed Programming' by Cachin et al gives a better and more rigorous overview of distributed systems theory than anything else I've seen mentioned on this thread: www.distributedprogramming.net

alexpotato 14 days ago

I always liked patio11's take on this: just play Factorio.

bhaney 14 days ago

If I built distributed systems like I play Factorio, I'd be understandably blacklisted from the industry.
metalrain 14 days ago
And when you are ready to go for bit lower level, try Zachtronics games:
```
   - Opus Magnum
   - Exapunks
   - TIS-100
```

jvans 14 days ago

Raft is a lot easier to digest than paxos: https://raft.github.io/raft.pdf

denkmoon 14 days ago

Andrew Tanenbaum's Distributed Systems is the gold standard textbook on the subject. https://www.distributed-systems.net/index.php/books/ds4/

mettamage 14 days ago

For as far as I know, Maarten van Steen mostly wrote that. Tanenbaum is a co-author but Maarten is the driving force behind that one

samsquire 14 days ago

Is the industry (/academic) answer to distributed systems: you're building it by yourself?

There's no Postgresql or SQLite for distributed system or a template off the shelf standard design? (See LAMP or Linux/Docker/Postgres/Jenkins or rails or django)

lacksconfidence 14 days ago

There are various options, production grade systems like flink and spark use akka (iiuc) for the distributed building blocks. But of course that's only building blocks, they still have to build up the system.

adif_sgaid 14 days ago

Imho one of the best guides by far is "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen, I have learned the deep foundamentals regarding distributed systems there.

__turbobrew__ 14 days ago

I have been picking through that book. So far I have to say that the first quarter of the book probably could have been cut, for me the academic exercise of categorizing different hypothetical distributed systems into a complex hierarchy is not really useful. The latter parts of the book are much more interesting and actually give practical examples and guidance.

SJC_Hacker 14 days ago

Byzantine Generals Problem and the CAP theorem should cover most of it.

luciusdomitius 14 days ago

https://book.mixu.net/distsys/ (Distributed systems: for fun and profit) you should definitely start nowhere but here. it is meant to be a getting started doc and does a great job in that.

mark_h 14 days ago

There's a nice-looking series of exercises from fly.io: https://fly.io/dist-sys/

(I haven't actually done them myself yet, but they look great. Not a standalone resource, but good for practice)

kungfupawnda 14 days ago

I'd recommend Alex Wu system design interview books volume 1 and 2. Then Designing data intensive applications. It really gets to the point. Also there is added benefit of being ready for MAANG if that opportunity ever comes up..

FiberBundle 14 days ago

Would also recommend Kleppmann's book. After reading this you should be able to read some foundational papers. [1] is a good list for that.

I also learned a ton by reading the jepsen analyses [2]. [3] is also helpful, but was recently put "behind a paywall" via an O'Reilly online book, but the internet archive should still have all the content ([4]).

[1] https://dancres.github.io/Pages/ [2] https://jepsen.io/analyses [3] https://martinfowler.com/articles/patterns-of-distributed-sy... [4] https://web.archive.org/web/20230628001937/https://martinfow...

mlhpdx 14 days ago

How do I rationalize this article with the casual observation that so much software is distributed software today, and generally it works? It’s an interesting state of affairs.

johngossman 14 days ago

Just like any other deep area of CS (databases, compilers, kernels, crypto, …) most people use a higher level system, abstraction, or library. You can build a distributed system on top of one of these things, but that doesn’t qualify you to implement (well) the core primitives. And I’d add, that compared to 10 years ago, a lot more people know the core stuff too.
pjmlp 14 days ago

As Sun would put it, The Network is the Computer. :)

metalrain 14 days ago

You can also start from classic problems: Dining philosophers, Byzantine Generals, CAP theorem etc. then go towards more concrete examples and implementations.

cess11 13 days ago

Besides books and courses, doing stuff on the BEAM VM. Pick a language you find attractive or interesting, hack away.

mehulashah 14 days ago

State machine replication and protocols are also fundamental.

throwaway984393 14 days ago

[dead]

a_c 14 days ago

The starting point to me is I don't need it. Until it clearly isn't