Ask HN: What open source project, in your opinion, has the highest code quality?

458 points by chefqual 6 years ago

akavel 6 years ago

I hold the source code of Go standard library & base distribution (i.e. compiler, etc.) in very high regard. Especially the standard library is, in my opinion, stunningly easy to read, explore and understand, while at the same time being well thought through, easy to use (great and astonishingly well documented APIs), of very good performance, and with huge amounts of (also well readable!) tests. The compiler (including the runtime library) is noticeably harder to read and understand (especially because of sparse comments and somewhat idiosyncratic naming conventions; that's partly explained by it being constantly in flux). But still doable for a human being, and I guess probably significantly easier than in most modern compilers. (Though I'd love to be proven wrong on this account!)

At the same time, the apparent simplicity should not be mistaken for lack of effort; on the contrary, I feel every line oozes with purpose, practicality, and to-the-point-ness, like a well sharpened knife, or a great piece of art where it's not about that you cannot add more, but that you cannot remove more.

013a 6 years ago

This is one of the great things about many Go libraries; the language is so simple its difficult to overcomplicate a Go project. This makes reading any Go source code, projects, libraries, the stdlib, a joy. The only times I've found Go libraries to be a PITA to read is when they get autogenerated from some other language (protobuf compilations, parts of the compiler that came from C, AWS/GoogleCloud/Azure libraries, etc), but that's to be expected in every language.
Kubernetes is another great example of a project that is so unbelievably complex in its function, it should be completely impenetrable to anyone who isn't a language expert. But, go check it out; its certainly complex and huge, but actually grokable.
- ben_jones 6 years ago
  
  I would argue while Kubernetes is a great piece of software, and its definitely practical to go in with relatively little experience and tweak a single line or function, Kubernetes is not easy to grok or reproduce in its entirety for example it has its own implementation of generics and a custom type system [1].
  [1]: https://medium.com/@arschles/go-experience-report-generics-i...
eptcyka 6 years ago

I'd agree, but only as far as aesthetics go. When you have to understand the time complexity and runtime characteristics of the standard library sorting algorithms, I think Go does a very bad job - the standard `sort.Sort(data sort.Interface)` will run poorly if the data is already mostly sorted. I expect these kinds of things to be documented properly.
- kragen 6 years ago
  
  Golang's `sort.Sort(data sort.Interface)` will sort mostly-sorted data in nearly its fastest possible time, because it basically uses median-of-three quicksort, falling back to insertion sort for small partitions. Median-of-three on sorted or nearly-sorted data picks the optimal or nearly optimal partitioning element for quicksort. The code is simple, readable, and well-commented. Moreover, its average and worst-case complexity is documented in the godoc.
  In short, your comment is wrong from beginning to end. What led you to believe that anything in it was true?
- pjscott 6 years ago
  
  It's guaranteed to run in O(n log n) time. Currently it uses quicksort with heapsort as a fallback to prevent quicksort's quadratic worst-case time.
  https://golang.org/src/sort/sort.go?s=5414:5439#L206
- jehlakj 6 years ago
  
  I was pretty certain most libraries shuffle then quicksort. No need for documentation. Does go not do this?
  
  kazagistar 6 years ago
  
  JVM and Python use timsort, which is O(N) on mostly sorted data of all kinds.
  
  sova 6 years ago
  
  timsort is full of win, and such a clever approach.
  
  lorenzhs 6 years ago
  
  I'm sorry, but Timsort is a bit of a hack. It's a "this seems to work well" algorithm, and it shows. It took 13 years until its claimed running time was finally proven in 2015. The four (originally three) rules for merging sequences from the stack are rather arbitrary. Multiple issues were found well after it was already widely deployed.
  Recently, it was also shown that Timsort doesn't optimally use the information it has about runs. As an alternative, powersort was proposed, which seems to outperform Timsort both on randomly ordered inputs as well as inputs with long runs: https://arxiv.org/pdf/1805.04154.pdf
  
  sova 6 years ago
  
  Totally! And there are much better algorithms for the internet routing protocols, very well documented and tested and all, still just sitting on desks under paperweights...
  
  xapata 6 years ago
  
  > a bit of a hack
  Isn't that most of computing?
  
  xapata 6 years ago
  
  Ever heard of Timsort?
  
  dgacmu 6 years ago
  
  People were still finding bugs in common implementations of timsort as of 3 years ago. It's not unreasonable to stick with a somewhat more conservative choice for a core library function until there's more reason to have confidence in the implementations of timsort.
  
  arayh 6 years ago
  
  This comment got me looking into timsort bugs. This was a really interesting read: http://envisage-project.eu/proving-android-java-and-python-s...
  
  IronBacon 6 years ago
  
  You know what, I'm not surprised at all.
  Didn't one of the most simple algorithms, binary search, suffered of a bug in a standard library (was it Java?) a few years ago? If IIRC it was a corner case, I should check it because I don't recall the details, but it looked robust code.
  Edit: I think it's this one https://ai.googleblog.com/2006/06/extra-extra-read-all-about...
  
  xapata 6 years ago
  
  If I recall correctly, the bug only applied for arrays/lists of length greater than 2 to the more-than-astronomical. Not something that anyone ever encountered in the wild, because current hardware doesn't have enough memory.
  Edit: The article linked in the other comment says the Java dev team didn't even bother to implement the "proper" fix, but merely adjusted how much space is allocated.
  
  NicoJuicy 6 years ago
  
  Timsort always reminds me of monkeysort ( http://leonid-shevtsov.github.io/monkeysort/ )
  
  eptcyka 6 years ago
  
  Being pretty certain isn't the same as being certain. I really don't care what most libraries do as long as they document what exactly they have chosen to do. Go's `sort.Sort(data Interface)` definitely does not shuffle.
rusk 6 years ago

> standard library is, in my opinion, stunningly easy to read
Reading this brought to mind the JDK. All well structured, neatly formatted and well documented. I’ll often just click thru to the source to get the nitty-gritty on a function, I rarely need to consult the actual docs!
- rick22 6 years ago
  
  not c++ stl. Very obscure.
  
  rusk 6 years ago
  
  Some would consider that a virtue
  
  rick22 6 years ago
  
  How is hard to read code a virtue. ?
unstuckdev 6 years ago

The fact that almost everyone uses the same style and standards through the go tools has made learning easy. I can dip into the most advanced package and make sense of what's going on quickly.
abtinf 6 years ago

I think one of the things that makes Go library code so easy to read is the lack of generics. Everything you need to understand the code is right in front of you, without the barrier of having to learn new sets of complicated abstractions or worrying that some obscure code in some other file impacts/is invisibly called by the function. With large code bases written in other programming languages, I have to spend an inordinate amount of time studying the code base and object relations before making changes.
For me, code readability is such a high value that, on these grounds alone, I oppose the introduction of generics and hope the current proposals ultimately fail.
cube2222 6 years ago

As far as I know, parts of the compiler are still code automatically translated from C, so this may be part of the reason.
- colek42 6 years ago
  
  It is not, the compiler has been pure Go since, I think, v1.4.
  
  akavel 6 years ago
  
  That doesn't invalidate the parent comment :) The code is pure Go, but parts of it originate in C by means of automatic translation during development of Go 1.4 (or whichever version it was).
  
  cube2222 6 years ago
  
  Exactly, here's the tool that's been used: https://github.com/rsc/c2go

tzury 6 years ago

SQLite.

and for this reason alone!

https://www.sqlite.org/testing.html

    As of version 3.23.0 (2018-04-02), the SQLite library consists of approximately 
    128.9 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in 
    other words, lines of code excluding blank lines and comments.) 

    By comparison, the project has 711 times as much test code and test scripts - 
    91772.0 KSLOC.

danmaz74 6 years ago

Automated testing is useful and good. But I really feel it's reached a lever of fetishisation that is quite concerning.
Testing code is code which needs to be written, read, maintained, refactored. Very often nowadays I have to wade through tests which test nothing useful, except syntax. Even worse, with developers who adopt the mock-everything approach, I often find tests which only verify that the implementation is exactly the one they wrote, which is even worse: it makes refactoring a pain, because, even if you rewrote a method in a better way which produces exactly the results you wanted, the test will fail.
So, the ratio of testing code vs implementation code is a completely wrong proxy for code quality.
EDIT: I'm not criticising SQLite and their code quality - which I never studie - but the idea that you can judge code quality for a project just by the ratio of test code vs implementation code.
- faitswulff 6 years ago
  
  They actually have to test to that degree to follow aviation standards (DO-178b [0]) because they're used in aviation equipment.
  Dr. Hipp said he started really following it when Android came out and included SQLite and suddenly there were 200M mobile SQLite users finding edge cases: https://youtu.be/Jib2AmRb_rk?t=3413
  Lightly edited transcript here:
  > It made a huge difference. That that was when Android was just kicking off. In fact Android might not have been publicly announced, but we had been called in to help with getting Android going with SQLite. [Actually], they had been publicly announced and there were a bunch of Android phones out and we were getting flooded with problems coming in from Android.
  > I mean it worked great in the lab it worked great in all the testing and then [...] you give it to 200 million people and let them start clicking on their phone all day and suddenly bugs come up. And this is a big problem for us.
  > So I started doing following this DO-178b process and it took a good solid year to get us there. Good solid year of 12 hour days, six days a week, I mean we really really pushed but we got it there. And you know, once we got SQLite to the point where it was at that DO-178b level, standard, we still get bugs but you know they're very manageable. They're infrequent and they don't affect nearly as many people.
  > So it's been a huge huge thing. If you're writing an application deal ones, you know a website, a DO-178b/a is way overkill, okay? It's just because it's very expensive and very time-consuming, but if you're running an infrastructure thing like SQL, it's the only way to do it.
  [0]: https://youtu.be/Jib2AmRb_rk?t=677 "SQLite: The Database at the Edge of the Network with Dr. Richard Hipp"
  
  Nokinside 6 years ago
  
  SQlite is very high quality software, but they use DO-178b "inspired" testing process. As far as I know they don't have version of software that is or can be used in safety critical parts despite their boasting.
  They say in their site that:
  > Airbus confirms that SQLite is being used in the flight software for the A350 XWB family of aircraft.
  Flight software does not imply safety critical parts of avionics. It can be the entertainment system or some logging that is not critical.
  
  SQLite 6 years ago
  
  Correct. The key word is "inspired". Multiple companies have run a DO-178B cert on SQLite, I am told, but the core developers did not get to participate, and I think the result was level-C or -D.
  While all that was happening 10+ years ago, I learned about DO-178B. I have a copy of the DO-178B spec within arms reach. And I found that, unlike most other "quality" standards I have encountered, DO-178B is actually useful for improving quality.
  I originally developed the TH3 test suite for SQLite with the idea that I could sell it to companies interested in using SQLite in safety-critical applications, and thereby help pay for the open-source side of SQLite. That plan didn't work out as nobody ever bought it. But TH3 and the discipline of 100% MC/DC testing was and continues to be enormously helpful in keeping bugs out of SQLite, and so TH3 and all the other DO-178B-inspired testing and refactoring of SQLite has turned out to be well worth the thousands of hours of effort invested.
  The SQLite project is not 100% DO-178B compliant. We have gotten slack on some of the more mundane paperwork aspects. Also, we aggressively optimize the SQLite code base for performance, whereas in a real safety-critical application the focus would be on extreme simplicity at the cost of reduced performance.
  However, if some company does call us tomorrow and says that they want to purchase a complete set of DO-178B/C Level-A certification artifacts from us, I think we could deliver that with a few months of focused effort.
  
  moron4hire 6 years ago
  
  I just bought a copy of DO-178C after reading these posts here and the Wikipedia article on it. $290, but if it's good, it should be worth it, right?
  
  SQLite 6 years ago
  
  I haven't seen -C only DO-178B, though I'm told there isn't much difference. It is not a page-turner. It took me about a year to really understand it.
  
  sbradford26 6 years ago
  
  Yeah DO-178B gives several levels for software from DALA (highest) to DALE (lowest). If DALA software fails the results are catastrophic if DALE fails there is no effect on the aircraft. Since DALE is usually just test equipment and such they might be at a DALD level. So still requires a lot of testing but not nearly to the level that DALA requires.
  https://en.wikipedia.org/wiki/DO-178B
  
  Nokinside 6 years ago
  
  I think it's possible that parts of SQLite, for example file format in read only mode and few constant queries are certified as part of some safety critical software.
  Hipp's Hwaci consulting company would probably help to do the work, but it has no relation to the SQLite as a library.
  
  faitswulff 6 years ago
  
  Good point. The video I linked to merely says that he was contacted by someone in the aviation space about the standard, which I took to mean that it was used in avionics.
  
  redleggedfrog 6 years ago
  
  "They actually have to test to that degree to follow aviation standards..."
  I didn't know that, and that's very cool.
  Makes me think that the name SQLite is misnomer.
- lolc 6 years ago
  
  While I agree in general I disagree here. If you read about the Sqlite tests you will find that they do test sensibly.
  One suite I'm particularly impressed with will run tests from zero bytes with slowly increasing available memory until the program passes. The tests verify that at no point the DB is corrupted by an OOM event.
  
  danmaz74 6 years ago
  
  Just to clarify, I wasn't criticising sqlite, I was criticising the idea of judging their code quality "for this reason alone!" - ie that they have so much test code vs implementation.
  
  lolc 6 years ago
  
  Sure! Can't judge a book by its size.
  As a heuristic the code versus test-code ratio serves well as an indicator of quality. Just like consistent indentation does. You don't know whether a well-indented program is good. But if the indentation is inconsistent you'll expect worse.
- forgotmypw 6 years ago
  
  You may not be able to judge code quality by the presence of test code, but you can by its absence.
- rb808 6 years ago
  
  Agreed on the testing fetish and usually they're not valuable. Uncle Bob wrote about this last year and its something I wish everyone would read https://blog.cleancoder.com/uncle-bob/2017/10/03/TestContrav...
  
  arcticbull 6 years ago
  
  Yes, bad tests are bad. Yes, mocks are bad. Good tests, however, are good.
  To expound on that, designing for testability allows you to sidestep the need for mocks almost entirely, and forces you into easier, more reliable and more consistent code. Then when you choose to test it, the tests are simple, straightforward and valuable.
- jim_bailie 6 years ago
  
  Oh boy, I would give your comment an infinite number of up-votes. Yes, testing has reached fetish-like levels.
  Some of the test code I've encountered recently has been more voluminous, complex and has taken more man hours to develop and maintain than the application or library it's assigned to.
  For the love of God, develop the damned software! It's either going to work or it's not.
  
  kyberias 6 years ago
  
  > has taken more man hours to develop and maintain than the application
  But that is perfectly normal when developing quality code.
  There is no rule that says that test code development should take LESS time.
  Certainly different applications have different quality requirements. Perhaps the software you are developing doesn't have that high requirements?
  
  jim_bailie 6 years ago
  
  Yeah, yeah. Just venting. The products I work on aren't the most important, but they certainly are quite important and most of the testing infrastructure that has been built thus far has a lot of goofy sh@t in it.
  I just don't have a high tolerance for needless complexity and gee-whiz-look-what-I-can-do while the clock is running.
nathell 6 years ago

And this:
https://www.sqlite.org/malloc.html
"SQLite can be configured so that, subject to certain usage constraints detailed below, it is guaranteed to never fail a memory allocation or fragment the heap."
gameswithgo 6 years ago

A better comparison would be some sort of defect rate. Does SQLLite manage less defects per line of code per month (or whatever) than PostgreSQL with that test suite?
Is there a distinction between the best codebase and the best test suite? Probably.
TeMPOraL 6 years ago

Tangent: I always thought SLOC means "significant lines of code". Since "code" is a shortand for "source code", expanding SLOC as "source lines of [source] code" makes little sense, IMO.
- elviejo 6 years ago
  
  "Source lines" includes lines with comments, empty lines and code. Source lines of code... Specifies only lines with code, no comments
- dx87 6 years ago
  
  I always thought it meant "single lines of code", like if line breaks added for formatting had been removed so every statement had its own line.
gardaani 6 years ago

And all that is public domain: https://www.sqlite.org/copyright.html
- anewhnaccount2 6 years ago
  
  Not all of the test suite is public domain. That's how the authors maintain a monopoly on improvements to the code.
  
  anewhnaccount2 6 years ago
  
  As a point of information, I wasn't using monopoly in a particularly pejorative sense, but more just to explain the situation. Feel free to replace with a less loaded word in your head.
IshKebab 6 years ago

That doesn't mean their code is necessarily nice.
ttty 6 years ago

So you measure good code by the number of test lines?
- wbkang 6 years ago
  
  No but it could be a good proxy.
  
  13of40 6 years ago
  
  It's like code coverage, though. Zero might indicate a problem, but lots doesn't necessarily mean "good".
Daviey 6 years ago

Inversely, this is counteracted by the logic to not use Git:
https://sqlite.org/whynotgit.html
- raverbashing 6 years ago
  
  But to be honest, this is mostly fair criticism. Not sure why they need some of those things, but if they need it, then it's fine.
  
  deanishe 6 years ago
  
  It's explained in more depth here: https://fossil-scm.org/fossil/doc/trunk/www/fossil-v-git.wik...
  Essentially, git is designed for the "bazaar" development model, while Fossil is designed for the "cathedral" one, which is what SQLite uses, being developed by just three guys working very closely together.
  
  Drdrdrq 6 years ago
  
  Thos is not the reason. The real reasons are explained quite nicely here [0] and I agree with all of them - though I still use Git because of network effects. My alternative of choice would be hg.
  [0] https://sqlite.org/whynotgit.html
- utahcon 6 years ago
  
  The most interesting thing (IMO) left out of this page is the fact that D. Richard Hipp is the author of Fossil. This has a particular smell factor to it, not sure if it is a good or bad smell.
- vnorilo 6 years ago
  
  how would using a specific vcs improve the code quality?
  
  Daviey 6 years ago
  
  This was a response to test coverage used as a benchmark. I see them both as an indirect project quality benchmark. Neither is directly representing the actual code quality but does indicate if it is a project that matches my view of a mature project. Sure, others will disagree - but the whole idea of quality in this context is subjective anyway.

moviuro 6 years ago

OpenBSD and Co. (OpenSSH, etc.)

* https://cvsweb.openbsd.org/src/

* https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/

* https://www.libressl.org/

* https://cvsweb.openbsd.org/src/usr.sbin/ntpd/

NelsonMinar 6 years ago

There's a lot to admire in OpenBSD but sometimes the peripheral tools they deliver don't work well. The example I know best is OpenNTPD (last link in your comment). It has a bunch of problems, including relatively poor clock discipline compared to other NTP implementations. And it doesn't even try to handle leap seconds. That causes problems on the machine itself which may or may not matter to you, but it's catastrophic if that OpenNTPD serves time to other servers. Unfortunately there's a bunch of OpenNTPD servers in the NTP Pool actively providing bad time. Some details from the 2016 leap second: https://community.ntppool.org/t/leap-second-2017-status/59/1...
Again I mostly admire OpenBSD. But OpenNTPD is not the best example of their work.
- eikenberry 6 years ago
  
  Have you had a chance to look at systemd-timesyncd? I'm curious how it stacks up as a NTP client. Was thinking about switching my systems to it (from chrony) as I don't need the NTP server functionality.
- moviuro 6 years ago
  
  This is seriously bad, but I do get why they thought OpenNTPd was necessary (bad/perfectible code in the other implementations).
  Maybe I'll check that at home (where I replaced FreeBSD's ntpd with OpenNTPd).
  
  NelsonMinar 6 years ago
  
  Yeah the stock old ntpd had a lot of unused code and various security problems over the years. It makes sense OpenBSD would replace it. Just a shame they didn't do it completely. I think that describes a lot of OpenBSD tools; you're trading off some functionality for very good security.
  There are better NTP implementations now. Chrony is great, it's the default in Ubuntu now. NTPsec is coming along although I haven't tried to use it myself. Also good ol' ntpd is greatly improved.
  https://chrony.tuxfamily.org/ https://www.ntpsec.org/
basementcat 6 years ago

Just wanted to +1 this.
Once had to make some changes to OpenSSH for an internal project and it was surprisingly easy to find the relevant code and make the necessary changes. One of the few times my code worked on the first compile.
rurban 6 years ago

for sure not. OpenBSD makes no attempt to use proper performance which is critical for a kernel. there are so many naive ad-hoc data structures and algos, it's a shame to walkthrough.
- moviuro 6 years ago
  
  Meh. Given the recent Spectre, Meltdown, etc. maybe we shouldn't focus on performance first, but security.
  https://www.openbsd.org/faq/pf/perf.html
  https://www.openbsd.org/crypto.html
  https://www.openbsd.org/security.html
- barrow-rider 6 years ago
  
  OpenBSD's niche is security -- that's the point.
- codemusings 6 years ago
  
  How is performance related to code quality. That makes no sense. If anything, if you had to inline ASM for example, the code would suffer from readability.
  
  rurban 6 years ago
  
  using comma seperated string splitting options over normal bits or'ed together in their public API looses all credibility in their engineering abilities. it would not survive any professional code review.
  
  iainmerrick 6 years ago
  
  Shouldn’t good performance be one of the goals of good code?
  
  moviuro 6 years ago
  
  Depends on the design goals. If you want secure code, you'll make it readable. Here's true(1):
  :
  A single "noop" in a 755 file.
  A C true would be: https://cvsweb.openbsd.org/src/usr.bin/true/true.c?rev=1.1&c...
  Here's a much faster true(1) if you need it: https://github.com/coreutils/coreutils/blob/master/src/true....
  
  hultner 6 years ago
  
  Is the gnu-coreutils true much faster? I have a hard time seeing how return 0 can be so slow.
  
  iainmerrick 6 years ago
  
  I did say “one of the goals”.
  I don’t see how those examples are relevant. Why would that last one be faster?
  I agree that the OpenBSD code here is good, no more and no less than needed.
  I assumed the grandparent was referring to cases where an O(n) algorithm is used where it might be O(log n) or O(1) with just a little more effort. It’s a tradeoff, sure, and in some cases linear searches can work surprisingly well, but in general I think this kind of thing should always be considered in good code.
  Micro-optimizations like inline assembly for inner loops may or may not be a good idea, depending on the application. All else being equal, I’d certainly agree that good clean code would not use assembly.
  
  josefx 6 years ago
  
  How is the coreutils true faster?
  I would expect the openbsd true to be the fastest, it doesn't need to spawn a subshell and it doesn't do more than the posix specification requires (afaik --help/--version should be ignored).
  
  moviuro 6 years ago
  
  Experience shows it's faster. It's just weird, but it's like that.
  time { for i in $(seq 1 10000); do /path/to/true; done; }
  
  iainmerrick 6 years ago
  
  What are you comparing against what there? Two C executables with the same compiler flags on the same OS?

kristopolous 6 years ago

NetBSD.

Why? I was able to do substantial changes to the kernel when I was a teenager (late 90s), mostly on my first try. There was no giant wall of abstraction I had to climb over or some huge swath of mutually interacting code I had to comprehend. There was also nothing that required fancy code navigation and the creation of something like the ctags database in order to find out what on earth was happening.

No action at a distance or lasagna style dereferencing or mysterious type names that are just typedef'd and #define'd around dozens of times back to something basic like char. No fancy obscure GNU preprocessor extensions or exotic programming patterns.

Nothing had obtuse documentation that tried my patience or required much more than enthusiasm and basic C knowledge.

I did things like got a wireless card working from code written for one with a similar chipset and got various other things like the IrDA transmitter on my laptop at the time to do a slattach and thus work as a primitive wireless network - all in the late 90s.

I likely had no idea what, say, the difference between network byte order and host byte order was at the time or how the 802.11b protocol worked or what a radiotap header was or any of that. The separation of concerns was so good however, that none of that knowledge was actually needed.

Compare that to say, the Qualcomm compatible WWAN I just dealt with over the past few weeks where I needed to have in-depth knowledge of an exhaustive number of things (very specific chipset and network details) to get a basic ipv4 address working. Then I needed to read up on GNSS technology and NMEA data to debug codes over USBmon to get the GPS from the wwan working. Then after I had the qmi kernel modules doing what I wanted and the qmi userland toolsets, I had to write some python scripts to talk to dbus to get the data from the modemmanager that I needed in order to log the GPS. All the maintainers of these pieces were very nice and helpful and I have nothing negative to say. This is just how it usually is these days.

Back then however, I wasn't a good programmer, I was likely pretty terrible in fact but with the NetBSD codebase I was able to knockout whatever I wanted every time, fast, on a 486.

I miss those days.

floren 6 years ago

> No action at a distance or lasagna style dereferencing or mysterious type names that are just typedef'd and #define'd around dozens of times back to something basic like char. No fancy obscure GNU preprocessor extensions or exotic programming patterns.
Ah, I see you've also looked at the Linux kernel code.
akavel 6 years ago

What's your relation with it nowadays? I'm very curious about NetBSD but never tried it yet. I sincerely wonder what's your opinion on it now, and why you speak about the situation only as "those days" now? :)
- kristopolous 6 years ago
  
  I have no idea, haven't kept up with it. I'd recommend 1.x (<=4) any day though, simply for the education alone.
  I don't really use it these days because I need systems that future cheap devs can maintain and once you enter userland it takes commitment and time I simply don't have to stay with netbsd.
  Debian permits me to usually not have to care and that's pretty invaluable

hyperman1 6 years ago

Its older than some HN posters, but the GPLed DOOM source code was one I liked.

The performance reached by the game was considered impossible until Carmack did show us otherwise. So I expected lots of ASM and weird hacks, especially as compiler optimization wasnt as good as it is today.

Surprise, surprise, the thing was easy to read, easy to get going, easy to port, reasonablye documented . It has shown me what a goog balance between nice code and usable code is.

If you want tho browse: https://github.com/id-Software/DOOM

NoSirRah 6 years ago

FYI, "Surprise, surprise" is generally meant sarcastically. I think you mean "Surprisingly".
https://www.merriam-webster.com/dictionary/surprise,%20surpr...
- hyperman1 6 years ago
  
  I can speak English! I learn it from a book!
  Though not always very good. Thanks for the bugfix.
- jay_kyburz 6 years ago
  
  Actually, Hypeman is using this correctly in my opinion. We all knew the code would be good, it was only him that doubted it. So its not surprising to us, the reader, that the code is good.
iso-8859-1 6 years ago
You are reading cleaned up source code that only compiles and runs on Linux. That's why it looks nice.
```
  Many thanks to Bernd Kreimeier for taking the time to clean up the
  project and make sure that it actually works.  Projects tends to rot if
  you leave it alone for a few years, and it takes effort for someone to
  deal with it again.
```
- hyperman1 6 years ago
  
  I didnt have interrnet at the time so I didn't check github 20 years ago ;-)
  On the more serous side, i wanted to say something about the TODOs as example of the balance, but couldnt find any. I thought i was confusing with quake, but the cleanup might explain it better.
GNi33 6 years ago

These 666 forks can't be a coincidence, right?
- thecatspaw 6 years ago
  
  it seems to be, judging by the fact that its now 667 forks
- blackbeard334 6 years ago
  
  667
  +1
de_watcher 6 years ago

The DOOM code is so straightforward. You don't ever experience that feeling of having zero understanding of the code when you look into a file.

bsandert 6 years ago

This is not necessarily about the code, but I've been really impressed for a while by the lodash project and its maintainer's dedication to constantly keep the number of open issues at 0. Any issues get dealt with at record speed, it's quite a sight to see.

https://github.com/lodash/lodash/issues

oldmanhorton 6 years ago

JDD, the maintainer, is also incredibly devoted and overall a nice guy to talk to. He has something like 5 years (and counting) of making a commit every single day, including weekends and holidays and sick days. They may not always be world-changing commits, but it still shows an incredible amount of dedication
rootlocus 6 years ago

Not necessarily relevant, but 15% of the issues are labeled "wontfix".
- adimitrov 6 years ago
  
  With such a big project, being quick to hand out wontfix isn't necessarily a bad thing. To be honest, seeing as this project is used by a huge part of the… rather diverse JS crowd, 15% wontfix is astoundingly low.
- fergie 6 years ago
  
  As an open source maintainer myself, that seems like a pretty low percentage.
xfs 6 years ago

It's not always a good thing. In the haste of fixing things introspection of root causes may be neglected...
dilipray 6 years ago

Never noticed this.

curtisz 6 years ago

Strictly talking about code quality, I will nominate RCP100, which is a small, virtually unknown, now-abandoned routing software written in C [0]. I started programming with C way back in the 90s, and this is one of only two projects I can recall being immediately struck by the beauty of the code (Redis being the other). I know almost nothing about the author but he seems not to want to be known by name. You can browse the source on Github [1], which I uploaded myself, since you can only get a tarball from sourceforge. Anyway, as someone else mentions, C is usually a mess, but RCP100 struck me as beautiful.

[0] http://rcp100.sourceforge.net/

[1] https://github.com/curtiszimmerman/rcp100

carliton 6 years ago

Hi Curtiz,
Thanks for uploading RCP100. Your comment is a timely one. I wanted to learn how a router works and is built and was looking for a simpler implementation.
Can you recommend any resources from which I could learn more about network programming, so that I could understand RCP100 code better?
Thanks!
hansoolo 6 years ago

Maybe you just send the guys an email ;)
- curtisz 6 years ago
  
  I actually did send fan mail to the author, heh, thinly-disguised as a courtesy to let them know that I mirrored their project on Github.

zubairlk 6 years ago

I'm surprised no one has mentioned the Linux Kernel!

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

It is quite clean when you consider the task that it accomplishes.

Being able to compile across multiple architectures/endian-ness,32/64-bit/scale up/down from server/desktop/router/phone while accepting contributions from thousands of people..

aortega 6 years ago

It's not clean at all. Thousands of different styles, no single convention on function-naming, etc.
Want a clean kernel, go look at the BSDs.
elihu 6 years ago

One thing the Linux kernel has going for it is that there are a lot of books that describe how the various parts work and how to use the various internal interfaces. I can't think of any other open source project that has multiple books on how to contribute.
(Sadly, most of those good kernel books were written in the 90's and early 2000's. I don't know if there are any recent kernel hacking books.)

brentjanderson 6 years ago

Going to throw Elixir Lang into the mix.

- The tooling is excellent.

- The code is well-documented and readable.

- The core team committed to never needing to introduce breaking changes.

The Elixir community tends to produce work that is actually considered "Done". An elixir package is not stale when it hasn't seen a commit in a few months. Instead, the feeling is: "It's feature complete and only needs maintenance from here on out."

https://github.com/elixir-lang/elixir

flaque 6 years ago

> The core team committed to never needing to introduce breaking changes.
Is this why Elixir seems to have many different ways of doing the same thing though?
- JamesUtah07 6 years ago
  
  I think that's one reason. The other is that classic erlang (Elixir is built on top of the erlang beam vm) sometimes does things one way but elixir has a more elegant way of doing the same thing, however, in elixir you can still call into erlang libraries to achieve the same thing if that's more familiar to you.

binalpatel 6 years ago

Scikit Learn comes to mind - not just because I can dig into the source code and immediately know what's happening, but also for the stellar documentation that goes above and beyond telling your what the functions do.

For example their Cross Validation documentation is amazing:

http://scikit-learn.org/stable/modules/cross_validation.html...

andygrunwald 6 years ago

When we take the language into consideration, unwound like to mention Redis.

Often codebases written in C are a a mess to understand, a mess to read. The Redis Source Code is understandable even without deep knowledge of C

christophilus 6 years ago

Yep. I was going to say Redis and SQLite. Both are really well commented. They almost read like a manual.
- andoma 6 years ago
  
  Although still in beta, I'd like to add BearSSL to the mix of well written and documented C libraries. In particular compared to the OpenSSL "documentation". It's also nice to see an TLS implementation without any memory allocations at all.
travmatt 6 years ago

> Redis Source Code is understandable even without deep knowledge of C
Came here to say exactly this - Redis is very cleanly written.

maaaats 6 years ago

I'd prefer if people said why they consider the code good, instead of throwing out a bunch of random projects.

aaaaaaass 6 years ago

Who cares what you prefer?

miguendes 6 years ago

Python: I really like requests, scikit-learn, the Path module from the stardard library, Keras, Django.

C: Redis, SQLite, LUA.

Java: Joda Time, Guava

potta_coffee 6 years ago

+1 for requests, I forgot about that one but it's quite good.
chillydawg 6 years ago

Joda Time is one of my all time favourite libraries.
After struggling with JVM stdlib time nonsense, JodaTime was a breath of fresh air and actually made programming with time fun.
- hellofunk 6 years ago
  
  Java 8 time module is now considered the replacement for JodaTime for new projects. It is separate from the older Java time libraries, and fixes many of the problems in Joda. Give it a try!

guidovranken 6 years ago

ARM mbed TLS [1], Amazon S2N [2], nginx [3] have a super consistent code style throughout and are prime examples of how C application programming should be done (in my opinion).

[1] https://github.com/ARMmbed/mbedtls

[2] https://github.com/awslabs/s2n

[3] https://github.com/nginx/nginx

ac 6 years ago

+1 for s2n. It's one of the select few C codebases that is actually a pleasure to read.

_the_inflator 6 years ago

There are too many in very different domains and languages.

However, I opt for jQuery here. It is one of the greatest examples of how constant refactoring and thoughful usage of design pattern get you a very long way.

If you are designing JavaScript libraries, pls have a look at jQuery. So many great design decisions aka great code quality.

AmericanChopper 6 years ago

Pushing all dom manipulation through global evals seems like the exact opposite of thoughtful design to me. I have a long list of places where I want to implement strict CSPs, but can’t purely for minor use of jQuery.

jaequery 6 years ago

Sequel, a database ORM for Ruby: https://github.com/jeremyevans/sequel

The quality of the code is amazing, it's simple to use and even simpler to look through the docs to reason about.

I also want to praise the author of the library (Jeremy Evans), his support through the IRC is second to none, you can talk directly with him pretty much on a daily basis.

And even after 8+ years, the project is still constantly being updated (last commit 4 days ago). I haven't seen too many project of this calibre especially when it is ran mostly by a single person.

ChrisRackauckas 6 years ago

Julia. Julia / Julialang is so pedantically tested and the names are pretty meticulously chosen. The algorithms in Base are almost all generic and handle a very wide variety of inputs without catering to them. If you want to learn Julia, along with good software engineering, looking at the Base library is quite recommended.

tlamponi 6 years ago

Did not look to much into it but at least a packager from Alpine Linux does not think Julia's compiler ecosystem is clean/easy to work with: http://lists.alpinelinux.org/alpine-devel/6248.html
But as said, I did not really checked this claim for validness myself...
- ChrisRackauckas 6 years ago
  
  Julia requires patched versions of things like LLVM in order for all tests to pass because upstreaming bugfixes take time. This has given some Linux package managers an issue since they try to build using system LLVM/OpenBLAS/etc. with the known bugs. I agree this does cause some distribution problems, but as a scientist and mathematician I do like that the standard distribution of Julia uses the most numerically correct versions (as of current knowledge) of the dependencies as it can, and has a test to identify known potential issues. To me this is good practice.
  But anyways, I was talking about the Julia Base library and its numerical routines. I just look at the Julia code and don't touch the build systems.

nazri1 6 years ago

Does assembly count? Prince of Persia's source code (not really open source...): https://github.com/jmechner/Prince-of-Persia-Apple-II

One look at any of the assembly files and you can get a sense of how properly organized the source code is.

bovermyer 6 years ago

Thanks for that! I love looking at historical game code.

mpasternacki 6 years ago

I'm a bit surprised nobody mentioned qmail yet: https://cr.yp.to/qmail.html

pvarangot 6 years ago

qmail cheats a bit because it's so simple, that most people end up using something with messy code on top. Not that I don't think it's a sound engineering decision but when comparing it's code cleanliness with other SMTP stacks it needs to be mentioned.
harryh 6 years ago

Or djbdns!
djb is a legend
informatimago 6 years ago

I don’t know about qmail, but postfix sources are really nice.
- tptacek 6 years ago
  
  They're better than most C software of the era, but not better than qmail --- qmail has a better vulnerability record than Postfix does (perhaps because it does less, but that's beside the point).

potta_coffee 6 years ago

Granted I haven't read much open source code but when I was working in Flask, I found the source code to be awesomely clear and well-documented. I actually learned quite a bit about Python by reading Flask code. Also, no-one could explain "g" in a way that made sense, but the source code made it obvious. Would recommend reading it if you're into Python at all.

83457 6 years ago

g?
- jxub 6 years ago
  
  The global request object if I recall properly.

dredmorbius 6 years ago

I'd like to suggest;

1. Don't simply list projects.

2. Give some notion of why you're nominating code.

3. A sense of what you consider to be quality.

Enough to spark discussion, inquiry, or comparison. Doesn't have to be much.

This is rudimentay. But affords purchase; https://news.ycombinator.com/item?id=18037815

This does not: https://news.ycombinator.com/item?id=18038047

(Both reference the same project.)

philliphaydon 6 years ago

Both Vue and PostgreSQL. Both have great code base. Amazing documentation. And amazing communities.

chiefalchemist 6 years ago

Nice to see you include / mention docs and community. I believe a code-based product has a UX. That UX is the code (with comments), documentation and community. That UX is your (i.e., a dev / engineer) end to end experience with "the product." It's not simply the code.
Put another way, there's more to a product that's easy and sensible to work with than code quality.

graki 6 years ago

I'm suprised nobody cited TeX from Knuth. It's an absolute standard in quality of implementation, documentation and computer science background. Perhaps unsurpassed.

jacques_chester 6 years ago

I definitely admired PostgreSQL's code when I first looked at it.

Projects written in C require a fair amount of care and discipline to be scaled up to larger codebases and teams. PostgreSQL is such a codebase.

I've also seen various parts of Spring's codebase and found all of it to be consistently solid and careful. They take a lot of care to structure carefully and comment immaculately.

Disclosure: I work for Pivotal, which sponsors Spring. Which is why Spring is highly visible in my working life.

nojvek 6 years ago

Typescript

Even though it’s a fairly complex transpiler, the authors did a good job modularizing and leaving lots of contextual comments on what each part does.

Also typescript baseline tests are a simple but very effective way to get lots of coverage on the compiler.

I’ve read source code for Babel, typescript, coffeescript and flow. Typescript architecture stands out.

Typescript not only does fascinating things like magical code completion abilities and great tooling for IDEs but their codebase has been an inspiration for me to build better front end code.

I may be a bit biased since I’ve worked at Microsoft before.

ioddly 6 years ago

I found the TypeScript type checker pretty hard to read through, though it may be my lack of, well, almost any knowledge about type theory. I didn't dig much into the other parts of the codebase however. What parts of it do you enjoy reading?
- nojvek 6 years ago
  
  While submitting a PR, the parser, Lexor and emitter were fairly easy to understand.

daniel-levin 6 years ago

LLVM and associated projects such as clang. Bazel is good too. OkHttp and Retrofit by Square.

tom_mellior 6 years ago

I've been working with LLVM for a few years and I still find the code difficult to navigate and badly documented. And every single function's argument list is a random jumble of pointers and references (almost all arguments should be references, but many aren't).
- anarazel 6 years ago
  
  Indeed. And it's not just medium to low-level stuff that's not well documented, it's the high-level stuff too. I personally don't mind that much if I have to spend a few minutes to understand something on a a very local scope, but if the bigger picture is unclear, that's quite bad. For LLVM one largely has to grep for a bunch of other users and try to figure it out from that.
  While I think it has some clear deficiencies, I found a lot of e.g. the optimization passes in GCC a lot easier to read. It's probably above par, but e.g. https://github.com/gcc-mirror/gcc/blob/master/gcc/gimple-ssa... is really well explained imo.
vnorilo 6 years ago

LLVM is remarkable; the domain is both difficult and critical. Still, the code is consistent enough that I can often guess how things work based on what I think would be reasonable!
- glandium 6 years ago
  
  Don't look how inline assembly is handled between clang and llvm.
xfs 6 years ago

Yet it contains monsters like this one https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X...
McP 6 years ago

The coding standard for variables in LLVM drives me nuts. Both class names and variables names must be upper camel case so if you're lucky the code looks like this:
Analyzer TheAnalyzer;
but more commonly:
Analyzer A;
with A being utterly unhelpful to read many lines later.

helium 6 years ago

The requests codebase is really well written and it has a beautiful api

https://github.com/requests/requests

Drdrdrq 6 years ago

I like Kenneth's comments: https://github.com/requests/requests/blob/master/requests/pa...

kbr2000 6 years ago

Tcl! See https://www.tcl.tk/doc/engManual.pdf to start to understand why. (For code written in Tcl itself there's also some proposed conventions: https://www.tcl.tk/doc/styleGuide.pdf)

a-saleh 6 years ago

I really liked the clojure core, I read it quite a lot when learning the language.

I have heard good things about sqlite, and some day, I plan to read it :-)

unixhero 6 years ago

Dolphin Emulator

https://dolphin-emu.org/

delroth 6 years ago

We try to keep up, but the truth is that it's a 15 years old C++ codebase implementing some weird hardware in even weirder ways. We're far from where we'd want to be code quality wise -- close to no automating testing infrastructure, code is full of module-level globals, inconsistent conventions, etc.
- swsieber 6 years ago
  
  How would you even test an emulator except manually? It seems like automated website testing, but even worse. I guess screenshots + scripted input?
  That seems like it'd be terrible to try to get running reliably.
stevekemp 6 years ago

I've never installed it, or read the code, but their progress-report writeups are fascinating to me.
e.g. The most recent https://dolphin-emu.org/blog/2018/09/01/dolphin-progress-rep...

zengid 6 years ago

The JUCE C++ library is very nice: https://github.com/WeAreROLI/JUCE

garyclarke27 6 years ago

I’m no C expert so I’m somewhat guessing, to me, PostgreSQL source looks remarkably clean, well structured and nicely commented.

itsoggy 6 years ago

The Quake 3 source was fairly good...

charlchi 6 years ago

As a C beginner getting into writing larger projects, especially in that sort of context, the quake source has been my reference on how to structure my code.
archi42 6 years ago

Oh, this +1. I ported it to another C dialect (test case for the compiler) and found those parts I touched well structured and easy to understand.

batteryhorse 6 years ago

I was going to say the GNU version of /bin/false and /bin/true, but I actually took a look at the source and it is terrible.

panic 6 years ago

The original /bin/true is probably the highest quality code ever written, but unfortunately I don’t think the license is OSI approved: http://trillian.mit.edu/~jc/%3B-)/ATT_Copyright_true.html
- hyperpallium 6 years ago
  
  Gosh, it seems like copyright lawyers will stop at nothing.
floren 6 years ago

The GNU coding style does not help with readability, in my opinion (he said, donning flame-proof underwear)

mixedbit 6 years ago

Python core libraries have great code. You can open pretty much any module and be able to understand the source without much context.

akvadrako 6 years ago

I don't know how you can say this. The standard lib isn't even very pythonic, let alone "great" along other dimensions.
- blattimwind 6 years ago
  
  Agreed. Almost every time I've looked deeply into stdlib code I was surprised by how hard to follow it is and how frequently antipatterns are employed. Doubly so for anything near a C module.
  I consider the Python stdlib in a similar vein as the C++ stdlib or Boost: Yes, some useful bits in there, but (1) lots of rot (2) you don't want to have your code look anything like it.
falsedan 6 years ago

The only core library code I needed to look at was namedtuple, which is pretty incomprehensible even with context.
Walkman 6 years ago

You obviously did not dig in :D There are absolutely terrible parts, would not recommend!
xtreak29 6 years ago

Though core has some bad API due to maintaining backwards compatibility a lot of the third party libraries like requests, Flask have great focus on API design and code quality.
The authors have good quality repos :
https://github.com/kennethreitz
https://github.com/mitsuhiko
- dyeray 6 years ago
  
  I agree with Flask, much more readable code than Django for example. I would also add Django Rest Framework (and Tom Christie) to the list.
  
  eloycoto 6 years ago
  
  I think that django-rest-framwork is one of the best software implementations out there!
dyeray 6 years ago

Agreed with the rest, I've ended up reading pypy's implementation of some functions sometimes to see how it works after trying CPython first. From the few I've read I'd say pypy looks nice by the way (I'm talking about standard library).

noir_lord 6 years ago

In PHP land where I spend time for work.

Hands down Symfony.

drosan 6 years ago

ew

utahcon 6 years ago

Golang and Kubernetes have been highly regarded as high quality. I particularly found the Golang code for Kubernetes to be well documented and well architected.

jimhefferon 6 years ago

Knuth did a good job on TeX, and it has been closely examined for many years since so there are very few bugs.

informatimago 6 years ago

The TeX language itself, and the logs and error messages of TeX are so bad, that I would hardly believe it.
- svat 6 years ago
  
  Are you sure you aren't thinking of LaTeX?
  TeX (plain TeX, not LaTeX) has phenomenally good logging and error messages IMO — everything you need is there, each error message comes in a “formal” and “informal” form and points you to exactly the place the error happened, and TeX lets you fix things on-the-fly without restarting the program. All this of course assumes you use TeX the way it is described in the manual (The TeXbook). The experience is opposite with LaTeX, so I find it worth giving up all the convenience of LaTeX just for the wonderful experience with TeX.
  As for “the TeX language”, there is no such thing. As Knuth has said many times, TeX is designed for typesetting, not programming. Sure it has macros to save some typing, but if you're writing elaborate programs in it (as is nearly inevitable if you're using LaTeX) you're doing something wrong. Knuth said:
  > When I put in the calculation of prime numbers into the TeX manual I was not thinking of this as the way to use TeX. I was thinking, “Oh, by the way, look at this: dogs can stand on their hind legs and TeX can calculate prime numbers.”
  But of course LaTeX does every such thing imaginable :-)
  More on TeX not being a programming language: https://cstheory.stackexchange.com/a/40282/115
  On the TeX error experience: https://news.ycombinator.com/item?id=15734980
  
  yesenadam 6 years ago
  
  Plain TeX is different to TeX:
  ..."virgin" TeX...knows just primitive commands, no macros. Plain TeX is the set of macros (developed by Knuth) which makes TeX usable in everyday life of a typist. ... The available commands can be classified into primitive commands and macros. ... The "virgin" TeX knows only the primitive commands. ... Formats (plain TeX, LaTeX, etc.) extend TeX's vocabulary by defining macros. ...For example, plain TeX defines macros \item, \rm, \newdimen, \loop, etc. Plain TeX defines about 600 macros.
  https://tex.stackexchange.com/questions/97520/what-is-plain-...
  
  svat 6 years ago
  
  Yes of course; see this answer I wrote about typesetting with “virgin” TeX: https://tex.stackexchange.com/a/388360/48 (it's not easy). “Virgin” TeX is never (and was never) used by typical users, and is used only by the system administrator (or these days, the people behind the TeX distributions) to pre-load formats (like plain or LaTeX).
  Knuth wrote both the TeX program and the “plain” set of macros; when you start `tex` it is with `plain` that it starts up, and The TeXbook describes both the TeX program and the plain format without being careful to distinguish what comes from where (you have to look at Appendix B to see the proper definition of plain.tex), so when we speak of TeX as Knuth intended/imagined it to be used, it is plain TeX that is meant.

begriffs 6 years ago

OpenBSD

https://www.openbsd.org/goals.html

DanWaterworth 6 years ago

SQLite

zevv 6 years ago

Lua: https://www.lua.org/

cellover 6 years ago

To me that would be Appleseed rendering engine.

https://github.com/appleseedhq/appleseed

Even though I can't code C++, I can read it here and understand most of it (besides the maths).

pa7ch 6 years ago

I particularly like reading code from Upspin (upspin.io). Its probably partially because I think the project design is interesting and write go. Regardless, its a great ground up Go project by some of the original Go authors and contributors.

Very well organized code and it feels like they got the project off the ground, fixed bugs for a few months, and now have largely trailed off from maintaining it largely because it just works (I use it) which lends some credibility to their coding style. Of course, I'd like to see the project evolve conceptually, but, right now it does what it says it does reliably for a project that hasn't even cut a single release.

silur 6 years ago

radare2 - https://github.com/radare/radare2/ More GNU than actual GNU sources, more UNIX than the linux kernel. Huge codebase but extremely easy to get involved with, orthogonal design with no compromise on speed. Best codebase I ever encountered

gameswithgo 6 years ago

PostgreSQL and Quake3 are good candidates. Both are C codebases which are surprisingly readable even by relative novices.

gorb314 6 years ago

I think musl libc [1] has good quality code. If anything their build system is great. It makes the code much easier to navigate.

[1] https://www.musl-libc.org

izabera 6 years ago

https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn....
https://git.musl-libc.org/cgit/musl/tree/src/stdio/vfprintf....
ah yes, good quality code
- AndyKelley 6 years ago
  
  I still think musl overall is quite readable, but my goodness, that switch statement in your second example. What a monster. I didn't think it was possible to be this confusing without the preprocessor.
- stonogo 6 years ago
  
  What's the problem, out loud?

ckorhonen 6 years ago

On the JavaScript side I've enjoyed reading the code for Backbone and Underscore, helped also by the awesome in-line documentation. Very easy to see what is going on.

Also big fan of Sidekiq for similar reasons.

tmilard 6 years ago

BabylonJs is a wonderful clean code. Made to write 3D on the Web. https://www.babylonjs.com/

epynonymous 6 years ago

most people are talking about clean code, good design constructs, but i feel that many are missing the point, we’re talking about code quality here, design is the grit and grind that all developers go through to develop great software, certainly there are better designed software projects out there that leaves them more maintainable and prone to less bugs, but the fact of the matter is that for complicated code, designs go through many iterations and refactorings over time e.g. linux kernel, all software projects have bugs, even well designed or well tested software. but the significance of good testing and good processes are not being highlighted here, unit testing, code coverage, functional testing, end to end testing，scale testing, performance testing, code review, fault injection, debuggability, test automation, static code analysis, etc, i am shocked not to see lots of discussion on these things (aside from the sqlite mention) and testing techniques. probably a more developer friendly crowd here at hn, but testing is a significant and game changing part of what separates developers from great developers.

TekMol 6 years ago

I like the Laravel framework. It has a clean style to it.

jedberg 6 years ago

Postgres.

sgt 6 years ago

Definitely agree with this. Both the documentation and code are of excellent quality. Others that come to mind are sqlite and zeromq.

bsaul 6 years ago

my first experience with high quality code was with tge quake2 engine.

i was both amazed by the simplicity of the architecture (a huge single event loop), and the attention to code presentation and indentation.

SmellyGeekBoy 6 years ago

Interesting to see so many John Carmack projects in this thread. He's a good candidate for "best programmer of all time", if there were such a thing.

i_feel_great 6 years ago

Gambit, Chicken, Racket, Chez and Guile Schemes

jMyles 6 years ago

Twisted. Not only highly organized and sensibly delineated, but also a lot of fun to read - borderline comical at places.

iso-8859-1 6 years ago

How do you think the asyncio (formerly Tulip) sources compare?
- jMyles 6 years ago
  
  asyncio is more modern, more stylish, and more concrete.
  Twisted is more timeless, more patterned, and more self-aware.
  I can imagine Twisted's asyncio reactor becoming its default (and the Twisted flow control slowly declining in importance), but Twisted's protocols, control structures, and execution models becoming more popular.
  Twisted has undergone a great resurgence in quality engineering since asyncio became more viable - this was surprising to me, but is actually probably reasonably consistent with the way the historical influence of the standard library.
  Overall, I think that Twisted is a great project; I almost always reach for it when my python codebase becomes mature enough to need more thoughtful abstractions around network I/O.

otakucode 6 years ago

Does 'Physically Based Rendering' count? It's a book... which is also source. It was written as only the 2nd work of true 'Literate Programming' that I know of. I believe Knuth wrote a book about TeX which was the first example. But basically it is prose interleaved with source, readable as a book.

fapjacks 6 years ago

Actually, I think early versions (like from pre-1.0 through maybe 1.5 or so) of Docker had some very high quality code and was also very pleasing to look at. It was very clean and super approachable and readable, and I felt sort of like how the NetBSD commenter felt as described in their comment.

mehrdadn 6 years ago

"Highest" I don't know, but "code whose quality I look up to", then:

For C: Process Hacker and some similar code that is designed like and written around Windows kernel APIs: https://github.com/processhacker/processhacker/blob/master/p...

For C++: Some of the Boost code, and stuff like it, such as P-Stade Oven: https://github.com/himura/p-stade/blob/master/pstade/pstade/...

For others: (need to look later, I forget)

robbick 6 years ago

Can't say I've seen enough to be confident on the best library but redux (https://github.com/reduxjs/redux) is just so simple, and has great, readable/understandable code.

icc97 6 years ago

In Dan Abramov's excellent egghead redux course [0] he implements the `createStore` from scratch which is the core of redux, it's simple enough to post here:

  const createStore = (reducer) => {
    let state;
    let listeners = [];

    const getState = () => state;

    const dispatch = (action) => {
        state = reducer(state, action);
        listeners.map(listener => listener());
    };

    const subscribe = (listener) => {
        listeners.push(listener);
        // unsubscribe
        return () => {
            listeners = listeners.filter(l => l !== listener)
        };
    };

    // populate initial state
    dispatch({});

    return { getState, dispatch, subscribe };
  };

[0]: https://egghead.io/lessons/react-redux-implementing-store-fr...

coldnose 6 years ago

After spending about a month of concerted effort pouring through the zlib sources, looking for vulnerabilities, I can say that zlib is the most astonishingly bug-free code I've ever seen. But in the conventional understanding of "code quality", it's pretty bad.

wrasee 6 years ago

Julian Storer (JUCE library) did a talk on code quality using zlib as an example. Might be interesting to you if you've not seen it already.
https://www.youtube.com/watch?v=SIAAvv1O7Gg

beefhash 6 years ago

I'd have to go with Monocypher. It makes very tasteful use of comments, functions and macros to maximize readability and clarity.

https://github.com/LoupVaillant/Monocypher

rileyraver57 6 years ago

Toybox by Landley(https://github.com/landley/toybox) is probably the best example of a modern c implementation I have ever seen. Surprised no one has mentioned it yet.

ezequiel-garzon 6 years ago

I admit I don’t have the knowledge to make my own assessment, but I’ve read some downright poetic praise on djb’s work [1], and more than once.

[1] https://cr.yp.to/

Dowwie 6 years ago

For Rust, many say that the regex crate sets a high standard for excellence: https://github.com/rust-lang/regex

markpapadakis 6 years ago

I study codebases as a hobby. I highly recomend Seastar, Folly, Aeron and Disruptor, SQLite, PostgresSQL, LMDB, Tensorflow, Hashicorp’s vault, and the Linux Kernel projects as prime examples of high quality codebases.

chubot 6 years ago

For clean C++, I like leveldb (a key-value DB library) and re2 (a regex engine). Random files from each of them:

https://github.com/google/leveldb/blob/master/table/table_bu...

https://github.com/google/re2/blob/master/re2/nfa.cc

macco 6 years ago

I really like the source code of prosemirror:

https://github.com/ProseMirror/

It's not typical js, but very good none the less

Theodores 6 years ago

The open source code I know from web development has to be fixed with various hacks - PHP and the frontend javascript that goes with it. Therefore the code I know is not 'highest code quality'. If it was 'highest code quality' then I would not know the code.

Therefore the highest code quality is likely to be in projects where I do not have to go under the hood, e.g. the Chromium project where all contributors are vastly more educated and capable than myself.

eloycoto 6 years ago

Libuv is one of my favourites https://github.com/libuv/libuv

partycoder 6 years ago

Good code bases that inspired larger projects: MINIX, KHTML

schaefer 6 years ago

with respect to the C++ Language: there was a book published in 1996. Large-Scale C++ Software Design by John Lakos. He's about to publish the second edition of the book while also expanding it's reach to span two volumes.

Anyhow, while we await the publication of that book, John has been working at bloomberg. some of the code written there has been published to github[1]. He's also done a five hour lecture series [2] available on safari-online (paid service) that cover the topics of his book, and introduce the open source bloomberg repo as an example of code written in that style.

I can't offer you a review as I've just found this all myself, but I'll be eagerly studying it along with some of the other items mentioned here.

[1] https://github.com/bloomberg/bde [2]https://www.safaribooksonline.com/videos/large-scale-c-livel...

epynonymous 6 years ago

linux kernel, purely the reasoning being that it’s probably one of the most used pieces of software out there, along those lines, probably the kernel libraries and user libraries like libstdc that are a part of it. i dont know how the linux kernel is tested, but i know that production testing of the kernel on different platforms, at large scale is probably the most used open source in the market.

TangoTrotFox 6 years ago

I would not judge things on aesthetic quality, but simply on results. In general code faces difficulties that grow exponentially with with time, size, and the number of contributors. Millions of lines code, thousands of contributors, decades of development and it's still at the top of its game? In spite of its complete lack of aesthetic appeal, that's the Linux kernel.

sparkling 6 years ago

High quality code and one of the best APIs i ever used: https://github.com/requests/requests

Best source code layout, architecture, maintainability: https://github.com/rg3/youtube-dl

agentultra 6 years ago

As far as C++ code goes, the Lean Prover is really well maintained: https://github.com/leanprover/lean

I'd also say GHC is quite good.

And Pandoc as well.

I don't think I can compute enough variables to consider the "highest" though... so the aforementioned are only examples of what I think are good.

nojvek 6 years ago

Redis. I have to say antirez not only is an amazing engineer but from the way the code is written, you can see he is a very clear thinker.

I hold Redis codebase as an example of what good C code should be. On the other hand opencv codebase as an example of what C could should not be. Opencv codebase is really inconsistent with quite a bit of unreadable spaghetti sauce.

moneysconcerned 6 years ago

CVEdetails.com lists the number of (reported) vulnerabilities by year for software projects that have a CVE identifier.

Here's bitcoind: https://www.cvedetails.com/product/22744/Bitcoin-Bitcoind.ht...

rataata_jr 6 years ago

XMonad window manager written in Haskell.

iso-8859-1 6 years ago

What do you think about the GHC sources in comparison?

dhuramas 6 years ago

I am surprised no one mentioned SycallaDB(https://github.com/scylladb/scylla) . Redis and SycallaDB have often been pointed out as examples of good codebases to look at for C/C++ Devs.

Dawny33 6 years ago

Gensim : https://github.com/RaRe-Technologies/gensim

[Can't speak for the 'highest' part of the qn, but Gensim upholds very high code quality standards]

kostarelo 6 years ago

I like Spectrum for both their architecture and code quality. Node.js/JavaScript.

https://github.com/withspectrum/spectrum Https://spectrum.chat

nunobrito 6 years ago

Referred by thousands and available since 2004 without one single bug reported in the last decade: http://users.telenet.be/AphexSoft/

It is not yet on Github.

sv12l 6 years ago

Pretty sure PostgreSQL will have a place at the top quarter of this page.

numeromancer 6 years ago

Pari:

http://pari.math.u-bordeaux.fr/git/pari.git

I prefer the early versions, before it was softened up for the vulgo.

cantagi 6 years ago

GTKmm. GTK uses GObject to implement inheritance between C structs and it's easy to go wrong when extending. GTKmm wraps GTK in C++. It's a joy to use and is safer.

SoylentOrange 6 years ago

I like the design philosophy behind BoringSSL:

https://boringssl.googlesource.com/boringssl

If some portion of the library is overly complex, look into the use case and delete it wherever possible. It maintains a long-term bound on code complexity, which I quite like.

Edit: a nice explanation on the design philosophy here https://www.imperialviolet.org/2015/10/17/boringssl.html

anuraaga 6 years ago

I am very lucky that there are too many great open source libraries out there to label one with the "highest" quality.

hysan 6 years ago

Any React and React Native suggestions?

hmsync 6 years ago

Spring Framework

1. Elegant structure 2. Strict code style 3. Project size is not too large 4. Have detailed documentation

jankotek 6 years ago

For Java I would say H2 SQL DB. It is small, compact, packed with features and good abstraction.

tom-jh 6 years ago

Nobody mentions Android. Any examples good quality code on Android?

winkdinkerson 6 years ago

I guess my vote for Matt's Script Archive is going nowhere..

rzvme 6 years ago

I would suggest Laravel!

cmarschner 6 years ago

Torch has the best code of DNN libraries I have seen so far.

ddtaylor 6 years ago

A lot of the KDE source code is well written and maintained.

_pmf_ 6 years ago

Qt sources too (which has a lot of overlap in people and mindshare with KDE). Mostly.

praveenster 6 years ago

zeromq. Both the code and documentation are very good.

halayli 6 years ago

Postgresql, llvm, Python, sqlite are pretty up there.

anticensor 6 years ago

Debian is the best with its rigid QA procedures.

aloukissas 6 years ago

My nomination would go to the chromium project.

mikkelam 6 years ago

Any iOS swift/objc releated projects?

joelbirchler 6 years ago

Kubernetes is extremely well designed.

novaRom 6 years ago

Python (official cpython)

vfinn 6 years ago

There's a nice introductory lecture series on CPython internals on Youtube that tries to cover how the interpreter works and how the python code maps to bytecode by going trough the cpython source: https://www.youtube.com/watch?v=LhadeL7_EIU

unixhero 6 years ago

LibreOffice

charlysl 6 years ago

xv6

se7entime 6 years ago

Linux

lowry 6 years ago

Lua. It's has everything a good C project should have: small size, simple build system, portability by using the simplest constructs and not ifdefs, a clear and well define scope that none dares trespassing.

Quenty 6 years ago

You can see a mirror of Lua here! https://github.com/lua/lua
javaJake 6 years ago

When I used this library, I was impressed with how their design not only kept their own code clean, but made it incredibly intuitive and fun to write clean code on top of their API. Coworkers also looked at that code years later and went out of their way to give positive reviews of Lua.

edoo 6 years ago

The Linux kernel of course. In userland I have to say lib QT. I've used a lot of APIs and QT is always a pleasure to work with.

SmellyGeekBoy 6 years ago

I'm a Linux fanboy myself but come on - we're talking about nearly 30 years' worth of commits from thousands (tens of thousands?) of developers.
The only thing I can say is that with this in mind it's actually a lot better than I'd expect - testament to Linus's iron fist, perhaps.

auslander 6 years ago

openbsd

qualawhat 6 years ago

Start by defining quality.

moneysconcerned 6 years ago

Software quality: https://en.wikipedia.org/wiki/Software_quality
Software metric: https://en.wikipedia.org/wiki/Software_metric
''' Common software measurements include:
- Balanced scorecard - Bugs per line of code - Code coverage - Cohesion - Comment density[1] - Connascent software components - Constructive Cost Model - Coupling - Cyclomatic complexity (McCabe's complexity) - DSQI (design structure quality index) - Function Points and Automated Function Points, an Object Management Group standard[2] - Halstead Complexity - Instruction path length - Maintainability index - Number of classes and interfaces[citation needed] - Number of lines of code - Number of lines of customer requirements[citation needed] - Program execution time - Program load time - Program size (binary) - Weighted Micro Function Points - CISQ automated quality characteristics measures '''
Category:Software metrics https://en.wikipedia.org/wiki/Category:Software_metrics
greg7mdp 6 years ago

It is like porn, you know it when you see it.

gupi 6 years ago

well, if trolling is permitted, I would say that "Hello World" example has the most exquisite code.

in most cases "Hello World" is open-source, but I still don't know if can be named "project"

theboywho 6 years ago

It's funny to see nobody is even questioning the question.

What does it even hold as a value to be the project of the highest code quality in the world ? How can it exist as a consensus if we can't even agree on best practices ?

If it's for learning purposes, why even look for the ONE project with the HIGHEST quality ? Just go by any GOOD ENOUGH project.

I see this all the time: what's the best editor, the best color scheme, the best font, etc.

How about we just start saying: what's a good enough X for my purpose ?

erpellan 6 years ago

Sometimes you need a recipe book, other times you want to lose yourself in a masterpiece.
dredmorbius 6 years ago

Popular opinion is a poor test of truth. The rationales offered can be illuminating, however.
I'd actually considered making a similar comment on seing the question.

hyperpallium 6 years ago

Just wanted to mention some bias in successful open source projects: they are often structured as a number of similar plug-in pieces, like youtube-dl for different video publishers.

This is great for open source, because you can easily discover and navigate to the part you want, and change it. You might need to understand the plugin interface - or you might not. This flat architecture makes it easy for people to contribute, an important aspect of a successful open source project.

But it's not the ideal architecture for every project. In some cases, a cleverer, harder to understand approach is more elegant, shorter, more efficient, simpler.

Of course... one might argue that ease of understanding is more important than anything else.

leetcrew 6 years ago

the only thing more important than understanding is shipping on time. but how are you going to ship on time if you can't understand it?