LukeShu 5 years ago

I was surprised to read that some versions of cat (apparently BSD Net/2 and derivatives) have special code for sockets. What does cat do with sockets!?

Well, AF_UNIX sockets are sockets with paths in the filesystem. You must either connect() to them or bind() to them instead of open()ing them. Apparently, BSD-derived versions of cat will try to connect() to to a file if open() fails.

With GNU cat, if you try to cat a socket, it will go like this:

    $ ls -l test.sock
    srwxr-xr-x 1 luke users 0 Nov 12 21:07 test.sock
    $ cat test.sock 
    cat: test.sock: No such device or address
but BSD-derived cats will successfully open the socket for reading. That behavior can be accomplished on other systems by using socat instead; BSD cat behaves somewhat like:

    $ socat UNIX:test.sock STDOUT
  • loeg 5 years ago

    Hah, I learned something about cat today. Thanks.

    Amusingly, the BSD socket behavior can be disabled with the compiler macro -DNO_UDOM_SUPPORT, but as far as I can tell it is not documented nor hooked into the rest of the build system in any way since its introduction in 2001:

    https://svnweb.freebsd.org/base?view=revision&revision=83482

zeveb 5 years ago

> But, if you pull up the manual page for something like grep, you will see that it has not been updated since 2010 (at least on MacOS).

Well, GNU grep was last released 16 months ago, and the last change to its master branch was 4 weeks ago: http://git.savannah.gnu.org/cgit/grep.git

FreeBSD's grep was last updated back in August: https://github.com/freebsd/freebsd/tree/master/usr.bin/grep

OpenBSD's grep was last updated 11 months ago: http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/usr.bin/grep/

Oddly, it looks like the Darwin grep was last updated in 2012: https://opensource.apple.com/source/text_cmds/text_cmds-99/g...

Strange that Apple would be shipping such an ancient grep.

  • setr 5 years ago

    Iirc, Apple stopped updating but continued shipping all gnu utilities since gplv3 was attached to them

    • LukeShu 5 years ago

      I don't believe that macOS grep was ever GNU grep. I believe that macOS always used a BSD variant of grep.

      • yesenadam 5 years ago

        Using OS X 10.4.11 here, the grep file is dated Jan 2006, the end of the grep man pages says "2002/01/22".

          $ uname -v
          Darwin Kernel Version 8.11.1: Wed Oct 10 18:23:28 PDT 2007; root:xnu-792.25.20~1/RELEASE_I386
          $ grep --version
          grep (GNU grep) 2.5.1
        
        Other man pages: ed says 1993, sed says BSD 2004, cat says 3rd Berkeley Distribution 1995.
        • LukeShu 5 years ago

          Interesting. What does `type grep` say? Is it possible that it's /usr/local/bin/grep from homebrew/macports/…, and that /usr/bin/grep is BSD grep?

          I found a comment claiming that prior to 10.8 (2012, Mountain Lion) it used GNU grep, but nothing I'd feel comfortable citing.

          • yesenadam 5 years ago

              $ type grep
              grep is hashed (/usr/bin/grep)
            
            It does seem to be the original grep for this machine (it's a Mac Mini) - it has the same Jan 2006 date as most of the files in /usr/bin, and nothing has an earlier date. There's no other file called grep elsewhere.
      • privong 5 years ago

        For what it's worth, on a not-too-old Mac:

          $ uname -v
          Darwin Kernel Version 13.4.0: Mon Jan 11 18:17:34 PST 2016; root:xnu-2422.115.15~1/RELEASE_X86_64
          $ grep --version
          grep (BSD grep) 2.5.1-FreeBSD
        
        I don't have historical information, but that's at least consistent.
  • skissane 5 years ago

    > Strange that Apple would be shipping such an ancient grep.

    I don't think it is that strange. Command line tools such as grep don't appear to be a development priority for Apple. Their focus appears to be on features visible to the average user, who uses the GUI instead of the command line.

    Command line tools are mainly used by developers and power users, and the existing tools are generally good enough for most purposes, and people who want something better can always install the GNU versions using Homebrew/MacPorts/etc. There isn't much market demand for improvements in this area, so it makes sense Apple wouldn't invest in it.

  • JdeBP 5 years ago

    Be aware if you are going to delve into history that grep is the source of much confusion, in part because exactly which program was grep on some systems has changed over the years. On FreeBSD, for example, some years ago grep was the GNU tool and the BSD tool was named "bsdgrep". They would both identify as the same version number.

    • loeg 5 years ago

      > On FreeBSD, for example, some years ago grep was the GNU tool and the BSD tool was named "bsdgrep". They would both identify as the same version number.

      Neither of these statements are true. grep on FreeBSD is still GNU grep, and it has a distinct version text from bsdgrep:

          $ grep -V
          grep (GNU grep) 2.5.1-FreeBSD
          
          Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
          This is free software; see the source for copying conditions. There is NO
          warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
          
          $ bsdgrep -V
          bsdgrep (BSD grep) 2.6.0-FreeBSD
      
          $ uname -rK
          13.0-CURRENT 1300003
      • JdeBP 5 years ago

        Tut-tut! So easily demonstratable otherwise.

        MacOS:

        * https://unix.stackexchange.com/questions/352977/

        * https://unix.stackexchange.com/a/398249/5132

        The very version of FreeBSD from some years ago:

           % bsdgrep --version
           bsdgrep (BSD grep) 2.5.1-FreeBSD
           % grep --version
           grep (GNU grep) 2.5.1-FreeBSD
        
           Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
           This is free software; see the source for copying conditions. There is NO
           warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
           
           %
        
        More on that:

        * https://unix.stackexchange.com/a/65609/5132

        Kyle Evans and others on making bsdgrep into grep:

        * https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201650

        • loeg 5 years ago

          MacOS isn't FreeBSD. They're free to do whatever they want with the BSD-licensed software. Your comments made claims about FreeBSD that weren't factual. I would also emphasize that:

              strcmp("bsdgrep (BSD grep) 2.5.1-FreeBSD", "grep (GNU grep) 2.5.1-FreeBSD") != 0
          
          Also, I'm personally in contact with Kyle Evans and am familiar with the general interest in making bsdgrep grep. But I also know that it hasn't happened yet.
  • dylan604 5 years ago

    > Strange that Apple would be shipping such an ancient grep.

    Maybe they ceded this part of the OS to Homebrew? I know I never try to update anything stock in the OS. It's so much easier/faster to `brew install xxxxxx` than mess with the OS which might get overwritten with an official update anyway.

Isamu 5 years ago

Really nice history! I want to applaud the author on this loving treatment.

Also I want to point readers to the commentary of some of the Unix authors:

“Old programs have become encrusted with dubious features. Newer programs are not always written with attention to proper separation of function and design for interconnection.”

http://harmful.cat-v.org/cat-v/unix_prog_design.pdf

My point being: Unix (and derivatives) encompass a set of people who disagree about what constitutes Unix philosophy.

  • loeg 5 years ago

    > My point being: Unix (and derivatives) encompass a set of people who disagree about what constitutes Unix philosophy.

    That's certainly a Unix truism! It seems everyone has their own subjective beliefs about what Unix should be and decides their own beliefs constitute "the" Unix philosophy.

akkartik 5 years ago

Interesting to think what a different conclusion the article would have arrived at if he'd chosen to look at GNU cat on Linux. A few sample points:

* 2002: 833 LoC (http://landley.net/aboriginal/history.html)

* 2013: 36kLoC, 2/3rds of them .h files (https://news.ycombinator.com/item?id=11340510#11341175)

* 2018: 37kLoC of .c file dependencies going into libcoreutils.a and some LoC of .h files (coreutils has 60kLoC of .h files)

The methodology for counting lines likely isn't consistent across those data points. But the trend is still unmistakeable. Maybe I'll tree-shake all the dead code out and come up with an accurate line count one of these days..

  • akkartik 5 years ago

    I just performed an ad hoc file-level tree-shaking for 'src/cat.c' in GNU coreutils 8.30, starting with `gcc src/cat.c` and gradually adding arguments until I got it to build. Here's the command I ended up with.

        gcc -I. -I./lib /
          src/version.c /
          lib/progname.c /
          lib/safe-read.c /
          lib/safe-write.c /
          lib/quotearg.c /
          lib/xmalloc.c /
          lib/localcharset.c /
          lib/c-strcasecmp.c /
          lib/mbrtowc.c /
          lib/xalloc-die.c /
          lib/c-ctype.c /
          lib/hard-locale.c /
          lib/exitfail.c /
          lib/closeout.c /
          lib/close-stream.c /
          lib/fclose.c /
          lib/fflush.c /
          lib/fseeko.c /
          lib/version-etc.c /
          lib/xbinary-io.c /
          lib/version-etc-fsf.c /
          lib/binary-io.c /
          lib/fadvise.c /
          lib/full-write.c /
          src/cat.c
    
    Those .c files add up to 5021 lines.

    The .c files include 44 header files:

        lib/binary-io.h
        lib/c-ctype.h
        lib/closeout.h
        lib/close-stream.h
        lib/config.h
        lib/c-strcaseeq.h
        lib/c-strcase.h
        lib/ctype.h
        lib/error.h
        lib/exitfail.h
        lib/fadvise.h
        lib/fcntl.h
        lib/fpending.h
        lib/freading.h
        lib/full-write.h
        lib/gettext.h
        lib/hard-locale.h
        lib/ignore-value.h
        lib/limits.h
        lib/localcharset.h
        lib/locale.h
        lib/minmax.h
        lib/progname.h
        lib/quotearg.h
        lib/quote.h
        lib/safe-read.h
        lib/stdio.h
        lib/stdio-impl.h
        lib/stdlib.h
        lib/string.h
        lib/sys/ioctl.h
        lib/sys-limits.h
        lib/sys/types.h
        lib/unistd.h
        lib/unused-parameter.h
        lib/verify.h
        lib/version-etc.h
        lib/wchar.h
        lib/wctype.h
        lib/xalloc.h
        lib/xbinary-io.h
        src/die.h
        src/ioblksize.h
        src/system.h
    
    The header files add up to 19.7k lines.

    So the total line count for files GNU cat actually needs to build is at least ~25k.

    (I didn't bother checking for headers including other headers.)

    Next step: do this for various versions of GNU coreutils.

    • rurban 5 years ago

      Much more code for much less functionality than the BSD cat which can do sockets. Not surprised at all.

    • rain1 5 years ago

      Thanks for taking the time to do this counting, very interesting result.

  • arminiusreturns 5 years ago

    This is why I think the movement of the future will be about going back and stripping cruft out of old codebases. We've seen the weaknesses of the bazaar/ many eyes, and the main one IMHO is code complexity, which is often easiest to measure in loc.

saagarjha 5 years ago

Strangely, it seems that many versions of macOS on opensource.apple.com are missing grep. It used to be its own project until 10.7 Lion, after which it disappeared and then reappeared under text_cmds in 10.12 Sierra.

  • LukeShu 5 years ago

    Apparently, the 10.7→10.8 update is when macOS switched from GNU grep to FreeBSD grep.

kazinator 5 years ago

> My aunt and cousin thought of computer technology as a series of increasingly elaborate sand castles supplanting one another after each high tide clears the beach.

They are basically right though.

The counterexample of some Unix utilities means nothing. You're not getting a CS degree in order to develop the next version of cat, are you?

We have some things with a long history and they are easy to identify. It is just hindsight being 20/20.

For every one of those things, there are countless that can't be seen or felt. They aren't here; they got washed away.

Who uses the Michigan Terminal System?

Or a web framework from ten years ago?

  • Beldin 5 years ago

    > They are basically right though.

    They are only right in the same way that a physics major is obsoleted by advances in physics: lhc, discovery of dark matter & energy, increasing expansion of the universe, etc.

    A CS major isn't about learning the latest Angular framework derivative. A CS major is about learning fundamental aspects of computer science.

  • escape_goat 5 years ago

    I am not sure that computer technology would have become powerful, inexpensive, and ubiquitous to the extent that it is become today were his aunt and his cousin correct.

    The aunt and the cousin are thinking that 'computer technology' exists at the level of abstraction of the sandcastles in the metaphor. To some extent it does, but the vastly greater part of it is at the level of abstraction of the knowledge and theory of building sand castles, as gained over the course of many iterations.

    One of the most common themes one hears, when reading what people write about computer science, is how few new ideas in computer science are actually involved in nearly anything anyone does on a computer (or teaches at the undergraduate level).

    • kazinator 5 years ago

      The people implementing those ideas often believe they are new, though.

  • taneq 5 years ago

    When I first came here, this was all swamp. Everyone said I was daft to build an operating system on a swamp, but I built it all the same, just to show them. It sank into the swamp. So I built a second one. And that one sank into the swamp. So I built a third. That burned down, fell over, and then sank into the swamp. But the fourth one stayed up. And that’s what you’re going to get, Son, the strongest OS in all of England.

  • sebazzz 5 years ago

    > Or a web framework from ten years ago?

    Well, I still write ASP.NET Web Forms on a regular basis. Ten years is not that old, or is it? Though it is harder and harder to find developers for it, the young people simply don't start with Web Forms.

  • tr352 5 years ago

    I think we have to distinguish computer science from engineering here. Computer science is a branch of mathematics, where theories are developed and results are obtained that in principle remain valid for ever. Think of theories of computation and complexity theory, but also logic, probability and so on.

    Indeed, the observation that some Unix utilities have their roots in the seventies misses the point in this regard. I'd say this is a testament to the success of the unix approach or whatever you want to call it. It's not really about computer science.

pjmlp 5 years ago

If you like to have insights into how some UNIXes got built, these books are quite interesting.

"The Design and Implementation of the 4.4 BSD Operating System"

"The Design and Implementation of the FreeBSD Operating System"

"Mac OS X Internals: A Systems Approach"

"Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture"

"HP-UX 11i Internals"

"IA-64 Linux Kernel: Design and Implementation"

  • JdeBP 5 years ago

    How could you omit Bach and Comer? (-:

    • pjmlp 5 years ago

      I never read it, thus cannot express my opinion about its contents.

      Xenix manuals and later Steven's books were my introduction into UNIX world.

      • JdeBP 5 years ago

        Not it, them.

        * http://jdebp.eu./FGA/operating-system-books.html

        One of these days I shall get around to expressing my opinions, which as you can see are still missing. Indeed, the list itself is a decade out of date. (-:

        I have some SCO UNIX manuals on the other side of the room as I type this.

        • pjmlp 5 years ago

          Very nice list.

  • masonic 5 years ago

    I just found a set of Solaris 4 manuals still in the wrapper yesterday.

rain1 5 years ago

I think that code bloat, especially in GNU, is a huge problem in our software because it makes programs difficult to maintain, to understand and modify. I feel like most people I interacted with online (present company excepted) don't care about it and don't see it as a problem. I can get that it doesn't affect them because they only use these projects as black boxes and don't maintain them, so it isn't relevant to their work.

I created a wiki page to measure the number of lines of code* of various types of software https://softwarecrisis.miraheze.org/wiki/Linecount - LOC is a very very rough proxy for what I actually want to measure, but the results are so stunning that even a inaccurate indirect measurement tells a lot. You can see that for 2 projects that do essentially the same thing there might be a 1000x difference in LOC.

It's fascinating what can happen to such a simple program like 'cat'. The same effect is amplified further when you look at projects like gcc. I tried to ask the question on a couple sites like stackexchange and reddit why does gcc take half an hour to build instead of a fraction of a second but this question was not taken well. I got a lot of resistance to it, X-Y answers, deleted etc. I don't think that the common software engineer wants to take the idea seriously that the day to day tools we use have a million fold inefficiency built into them by accident. I also noticed that 'make' has no profiler, nobody has even really done a breakdown of what takes how long to build in the gcc tree.

There are a lot of brilliant engineers who understand this problem and want to solve it though. We see that in Alan Kay's STEPS project, aligrudi's work, musl, toybox, maybe sbase and many of the independent bootstrapping projects that have popped up. There's a lot of inertia and weight to the standard GNU toolkit to push back against but I believe these problems are all solvable and by solving them we can create programming languages and tools with leverage far beyond what currently exists. I just hope such projects can be integrated rather than be forgotten.

fanbelt 5 years ago

I worked with some version of Unix in 1984 that had a program called dog. It would silently wait for a <CR> to be pressed after each screen of output. I've never seen it anywhere else.

koyote 5 years ago

> [...] but it seems that many people still get most excited about the six months of work he put into rewriting cat [...]

Is it me or does 6 months seem like an awfully long time for re-writing such a small and simple program?

  • eridius 5 years ago

    I would guess that it wasn’t his sole project for those 6 months, but rather something he kept incrementally improving until there was nothing left to improve.

  • rauhl 5 years ago

    It was a different era, one in which computers were a lot slower, source control was a lot more primitive, a lot of basic stuff was still being invented, but … yeah, I feel a lot better about my own productivity now!

    • ekun 5 years ago

      While I agree about productivity now (although rewriting a source for cat that is used decades later seems very productive), I think the above commenter has it correct that it was probably a side project that he worked on and released after 6 months and not so much the speed of the CPUs.

cestith 5 years ago

The cat utility is among the simplest, but once upon a time true was about the simplest possible Unix utility.

    #!/bin/sh

Yes, that's really it. Fire up the shell, get it to exit with 0, which is taken as success. That's all that's really necessary for its spec.

GNU's is around 29 KiB compiled, and it uses some of that to support --version and --help flags. MacOS's is around 17 KiB compiled and ignores flags.

  • rain1 5 years ago

    it used to be even simpler, a blank file

rawoke083600 5 years ago

Cat is awesome :) There is also 'tac' (reverse of cat) installed on most systems

  • jillesvangurp 5 years ago

    I came across something called bat recently. It's a rust clone of cat with a lot of nice features integrated. This seems to be a thing lately in the Rust community to put out vastly improved versions of tools we haven't really touched in ages. Loving it.

    • kungtotte 5 years ago

      I'm a fan of exa as an ls-replacement :)

      exa -l --git will list N/M git status flags in the output and:

      exa --git-ignore will obey .gitignore when you're listing files :)

      Works like a charm in my experience.

nuclx 5 years ago

That capital C in the title weirds me out.

ccannon 5 years ago

I always wondered where the name cat came from which the article doesn’t address. Any ideas?

mitchtbaum 5 years ago

I only read the beginning and end, and I very much like the closing message here.

A tldr of the middle would be cool. Maybe there was a pattern.

I'd like to add another OS not mentioned that will hopefully become a well-appreciated artifact soon too, from Redox OS: https://gitlab.redox-os.org/redox-os/coreutils/blob/master/s...

I can't find it quickly now, but jackpot51 also has a very answer somewhere on Reddit about how their networking stack's DNS query command departs from a commonly deployed C program for Windows and Unix, iirc. fascinating