Monadic I/O and Unix shell programming (2001)

thinkpad20 6 years ago

The analogy has a couple of problems with it. There is the fact that programs can be run in parallel, with one program feeding the other output in real time. This one the article covers although I don’t completely buy their explanation.

For another example, ‘cmd | cat’ is often not the same thing as ‘cmd’, because a command can inspect whether stdout is a tty or not and alter its behavior. This is why ‘ls’ normally outputs all in one line, but changes to one-per-line when used with a pipe (e.g. grep).

The other issue that comes to mind is that bash provides multiple concepts of “monadic” sequencing; not just with pipes but with the && and || operators, if logic, for/while loops, etc, which form a more traditional “imperative programming” monad. Perhaps bash could be considered then to be a nested monad (monad transformer?) but I’m not sure how precisely you could model its semantics while keeping things elegant.

But yes, in general you can view the shell as a monad, and there are libraries in Haskell which do exactly this (Shelly and Turtle come to mind). In general, the divergences from a purely monadic model come from the fiddly bits of real-world programming which can cause edge cases.

yanowitz 6 years ago

This is a fun read (fair warning: you should have some understanding of monoids and monads before reading) and further validates my hypothesis that understanding category theory would make me a more effective engineer.

If you've had the joy of constructing useful shell pipelines, and part of it is in the ease of knowing how it will behave. this is a nice formal statement that the underlying reasons WHY it just “feels right” are because of category theory--which means if I can just learn category theory, I’ll have a new tool for approaching and factoring problems that will be, technically speaking, hella useful.

I suppose I should start (re)working my way through https://www.youtube.com/watch?v=I8LbkfSSR58&list=PLbgaMIhjbm...

Guthur 6 years ago

Sorry. I don't feel it would actually validate that at all. It may echo you sentiment that is all. I feel confident in saying there are many highly productive mathematicians and software engineers that do not have a firm grasp of cat theory which by your reasoning would validate the opposite.
yesenadam 6 years ago

I had a look (at the original 2001 article) - seems you have to know Haskell to understand anything.
- mikekchar 6 years ago
  
  I think if I demystify it a bit for you, you will be able to appreciate it without the Haskell syntax. As the OP stated, you will need to understand what a monad is, though.
  In the first section of code, the author is showing some Haskell operations and showing equivalent operations in the shell. The most important ones are return, >>=, and >>. You can think of return as a kind of constructor for the monad (It also has the name "unit" and I think the cool kids are calling it "pure" these days). If you have a value, x, "return x" basically gives you x contained in the monad. >>= (also called "bind") is the "monadic operator". It just means to run a function with the contents of the monad as the parameter. The function has to return a monad containing the return value (usually by running "return", which is why it was called "return"). The monadic operator is useful for chaining functions. It basically is used when you want to run a series of functions, using the output from one as the input for the next (basically the same as the shell pipe). >> is similar to >>=, but you don't use the output from the previous function as the input to the next function. Hopefully with that explanation, you can see how it matches up.
  In the next code section the author is stating the three monadic laws. They are writing the laws in Haskell and the laws in shell commands. You can ignore the Haskell unless you just want to compare it to the Wikipedia article on monads. The shell commands are the interesting bits.
  After that, there's no Haskell at all. It's quite an interesting observation, but it's not nearly as complicated as it might seem.
- denis1 6 years ago
  
  Edit: The OP expanded that the post was about the linked article... now my comment doesn't make too much sense.
  If you say that about Milewki's talks then I don't agree. He does use Haskell notation a lot, but gives C examples as well and most of his arguments are based in category theory/general programming.
  Anyway, I think that the series of talks linked in the GP are a must watch. Even for people that don't fully agree with the viewpoint of Category Theory being a "holly grail" the videos do provide a different point of understanding.
  
  yesenadam 6 years ago
  
  "Milewki's talks"? Don't know what he or that is. Nope, I was just trying to say, the linked article "Monadic I/O and Unix shell programming" was about Haskell (and Unix pipelines), and maybe I should have known that from the title, but didn't. And I had to stop reading after 2 lines because it assumed Haskell familiarity.
  edit: Didn't see edited comment you mentioned until after I wrote that. Thanks.
  
  tempodox 6 years ago
  
  Bartosz Milewski: https://bartoszmilewski.com

ridiculous_fish 6 years ago

> Operator | (a pipe), "forces" a command, which is a promise of output, on its left producing a stream, an argument for a command on the right-hand side of the pipe. A shell pipe thus is an equivalent of a monadic >>= operator.

Why is this monadic and not just function composition? Why not just identify `a|b` with `b.a` and `a;b` with `seq a b`?

peteretep 6 years ago

What?

> From shell's point of view, the filters -- echo and cat -- are pure, referentially-transparent functions of their arguments

Yes, in this somewhat bizarre, contrived, tautological use of `cat`, maybe. But `cat` is more commonly used not as an identity function (the explicit identification of which would seem to strengthen the author's case, but isn't made), but as a way of pulling in data external to the command-line input, making it neither pure nor referentially transparent.

> Operator | (a pipe), "forces" a command, which is a promise of output, on its left producing a stream, an argument for a command on the right-hand side of the pipe

That seems to be more simply a description of function composition, rather `>>=`, which concatenates functions that have multiple output modes.

Indeed:

> UNIX filter is expected to sequentially consume its input and produce a single output stream

is suggestive that filters on the command line have no side-effects, which is not a typical usage of the term.