Unix Wildcards Gone Wild (2014)

92 points by veganjay 6 years ago

okdana 6 years ago

>Simple trick behind this technique is that when using shell wildcards, especially asterisk (...), Unix shell will interpret files beginning with hyphen (-) character as command line arguments to executed command/program.

To be clear, the shell isn't really 'interpreting' anything here — it knows absolutely nothing about the argument conventions of whatever program you're running[0]. All it's doing is passing a list of arguments to an executable; how the executable deals with that is up to it.

And this isn't only a shell problem — it's an issue when you pass arbitrary arguments to ANY external utility in ANY language. For example, maybe you have some kind of tool that calls grep:

    # Perl
    exec '/usr/bin/grep', '-R', $input, 'mydir';

    # PHP (with Symfony Process)
    (new Process(['/usr/bin/grep', '-R', $input, 'mydir']))->run();

    # Python
    subprocess.run(['/usr/bin/grep', '-R', input, 'mydir'])

    # C
    execv("/usr/bin/grep", (char *[]) {"grep", "-R", input, "mydir", NULL});

All of these are safe in the sense that there's no risk of shell command injection (in fact, only the PHP one even calls the shell), but NONE of them are safe against this problem, where the input value might begin with a hyphen. In this grep scenario it's probably not a security issue, but it can produce confusing results for the user if nothing else.

As others mentioned, you must ALWAYS use '--' to mark the end of option processing when you do something like this.

[0] Technically the shell knows about the conventions of its built-ins, but even those mostly handle arguments in a similar fashion to external utilities, where an arbitrary argv is fed to a function and then parsed using some getopts-like method.

schoen 6 years ago

> In this grep scenario it's probably not a security issue, but it can produce confusing results for the user if nothing else.
I tried to think of an example where it could create a problem and the best I could do so far is injecting -f /sensitive-file/that-the-user/otherwise-cannot-read, supposing that mydir contains instances of the sensitive things, or that the contents of mydir are under the user's control.
So for example, suppose that the user has SFTP access to upload stuff to mydir, and suppose that there's a file /etc/spynames containing a list of spies (but it's mode 660 so the user normally can't read it). Maybe the grep command gets run with user-provided input and with the same group as /etc/spynames, and so then the user could SFTP upload a file containing suspected spies, and then inject causing the execution of
/usr/bin/grep -R -f /etc/spynames mydir
The output will be lines that are also present in /etc/spynames.
Edit: apparently -f/etc/spynames will be parsed as -f /etc/spynames, and so it isn't even necessary to be able to inject a space. I bet some form of this attack works on hundreds of deployed Internet services.
userbinator 6 years ago
As others mentioned, you must ALWAYS use '--' to mark the end of option processing when you do something like this.
Relatedly, GNU getopt() (the most common one in use) has a dubious and non-standard "feature" --- enabled by default --- where it will automatically reorder the option arguments to come before the non-option ones. In other words, if you use it with something like
```
    foo -a -b somearg $input
```
according to POSIX there cannot possibly be any more options after somearg, and thus you can't inject any options via $input, but GNU will reorder them and pick up options from there too. Fortunately it respects --, but this feature has personally caused me some grief in the past, and while it sounds intuitive at first, it actually complicates things because it violates the "left-to-right"-ness of (mentally or otherwise) parsing a command line.
jwilk 6 years ago

You shouldn't assume grep is in /usr/bin. On many systems it's in /bin.
Perl's exec and Python's subprocess already search for executables in PATH, so you don't need to provide the full path.
In C, you can use execvp() instead of execv().
(No idea about PHP…)
- okdana 6 years ago
  
  Right, it was just a random example that popped into my head. You would probably also want to think twice about using -R (at least with GNU grep), about hard-coding the search directory, &c.
  All of PHP's process-execution methods (except one, which is unusable on many systems) go through the shell, so by virtue of that it also searches the PATH.
alkonaut 6 years ago

It’s a shell problem in the sense that the shell itself expands wildcards. It’s cumbersome for every windows utility to do it (via a library call), but in this sense it is an obvious benefit.

Pete_D 6 years ago

I didn't see this mentioned in the article, but you can protect against this in your own scripts by including a -- before user-supplied parameters, which prevents them from being treated as flags.

    $ rm -- -rf
    rm: cannot remove '-rf': No such file or directory

(see Guideline 10, http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_...)

Spivak 6 years ago

It should be noted that you will run plenty of programs in the wild that don't respect this convention.
jwilk 6 years ago
Alternatively:
```
  $ rm ./-rf
```
- zouhair 6 years ago
  
  That if you know such file exist in said folder.

derefr 6 years ago

This, plus the problem of dealing with files with spaces, newlines, tabs, etc. in their names, are what draw the line for me between "things I am willing to write as a shell script" and "things that will make me use a scripting language† with its own file-manipulation system-call builtins."

A three-line script that takes one file as an argument? Sure, write that in bash.

A script that processes N files passed as arguments? Or that processes a listing of files piped in from stdin? And uses branching logic on the files themselves? Bash goes out the window.

Sure, it's possible to write shell scripts to do that. But why? Unless you're targeting a PDP11 or the Busybox-linked /bin/sh in your initramfs, is shell scripting really the best option?

---

† Personally, I reach for Ruby in these cases; its stdlib Pathname and FileUtils classes are pretty solid for this use-case, and Ruby interpreters are now roughly everywhere I want to deploy.

Ruby is kind of clumsy for doing all the command piping parts of scripting, though; especially since there's no easy way short of wrapping Kernel.system in your own function to make a Ruby script crash on a nonzero subshell exit status.

I'd love to know if there was a language that was like Ruby in terms of the syntax and semantics around strings, arrays, and files; but then was "just like writing a shell script" in the sense of being able to just use binary names to make "function calls" that are really synchronous subshell spawns and exit-code checks in one (but which still use the variable-passing semantics of the language where the resulting exec(2) ARGV is concerned.) Probably a Ruby DSL is the simplest way to get that—but if it's not in the stdlib, I'm not going to be able to use it for DevOps scripting :/

LukeShu 6 years ago
> A script that processes N files passed as arguments?
```
    for filename in "$@"; do
        ...
    done
```
> Or that processes a listing of files piped in from stdin?
That has the tricky bit of "what about filenames that contain newlines" that would come up in an implementation in any language. Since the one byte not allowed in filenames is '\0', let's say that the listing uses '\0' to separate filesnames.
```
    while read -d '' -r filename; do
        ...
    done
```
There are things that are hard in Bash... but the things you mentioned aren't them.
> but which lets you strictly control the ARGV and env of those function calls without the whole thing passing through a TCL-like stringification and re-tokenization.
I'm pretty sure that's just Bash, if you remember to wrap everything in double-quotes.
- derefr 6 years ago
  
  That second example is write-only code, and becomes moreso the more things you have to do with the filename. What if you have to reassemble a few of these into an array, filter that array by matching tokens within it, and then feed those arguments through as NUL-separated stdin to another binary, with error checking all along the way to make sure you never call the final command with invalid arguments?
  In bash, that's a 50-line monstrosity. In Ruby (or Python, or even Perl), it's ~5 clean, readable lines.
  > I'm pretty sure that's just Bash, if you remember to wrap everything in double-quotes.
  Now try composing a command-line in Bash out of those "simply double-quoted values" to pass to ssh as a command to start an ssh exec channel with, where the command also contains a `su -c` stanza.
  Again, in another language, that's just a repeated application of a shellescape function as you're composing things. In bash... I have no idea.
  ---
  These are both examples of the type of code you have to write all the time in Bash when dealing with file manipulation, because any file could have any byte-sequence (except NUL) in its name, and you're dealing with the results of calling commands that want to pipe those filenames out to you as raw delimited data (hopefully NUL-delimited.)
  Let's say you're writing a script that copies a bunch of files into a dir, and then archives them into a tar file. Are you sure your command reads and recreates both relative and absolute symlinks correctly, in a way that they'll unpack correctly for the tar file? In most scripting languages, you've got functions like Pathname#relative_path_from(dir) and Pathname#clean_path. To get the "correct" symlink in bash, you've got `readlink -f` and a comparison and then a bunch of string manipulations using `basename` and `dirname`.
  Or how about a "recursively collapse empty or pseudo-empty directory" function? Where, by "pseudo-empty", I mean "empty except for .DS_Store; Thumbs.db; Desktop.ini; those resource-fork files on FAT volumes/SMB shares that start with ._; or other empty or pseudo-empty directories". That's not a function anything ships with, but it's ~10 lines of Ruby. Try to implement it in Bash! (Yes, you can decide whether to collapse the whole hierarchy in one pass with `find`. But the mission here is to remove all the empty and pseudo-empty dirs found in the hierarchy, like an enhanced version of `find -empty`.)
  (Notice that these are both examples of things where a Bash programmer would be likely to say "why not write a new binary in C that does this thing, and then call it from your script"? Well, because the whole reason I'm stuck writing in a scripting language is that this code needs to be portable to unknown target CPU ISAs and OS ABIs, silly.)
  There is a reason that the default kind of cloud-init script is a YAML file, rather than a Bash script. There is a reason that Vagrantfiles aren't Bash; a reason that Ansible is less popular than Chef or Puppet or Fabric; a reason why every /bin/init that isn't sysvinit has soundly beaten sysvinit. And it's why, even in Ansible, or autotools, or any package manager's pre-install and post-install hooks, the parts where you do file manipulation are abstracted out, either being done through YAML (Ansible), or through domain-specific file-manipulation binaries (dpkg) or through pre-written and thoroughly-checked file-manipulation Bash scripts (autotools).
  And that reason is that doing file manipulation directly in Bash is not worth the time and effort, and especially not worth the maintenance burden in a codebase shared with other programmers who are probably not Bash experts.
  
  LukeShu 6 years ago
  
  > That second example is write-only code,
  Tacking on the `-d ''` flag to `read` makes it write-only!?
  > In bash, that's a 50-line monstrosity. In Ruby (or Python, or even Perl), it's ~5 clean, readable lines.
  As an example, let's take the input, batch it in to groups of 5, filter-out files that don't contain "token1" or "token2", and pass those as a NUL-separated list to stdin of another binary.
  I wrote that up. It came out to 10 lines in either Ruby or Bash. https://pastebin.com/EksRxmVJ It's certainly not a "50-line monstrosity"
  > Again, in another language, that's just a repeated application of a shellescape function as you're composing things. In bash... I have no idea.
  In Bash, "shellescape" is pronounced "printf %q" (and shelljoin is pronounced "printf '%q '"). In Bash, that's similarly just repeated application of printf %q:
  base_cmd=('whatever' 'arg with space') nested_cmd=(sh -c "$(printf '%q ' "${base_cmd[@]}")") ssh user@hostname "$(printf '%q ' "${nested_cmd[@]}")"
  For direct comparison with Ruby:
  escaped=$(printf '%q ' "${base_cmd[@]}") escaped=base_cmd.map{|a|shellescape(a)}.join(" ") escaped=shelljoin(base_cmd)
  Or, Bash 4.4 gained @Q as a shorthand for that:
  base_cmd=('whatever' 'arg with space') nested_cmd=(sh -c "${base_cmd[*]@Q}") ssh user@hostname "${nested_cmd[*]@Q}"
  > In most scripting languages, you've got functions like Pathname#relative_path_from(dir) and Pathname#clean_path.
  You mean $(realpath --relative-to=DIR) and $(realpath)?
  > Or how about a "recursively collapse empty or pseudo-empty directory" function? ... it's ~10 lines of Ruby. Try to implement it in Bash!...
  And it's 6 lines of Bash?
  find $ -not -type d \ -not -name .DS_Store \ -not -name Thumbs.db \ -not -name Desktop.ini \ -not -name '._*' \ $ -printf '%h\0%p\0' | sort -uz
  Or were you looking for a listing of pseudo-empty directories?
  comm -z23 <(find -type d -print0 | sort -z) \ <(find $ -not -type d \ -not -name .DS_Store \ -not -name Thumbs.db \ -not -name Desktop.ini \ -not -name '._*' \ $ -printf '%h\0%p\0' | sort -uz)
  Like I said, there are things that are hard to do in Bash, but I'm not certain you've pointed any of them out.
  > There is a reason that the default kind of cloud-init script is a YAML file, rather than a Bash script.
  And there's also a reason why a "GNU/Linux distro" is essentially a giant pile of shell scripts (maybe embedded in to Makefiles).
  > a reason why every /bin/init that isn't sysvinit has soundly beaten sysvinit.
  I mean, I'm happy with systemd on my systems, but: OpenRC is a giant pile of shell scripts.
  > the parts where you do file manipulation are abstracted out, ... through domain-specific file-manipulation binaries (dpkg)
  Great example, because the core of Debian is the "rules" file for each package, which is... a Makefile, containing shell.
  Or, look at Arch Linux, the core of which is the "PKGBUILD" file for each package, which is... a Bash script.
  > or through pre-written and thoroughly-checked file-manipulation Bash scripts (autotools).
  Autoconf is targeting shells that aren't Bash. It's pre-written and thoroughly-checked shell that contains wonky workarounds, in order to be compatible with buggy shells from 25 years ago.
  If you're assuming "modern shell / utilities", file manipulation in Bash isn't hard.
  
  jwilk 6 years ago
  
  > You mean $(realpath --relative-to=DIR) and $(realpath)?
  For historical reasons, some old Debian-based distros (such as Ubuntu 14.04 LTS "Trusty Tahr") doesn't ship with coreutils' realpath. Instead, they have separate, somewhat incompatible realpath package, which doesn't have --relative-to.
  Also, beware that $() strips trailing newlines.
  
  LukeShu 6 years ago
  
  Good mention! I knew that realpath is a relatively new addition to GNU coreutils, but I didn't realize that older Debians had an incompatible implementation of it.
  With a single $(realpath ...), chomping the trailing newline is fine, as realpath will insert a trailing newline after the filename; but definitely something to be aware of with other commands.
  
  jwilk 6 years ago
  
  Newline in realpath (and many other commands) output is mostly for aesthetics. It doesn't help with taming $(), because $() strips all trailing newlines, including the ones that were actually part of the pathname. You would have have to resort to something as ugly as:
  f=$(realpath ... && printf 'x') f=${f%x}
  
  etatoby 6 years ago
  
  I've been writing Bash scripts (among other things) professionally for 15 years and I didn't know (and had never seen used) most of those tricks: read -d'', printf %q, @Q, all those -z options, and even realpath.
  Bash and other shell scripting makes simple data manipulation (filter/map/reduce/groupBy...) a royal pain in the butt and something incredibly hard to get right, without introducing horrible bugs under the surface of your script.
  The underlying reason is that shells were born as human interfaces for the basic UNIX calling convention for processes (ARGV and return code) and their input/output ports (byte-based stdin/stdout) both of which are hopelessly inadequate to represent even the most basic data structures, and thus need convoluted rules in place to be of any use (see the getopt conventions, including the double dash, and all the CSV-like input/output formats, including newline- and zero-terminated arrays of strings.)
  Take the example of command arguments: as the article points out, mixing options and argument arrays into the same flat array of strings (ARGV) is a recipe for disaster. Compare it with the calling convention in any modern programming language, where an array of filenames is a single well-defined item, and a boolean flag or other option is a completely different item.
  Couple this shaky foundation with the complex escaping and interpolation rules added by shells to make it useable by people (single and double quotes, backslashes, dollars, globs... nest and repeat, nest and repeat...) and you get a house of cards built on sand.
  The only redeeming quality of Bash and UNIX in general is that it's very easy to write something that works most of the time. Whether you want to write something flaky like that to "get the job done," or something that will work reliably all the time, hopefully with no security flaws or other bugs lurking under the surface, is up to you.
  See Worse is Better and the New Jersey approach that spawned C, UNIX and ultimately Bash: https://www.jwz.org/doc/worse-is-better.html
  Personally, I'll be phasing out my Bash usage more and more, in favor of strongly typed languages, even to do "scripting" tasks. I will also be looking more at PowerShell, which was designed to solve this very problem and which I have ignored for too long.
  
  pastage 6 years ago
  
  There are issues with handling files in any language, I think Bash is used in environments where files are controlled inputs too often for this to be an issue. Having Bash bail on bad filenames would be crude but enough for me, hook into readdir and check what subprocesses reads.
  Not checking filenames is a valid security concern in any language.
woodruffw 6 years ago
> Ruby is kind of clumsy for doing all the command piping parts of scripting, though; especially since there's no easy way short of wrapping Kernel.system in your own function to make a Ruby script crash on a nonzero subshell exit status.
You can do this by checking `$?`, and wrapping that check in a small function:
```
    def safer_system(*args)
      system *args
      raise "command failed with status: #{$?.exitstatus}" unless $?.success?
    end
```
(It might be small consolation, but Ruby 2.6 is also adding `exception: true` to `Kernel#system`)[1].
[1]: https://bugs.ruby-lang.org/issues/14386
jolmg 6 years ago

> Sure, it's possible to write shell scripts to do that. But why? Unless you're targeting a PDP11 or the Busybox-linked /bin/sh in your initramfs, is shell scripting really the best option?
It's still the best option when you want absolute portability and minimum dependency, like in an installation script. Another benefit of the shell is that for a random user of your script, they're more likely to know the shell language and be able to read the script, than if you used something like python, since they use it, at least basically, in their everyday use or administration of their system.

dwheeler 6 years ago

Here are some of my pages on attacks via filenames and how to counter them...

https://www.dwheeler.com/essays/fixing-unix-linux-filenames....

https://www.dwheeler.com/essays/filenames-in-shell.html

It is not really just a shell problem.

pixelbeat__ 6 years ago

This is one of the reasons GNU ls now quotes names with unsafe characters by default, so the common case of copy and pasting output from ls to commands will be safe, or if typing the argument, there is a visual indication of a potential issue.

If one want the older unsafe defaults you can add -N to your ls alias

jwilk 6 years ago

Is there a more comphesensive list of command line options that can be abused in argument injection attacks?

jacquesm 6 years ago

One nasty trick is web pages that show a benign code snippet that expands to something evil upon being cut-and-pasted.
- craftyguy 6 years ago
  
  or my personal favorite are folks who recommend this crap:
  curl https://<whatever> | bash
  
  Spivak 6 years ago
  
  This really isn't as bad and people make it seem. If you're trusting the vendor and aren't going to read the code anyway it's no less safe than installing a package.
  I don't like installations like this because they're messy, hard to track, harder to clean up, don't lend themselves to mass deployments, don't provide any integrity verification, the list goes on, but they're not particularly unsafe.
  These things exist because there aren't good package that don't require root. Maybe we'll get there eventually.
  
  jwilk 6 years ago
  
  One nasty thing about piping to shell is that the server can detect that this is happening:
  https://news.ycombinator.com/item?id=11532599
  
  craftyguy 6 years ago
  
  > If you're trusting the vendor
  Yea, because urls and websites never change! And http can never be compromised! (nope)
  > but they're not particularly unsafe.
  Arbitrary execution is never safe!
  > These things exist because there aren't good package that don't require root. Maybe we'll get there eventually.
  Well, depending on the language, there are (e.g. pip crap). But maybe folks should produce packages that maintainers can, well, maintain.