Designing Headers for HTTP Compression

116 points by fanf2 5 years ago

HTTP Headers can do amazing things, it’s unfortunate they are so overlooked. I would say in security, where I work, good headers are the backbone of good web application security.

They are:

X-XSS-Protection

X-Frame-Options

Content-Security-Policy

Strict-Transport-Security (HSTS)

X-Frame-Options

Expect-CT

Feature-Policy

All of these can be set very quickly in your Apache2.conf.

userbinator 5 years ago

As much as I like using netcat and friends to interact with HTTP servers, I've always found the text-based protocols to be needlessly verbose and oriented towards a minority use-case. Something like ASN.1, which is very widely used in the telecommunications industry for protocols like GSM and such, does not have the same bloat problem yet remains extensible.

jimktrains2 5 years ago

Isn't it notoriously difficult to write a sane, not exploitable asn.1 parser? The openssh didn't want to use x.509 certs for say because they considered asn.1 too risky and difficult iirc.
- tialaramex 5 years ago
  
  It is notoriously difficult, but to be fair a big part of that notoriety arises in the era when you'd have written this in C, as indeed OpenSSH is written in C. You obviously should not write such software in a language as unsafe as C today.
  The Distinguished Encoding (DER) used for X.509 forces everything to be unambiguous, so in principle in a language where you can translate that lack of ambiguity into code and not inadvertently have it execute the raw data if you have an off-by-one error this seems no more dangerous than the present use of ASCII. It's just that C has fewer obvious sharp corners when it comes to parsing ASCII and programmers who work mostly in C knew where they were.
  On the other hand, X.509 is also a scary choice because it's a thicket of options. One of the biggest pieces of work done in the Web PKI has been telling Certificate Authorities not to issue stuff with random options filled out, because that's yet more code surface area to either cause interoperability issues or worse, security problems. For the former, an example is Let's Encrypt. You may notice today your Let's Encrypt intermediate is named X3. What happened to X1 and X2? Well they are created in pairs, so that's why there were more, but it doesn't explain why we're not still using X1. The answer there is that X1 contains a feature called a negative (prohibitive) Name Constraint. This is a way to say in X.509 that the issuer (in this case "DST Root CA X3" operated by Identrust) does not permit this intermediate CA to issue for specific names. Let's Encrypt had a constraint forbidding the US military TLD .mil
  But Windows XP doesn't understand this optional X.509 feature, if it sees any name constraint that it doesn't understand it concludes that all names are forbidden. So having this constraint meant Let's Encrypt was useless for Win XP clients. Identrust agreed to have Let's Encrypt simply obey this as a policy (and then they subsequently abandoned the policy) to enable XP compatibility.
  So, avoiding X.509 might have been a sensible choice in OpenSSH but that doesn't necessarily mean using ASN.1 DER would necessarily be a bad choice in new systems today. I don't see any reason to use BER or other encodings of ASN.1 in new systems.
  
  blattimwind 5 years ago
  
  With ASN.1 there are also [the relatively new] OER, whose encoding rules seem significantly simpler than the previous ones (but may require transmitting one or the other bit more).
  W.r.t. parser security, I don't think that has been a success, historically, regardless of format. Few if any parsers for moderately complex formats have had zero vulnerabilities. If you think of a web server, or an XML library, or something similar, chances are pretty good it had at the very least one critical vulnerability related to parsing.
- userbinator 5 years ago
  
  ASN.1 DER (the variety used for X.509) is IMHO quite easy to parse. It's a TLV (tag-length-value) format, so you know the lengths of every value before reading the value itself, and you don't even need to know the length to check the tag and fail if it's unexpected. Very much unlike text-based protocols that require skipping arbitrary amounts of whitespace and reading until you see a delimiter (or an arbitrary limit is reached), dealing with case-insensitivity, comments, etc.
mcguire 5 years ago

BER, DER, PER, CER, or WhiR?
wbl 5 years ago

ASN.1 is not free
- blattimwind 5 years ago
  
  Most specifications are nowadays available free of charge from the ITU.
  
  wbl 5 years ago
  
  Oh exciting! Link please?
  
  lmz 5 years ago
  
  https://www.itu.int/rec/T-REC-X.680
  
  pdimitar 5 years ago
  
  Can one of you please clarify what does it mean "free" in this context?
  If I want my API to alternatively use ASN.1/DER instead of HTTP for encoding/decoding its traffic, are there are any licensing / copyright dangers in doing so?
  
  blattimwind 5 years ago
  
  No.
  
  wbl 5 years ago
  
  So I looked and it doesn't seem that DER is included in the free download.
  
  userbinator 5 years ago
  
  The Word documents (presumably the originals) are not free, but the PDFs are --- that link is to X.680, the general overview of ASN.1 as a whole. DER itself is in X.690 which is here:
  https://www.itu.int/rec/T-REC-X.690-201508-I/en

danesparza 5 years ago

Am I the only one that thinks 1k of HTTP headers per request is absolutely bananas? As somebody who traces requests regularly (as part of AJAX or REST API debugging) I would be hard pressed to see even half that on a request.

Please let me know if there some spec or framework that I'm not thinking of that passes that much data in the HTTP headers.

detaro 5 years ago

Open dev tools and browse around a bit and look at the request sizes. For me, on the request side, User-Agent and Accept-* headers already are ~200 bytes. Add a long referrer, another 100 bytes. Cache-control/ETag, cookies, ... get it up to around 1k fairly easily.
- lmkg 5 years ago
  
  > cookies
  This is the real culprit right here. This very page on HN that I'm reading right now, the cookie field is 415 bytes and there's really not much there. New York Times homepage has 988 bytes and I don't even visit that site. Reddit? 2055. Boom, that's twice your 1k right there in one field.

thresh 5 years ago

It's all fun and games until we realize our web designers put a four megabyte jpeg on the index page.

Sami_Lehtinen 5 years ago

What, JPEG? I thought real designers prefer high quality content and use 20 megabyte PNGs. (been there, done that)
benhoyt 5 years ago

I think this is a really good point. While I like the article's focus on depth, and thinking about the small stuff is important, saving a few bytes on headers is going to be blown out of the water by all the images, JS, CSS, and 3rd-party pixels being loaded. "Profile before optimizing" applies here too.
- mcguire 5 years ago
  
  1. Images are transferred once, not with every request.
  2. Most requests and many responses are all or mostly headers.
  3. Intermediaries.
  4. Header compression is at the protocol level; if it's done wrong, the best js/css/whatever policy can't do anything to improve it.
- youngtaff 5 years ago
  
  But I bet you most people will never profile this.
  We have a tendency to focus on page size but what really matters is the user experience.
  For all the advice of putting making sure the first 15KB of a page counts, inlining critical styles etc., it that response has 4KB of headers e.g. large cookies, then all of a sudden you've got 11KB!
udp 5 years ago

The 4 MB JPEG would be downloaded, though, while the headers would be uploaded. Consumer internet has terrible upload speeds in general.
dcbadacd 5 years ago

Given today's screen resolution, anything less would probably look horrible.
- seanwilson 5 years ago
  
  Use SVG. Infinite detail. Doesn't work for photos though.

londons_explore 5 years ago

I think it's really sad that we have to use sub-par compression because we can't trust TLS to keep our data secure when we use good compression, since TLS leaks the compressed size to attackers, and when compressed, the size can depend on the content.

I don't have a solution to that issue, but it seems really fundamental, and something we should all be on the lookout for ways to solve.

tialaramex 5 years ago

> I don't have a solution to that issue
You don't have a solution because there isn't a solution.
It's your definition of "good compression" which is leaking the data, not TLS, if you're willing to let Bill guess what your phone number is, while you agree to just tell him which digits he gets wrong, Bill can guess your phone number in no more than ten tries (the tenth will be correct if none of the others were).
TLS doesn't actually leak the compressed size, it's just that in practice you will stop the TLS session after transmitting the compressed data, because to do otherwise is wasteful and if you didn't care about waste you would not use compression. If you want, you can run TLS with padding to always hit, say, a multiple of 4 kilobytes per transaction. Now say your "compression" took you from 3.84kB to 2.16kB and then you padded it to 4kB anyway and oh, wait, this was worse, why did we bother with compression?
If you have a system with an explicit range of sizes and can tolerate always transmitting the maximum size, TLS absolutely will mask out the actual size with padding.
cipherboy 5 years ago

This is an issue with compression not TLS.
The threat model is this: the Mallory controls some number of bytes in a web page. Can she exfiltrate data about Alice's session (e.g., bank account number) by only knowing the size of the compressed payload and modifying her portion of the content?
For this purpose, TLS is a fixed-sized modification:
|TLS(m)| = |m| + k
for a constant k. So it doesn't meaningfully impact this attack: you can do it with just compression and no encryption.
Compression is useful when the cost of sending bytes exceeds the cost to decompress. If you start padding compression back to the original payload length, you no benefit to compressing in the first place. If instead you introduce a random amount of padding, this adds overhead both to decompression and in transfer, at a marginal benefit to security (the attacker needs more requests -- either to recover a larger portion of the secret's context, or to control for the random noise).
This is also a cryptopals exercise:
https://cryptopals.com/sets/7/challenges/51
blattimwind 5 years ago

This is not a problem with TLS. This is a problem with encrypting information of different contexts together. In other words, this is a problem of HTTP and the web.
It is also a problem of people thinking that you can have a magic encrypted pipe from A to B (e.g. TLS) that protects your information perfectly no matter how negligent you are: It's a wrong idea about an abstraction.