CompCert – A formally verified C compiler

compcert.inria.fr

185 points by cjg 5 years ago

ttul 5 years ago

So when you implement a buffer overflow in your C program, you can be assured it will overflow in the manner specified.

tom_mellior 5 years ago

That's both funny and true, but note that the same CompCert semantics is also the basis of a verified static analysis tool that can be used to prove the absence of such errors in C programs: http://compcert.inria.fr/verasco/
If you have a program that runs successfully through both the analyzer and the compiler, you can have quite a bit of confidence in the result.
pslam 5 years ago

It's just one part of the tooling, and doesn't solve program-correctness on its own. A formally verified toolchain is only really useful for projects which have pervasive mitigations against these kinds of errors. In other words, if you have a code base which has been verified to a high standard, you would also want a toolchain verified to a high standard.
In an ideal world you'd have a formally verified toolchain for a language without as many deficiencies as C, but here we are.
- pjmlp 5 years ago
  
  Yep, where all C compilers would be Frama-C compliant, integrated into the CI/CD, and all OSes would be targeting CPUs with memory tagging like Solaris on SPARC ADI, but oh well.
thekingofh 5 years ago

And that specified manner is 'Undefined'. Exactly to spec!
Retra 5 years ago

I know you're being facetious, but even if your compiler is verifying that your overflows are happening correctly, it is not a stretch to see that it can report that such overflows are occurring at all and what assumptions are causing them.

als0 5 years ago

One of the really nice outcomes of this approach is that it allows you to use different optimisation levels.

Other approaches try to bypass the compiler by verifying the generated binary against the semantics of the source code (treating the compiler itself as a black box). The major drawback is that you had to completely disable all optimisation.

monocasa 5 years ago

> Other approaches try to bypass the compiler by verifying the generated binary against the semantics of the source code (treating the compiler itself as a black box). The major drawback is that you had to completely disable all optimisation.
sel4 verfies the binary against the specification, and lets you use optimization. Once you've extracted the flow graphs from the binary, following the optimizations isn't difficult.
- sanxiyn 5 years ago
  
  Yes, AutoCorres works, but it is somewhat user hostile, even for these tools, or compared to CompCert. CompCert looks and feels like a compiler, AutoCorres isn't.
  https://github.com/seL4/l4v/tree/master/tools/autocorres

herodotus 5 years ago

In 1983, Ken Thompson gave a Turing Award talk in which he shows how to embed a backdoor into a compiler in such a way that it would not be visible, even if you had access to the compiler source. (http://delivery.acm.org/10.1145/1290000/1283940/a1983-thomps...) I wonder if the verified compiler could be altered in the same way? (A snippet from the paper so you get the idea:

>We compile the modified source with the normal C compiler to produce a bugged binary. We install this binary as the official C compiler. We can now remove the bugs from the source of the compiler and the new binary will reinsert the bugs whenever it is compiled. Of course, the login command will remain bugged with no trace in source anywhere.)

jpfr 5 years ago

The above assumes that the same C compiler is used everywhere. Say you suspect your gcc to be infected. If a second compiler, say clang, can compile the gcc sources and this gcc then recompiles itself bit-identical to the original, then gcc and clang would both have to be (mutually compatible) infected.
Another way is to bootstrap up to a C compiler from machine-language upwards. So that every "level upgrade" in the bootstrap chain can be verified independently.
http://bootstrappable.org/
Here is one bootstrapping effort:
https://github.com/oriansj/stage0
fmap 5 years ago

There is actually a neat way around this by using typed assembly language/proof carrying code. If you had a type preserving compiler down to machine code you could use a separate proof checker (which you presumably wrote by hand in machine code) to convince yourself that the resulting binary implements its spec. :)
The spec of a compiler is also relatively simple, so there isn't a lot of room for backdoors there.
fulafel 5 years ago

As long as the compiler system is self-bootstrapping. I don't know if this is the case with CompCert C. Probably not becuse of the ML compiler tech involved, which usually doesn't bootstrap on C.
But aside from these incidental properties, yes the same thing could be done.
(This still leaves a large class of related, less recursively symmetric integrity attacks that could be made without the language fully bootstrapping itself, of course)
- nickpsecurity 5 years ago
  
  There's a new extractor that can output C from Coq. There's also multiple compilers for each language you can extract to. My solution was to do diverse compilation where multiple compilers, OS's, and ISA's compiled the compiler. Then you just run it through itself on each one. Use end result that most agree on.
  
  unboxed_type 5 years ago
  
  Are you talking about CertiCoq project? It is not ready yet and not clear when will be.
  [1] https://www.cs.princeton.edu/~appel/certicoq/
  
  nickpsecurity 5 years ago
  
  Talking about this:
  https://staff.aist.go.jp/tanaka-akira/succinct/slide-pro114....
  They extract from Coq to C. Like in my Brute-Force Assurance concept, that would let us confirm and supplement the formal verification by hitting the C output with every V&V tool for C programs that we have on hand.
- iaabtpbtpnn 5 years ago
  
  The ML compiler tech is OCaml which is bootstrapped on C. (It wasn't always so, but has been since Caml Light.)
fao_ 5 years ago

> I wonder if the verified compiler could be altered in the same way?
How do you know it hasn't already been compromised =)
yesenadam 5 years ago

i.e. Reflections on Trusting Trust.
Here's a link that worked for me: http://www.cs.cmu.edu/~dga/15-712/F14/papers/p761-thompson.p...

fuklief 5 years ago

Does anyone in the industry use it ? Does anyone knows for sure that Airbus uses it ?

jjrh 5 years ago

https://www.absint.com/ is the company that sells licenses/support and they advertise:
> For two decades now, Airbus France has been using our tools in the development of safety-critical avionics software for several airplane types, including the flight control software of the A380, the world’s largest passenger aircraft.
https://www.absint.com/success.htm More customers listed here.
- throwaway0qw234 5 years ago
  
  >> Does anyone in the industry use it ? Does anyone knows for sure that Airbus uses it ?
  I've heard Xavier Leroy say a year or two ago that Airbus was still evaluating CompCert and was thinking of using it in production for the aircraft generation after the next one, or something like that. And that CompCert sales have been very slow in general.
  But here is a paper about an actual industrial user: https://hal.inria.fr/hal-01643290
  > https://www.absint.com/success.htm More customers listed here.
  AbsInt has been selling timing analysis tools for decades, and Airbus and the other customers are probably almost exclusively customers for those tools. That doesn't mean that those customers also buy CompCert. It seems like they don't.
- jng 5 years ago
  
  I have no idea about this specific case, but this kind of claim can be very misleading, while remaining true. Maybe they used the tooling once for an offline script of some sort, and that is all.
  
  fmap 5 years ago
  
  It's not misleading in this case. It's used in practice.
  You have to understand that CompCert's main competition in this space was an ancient version of GCC without any optimizations. This is mostly an issue of certification. CompCert got the same certification and is already a great improvement just by virtue of having a register allocator...
  
  jng 5 years ago
  
  Good to know, thanks for explaining.
  A register allocator! :)

kazinator 5 years ago

> Such verified compilers come with a mathematical, machine-checked proof that the generated executable code behaves exactly as prescribed by the semantics of the source program.

Which semantics? The ISO C semantics is rather lacking; "undefined behavior" lurks around every corner.

A C compiler that correctly implements all ISO C requirements is a fine thing, and certainly better than one which gets some of that semantics wrong, but doesn't achieve all that much in the big picture.

If the verified compiler actually provides extended semantics beyond ISO C that eliminates much of the undefined behavior, such that programmers using this dialect can rely on a safer language, then we're talking.

chrisseaton 5 years ago

> doesn't achieve all that much in the big picture
Bizarre way to talk about award-winning breakthrough research in applying verification to large practical programs.
- kazinator 5 years ago
  
  Don't mean to say that the technique isn't valuable; just the particular application (verified C compiler) not as much as other potential applications (verified ... almost anything else).
ychen306 5 years ago

Eliminating undefined behavior from the language is a language design issue orthogonal to the problem CompCert is trying to solve.
kurtisc 5 years ago

You're describing a problem already solved by MISRA C, etc., for which there are static analysis tools to allow you to ensure programmatically that you're conforming to the standard.

User23 5 years ago

Sadly even a proved correct C compiler can produce incorrect behavior in a real system: https://blog.regehr.org/archives/482

jacquesm 5 years ago

How does a project like this deal with the bootstrap problem?

dranov 5 years ago

CompCert doesn't do this, but CakeML, a verified compiler for a variant of ML bootstraps itself in the logic:
> A unique feature of the CakeML compiler is that it is bootstrapped “in the logic” – essentially, an application of the compiler function with the compiler’s source implementation as argument is evaluated via logical inference. This bootstrapping method produces a machine code implementation of the compiler and also proves a theorem stating functional correctness of that machine code. Bootstrapping removes the need to trust an unverified code generation process. By contrast, CompCert first relies on Coq’s extraction mechanism to produce OCaml, and then relies on the OCaml compiler to produce machine code.
For details, see section 11 "Compiler Bootstrapping" in http://www.cs.cmu.edu/~yongkiat/files/cakeml-jfp.pdf
- sansnomme 5 years ago
  
  Does this guard against trusting trust attacks?
  
  dranov 5 years ago
  
  I might be mistaken, but my understanding is that it does not. You still need to trust the implementation of the logic. But if you don't trust that, you wouldn't trust the compiler correctness proof anyway, so bootstrapping in the logic does give a benefit.
  
  sansnomme 5 years ago
  
  So Coq basically have to be correct and trusted for this to be secure?
  
  sanxiyn 5 years ago
  
  Yes, but large part of Coq is designed to be untrusted. Roughly, Coq searches proof and checks found proof. Search part is larger, it can be incorrect, can have bugs, etc. without affecting the proof. Check part is trusted, and is available as a separate binary (coqchk), which consumes proof found by search (coqc).
  For CompCert, the more worrying is the correctness of "extraction". This is somewhat technical, you should be able to search if you are curious. CakeML solves extraction problem.
  For HOL, Proof Technologies Ltd developed HOL Zero, which is solely dedicated to trustworthiness without bells and whistles. Bells and whistles are absolutely necessary for proof development. The idea is to use HOL Zero at the end, after proof is finished. It's like coqchk idea on steroid. http://proof-technologies.com/holzero/
  Summary: CompCert trusts coqchk and extraction. CakeML trusts HOL, hopefully something like HOL Zero can be used in the future.
  
  smallnamespace 5 years ago
  
  Note that Godel's Incompleteness Theorems tell us that formal systems with sufficient power cannot 'prove themselves' in some sense, and we see that rather convincingly:
  1. Implementations can be wrong, so we want a formal proof system
  2. The proof system can be wrong, so we want to apply the proof system to itself
  3. But the proof system may be wrong (either in implementation or specification) in a way such that its self-proof is actually invalid
  At some point, we want to assert some property (completeness, correctness) of the system as a whole, but whenever we augment the system to allow this, that augmented system's own properties now come into question.
  
  ChrisLomont 5 years ago
  
  Logic proving systems need never hit Godel incompleteness. Many systems weaker than Godel are proven consistent within themselves, and are strong enough to verify code.
  Godel’s Proofs rely on statement encoding using properties of integers not required for these logic systems.
  Presburger Arithmetic, for example, avoids Godel incompleteness, yet provides a decent system for a lot of work.
  
  cmrx64 5 years ago
  
  not coq but HOL4 and Poly/ML. there are verified HOL kernels but you’d need to do substantial manual validation of the exported theorems to make sure they didn’t get subverted during proof export.
fmap 5 years ago

There's a separate project that solves this problem (CertiCoq https://www.cs.princeton.edu/~appel/certicoq/). It's making progress, but these things take time. :)
amir734jj 5 years ago

Verifiable compiler means given a C source code their compiler-generated assembly code or IL code is verifiable that it functions the same as the source code.
- snazz 5 years ago
  
  The GP was asking how the compiler deals with the fact that it itself must be bootstrapped with another, unverified compiler before it can compile its own code.
  
  kiriakasis 5 years ago
  
  In this case the bootstrapping problem is slightly different. What you are bootstrapping is not the compiler but the verification, in coq this would be trusting that the ocaml extracted code is correct by the coq proofs.
  But I might be wrong, there are a lot of nuances and I do not know them all...
aaaaaaaaaab 5 years ago

Fully Countering Trusting Trust through Diverse Double-Compiling
https://dwheeler.com/trusting-trust/
syrak 5 years ago

It does not.

m4r35n357 5 years ago

See limitations in the docs.

marinintim 5 years ago

Going to great lengths to avoid writing Rust.

wk_end 5 years ago

This is about verifying the compiler, not proving any properties of the generated code. Totally orthogonal to Rust's aims, unless the Rust compiler has been formally verified (I don't believe it has).
- strangecasts 5 years ago
  
  There has been work to verify a subset of Rust's logic - https://plv.mpi-sws.org/rustbelt/popl18/ - but the compiler itself is not verified, no.
- steveklabnik 5 years ago
  
  It has not.
stcredzero 5 years ago

Would the world be a better place if Linux, the BSD kernel behind OS X, and the Windows kernel were all rewritten in Rust? (I don't think it would be dramatically better overnight.)
- sanxiyn 5 years ago
  
  If we could do this by waving a wand without downsides (e.g. all existing maintainers magically acquire Rust knowledge, etc.), yes it would be better. No question.
  Seriously, people, you can find kernel bugs at will with kernel fuzzers like syzkaller from Google. Look at https://github.com/google/syzkaller/blob/master/docs/found_b... and weep.
- AnimalMuppet 5 years ago
  
  If the Linux kernel writers all instantly knew Rust as well as they know C, then in 10 years they might produce something that is as functional as the Linux kernel currently is. In the process, they would lose everything that they could do over the next ten years to improve the Linux kernel.
  You could claim that eventually the payoff would pay back the lost 10 years. But "eventually" can be a very long time...
- c1yd3i 5 years ago
  
  It would be better overnight. But it would be even better if we had one NT like kernel and a new OS built from scratch with a browser (webkit) based ui.
  
  stcredzero 5 years ago
  
  a new OS built from scratch with a browser (webkit) based ui
  So basically, ChromeOS rewritten in Rust? Why not a Servo based UI?
  
  cjmaria 5 years ago
  
  The hard part is the NT-like kernel in Rust, not the web-based UI. ChromeOS uses the linux kernel.