373 points by stevekemp 4 months ago
Linux hasn’t used CR0.TS for some time. I removed it a while back because manipulating TS is very very slow.
(I am not part of any process with respect to this, embargoed or otherwise.)
Edit: the upstream commit is 58122bf1d856a4ea9581d62a07c557d997d46a19, called “x86/fpu: Default eagerfpu=on on all CPUs”, and it landed in early 2016. Greg K-H just submitted backports to all older supported kernels.
AFAIK the entire idea dates back to the days of the 80286/80287 (and 80386/80387) when accessing the coprocessor was slow, and a TSS switch (Linux used it until the late 1990s) set that bit for you.
Wasn't 212b02125f3 ("x86, fpu: enable eagerfpu by default for xsaveopt") sufficient protection for everyone on a modern CPU?
That commit disables the lazy mode by default on processors supporting the XSAVEOPT instruction. But, it's possible to override that default with the eagerfpu= kernel command-line option, or that a hypervisor has masked out support for that instruction even on hardware with XSAVEOPT support.
The point is: although relatively unlikely, it is still _possible_ that you need some mitigation even if you have newer hardware (Sandybridge or newer is where XSAVEOPT first showed up, I believe).
Disclaimer: I work on Linux at Intel.
Eagerfpu seems to mitigate it, but some confirmation would be good here.
The money quote is this: (OpenBSD):
"3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries."
AES-NI is part of the vector/FP units and uses those registers as well.
I wonder if these "bugs" will create a market for security dongles that perform AES, RSA, etc? That way they aren't black boxes like CPUs that literally have minds of their own these days (IME). I would like to own a USB dongle that took files in and outputted them in encrypted form. Bonus if they were an open spec so you could have various vendors or open source FPGA versions. Bonus if the key load would be airgapped from the PC side, say via QRcode, hex buttons, microSD, Bluetooth with hardware disable switch, or even rfid.
Yes that does create some new attack vectors, but these "bugs" make me think that the whole architecture is a rooted, burning trash fire.
Well, yes. There is already a large market for these "security dongles", and many libraries and protocols for interacting with them. They're called HSMs and examples of libraries include PKCS#11, JCE, MCE, and protocols like KMIP. Widely used in the financial sectors, CAs (of course), revenue collection such as tolls, government functions such as passport issuance, and some kinds of industrial control segments, among others.
It's long been the case that side-channel attacks can extract key materials out of conventional CPUs. Power analysis alone has been now a decades long science and not going away any time soon, made all the more exciting by the prevalence of RF and the advancement of antennas. Spectre and the like is just another wake-up for those not paying attention e.g. in cloud services. Consider yourself one of the enlightened when it comes to crypto material handling.
Well, I worked with one of proprietary security tokens before. Nothing to be proud of, unpatched software/firmware bugs, zero responsibility of manufacturer and usability mess. The thing is, not only cryptography hardware and software itself should be safe, but whole system should be up-to date and have no weak links, which is hard in practice and few want to pay for it.
Makes me think if there is any incentive to do crypto properly or security theater will always prevail?
I've had the unfortunate experience of integrating a gemalto network HSM and the broken state of the documentation alone is enough to make you question any engineering inside.
It's security through obscurity.
HSM devices, as I understand it, are designed mostly to protect secrets (keys) and perform asymmetric crypto operations safely but perhaps slowly. The Intel AES-NI, on the other hand, is designed for high speed fast symmetric crypto, with nothing kept secret to the user of the AES-NI instructions.
There’s already a TPM in most computers, and the TPM can do this for you.
Be careful, though: there are real TPMs (actual chips built by companies that, in theory, know how to build secure-ish hardware) and there are firmware-emulated TPMs. I think that Intel and AMD both offer the latter. Intel calls its offering “platform trust technology”. I take that to mean “please trust our platform even though there is sometimes a web server running on the same chip as the so-called TPM”.
Using a TPM for generic crypto operations is mostly outside of what they were designed to do. Real hardware TPMs tend to be very slow so they're not really useful for encrypting/decrypting network traffic, they're really supposed to be used to sign messages and verify checksums. They are also supposed to support full on trusted computing, but since it was designed by crypto nerds the trusted computing stuff is almost entirely useless. It completely blocks the CPU while running and is very slow so the practical use cases are very hard to find.
> I think that Intel and AMD both offer the latter.
Have no clue on Intel, but for AMD there is basically separate ARM core within CPU so it's has TrustZone built in. Is it just a web server too? I truly curious.
I don't think they meant that the TPM itself runs a webserver. I believe what they mean was that the emulated version runs on the cpu itself, which means that if the system is also running a webserver then any cpu vulnerabilities would compromise the security of the emulated TMP.
No, more like the former. As I understand it, the emulated TPM runs on the Intel ME, which can also run a webserver.
These guys are the leaders in small USB crypto keys. Yes their devices offload various crypto routines; and they are cheap.
> "offload various crypto routines"
This is a misconception. These crypto keys are only designed to protect RSA and ECC private key, and encrypt symmetric key instead of actual data, for good reasons. The actual symmetric encryption is still performed on the host computer, the actual AES key can still be stolen from a CPU side-channel.
Are these tokens any good? Yes, they guard your private key. Is it enough to protect you from Spectre? No.
they are not cheap as they are not even close to usable out of 2nd factor auth flows. actual hardware acelerators for servers and real world loads are much more expensive though, but you get much more than this. also the interface to yubikeys is a usb hid, much more trivial to exploit than the article's issue.
Well, firstly, they have HSM-grade hardware available as well. Secondly, they have crypto processors that let you use PGP or PKCS11 certificates with the private key and certificate operations happening on the device, directly integrated into native system utilities.
Also, source on them being "much more trivial to exploit than the article's issue"? The only issue I've heard with Yubikey's certificate operations was https://www.yubico.com/keycheck/ where they also provided anyone affected with a replacement key at no charge.
> less secure
vou posted on a thread about one process stealing memory from another using cpu delays.
yubi key expose a device (or type as a usb keyboard), which every single user process have access to.
> yubi key expose a device (or type as a usb keyboard), which every single user process have access to
So? Are there any actual exploits you'd like to share that take advantage of either of these? Or are you just speaking in hypotheticals? Because in that case, basically everything you do on any computer that isn't airgapped (and even that can be exploited) is going to theoretically be exploitable.
The Pi Zero can do this as a USB client device for a host PC, including the QRCodes with the Pi camera, buttons/switches, and a small touchscreen for host-isolated verification and pin entry. Beaglebone Black (possibly also Pocket Beagle) can do it too, and the am335x does have some minimal crypto acceleration.
They're not high performance, they aren't "security focused" hardware, nor are they perfect fit for the task, but they're reasonably well understood and broadly available. The Pi Zero does have a closed "GPU" firmware, but there is an effort to run open firmware on it.
However if you expect the host to attack the device, USB OTG (i.e. Linux USB gadget drivers) may not be a good choice, you may want to access it via network instead and that opens up more choices (though most will not be as small).
The other alternatives are basically going to be micro-controllers, for example FST-01/Gnuk, and FPGAs which are still more of a black box than CPUs at the moment.
>>but these "bugs" make me think that the whole architecture is a rooted, burning trash fire.
The next architecture should/will have dedicated on-chip space for things like encryption. That we are mixing essential and trivial data in the same space, and expecting neither to leak into each other, is the root of the problem. I wouldn't be suprised if in ten years we are talking about L1 through L3 Cache, with a separate "LX" cache for the important security stuff.
Well, I doubt it. If anything, Spectre points us in the direction of dedicated, simple, low-power cores that do no speculation for security sensitive tasks. Shared resources is the root of all sidechannel leaks, so my prediction is we will see, or at least should see, IMO, systems with multiple, separate physical chips, and maybe separate physical RAM, to run untrusted or sensitive code.
I don't think a separate chip for non-trusted code would work. When I play minecraft I don't want it to be relegated to some sub-chip because it isn't trusted. Conversely, the bits of code running my disk encryption shouldn't be sharing resources with minecraft. So I think the more practical route is dedicated space/chips/resources for security-related stuff, and the big chip for all the less important stuff.
Now on a server, with a far greater proportion of security-related tasks, then we may need greater allocation to security. A split between security-specialized with lots of separate protected chips, and general-use CPUs with one bigger chip may be likely.
> When I play minecraft I don't want it to be relegated to some sub-chip because it isn't trusted.
Why not relegate the trusted stuff to a sub-chip then? (i.e. TPMs in desktop PCs, or iOS' "secure enclave")
Because drawing lines between trusted and non-trusted code isn't easy. Sometimes you need/want to dedicate all available processing power to a bit of trusted code (ie booting into windows). But other time you want to do the same for non-trusted (ie minecraft). Separate chips for each means you've basically halved your peak available horsepower (assuming both are of equal power). Rather than draw the line between trusted and not, I'd draw the line between security-related stuff (encryption keys) and everything else. Then the separate secure chip can be relatively small while the big chip remains available for everything else.
(I say "chip" but I think it is more likely to be a separate 'core' on a chip, one with it's own cache and ram. It wouldn't need much of either.)
Yes, I agree with you. What you are describing is exactly how the examples I gave above are designed. I didn't mean to imply that the whole OS should be considered "trusted", but rather just the security critical components.
See the T2 processor in iMac Pro, rumored to be coming to Macbooks.
Ah, but where do you draw the line between 'security sensitive tasks' and the rest? Is the keyboard driver security sensitive? What about pointer drivers (mouse, touch, etc)? Voice input, is that security sensitive? Video? Draw the line too tight and performance will really suffer, make it too loose and all that effort is for naught.
Absolutely that is future, you are right on. Witness ARM's "trust zone", Intel's SGX, chip vendors all over the place are making secure boot standard even for the lowest end micros. TPMs have been standard in laptops for almost a decade now. It's no longer optional to do key protection.
Had the same with HSM vendors - poorly patched versions of OpenSSL to talk to it (no upstream patches), functionality missing, months between vulns in OpenSSL and their version getting fixed - hopeless industry tbf.
No idea if it's good or not, but that sounds very similar to Nitrokey:
Crypto was never supposed to be done on the main CPU. We're all using what is essentially the dev test implementation of pkcs#11 etc.
It's one of the use cases for usbarmory , which has been around for a while now.
Isn't this what smartcards were supposed to do for us?
So Intel tried to shut *BSD out of the process again (like they did for the original Spectre/Meltdown) so they didn't feel they had to respect any embargo?
> So Intel tried to shut *BSD out of the process again (like they did for the original Spectre/Meltdown) so they didn't feel they had to respect any embargo?
Yes and no... It's really important that this be viewed from the context of the discussion opened by theo in the video from the previous HN post (provided in this thread by codewriter23).
Here's My TL;DW from the irritatingly poor quality video:
Yes, they are pissed that they are being excluded (rumour is amazon and google have been implementing fixes).
However, they are not necessarily "not-respecting" the embargo according to the proposed methodology Theo outlines in the video: to (speculatively) exclude _any_ potential source of speculative execution vulnerabilities to ensure they are safe without giving weight to any one rumour. And then gradually prune back the precautions as they become publicly disclosed.
Apparently they used a similar strategy previously to provide patches for sshd before they were allowed to publicly disclose the vulnerability... prevent the bug from being reachable without revealing exactly what is broken in the commits by never touching the offending code. In this case the idea is to be non-specific, disable a whole class of things even though it might not be necessary (because in this case they really don't know where the problem is exactly).
Disclaimer: The above is not my opinion, it was my interpretation of the relevant context from the video, i do not know if it matches their actions.
It seems possible the commenter on the oss-security mailing list is not aware of this strategy and is giving more weight to openBSD's patch than it deserves (and perhaps wrongly implying openBSD have disrespected the embargo as a sideffect).
However these patches are way beyond me so I cannot tell.
The braking the embargo part is about the FPU issue that they published a patch for a few days before Theo gave the talk.
The part you're referencing is Theo speculating about the next bug. He suspects fixing it requires flushing a cash line but he doesn't know which one (because he doesn't know where the bug is) so he proposes flushing all of them until the bug is published and then removing the flushes that aren't necessary.
He then mentions the last serious OpenSSH bug. Instead of publishing a fix for the bug (and thus disclosing the bug) they decided to publish a patch that moved a bunch of code around and just happened to also make the buggy code unreachable. Then they told everybody to upgrade and once that happened they could safely disclose the bug and publish a fix for it. No embargo necessary and everybody got the fix at the same time. (I assume that's why he brought it up.)
Ah, thanks for pointing that out. But I wonder if there is actually a demonstrable exploit for that patch? Or, if it's the same preemptive approach? I guess i'm arguing that patches didn't necessarily wait for Theo's talk to reveal why they are doing what they are doing.
Can someone with deep enough knowledge of the patch tell if it implicitly demonstrates the flaw? (and therefor effectively breaks public disclosure) or is it purely speculative - oh god the puns are killing me.
> It seems possible the commenter on the oss-security mailing list is not aware of this strategy and is giving more weight to openBSD's patch than it deserves (and perhaps wrongly implying openBSD have disrespected the embargo as a sideffect).
This sort of thing regularly happens. I remember an incident relatively recently when someone inaccurately "pointed out" that Arch Linux had "broken" an embargo by packaging an upstream release .
Eh? An embargo is where you share information with someone on the grounds they don't release it until a certain date. If you come by the information some other way then clearly you're not a party to the embargo.
You can respect an embargo even if you got your information elsewhere.
The existence of the embargo is usually a secret as well. Hard to play the game when you're not told the rules.
why would you?
The language being used makes it sound like somehow openbsd is breaking agreements. Considering this appears to be the result of the false perception that OpenBSD breaks embargos they are a party too, its important to fight this loose usage of words.
In the vast majority of cases it is the prudent way to go about it and not doing it (with intent) is often reckless and a dick move well deserved of criticism.
This/these cases however might be an exception.
I fully agree that one should be careful of propagating such false perceptions about OpenBSD (or any other entity).
I disagree. As long as the embargo is purely related to this or that company profiting over another, as opposed to being potentially a matter of safety (see the UK D Notice system, for example), it's laughable to describe breaking something covered by someone else's optional embargo as a "dick move". On the contrary, it's generally highly amusing, and at the very least informative.
I agree and that is exactly what I meant.
However these circumstances can also be a matter of safety. For instance, an easily exploitable SSH vulnerability can incur serious damage to lots of institutions.
Further, the embargo isn't/shouldn't be about protecting Intel - it's about protecting everyone that uses Intel CPUs (sometimes those goals are aligned, sometimes not). How you go about that is one thing and if you intentionally disrespect that embargo (whether you were in on it or not) means that the assumptions and motivations for the embargo are invalidated and the consequences could be huge.
Now you don't necessarily have to agree with the embargo but if you don't know the consequences (in this case it looks like it was likely to be known) you take it up on yourself that you (with most likely very limited information) can identify the consequences of doing such a disclosure.
It's the same problem of doing a irresponsible disclosure of a major vulnerability. Most do consider that to be a dick move.
Assuming this were true, then they wouldn't have known about the embargo, since we are assuming they were not part of the process.
Posted on HN 3 days ago, Theo de Raadt speculates about the FP big and discusses being frozen out by Intel.
Video. Contains profanity.
This is a legitimately related impromptu talk gave at BSDCan last week.
Better quality video should surface eventually..
Don’t know why you’re being voted down. Thread and video are pertinent, language is not the commenter’s fault. Here’s my token upvote.
The technical speculation begins at 7:53 in the referenced video [so, https://www.youtube.com/watch?v=UaQpvXSa4X8&t=473]
It's worth noting that TdR is not the bringer of profanity in the video.
I wasn't impressed by whoever dropped the f-bomb.
Seems to be Warner Losh, previously one of the FreeBSD core team members.
I was in the room at the time. Warner had the initial outburst. There is a second speaker, and you can hear the voice change, responsible for the profanity. Warner's delivery was much scarier in person though.
the most important information shared in this post, downvoted.
Rumors circulate, people talk, leaks happen.
An embargo is state imposed. Not corporation imposed.
The usage of "embargo" to mean a generic, non-governmental impediment or prohibition is at least 200 years old.
but if you don't agree to it, it has no force, and therefore is no embargo..
You're being pedantic.
Dragonfly went a little further with this as well.. very precarious future ahead for us, I think..
I find it amazing how clean the diff for something like this is within the BSD source tree: http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/9474...
:) Some people go to great lengths to make commit history produce nice clean diffs... although I doubt anyone ever reads mine. Rebase, squash and split, your future self will thank you!
Here's the Linux patch for this: https://patchwork.kernel.org/patch/10202865/
It was done in February for Spectre ("variant 2"). Strange that Dragonfly is only starting now to clear registers.
I wonder if even this will be enough, it seems that modern x86 has hundreds/thousands of non externally visible hidden registers it uses for fast store/load (rather than a call to cache/ram).
Once the architectural registers are zeroed in a path that can't be speculatively skipped, the other values in the register file are dead and can't be exposed.
Sure but these bugs are about side effects of speculation that happens before any kind of "deadening" of untaken branches occurs. What if the processor is speculating using these hidden registers, couldn't similar information be exposed?
The code in question unconditionally zeroes the architecturally-visible registers on entry to the syscall path.
If any mis-speculation happens, it has to happen after this point, at which point any user-supplied data left over in the register file isn't accessible to speculated code, because even speculated code can only use architectural names to address registers.
Can anyone breakdown the impact and severity of this in more digestible terms for those of us not that deeply technical?
Unpatched systems can leak SIMD/FP state between privilege levels. Pretty fucking high severity since that's where we stick private keys these days.
The cost is more expensive context switches currently since we'll have to fully unload and reload all SIMD/FP state. I'm sure Intel will fix this one in a couple gens.
The processor has XSAVE (the mechanism that we use to save/restore FPU state and more these days) optimizations internal to the processor that keep it from having to fully reload the FPU state. OSes like Linux have not been doing lazy FPU switching on processors with these optimizations for a long time.
See information about XSAVEOPT and the "Init and Modified Optimizations" in the SDM: intel.com/sdm .
As @luto said above, recent versions of Linux ripped out the lazy handling entirely.
Unpatched meaning systems without the Spectre/Meltdown mitigations enabled? Or is this something unrelated to the previous bugs?
This is unrelated and requires new patches. Somewhere else in the thread here, someone is saying that Linux isn't vulnerable, but I don't know for sure.
Thanks for clearing that up for me. Wooo boy, another one.
In the words of Bruce Schneier, "attacks only get better over time".
It's basically another aspect of the branch-speculation bugs, not fixed by the original fixes.
AFAICT it's not related to OOO execution, but rather to not cleaning normally accessible state after normal execution. No timers or tricky memory reads needed.
Yeah, probably if I'm understanding it right. This one seems less correlated with OoO than Meltdown or Spectre. It looks like you're just issuing a load based on FP register state you shouldn't touch. The system eventually notices it needs to have issued an exception based on the access to the FP register and eventually does quash the load within the core but the the load has already gone out to the memory hierarchy where you can see its effects on cache levels even though it never completes. Larger cores are harder to stop all at once than smaller cores so being OoO should correlate with being vulnerable but basically any pipelined processor that can throw an interrupt on register access could theoretically be vulnerable to something like this.
I think you probably do need to execute the should-fault FP access in a not-executed speculatively executed branch (à la Meltdown), so that the exception doesn't actually fire and the kernel doesn't reload your own FP state.
(Since you can only learn a small part of the state each time, you need to have the other processes state remain in the FPU while you repeat the process to learn the entire AES key or whatever).
It might be that the check of who can access the FP registers versus what the identity of the current process is takes a few clock cycles due to communication across the core and Intel didn't want to slow down the critical path register load for this since they figured they could just squash any improper execution later. But it might also be as you say.
If it was public, we might be able to. This is just speculation, and there's a good deal of speculation that this isn't the full issue.
Does this impact AMD as well? If not, might this bring further performance parity between the chipmakers?
Linux appears to have patched things to avoid this problem on Intel and AMD more than 2 years ago, with the reasoning being that modern CPUs are fast in "eager" mode and "lazy" is not needed. So no, no performance difference for AMD as a result of this issue.
It's not just that modern CPUs are fast in eager mode; also str* and mem* functions nowadays use the FPU (via SSE/AVX) and the dynamic loader (ld.so) uses them. Therefore Linux's heuristics switched to preloading the FPU anyway for most processes, even before the program code started running.
an empire built on sand...
Literally and figuratively.
Hah, I get it. Because silicon is made of a particular kind of sand.
I am not a security expert, but I am still able to understand the low level of how software works.
Yet I have been always mesmerized how hard it is to understand security stuff. Maybe it is because I don't find it interesting, as I am more interested in creative stuff like gaming.
Honestly it has been maybe 10 since I've abandoned the idea of caring about security. I just do what is minimal: passwords, avoid sketchy websites, not keeping sensitive files, using trustworthy software, etc.
Security is just too hard now. Maybe manufacturers like intel are to blame, and obviously there MUST be some political will to make sure that most electronics are insecure to give an advantage to intelligence agencies.
Because ultimately when I first heard about the sony rootkit, and lately about the HD firmware worm, I was really feeling powerless and outdated. I really think that even for a guy like me, who can write software, to not be able to protect myself efficiently against those attacks, and to tell non-programmers that "no I cannot hack people's computers", is starting to make me feel like an idiot.
As years go by it seems that electronics seem more and more vulnerable, and I still feel completely unable to defend myself against it. Even politically I'm sure that designing a completely secure computer would be a taboo subject because people would argue that it could help the bad guys, and I'm sure that politically, one could not design such a device with success.
The whole "I don't care since I have nothing to hide" is really a fair excuse to show that I'm not capable of defending myself, and I will let others go at their cyber wars without me caring at all. For now the security of individuals seems to be lost, and I fear that one day it won't only be state actors that use security for policing, it will be petty criminals. If cyber chaos ensues nobody will use computers anymore, and they might even become banned from possession.