Escolar Documentos
Profissional Documentos
Cultura Documentos
SPECTRE OR MELTDOWN
5th Jan 2018 at 5:01 pm
89 Comments
« THE RASPBERRY PI PISERVER TOOL
security vulnerabilities nicknamed Spectre and Meltdown. These affect all modern BLOG FEED
Intel processors, and (in the case of Spectre) many AMD processors and ARM
VIEW THE ARCHIVE
cores. Spectre allows an attacker to bypass software checks to read data from
arbitrary locations in the current address space; Meltdown allows an attacker to RSS FEED
read data from arbitrary locations in the operating system kernel’s address space
(which should normally be inaccessible to user programs).
While the processor in your computer doesn’t execute Python directly, the
RASPBIAN
statements here are simple enough that they roughly correspond to a single
machine instruction. We’re going to gloss over some details (notably pipelining and
register renaming) which are very important to processor designers, but which
aren’t necessary to understand how Spectre and Meltdown work.
The simplest sort of modern processor executes one instruction per cycle; we call
this a scalar processor. Our example above will execute in six cycles on a scalar
processor.
Examples of scalar processors include the Intel 486 and the ARM1176 core used in
Raspberry Pi 1 and Raspberry Pi Zero.
The obvious way to make a scalar processor (or indeed any processor) run faster is
to increase its clock speed. However, we soon reach limits of how fast the logic
gates inside the processor can be made to run; processor designers therefore
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
began to look for ways to do several things at once.
t, u = a+b, c+d
v, w = e+f, v+g
x, y = h+i, j+k
But this doesn’t make sense: we have to compute v before we can compute w ,
so the third and fourth instructions can’t be executed at the same time. Our two-
way superscalar processor won’t actually be able to nd anything to pair with the
third instruction, so our example will execute in four cycles:
t, u = a+b, c+d
v = e+f # second pipe does nothing here
w, x = v+g, h+i
y = j+k
Examples of superscalar processors include the Intel Pentium, and the ARM
Cortex-A7 and Cortex-A53 cores used in Raspberry Pi 2 and Raspberry Pi 3
respectively. Raspberry Pi 3 has only a 33% higher clock speed than Raspberry Pi 2,
but has roughly double the performance: the extra performance is partly a result of
Cortex-A53’s ability to dual-issue a broader range of instructions than Cortex-A7.
t = a+b
u = c+d
v = e+f
x = h+i
w = v+g
y = j+k
t, u = a+b, c+d
v, x = e+f, h+i
w, y = v+g, j+k
Our example above is a straight-line piece of code. Real programs aren’t like this of
course: they also contain both forward branches (used to implement conditional
operations like if statements), and backward branches (used to implement
loops). Branches may be unconditional (always taken), or conditional (taken or not,
depending on a computed value).
Modern branch predictors are extremely sophisticated, and can generate very
accurate predictions. Raspberry Pi 3’s extra performance is partly a result of
improvements in branch prediction between Cortex-A7 and Cortex-A53. However,
by executing a crafted series of branches, an attacker can mis-train a branch
predictor to make poor predictions.
What is speculation?
Reordering sequential instructions is a powerful way to recover more instruction-
level parallelism, but as processors become wider (able to triple- or quadruple-issue
instructions) it becomes harder to keep all those pipes busy. Modern processors
have therefore grown the ability to speculate. Speculative execution lets us issue
instructions which might turn out not to be required (because they may be
branched over): this keeps a pipe busy (use it or lose it!), and if it turns out that the
instruction isn’t executed, we can just throw the result away.
t = a+b
u = t+c
v = u+d
if v:
w = e+f
x = w+g
y = x+h
If the branch predictor indicates that the body of the if statement is likely to
execute, speculation effectively shu es the program like this:
t = a+b
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
u = t+c
v = u+d
w_ = e+f
x_ = w_+g
y_ = x_+h
if v:
w, x, y = w_, x_, y_
so we now have additional instruction level parallelism to keep our pipes busy:
t, w_ = a+b, e+f
u, x_ = t+c, w_+g
v, y_ = u+d, x_+h
if v:
w, x, y = w_, x_, y_
What is a cache?
In the good old days*, the speed of processors was well matched with the speed of
memory access. My BBC Micro, with its 2MHz 6502, could execute an instruction
roughly every 2µs (microseconds), and had a memory cycle time of 0.25µs. Over
the ensuing 35 years, processors have become very much faster, but memory only
modestly so: a single Cortex-A53 in a Raspberry Pi 3 can execute an instruction
roughly every 0.5ns (nanoseconds), but can take up to 100ns to access main
memory.
At rst glance, this sounds like a disaster: every time we access memory, we’ll end
up waiting for 100ns to get the result back. In this case, this example:
a = mem[0]
b = mem[1]
A cache is a small on-chip memory, close to the processor, which stores copies of
the contents of recently used locations (and their neighbours), so that they are
quickly available on subsequent accesses. With caching, the example above will
execute in a little over 100ns:
From the point of view of Spectre and Meltdown, the important point is that if you
can time how long a memory access takes, you can determine whether the
address you accessed was in the cache (short time) or not (long time).
Spectre and Meltdown are side-channel attacks which deduce the contents of a
memory location which should not normally be accessible by using timing to
observe whether another, accessible, location is present in the cache.
t = a+b
u = t+c
v = u+d
if v:
w = kern_mem[address] # if we get here, fault
x = w&0x100
y = user_mem[x]
Now, provided we can train the branch predictor to believe that v is likely to be
non-zero, our out-of-order two-way superscalar processor shu es the program like
this:
t, w_ = a+b, kern_mem[address]
u, x_ = t+c, w_&0x100
v, y_ = u+d, user_mem[x_]
if v:
# fault
w, x, y = w_, x_, y_ # we never get here
Even though the processor always speculatively reads from the kernel address, it
must defer the resulting fault until it knows that v was non-zero. On the face of it,
this feels safe because either:
However, suppose we ush our cache before executing the code, and arrange a ,
b , c , and d so that v is actually zero. Now, the speculative read in the third
cycle:
v, y_ = u+d, user_mem[x_]
will access either userland address 0x000 or address 0x100 depending on the
eighth bit of the result of the illegal read, loading that address and its neighbours
into the cache. Because v is zero, the results of the speculative instructions will
be discarded, and execution will continue. If we time a subsequent access to one of
those addresses, we can determine which address is in the cache. Congratulations:
you’ve just read a single bit from the kernel’s address space!
The real Meltdown exploit is substantially more complex than this (notably, to avoid
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
having to mis-train the branch predictor, the authors prefer to execute the illegal
read unconditionally and handle the resulting exception), but the principle is the
same. Spectre uses a similar approach to subvert software array bounds checks.
Conclusion
Modern processors go to great lengths to preserve the abstraction that they are in-
order scalar machines that access memory directly, while in fact using a host of
techniques including caching, instruction reordering, and speculation to deliver
much higher performance than a simple processor could hope to achieve.
Meltdown and Spectre are examples of what happens when we reason about
security in the context of that abstraction, and then encounter minor discrepancies
between the abstraction and reality.
The lack of speculation in the ARM1176, Cortex-A7, and Cortex-A53 cores used in
Raspberry Pi render us immune to attacks of the sort.
meltdown spectre
89 comments
Reply
Having read this, I feel smarter. Didn`t really understand it, but I feel smarter.
Reply
Thanks, great explanation and I feel con dent Raspberry Pi’s are tough little computers.
If I am using Raspbian x86 desktop on Oracle Virtual Box, does the Virtual CPU have the
exploit or is it protected by the Host OS. I am assuming the host hardware may have the
exploit patched.
Reply
Reply
Reply
Niall says: 6th Jan 2018 at 2:22 pm
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
If you’re using an emulated CPU, I imagine you’re safe — after all, implementing
complicated parallelism in software will only serve to slow the program down, and
hardware parallelism is intended to make software run faster.
HOWEVER…
I don’t believe Virtualbox does any processor emulation at all — it simply mediates
between the host operating system and the guest environments, but passes through
x86 commands to the host x86 processor.
The big issue with Spectre and Meltdown is that they can actually break out of a virtual
machine and access system memory for the host.
Reply
That is an *excellent* summary of how Spectre and Meltdown work. (The fact that the Pi
is immune is just a bonus).
Reply
Wow. Silver lining related to Meltdown and Spectre: I’m learning a whole lot more about
how processors work.
Reply
Eben, could you please clarify how this A53 CPU feature is different from the one that is
exploitable by Spectre?
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500g/CHDGIAHH.html
Reply
Why don’t speculative instruction (and data) fetches introduce a vulnerability? Because
unlike speculative execution they don’t lead to a separation between a read instruction
and the process (whether a hardware page fault or a software bounds check) that
determines whether that read instruction is allowed.
Reply
Reply
+1
A well written and easy to understand introduction to some aspects of modern CPU
design (add some more about instruction fusion and the crucial register renaming) and
it should be permanently published in the education section.
Reply
BLOGJohnson says: DOWNLOADS
Shane COMMUNITY
5th Jan 2018 at 6:08 pm HELP FORUMS EDUCATION
Reply
I’m still a little unclear on how knowing what memory address is in processor cache get
us actual data from memory.
Reply
Imagine the value at the kernel address, which gets loaded into _w, was 0xabde3167.
Then the value of _x is 0x100, and address user_mem[0x100] will end up in the cache. A
subsequent load of user_mem[0x100] will be fast.
Now imagine the value at the kernel address, which gets loaded into _w, was
0xabde3067 . Then the value of _x is 0x000, and address user_mem[0x000] will end up in
the cache. A subsequent load of user_mem[0x100] will be slow.
So we can use the speed of a read from user_mem[0x100] to discriminate between the
two options. Information has leaked, via a side channel, from kernel to user.
Reply
I still don’t get the *depending on the eighth bit of the result of the illegal read* & *you’ve
just read a single bit from the kernel’s address space* part of this and other articles.
Why 8th bit ? Is that the privilege bit in L1$ ? How does this process leak just 1 bit and
not a byte/word/etc ?
Reply
The “8th bit” comes from the x_ = w_&0x100 instruction. This is a mask-instruction:
– if the 8th bit in w_ is 1, then x_ = 100.
– if the 8th bit in w_ is 0, then x_ = 000.
The subsequent read of user_mem[x_] causes either address 100 or address 000 to be
brought into the cache, depending on whether the 8th bit in w_ is 1 or 0. By reading
address 100 again and measuring how long it takes, you can determine whether 100 or
000 was brought into the cache.
Reply
In this particular example, you know whether the eighth bit of a particular kernel address
is 1 or 0. You can use the exact same principle to leak any other bit of an address, so
you can do this eight times with different operands to & to get an entire byte. Do it
another eight times and you can read the entire byte at the next address, and so on. It’s
slow, but you can eventually read out the entire kernel address space that way, which
would potentially allow you to compromise the operating system.
Reply
It might be easier to see Eben’s example in binary instead of hex (and we’ll use an 16 bit
architecture to make it easier to see):
Reply
In `w&0x100`
Reply
@Pete: The Raspberry Pi computer itself is invulnerable to the bug. The Intel or AMD-
based computer you are running Raspbian on is not.
So if you are running Raspbian x86 you will need to install patches in both your Raspbian
guest operating system and your host operating system to be safe from the
vulnerability.
Reply
Reply
It’s not wrong at all – he’s responding to the comment that speci cally asked about
virtualized Raspbian – aka Raspbian running on x86.
Reply
Eben .. thanks so much for this … I read through it once but will reread to hopefully
understand it better .. It is nice to get education instead of hysteria! If you were willing to
pay with performance to get security could you simply turn off specualtion? Is that what
the news was referring to when they say the x will cause a 30% degradation in
performance?
Thanks again!
-don
Reply
> If you were willing to pay with performance to get security could you simply turn off
specualtion? Is that what the news was referring to when they say the x will cause a
30% degradation in performance?
From what I understand (so I could be wrong!), the 30% degradation comes from
additional checks added at the operating system level to make sure there are no security
leaks. This particularly surrounds programs that read and write a lot of les to the disk.
This context-switching (from the program to the operating system and back again) is
computationally expensive, so modern processors have–at a very low level–blended the
two contexts. From what I’ve gathered, the “ x” for this is to have the OS perform extra
checks to make sure no cached data is being leaked. For some programs, it’s a
negligible difference (Apple is claiming no noticeable difference for most of their
customers); other programs like databases, however, will probably see all of that 30%
drop.
Reply
Turning off “speculation” is not possible in software. Maybe intel could implement that in
microcode and issue an update, but that is an even far more complicated discussion.
The performance hit comes from the Linux kernel mapping an unmapping the kernel.
Currently, process memory is divided in two: the low half is process space (unique to
each process), the upper half is kernel space (shared with all processes). The processor
is supposed to protect the kernel memory, but these hardware bugs break that
protection.
The x is therefor to map the kernel space on entry to a kernel call and unmap it upon
return to the process. This can be a time consuming set of operations.
Reply
One almost wishes that they’d stuck with the original name for the KPTI patchset:
Forcefully Unmap Complete Kernel With Interrupt Trampolines.
https://www.theregister.co.uk/2018/01/02/intel_cpu_design_ aw/
Reply
That is hilarious!!!!
I believe the performance degradation projections (which are on the order of 5% for
most real benchmarks) are based on the cost of adding Kernel Page Table Isolation to
the Linux kernel.
Reply
This is the best “tutorial” I have seen on this subject. The side effect of this attack has
been a better awareness of modern processor architecture. It is unfortunate that this
had to happen to get folks to draw back the curtain on this, instead of keeping the
pretense of everything being scalar and in order. It does matter in many more instances
than people think.
Reply
Thanks for this. Even though you mention that the real exploit is more complex, this
gives the context. I feel like I could read more and more on this topic.
Happy New Year*
Reply
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
Reply
And how can you reliably “time a subsequent access to one of those addresses”?
Reply
You really need be down at the machine-language level to manipulate this (and to be
able to do unchecked pointed arithmetic).
As I’m not particularly au fait with Intel high-performance timing. If you’re in ring 0 (the
kernel) you can probably use the performance counters:
https://www.blackhat.com/docs/us-15/materials/us-15-Herath-These-Are-Not-Your-
Grand-Daddys-CPU-Performance-Counters-CPU-Hardware-Performance-Counters-For-
Security.pdf
I can imagine that in userland you may need to loop the attack to get enough signal-to-
noise. There’s some discussion in the Spectre paper:
https://spectreattack.com/spectre.pdf
Reply
That said, I’m sure Eben’s right about needing to be closer to machine code. The
overhead of the CPython interpreter and the GC are probably su cient to make it either
outright impossible, or at least extremely di cult, to implement in pure Python (i.e.
without resorting to some externally compiled module).
Reply
Thank you for the incredible post! I understand so much more now.
Reply
This is fantastically friendly and clear. Thanks so much for the accessible explanation!
I’m much less confused than I was before.
Reply
BLOG
Rohit says: DOWNLOADS COMMUNITY
5th Jan 2018 at 7:21 pm HELP FORUMS EDUCATION
Thanks for the wonderfully explained post, Eden. You’ve explained a complex concept in
a really simple manner.
Reply
Reply
I defected to the Amiga: a shop-soiled A600, for £200 just after Christmas 1992. Couldn’t
afford an Archimedes, though I drooled over the single unit my school could afford.
Feels good to hold the record for shipping the largest number of units of Archimedes-
compatible hardware.
Reply
For Christ sake, why should an userspace program ever “ ush the cache”?
Reply
Are you asking why a userspace program should be allowed access to a cache ush
primitive?
Reply
This might be good except for the fact that arm themselves started that those devices
are effected by the “bug” if one could really call it that.
Reply
[citation needed]
Reply
Arm’s statement lists the processors affected, which don’t include those used in
Raspberry Pis. As that statement says, “[o]nly affected cores are listed, all other Arm
cores are NOT affected“.
Reply
You are a mind reader! I was thinking about this problem earlier and I was about to ask it
on the forums then this long and helpful post pops up.
Reply
Perhaps we should look at far simpler CPU designs more seriously as they say,
“complexity kills”. SUBLEQ anyone? :-)
All this talk of scalar vs superscalar takes me back to the day I got my 68060 (a
superscalar CPU) expansion board for my Amiga 1200 and overclocked it from 50Mhz
to 66Mhz by simply soldering on a different clock crystal! :-)
PS: Love the RasberryPi, it’s really put the fun back into computing, keep up the great
work!
Reply
I was a 68000 junkie for three years in the early 90s. Beautiful architecture: in a more
just world it, or its descendants, would have won out.
Reply
I learnt assembly on a 68000 (Amiga) in the early 90’s, imagine my horror when I moved
to x86! :-)
Reply
Sadly I do not need to use my imagination, having followed a similar road myself.
Reply
The Cortex A53 boasts an “Advanced Branch Predictor” which I assumed to mean it
supports speculative execution. If the processor isn’t using the branch prediction to pre-
execute instructions is it using it for instruction re-ordering? What’s the point of branch
prediction without speculative execution of the predicted branch?
Reply
A branch predictor, and branch target buffer, are useful even without speculative
execution because they give you a hint about which instructions to admit to the pipeline
next while you wait for the branch condition to resolve.
Cortex-A53 isn’t capable of “real” speculative execution because it can’t stash the results
of instructions which are started speculatively. This means that the pipeline bogs down
quite fast if resolution of the branch condition is signi cantly delayed, and critically the
chained dependent memory accesses that both attacks rely on to modify cache state
can never happen.
Perhaps I do need to write about register renaming: I’d been hoping to avoid that.
Reply
Then why do Cortex-A53 and Cortex-A7 implement PMU event 0x10? It counts the
number of “mispredicted or not predicted branches speculatively executed”. I doubt
ARM implemented it to always return zero.
Reply
BLOGDan Huby says: DOWNLOADS COMMUNITY HELP
5th Jan 2018 at 9:47 pm FORUMS EDUCATION
I read the CPU technical manual and the only section on branch prediction I could nd
referred to preemptively loading the set of instructions in the branch, but not executing
them.
Reply
Reply
Reply
The branch predictor’s job is to keep the instruction pipelines in an in-order core full by
guessing the most likely instruction ow after a branch instruction. It does this by
storing and comparing the results of previous branch instructions and by using certain
architectural hints, like predicting a forwards branch to be not-taken and a backwards
branch to be taken.
The branch predictor in an in-order core only affects the instruction cache, by predicting
and speculatively fetching what instructions need to be in the Icache ahead of time. The
vast majority of modern processors (ARM1176 included) have split instruction and data
caches at the innermost level, so a data cache timing attack will not reveal anything
about the direction the branch predictor took. Additionally, fooling a branch predictor into
speculatively fetching something that is not an instruction will not work – page table
structures have dedicated bits that specify whether a particular memory page contains
instructions or data (see the NX bit for x86), and fetching instructions from data pages
will almost certainly result in an access violation.
Reply
Reply
I hope Pi that will be released in 2019 (speculation?), also continue using ARM A-53 :)
Reply
:)
Reply
BLOG says:
solar3000 DOWNLOADS COMMUNITY
5th Jan 2018 at 9:54 pm HELP FORUMS EDUCATION
Reply
Woah. I understood that and was able to follow it to the end of the article!
Reply
From what I read, AMD seems to reject that their chips are affected by Meltdown. Does
this mean that AMD chips don’t implement speculative execution? Can’t imagine that
however..
Reply
It’s perfectly possible to implement an out-of-order core with speculation that isn’t
vulnerable to Meltdown. For example, of ARM’s out-of-order cores, only Cortex-A75 is
vulnerable. Intel cores are vulnerable because of a design choice not to prevent
speculative loads from illegal addresses, but instead to rely on a delayed fault (or
instruction non-retire) to suppress the result.
Reply
Ah! This is exactly what I was wondering about while reading the article. (“But why is the
illegal fetch allowed at all in the rst place?”) It seems to me like a reasonable thing to do
to fault if someone has written code with an illegal instruction *even if in practice the
branch with that instruction is never o cially executed*.
Reply
You can’t fault just because a speculative instruction is invalid. Think of this simple
pattern that’s used everywhere in C/C++ code:
if (pointer != NULL)
pointer->data = value;
Check if you have a valid memory address, and if so, do something with it. If you throw a
fault based on speculative instructions, you’ll be faulting constantly on code like that.
(Implementation details: NULL is zero. Memory addresses at or close to zero are always
marked invalid in the page table, and trigger a fault when accessed. This is done so that
a lot of bad code will crash immediately instead of writing garbage over real data.)
Reply
Okay, so:
> Intel cores are vulnerable because of a design choice not to prevent speculative loads
from illegal addresses
If the other design choice had been taken, what would it have looked like?
SpeculaArrg says: 6th Jan 2018 at 1:37 pm
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
Reply
Reply
You have an if that will equate to false that tries to read from kernel memory. (if you did
read this memory, it’ll raise an exception)
You ensure the cache is ushed so that when the CPU speculatively executes the read
from kernel memory the value will be in the cache.
The if is then checked and is found to be false and so an exception is not red as the
CPU pretends it was not executed.
Now the memory read from the kernel is in the cache (because of the speculative
execution) and is in the same place that our user space memory would have been
because of how the cache is aliased against the whole address space. And this is what
allows you to read it????
Reply
It’s a bit more complex than that. The memory from the kernel is not loaded into the
cache. However, a section of (legal) user memory is loaded into the cache whose
address is based in part on a tiny piece of the (illegal) kernel memory, in an operation
that o cially never happened but whose cache fetch has been left as a side-effect. By
attempting to read that legal memory in a subsequent legal operation, and timing how
long it takes, you can reason backwards to what that tiny piece of kernel memory was
that you were never supposed to have been able to access. You can’t read it directly, but
you can infer its value from the side-effects of the phantom operation (the speculative
fetch).
Reply
Nearly, you compute a memory address based on a single bit in the hidden value and
then access that memory address, all within the branch that will ultimately be thrown
away. However, you can still determine the hidden value by timing how long it then takes
to access that memory address, because if it’s super fast, then you know it it must be in
the CPUs cache and not main memory (as it has been used before in the branch that got
trhown away). Well done, you’ve just discivered a single bit of memory that your were
never meant to see… now repeat for the rest… :-)
Reply
You do not have to directly read data from within the cache. Your cachche was ushed,
so either array[0] or array[4] are not cached. Then, afther execution, read access of one
of these two values with timing will leak if is your particular value cached or not. If
present or not could be tell apart by delay, short time mead data are cached, opossite
whe readed directly from memory … what is exactly one bit of information ;)
Reply
S. Rose says: 6th Jan 2018 at 1:53 am
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
This is extremely well-written, and if a reader has the patience to read it through
carefully, that reader comes away with an understanding not just of how a kernel-
reading exploit could be constructed, but also twenty years of advances in CPU design. I
am in awe, and reassured to have people like Eben Upton on the users’ side.
Reply
:)
Reply
Reply
What is not discussed is that you have to have a program running that is doing these
timings. That, alone, would skew the numbers as control is taken from one thread and
given to another. Or… If the pipeline is stuck waiting on something, where is this program
going to run? I guess it could run on an additional core. Then it would have to be running
really tight code to get these timings. In fact, it seems like the program would have to
run faster than the cycle time of the core to be able to watch what is happening (timing
wise) in another core or memory or cache or whatever it is exactly watching. This seems
on the face of it to be impossible since you have to run faster than what you are timing
for timing to be usable. Am I missing something here or was it just left out. So far, all of
this seems theoretical. It seems like you would need another, faster processor to time
the decision processes of the other, slower processor. How can all this actually run on
the same CPU, even with multiple cores? Maybe the timing program can run faster after
its memory is all in cache. But, then, it has to collect and eventually send this data out so
it is subject to the same speed restrictions on the internal buss(es) as the program it is
watching. Seems all very theoretical and not particularly practical. Where is this wrong?
My speculations must be wrong if this can actually be done.
Reply
This is a great article in explaining the processor issues and the operation of the current
software xes. I’m terms of future processor architecture design, how easy will it be to
design this out for ARM and Intel, and will it be possible to do so without suffering a
signi cant performance hit in future processor designs ? Are there any designs in the
pipeline that take a different approach to speculation and parallel pipelines from the
current generation of processor architectures ?
Reply
And all these years, ARM has been ignored by major players. At least this should make
people thinking about wider adoption of ARM processors.
Reply
BLOG
Terry Coles says: DOWNLOADS COMMUNITY
6th Jan 2018 at 9:58 am HELP FORUMS EDUCATION
Eben,
Long before I retired, I worked for some years as a Technical Author; an experience that
makes me super-critical of so-called technical journalists and authors who don’t really
understand their topic.
I have to say that this is the best bit of technical journalism that I’ve read for years. After
technical authorship I worked as an engineer in the test industry, but 30 years of that did
not equip me to understand the intricacies of CPU architecture. Your posting has
impressed me most because it doesn’t assume that reader knows anything except a
slight grounding in electronics engineering and computer science, but to me anyway, is
incredibly readable.
Thanks for this and for everything else that you’ve done for education.
Reply
Thanks Terry – that means a lot. These days I don’t often get a continuous block of time
required to write this sort of thing, but this felt worth spending a day on. I ran out of time
before getting to the detail of Spectre, but I’ve started adding some relevant material
(e.g. branch prediction) to the post, and hope to get to it this week.
Reply
Reply
Thanks for explaining this complex matter in an easy to read and understandable way
for everybody.
Reply
Reply
Hi Eben. A really helpful and informative post. It made way more sense than the urry of
pseudo-tech commentaries I’d been reading before today. Thanks for the update and
pleased the Pi is spared.
Geoff
Reply
Nice article that triggered an equally nice discussion in comments. Sad to see Wikipedia
description incorrectly pin side channel attacks to crypto systems when in fact such
attacks are widely used and not unique to crypto systems. Thanks for taking time to
BLOG
write, and for providing a usefulDOWNLOADS COMMUNITY
product to the general public. HELP FORUMS EDUCATION
Reply
Reading the white papers, the only (known) way to deploy the Spectre attack would be to
have a kernel with the Berkley Packet Filter JIT compiled in, which is in Ring 0. What if it’s
more of a aw in the BPF or gcc toolchain, and not necessarily a aw with any particular
processor?
Have we seen it deployed as anything other than taking advantage of any other
methods?
Either way, the x on the arm white paper for the issue isn’t computationally expensive.
Reply
Reply
This excellent explanation is a greta primer on how (most) modern microprocessors and
compilers attempt to maximize performance, as well as the clearest explanation of the
fundamentals of Meltdown and Spectre. Great job!
Reply
LEAVE A REPLY
Your email address will not be published. Required elds are marked *
Comment
Name *
Email *
Website
Post Comment
RASPBERRY PI FOUNDATION
UK REGISTERED CHARITY 1129409 Cookies Trademark rules and brand guidelines