Você está na página 1de 20

BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION

Posted by Eben Upton


WHY RASPBERRY PI ISN’T VULNERABLE TO Raspberry Pi Founder

SPECTRE OR MELTDOWN
5th Jan 2018 at 5:01 pm

89 Comments
« THE RASPBERRY PI PISERVER TOOL

WHY RASPBERRY PI ISN’T VULNERABLE TO


Over the last couple of days, there has been a lot of discussion about a pair of SPECTRE OR MELTDOWN

security vulnerabilities nicknamed Spectre and Meltdown. These affect all modern BLOG FEED
Intel processors, and (in the case of Spectre) many AMD processors and ARM
VIEW THE ARCHIVE
cores. Spectre allows an attacker to bypass software checks to read data from
arbitrary locations in the current address space; Meltdown allows an attacker to RSS FEED

read data from arbitrary locations in the operating system kernel’s address space
(which should normally be inaccessible to user programs).

Both vulnerabilities exploit performance features (caching and speculative


execution) common to many modern processors to leak data via a so-called side-
channel attack. Happily, the Raspberry Pi isn’t susceptible to these vulnerabilities,
because of the particular ARM cores that we use. RASPBERRY PI WEEKLY

To help us understand why, here’s a little primer on some concepts in modern


processor design. We’ll illustrate these concepts using simple programs in Python
SIGN UP NOW
syntax like this one:

You might also like...


t = a+b
u = c+d
v = e+f
w = v+g
x = h+i
y = j+k

While the processor in your computer doesn’t execute Python directly, the
RASPBIAN
statements here are simple enough that they roughly correspond to a single
machine instruction. We’re going to gloss over some details (notably pipelining and
register renaming) which are very important to processor designers, but which
aren’t necessary to understand how Spectre and Meltdown work.

For a comprehensive description of processor design, and other aspects of


modern computer architecture, you can’t do better than Hennessy and Patterson’s
classic Computer Architecture: A Quantitative Approach.

STRETCH FOR PCS AND MACS, AND


What is a scalar processor? A RASPBIAN UPDATE

The simplest sort of modern processor executes one instruction per cycle; we call
this a scalar processor. Our example above will execute in six cycles on a scalar
processor.

Examples of scalar processors include the Intel 486 and the ARM1176 core used in
Raspberry Pi 1 and Raspberry Pi Zero.

A RASPBIAN DESKTOP UPDATE


What is a superscalar processor? WITH SOME NEW PROGRAMMING
TOOLS

The obvious way to make a scalar processor (or indeed any processor) run faster is
to increase its clock speed. However, we soon reach limits of how fast the logic
gates inside the processor can be made to run; processor designers therefore
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
began to look for ways to do several things at once.

An in-order superscalar processor examines the incoming stream of instructions


and tries execute more than one at once, in one of several pipelines (pipes for
short), subject to dependencies between the instructions. Dependencies are
important: you might think that a two-way superscalar processor could just pair up
(or dual-issue) the six instructions in our example like this:

t, u = a+b, c+d
v, w = e+f, v+g
x, y = h+i, j+k

But this doesn’t make sense: we have to compute v before we can compute w ,
so the third and fourth instructions can’t be executed at the same time. Our two-
way superscalar processor won’t actually be able to nd anything to pair with the
third instruction, so our example will execute in four cycles:

t, u = a+b, c+d
v = e+f # second pipe does nothing here
w, x = v+g, h+i
y = j+k

Examples of superscalar processors include the Intel Pentium, and the ARM
Cortex-A7 and Cortex-A53 cores used in Raspberry Pi 2 and Raspberry Pi 3
respectively. Raspberry Pi 3 has only a 33% higher clock speed than Raspberry Pi 2,
but has roughly double the performance: the extra performance is partly a result of
Cortex-A53’s ability to dual-issue a broader range of instructions than Cortex-A7.

What is an out-of-order processor?


Going back to our example, we can see that, although we have a dependency
between v and w , we have other independent instructions later in the program
that we could potentially have used to ll the empty pipe during the second cycle.
An out-of-order superscalar processor has the ability to shu e the order of
incoming instructions (again subject to dependencies) in order to keep its pipes
busy.

An out-of-order processor might effectively swap the de nitions of w and x in


our example like this:

t = a+b
u = c+d
v = e+f
x = h+i
w = v+g
y = j+k

allowing it to execute in three cycles:

t, u = a+b, c+d
v, x = e+f, h+i
w, y = v+g, j+k

Examples of out-of-order processors include the Intel Pentium 2 (and most


subsequent Intel and AMD x86 processors with the exception of some Atom and
Quark devices), and many recent ARM cores, including Cortex-A9, -A15, -A17, and -
A57.
What BLOG
is a branch predictor?
DOWNLOADS COMMUNITY HELP FORUMS EDUCATION

Our example above is a straight-line piece of code. Real programs aren’t like this of
course: they also contain both forward branches (used to implement conditional
operations like if statements), and backward branches (used to implement
loops). Branches may be unconditional (always taken), or conditional (taken or not,
depending on a computed value).

While fetching instructions, a processor may encounter a conditional branch which


depends on a value which has yet to be computed. To avoid a stall, it must guess
which instruction to fetch next: the next one in memory order (corresponding to an
untaken branch), or the one at the branch target (corresponding to a taken branch).
A branch predictor helps the processor make an intelligent guess about whether a
branch will be taken or not. It does this by gathering statistics about how often
particular branches have been taken in the past.

Modern branch predictors are extremely sophisticated, and can generate very
accurate predictions. Raspberry Pi 3’s extra performance is partly a result of
improvements in branch prediction between Cortex-A7 and Cortex-A53. However,
by executing a crafted series of branches, an attacker can mis-train a branch
predictor to make poor predictions.

What is speculation?
Reordering sequential instructions is a powerful way to recover more instruction-
level parallelism, but as processors become wider (able to triple- or quadruple-issue
instructions) it becomes harder to keep all those pipes busy. Modern processors
have therefore grown the ability to speculate. Speculative execution lets us issue
instructions which might turn out not to be required (because they may be
branched over): this keeps a pipe busy (use it or lose it!), and if it turns out that the
instruction isn’t executed, we can just throw the result away.

Speculatively executing unnecessary instructions (and the infrastructure required


to support speculation and reordering) consumes extra energy, but in many cases
this is considered a worthwhile tradeoff to obtain extra single-threaded
performance. The branch predictor is used to choose the most likely path through
the program, maximising the chance that the speculation will pay off.

To demonstrate the bene ts of speculation, let’s look at another example:

t = a+b
u = t+c
v = u+d
if v:
w = e+f
x = w+g
y = x+h

Now we have dependencies from t to u to v , and from w to x to y , so


a two-way out-of-order processor without speculation won’t ever be able to ll its
second pipe. It spends three cycles computing t , u , and v , after which it
knows whether the body of the if statement will execute, in which case it then
spends three cycles computing w , x , and y . Assuming the if
(implemented by a branch instruction) takes one cycle, our example takes either
four cycles (if v turns out to be zero) or seven cycles (if v is non-zero).

If the branch predictor indicates that the body of the if statement is likely to
execute, speculation effectively shu es the program like this:
t = a+b
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
u = t+c
v = u+d
w_ = e+f
x_ = w_+g
y_ = x_+h
if v:
w, x, y = w_, x_, y_

so we now have additional instruction level parallelism to keep our pipes busy:

t, w_ = a+b, e+f
u, x_ = t+c, w_+g
v, y_ = u+d, x_+h
if v:
w, x, y = w_, x_, y_

Cycle counting becomes less well de ned in speculative out-of-order processors,


but the branch and conditional update of w , x , and y are (approximately)
free, so our example executes in (approximately) three cycles.

What is a cache?
In the good old days*, the speed of processors was well matched with the speed of
memory access. My BBC Micro, with its 2MHz 6502, could execute an instruction
roughly every 2µs (microseconds), and had a memory cycle time of 0.25µs. Over
the ensuing 35 years, processors have become very much faster, but memory only
modestly so: a single Cortex-A53 in a Raspberry Pi 3 can execute an instruction
roughly every 0.5ns (nanoseconds), but can take up to 100ns to access main
memory.

At rst glance, this sounds like a disaster: every time we access memory, we’ll end
up waiting for 100ns to get the result back. In this case, this example:

a = mem[0]
b = mem[1]

would take 200ns.

However, in practice, programs tend to access memory in relatively predictable


ways, exhibiting both temporal locality (if I access a location, I’m likely to access it
again soon) and spatial locality (if I access a location, I’m likely to access a nearby
location soon). Caching takes advantage of these properties to reduce the average
cost of access to memory.

A cache is a small on-chip memory, close to the processor, which stores copies of
the contents of recently used locations (and their neighbours), so that they are
quickly available on subsequent accesses. With caching, the example above will
execute in a little over 100ns:

a = mem[0] # 100ns delay, copies mem[0:15] into cache


b = mem[1] # mem[1] is in the cache

From the point of view of Spectre and Meltdown, the important point is that if you
can time how long a memory access takes, you can determine whether the
address you accessed was in the cache (short time) or not (long time).

What is a side channel?


From Wikipedia:
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION

“… a side-channel attack is any attack based on information gained from the


physical implementation of a cryptosystem, rather than brute force or theoretical
weaknesses in the algorithms (compare cryptanalysis). For example, timing
information, power consumption, electromagnetic leaks or even sound can provide
an extra source of information, which can be exploited to break the system.”

Spectre and Meltdown are side-channel attacks which deduce the contents of a
memory location which should not normally be accessible by using timing to
observe whether another, accessible, location is present in the cache.

Putting it all together


Now let’s look at how speculation and caching combine to permit a Meltdown-like
attack on our processor. Consider the following example, which is a user program
that sometimes reads from an illegal (kernel) address, resulting in a fault (crash):

t = a+b
u = t+c
v = u+d
if v:
w = kern_mem[address] # if we get here, fault
x = w&0x100
y = user_mem[x]

Now, provided we can train the branch predictor to believe that v is likely to be
non-zero, our out-of-order two-way superscalar processor shu es the program like
this:

t, w_ = a+b, kern_mem[address]
u, x_ = t+c, w_&0x100
v, y_ = u+d, user_mem[x_]

if v:
# fault
w, x, y = w_, x_, y_ # we never get here

Even though the processor always speculatively reads from the kernel address, it
must defer the resulting fault until it knows that v was non-zero. On the face of it,
this feels safe because either:

v is zero, so the result of the illegal read isn’t committed to w


v is non-zero, but the fault occurs before the read is committed to w

However, suppose we ush our cache before executing the code, and arrange a ,
b , c , and d so that v is actually zero. Now, the speculative read in the third
cycle:

v, y_ = u+d, user_mem[x_]

will access either userland address 0x000 or address 0x100 depending on the
eighth bit of the result of the illegal read, loading that address and its neighbours
into the cache. Because v is zero, the results of the speculative instructions will
be discarded, and execution will continue. If we time a subsequent access to one of
those addresses, we can determine which address is in the cache. Congratulations:
you’ve just read a single bit from the kernel’s address space!
The real Meltdown exploit is substantially more complex than this (notably, to avoid
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION
having to mis-train the branch predictor, the authors prefer to execute the illegal
read unconditionally and handle the resulting exception), but the principle is the
same. Spectre uses a similar approach to subvert software array bounds checks.

Conclusion
Modern processors go to great lengths to preserve the abstraction that they are in-
order scalar machines that access memory directly, while in fact using a host of
techniques including caching, instruction reordering, and speculation to deliver
much higher performance than a simple processor could hope to achieve.
Meltdown and Spectre are examples of what happens when we reason about
security in the context of that abstraction, and then encounter minor discrepancies
between the abstraction and reality.

The lack of speculation in the ARM1176, Cortex-A7, and Cortex-A53 cores used in
Raspberry Pi render us immune to attacks of the sort.

* days may not be that old, or that good

meltdown spectre

89 comments

Tim says: 5th Jan 2018 at 5:14 pm

Great news, thank you!

Reply

Tobias Huebner says: 5th Jan 2018 at 5:36 pm

Having read this, I feel smarter. Didn`t really understand it, but I feel smarter.

Reply

Pete says: 5th Jan 2018 at 5:41 pm

Thanks, great explanation and I feel con dent Raspberry Pi’s are tough little computers.

If I am using Raspbian x86 desktop on Oracle Virtual Box, does the Virtual CPU have the
exploit or is it protected by the Host OS. I am assuming the host hardware may have the
exploit patched.

Reply

MW says: 5th Jan 2018 at 6:29 pm

That is not relevant to ARM CPU of Raspberry Pi

Reply

Joshua Barretto says: 5th Jan 2018 at 11:07 pm

A virtual Raspberry Pi is vulnerable because the problem is dependent on the underlying


chip. Besides, the emulator likely doesn’t emulate the RPi’s CPU down to that level of
detail: it doesn’t need to.

Reply
Niall says: 6th Jan 2018 at 2:22 pm
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION

If you’re using an emulated CPU, I imagine you’re safe — after all, implementing
complicated parallelism in software will only serve to slow the program down, and
hardware parallelism is intended to make software run faster.

HOWEVER…

I don’t believe Virtualbox does any processor emulation at all — it simply mediates
between the host operating system and the guest environments, but passes through
x86 commands to the host x86 processor.

The big issue with Spectre and Meltdown is that they can actually break out of a virtual
machine and access system memory for the host.

Reply

Martin Bonner says: 5th Jan 2018 at 5:51 pm

That is an *excellent* summary of how Spectre and Meltdown work. (The fact that the Pi
is immune is just a bonus).

Reply

Shelley Powers says: 5th Jan 2018 at 6:03 pm

Wow. Silver lining related to Meltdown and Spectre: I’m learning a whole lot more about
how processors work.

Reply

Antz says: 5th Jan 2018 at 6:05 pm

Eben, could you please clarify how this A53 CPU feature is different from the one that is
exploitable by Spectre?

http://infocenter.arm.com/help/topic/com.arm.doc.ddi0500g/CHDGIAHH.html

Reply

Eben Upton says: 5th Jan 2018 at 6:40 pm

That link refers to speculative fetches of instructions, as opposed to speculative


execution. The former is much more common than the latter, as without it the processor
will frequently stall waiting for instructions from memory, crippling performance.

Why don’t speculative instruction (and data) fetches introduce a vulnerability? Because
unlike speculative execution they don’t lead to a separation between a read instruction
and the process (whether a hardware page fault or a software bounds check) that
determines whether that read instruction is allowed.

Reply

Antz says: 5th Jan 2018 at 6:56 pm

Awesome, thank you!

Reply

Jeremy says: 5th Jan 2018 at 6:06 pm

+1

A well written and easy to understand introduction to some aspects of modern CPU
design (add some more about instruction fusion and the crucial register renaming) and
it should be permanently published in the education section.

Reply
BLOGJohnson says: DOWNLOADS
Shane COMMUNITY
5th Jan 2018 at 6:08 pm HELP FORUMS EDUCATION

Excellent write up Ebon. Thank you.

Reply

Jeff Schoby says: 5th Jan 2018 at 6:10 pm

I’m still a little unclear on how knowing what memory address is in processor cache get
us actual data from memory.

Reply

Eben Upton says: 5th Jan 2018 at 6:25 pm

Imagine the value at the kernel address, which gets loaded into _w, was 0xabde3167.
Then the value of _x is 0x100, and address user_mem[0x100] will end up in the cache. A
subsequent load of user_mem[0x100] will be fast.

Now imagine the value at the kernel address, which gets loaded into _w, was
0xabde3067 . Then the value of _x is 0x000, and address user_mem[0x000] will end up in
the cache. A subsequent load of user_mem[0x100] will be slow.

So we can use the speed of a read from user_mem[0x100] to discriminate between the
two options. Information has leaked, via a side channel, from kernel to user.

Reply

Sam says: 5th Jan 2018 at 7:41 pm

I still don’t get the *depending on the eighth bit of the result of the illegal read* & *you’ve
just read a single bit from the kernel’s address space* part of this and other articles.
Why 8th bit ? Is that the privilege bit in L1$ ? How does this process leak just 1 bit and
not a byte/word/etc ?

Reply

Martijn says: 5th Jan 2018 at 8:08 pm

The “8th bit” comes from the x_ = w_&0x100 instruction. This is a mask-instruction:
– if the 8th bit in w_ is 1, then x_ = 100.
– if the 8th bit in w_ is 0, then x_ = 000.

The subsequent read of user_mem[x_] causes either address 100 or address 000 to be
brought into the cache, depending on whether the 8th bit in w_ is 1 or 0. By reading
address 100 again and measuring how long it takes, you can determine whether 100 or
000 was brought into the cache.

Reply

Evan says: 5th Jan 2018 at 8:13 pm

In this particular example, you know whether the eighth bit of a particular kernel address
is 1 or 0. You can use the exact same principle to leak any other bit of an address, so
you can do this eight times with different operands to & to get an entire byte. Do it
another eight times and you can read the entire byte at the next address, and so on. It’s
slow, but you can eventually read out the entire kernel address space that way, which
would potentially allow you to compromise the operating system.

Reply

Naveen Michaud-Agrawal says: 5th Jan 2018 at 8:46 pm

It might be easier to see Eben’s example in binary instead of hex (and we’ll use an 16 bit
architecture to make it easier to see):

value in _w: 0x3167 in binary is 0b0011000101100111


0x0100 in binary is 0b000000100000000
So if you AND them together you get: 0b0000000100000000, which tells you the 8th bit.
BLOG DOWNLOADS
And then in COMMUNITY
subsequent code, you AND against other single bit values,HELP
thereby being FORUMS EDUCATION
able to read out arbitrary amounts of kernel memory.

Reply

Janghou says: 5th Jan 2018 at 7:55 pm

In `w&0x100`

0x100 stands for a byte literal


and & for a `bitwise and` operator:
w & 0x100

if I’m not mistaken.

Reply

Patches says: 5th Jan 2018 at 6:12 pm

@Pete: The Raspberry Pi computer itself is invulnerable to the bug. The Intel or AMD-
based computer you are running Raspbian on is not.

So if you are running Raspbian x86 you will need to install patches in both your Raspbian
guest operating system and your host operating system to be safe from the
vulnerability.

Reply

MW says: 5th Jan 2018 at 6:32 pm

That is so wrong Raspbian Operating System only runs on ARM CPU.

The x86 version is Debian x86-32….

This post is about the ARM CPU of the Raspberry Pis

Reply

Peter Dolkens says: 6th Jan 2018 at 2:56 am

It’s not wrong at all – he’s responding to the comment that speci cally asked about
virtualized Raspbian – aka Raspbian running on x86.

Reply

don isenstadt says: 5th Jan 2018 at 6:17 pm

Eben .. thanks so much for this … I read through it once but will reread to hopefully
understand it better .. It is nice to get education instead of hysteria! If you were willing to
pay with performance to get security could you simply turn off specualtion? Is that what
the news was referring to when they say the x will cause a 30% degradation in
performance?

Maybe we should have raspberry pi terminals communicating to IBM Z mainframes!

Thanks again!
-don

Reply

Evan Hildreth says: 5th Jan 2018 at 6:36 pm

> If you were willing to pay with performance to get security could you simply turn off
specualtion? Is that what the news was referring to when they say the x will cause a
30% degradation in performance?

From what I understand (so I could be wrong!), the 30% degradation comes from
additional checks added at the operating system level to make sure there are no security
leaks. This particularly surrounds programs that read and write a lot of les to the disk.

Normally, this works by:


– Program asks OS for le
– OS reads le into memory
–BLOG DOWNLOADS
Program reads le from memory COMMUNITY HELP FORUMS EDUCATION

This context-switching (from the program to the operating system and back again) is
computationally expensive, so modern processors have–at a very low level–blended the
two contexts. From what I’ve gathered, the “ x” for this is to have the OS perform extra
checks to make sure no cached data is being leaked. For some programs, it’s a
negligible difference (Apple is claiming no noticeable difference for most of their
customers); other programs like databases, however, will probably see all of that 30%
drop.

I hope this helps. I also hope this was right!

Reply

Mark Woodward says: 5th Jan 2018 at 7:57 pm

Turning off “speculation” is not possible in software. Maybe intel could implement that in
microcode and issue an update, but that is an even far more complicated discussion.

The performance hit comes from the Linux kernel mapping an unmapping the kernel.
Currently, process memory is divided in two: the low half is process space (unique to
each process), the upper half is kernel space (shared with all processes). The processor
is supposed to protect the kernel memory, but these hardware bugs break that
protection.

The x is therefor to map the kernel space on entry to a kernel call and unmap it upon
return to the process. This can be a time consuming set of operations.

Reply

Eben Upton says: 5th Jan 2018 at 8:16 pm

One almost wishes that they’d stuck with the original name for the KPTI patchset:
Forcefully Unmap Complete Kernel With Interrupt Trampolines.

https://www.theregister.co.uk/2018/01/02/intel_cpu_design_ aw/

Reply

Bill Stephenson says: 6th Jan 2018 at 2:30 am

That is hilarious!!!!

Eben Upton says: 5th Jan 2018 at 6:44 pm

I believe the performance degradation projections (which are on the order of 5% for
most real benchmarks) are based on the cost of adding Kernel Page Table Isolation to
the Linux kernel.

Disabling speculation, even if possible, would have a much larger impact.

Reply

Shannon says: 5th Jan 2018 at 6:20 pm

This is the best “tutorial” I have seen on this subject. The side effect of this attack has
been a better awareness of modern processor architecture. It is unfortunate that this
had to happen to get folks to draw back the curtain on this, instead of keeping the
pretense of everything being scalar and in order. It does matter in many more instances
than people think.

Thank you for the awesome lesson in processor technology.

Reply

Luyanda Gcabo says: 5th Jan 2018 at 6:25 pm

Thanks for this. Even though you mention that the real exploit is more complex, this
gives the context. I feel like I could read more and more on this topic.
Happy New Year*
Reply
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION

Eben Upton says: 5th Jan 2018 at 6:40 pm

Happy New Year to you too.

Reply

sty e says: 5th Jan 2018 at 7:04 pm

Thanks for the excellent explanation!

I wonder, can a python program actually exploit this bug?

And how can you reliably “time a subsequent access to one of those addresses”?

Reply

Eben Upton says: 5th Jan 2018 at 7:48 pm

You really need be down at the machine-language level to manipulate this (and to be
able to do unchecked pointed arithmetic).

As I’m not particularly au fait with Intel high-performance timing. If you’re in ring 0 (the
kernel) you can probably use the performance counters:

https://www.blackhat.com/docs/us-15/materials/us-15-Herath-These-Are-Not-Your-
Grand-Daddys-CPU-Performance-Counters-CPU-Hardware-Performance-Counters-For-
Security.pdf

I can imagine that in userland you may need to loop the attack to get enough signal-to-
noise. There’s some discussion in the Spectre paper:

https://spectreattack.com/spectre.pdf

about incrementing a counter on another thread to generate a su ciently accurate time


reference.

Reply

Dave Jones says: 5th Jan 2018 at 11:17 pm

As a point of interest Python 3.3+ does have time.perf_counter() which is meant to be


high resolution. Whether that actually queries HPET or not (on a PC) I can’t recall but the
info’s probably buried somewhere in PEP-418 (https://www.python.org/dev/peps/pep-
0418/). Also unchecked integer arithmetic is possible by abusing certain things (e.g.
ctypes).

That said, I’m sure Eben’s right about needing to be closer to machine code. The
overhead of the CPython interpreter and the GC are probably su cient to make it either
outright impossible, or at least extremely di cult, to implement in pure Python (i.e.
without resorting to some externally compiled module).

Reply

Caroline says: 5th Jan 2018 at 7:16 pm

Thank you for the incredible post! I understand so much more now.

Reply

Allison Reinheimer Moore says: 5th Jan 2018 at 7:18 pm

This is fantastically friendly and clear. Thanks so much for the accessible explanation!
I’m much less confused than I was before.

Reply
BLOG
Rohit says: DOWNLOADS COMMUNITY
5th Jan 2018 at 7:21 pm HELP FORUMS EDUCATION

Thanks for the wonderfully explained post, Eden. You’ve explained a complex concept in
a really simple manner.

Reply

Jamie says: 5th Jan 2018 at 7:23 pm

So did you have an Archimedes too, or did you defect to Amiga? =)

Reply

Eben Upton says: 5th Jan 2018 at 7:40 pm

I defected to the Amiga: a shop-soiled A600, for £200 just after Christmas 1992. Couldn’t
afford an Archimedes, though I drooled over the single unit my school could afford.

Feels good to hold the record for shipping the largest number of units of Archimedes-
compatible hardware.

Reply

z says: 5th Jan 2018 at 7:46 pm

For Christ sake, why should an userspace program ever “ ush the cache”?

Reply

Eben Upton says: 5th Jan 2018 at 7:49 pm

Are you asking why a userspace program should be allowed access to a cache ush
primitive?

Reply

James T. Carver says: 5th Jan 2018 at 8:04 pm

This might be good except for the fact that arm themselves started that those devices
are effected by the “bug” if one could really call it that.

Reply

Eben Upton says: 5th Jan 2018 at 8:12 pm

[citation needed]

Reply

Helen Lynn says: 5th Jan 2018 at 8:18 pm

Arm’s statement lists the processors affected, which don’t include those used in
Raspberry Pis. As that statement says, “[o]nly affected cores are listed, all other Arm
cores are NOT affected“.

Reply

Louis Parkerson says: 5th Jan 2018 at 8:10 pm

You are a mind reader! I was thinking about this problem earlier and I was about to ask it
on the forums then this long and helpful post pops up.

Reply

James Wright says: 5th Jan 2018 at 8:28 pm


Thank you for a fantastic explanation, it should be preserved somewhere for educational
BLOG
purposes! DOWNLOADS COMMUNITY HELP FORUMS EDUCATION

Perhaps we should look at far simpler CPU designs more seriously as they say,
“complexity kills”. SUBLEQ anyone? :-)

All this talk of scalar vs superscalar takes me back to the day I got my 68060 (a
superscalar CPU) expansion board for my Amiga 1200 and overclocked it from 50Mhz
to 66Mhz by simply soldering on a different clock crystal! :-)

PS: Love the RasberryPi, it’s really put the fun back into computing, keep up the great
work!

Reply

Eben Upton says: 5th Jan 2018 at 9:47 pm

I was a 68000 junkie for three years in the early 90s. Beautiful architecture: in a more
just world it, or its descendants, would have won out.

Reply

James Wright says: 5th Jan 2018 at 9:57 pm

I learnt assembly on a 68000 (Amiga) in the early 90’s, imagine my horror when I moved
to x86! :-)

Reply

Eben Upton says: 5th Jan 2018 at 11:29 pm

Sadly I do not need to use my imagination, having followed a similar road myself.

Reply

Travis Johnson says: 5th Jan 2018 at 8:42 pm

Thanks for the article!

The Cortex A53 boasts an “Advanced Branch Predictor” which I assumed to mean it
supports speculative execution. If the processor isn’t using the branch prediction to pre-
execute instructions is it using it for instruction re-ordering? What’s the point of branch
prediction without speculative execution of the predicted branch?

Reply

Eben Upton says: 5th Jan 2018 at 9:45 pm

A branch predictor, and branch target buffer, are useful even without speculative
execution because they give you a hint about which instructions to admit to the pipeline
next while you wait for the branch condition to resolve.

Cortex-A53 isn’t capable of “real” speculative execution because it can’t stash the results
of instructions which are started speculatively. This means that the pipeline bogs down
quite fast if resolution of the branch condition is signi cantly delayed, and critically the
chained dependent memory accesses that both attacks rely on to modify cache state
can never happen.

Perhaps I do need to write about register renaming: I’d been hoping to avoid that.

Reply

Daniel says: 5th Jan 2018 at 11:06 pm

Then why do Cortex-A53 and Cortex-A7 implement PMU event 0x10? It counts the
number of “mispredicted or not predicted branches speculatively executed”. I doubt
ARM implemented it to always return zero.

Reply
BLOGDan Huby says: DOWNLOADS COMMUNITY HELP
5th Jan 2018 at 9:47 pm FORUMS EDUCATION

I read the CPU technical manual and the only section on branch prediction I could nd
referred to preemptively loading the set of instructions in the branch, but not executing
them.

Reply

Dan Huby says: 5th Jan 2018 at 9:48 pm

Sorry, it looks like Eben replied while I was typing!

Reply

solar3000 says: 5th Jan 2018 at 9:55 pm

Eben doesn’t type. He has a pi glued to his brain via GPIO.

Reply

jdb says: 5th Jan 2018 at 9:55 pm

There’s an important difference between branch prediction and speculative execution.

Branch prediction guesses what *instructions* are likely to be executed next.


Speculative execution precomputes the *results* of the instructions on both sides of the
branch, before deciding the path that the branch took and discarding (retiring) the
results of the non-executed instructions.

The branch predictor’s job is to keep the instruction pipelines in an in-order core full by
guessing the most likely instruction ow after a branch instruction. It does this by
storing and comparing the results of previous branch instructions and by using certain
architectural hints, like predicting a forwards branch to be not-taken and a backwards
branch to be taken.

The branch predictor in an in-order core only affects the instruction cache, by predicting
and speculatively fetching what instructions need to be in the Icache ahead of time. The
vast majority of modern processors (ARM1176 included) have split instruction and data
caches at the innermost level, so a data cache timing attack will not reveal anything
about the direction the branch predictor took. Additionally, fooling a branch predictor into
speculatively fetching something that is not an instruction will not work – page table
structures have dedicated bits that specify whether a particular memory page contains
instructions or data (see the NX bit for x86), and fetching instructions from data pages
will almost certainly result in an access violation.

Reply

Travis Johnson says: 5th Jan 2018 at 10:21 pm

Thanks for the replies. Makes total sense.

Reply

Tomm says: 5th Jan 2018 at 9:28 pm

I hope Pi that will be released in 2019 (speculation?), also continue using ARM A-53 :)

Reply

Eben Upton says: 5th Jan 2018 at 9:45 pm

:)

Reply
BLOG says:
solar3000 DOWNLOADS COMMUNITY
5th Jan 2018 at 9:54 pm HELP FORUMS EDUCATION

Holy! Eben is awesome! Met him at a Maker Faire in NYC.


And Liz too!

Reply

Beckalooo says: 5th Jan 2018 at 9:59 pm

Woah. I understood that and was able to follow it to the end of the article!

Thanks, Eben, you are a ne writer.

Reply

LH says: 5th Jan 2018 at 11:01 pm

From what I read, AMD seems to reject that their chips are affected by Meltdown. Does
this mean that AMD chips don’t implement speculative execution? Can’t imagine that
however..

Reply

Eben Upton says: 5th Jan 2018 at 11:19 pm

It’s perfectly possible to implement an out-of-order core with speculation that isn’t
vulnerable to Meltdown. For example, of ARM’s out-of-order cores, only Cortex-A75 is
vulnerable. Intel cores are vulnerable because of a design choice not to prevent
speculative loads from illegal addresses, but instead to rely on a delayed fault (or
instruction non-retire) to suppress the result.

Reply

Kaitain says: 6th Jan 2018 at 12:11 am

Ah! This is exactly what I was wondering about while reading the article. (“But why is the
illegal fetch allowed at all in the rst place?”) It seems to me like a reasonable thing to do
to fault if someone has written code with an illegal instruction *even if in practice the
branch with that instruction is never o cially executed*.

Reply

Ed says: 6th Jan 2018 at 5:03 am

You can’t fault just because a speculative instruction is invalid. Think of this simple
pattern that’s used everywhere in C/C++ code:

if (pointer != NULL)
pointer->data = value;

Check if you have a valid memory address, and if so, do something with it. If you throw a
fault based on speculative instructions, you’ll be faulting constantly on code like that.

(Implementation details: NULL is zero. Memory addresses at or close to zero are always
marked invalid in the page table, and trigger a fault when accessed. This is done so that
a lot of bad code will crash immediately instead of writing garbage over real data.)

Reply

Kaitain says: 6th Jan 2018 at 12:19 pm

Okay, so:

> Intel cores are vulnerable because of a design choice not to prevent speculative loads
from illegal addresses

If the other design choice had been taken, what would it have looked like?
SpeculaArrg says: 6th Jan 2018 at 1:37 pm
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION

Shouldn’t the susceptibility to Meltdown be implementation as well as model speci c ?


Or is validating permissions on memory access before committing rather than before
loading is part of the speci ed A-75 micro-arch. ?

Reply

pjt says: 5th Jan 2018 at 11:16 pm

Good writing. Thanks!

Reply

Richard Collins says: 5th Jan 2018 at 11:29 pm

I think I understand. Please correct if this is wrong.


So you’re saying:

You have an if that will equate to false that tries to read from kernel memory. (if you did
read this memory, it’ll raise an exception)

You ensure the cache is ushed so that when the CPU speculatively executes the read
from kernel memory the value will be in the cache.

The if is then checked and is found to be false and so an exception is not red as the
CPU pretends it was not executed.

Now the memory read from the kernel is in the cache (because of the speculative
execution) and is in the same place that our user space memory would have been
because of how the cache is aliased against the whole address space. And this is what
allows you to read it????

Reply

Kaitain says: 6th Jan 2018 at 12:22 am

It’s a bit more complex than that. The memory from the kernel is not loaded into the
cache. However, a section of (legal) user memory is loaded into the cache whose
address is based in part on a tiny piece of the (illegal) kernel memory, in an operation
that o cially never happened but whose cache fetch has been left as a side-effect. By
attempting to read that legal memory in a subsequent legal operation, and timing how
long it takes, you can reason backwards to what that tiny piece of kernel memory was
that you were never supposed to have been able to access. You can’t read it directly, but
you can infer its value from the side-effects of the phantom operation (the speculative
fetch).

Reply

James Wright says: 6th Jan 2018 at 12:55 am

Nearly, you compute a memory address based on a single bit in the hidden value and
then access that memory address, all within the branch that will ultimately be thrown
away. However, you can still determine the hidden value by timing how long it then takes
to access that memory address, because if it’s super fast, then you know it it must be in
the CPUs cache and not main memory (as it has been used before in the branch that got
trhown away). Well done, you’ve just discivered a single bit of memory that your were
never meant to see… now repeat for the rest… :-)

Reply

Cenek says: 6th Jan 2018 at 12:20 am

You do not have to directly read data from within the cache. Your cachche was ushed,
so either array[0] or array[4] are not cached. Then, afther execution, read access of one
of these two values with timing will leak if is your particular value cached or not. If
present or not could be tell apart by delay, short time mead data are cached, opossite
whe readed directly from memory … what is exactly one bit of information ;)

Reply
S. Rose says: 6th Jan 2018 at 1:53 am
BLOG DOWNLOADS COMMUNITY HELP FORUMS EDUCATION

This is extremely well-written, and if a reader has the patience to read it through
carefully, that reader comes away with an understanding not just of how a kernel-
reading exploit could be constructed, but also twenty years of advances in CPU design. I
am in awe, and reassured to have people like Eben Upton on the users’ side.

Reply

Badger says: 6th Jan 2018 at 2:38 am

Exactly what i was thinking.

(say about 1% of readers)

:)

Reply

Aung Thu Htun says: 6th Jan 2018 at 7:31 am

pardon my noobie question,


does that mean smartphones made with Cortex-A53 processor are not affected by
meltdown & spectre?
(e.g Xiaomi’s Honor 7X Smartphone with CPU : Octa-core (4×2.36 GHz Cortex-A53 &
4×1.7 GHz Cortex-A53) OS : Android 7.0 (Nougat)-)
million thanks….

Reply

Mike Morrow says: 6th Jan 2018 at 9:00 am

What is not discussed is that you have to have a program running that is doing these
timings. That, alone, would skew the numbers as control is taken from one thread and
given to another. Or… If the pipeline is stuck waiting on something, where is this program
going to run? I guess it could run on an additional core. Then it would have to be running
really tight code to get these timings. In fact, it seems like the program would have to
run faster than the cycle time of the core to be able to watch what is happening (timing
wise) in another core or memory or cache or whatever it is exactly watching. This seems
on the face of it to be impossible since you have to run faster than what you are timing
for timing to be usable. Am I missing something here or was it just left out. So far, all of
this seems theoretical. It seems like you would need another, faster processor to time
the decision processes of the other, slower processor. How can all this actually run on
the same CPU, even with multiple cores? Maybe the timing program can run faster after
its memory is all in cache. But, then, it has to collect and eventually send this data out so
it is subject to the same speed restrictions on the internal buss(es) as the program it is
watching. Seems all very theoretical and not particularly practical. Where is this wrong?
My speculations must be wrong if this can actually be done.

Reply

John Read says: 6th Jan 2018 at 9:14 am

This is a great article in explaining the processor issues and the operation of the current
software xes. I’m terms of future processor architecture design, how easy will it be to
design this out for ARM and Intel, and will it be possible to do so without suffering a
signi cant performance hit in future processor designs ? Are there any designs in the
pipeline that take a different approach to speculation and parallel pipelines from the
current generation of processor architectures ?

Reply

It's FOSS says: 6th Jan 2018 at 9:24 am

And all these years, ARM has been ignored by major players. At least this should make
people thinking about wider adoption of ARM processors.

Reply
BLOG
Terry Coles says: DOWNLOADS COMMUNITY
6th Jan 2018 at 9:58 am HELP FORUMS EDUCATION

Eben,

Long before I retired, I worked for some years as a Technical Author; an experience that
makes me super-critical of so-called technical journalists and authors who don’t really
understand their topic.

I have to say that this is the best bit of technical journalism that I’ve read for years. After
technical authorship I worked as an engineer in the test industry, but 30 years of that did
not equip me to understand the intricacies of CPU architecture. Your posting has
impressed me most because it doesn’t assume that reader knows anything except a
slight grounding in electronics engineering and computer science, but to me anyway, is
incredibly readable.

Thanks for this and for everything else that you’ve done for education.

Reply

Eben Upton says: 6th Jan 2018 at 10:33 am

Thanks Terry – that means a lot. These days I don’t often get a continuous block of time
required to write this sort of thing, but this felt worth spending a day on. I ran out of time
before getting to the detail of Spectre, but I’ve started adding some relevant material
(e.g. branch prediction) to the post, and hope to get to it this week.

Reply

Barnaby says: 6th Jan 2018 at 10:17 am

Amazingly clear writeup; more content like this please.

Reply

Taivas says: 6th Jan 2018 at 10:37 am

Very good article!

Thanks for explaining this complex matter in an easy to read and understandable way
for everybody.

Keep on writing technical stories like this.

Reply

Martin Whit eld says: 6th Jan 2018 at 10:39 am

I see what you do Eben.


Raspberry Pi is actually Battlestar Galactica.
:-)
Nice explanantion. I now have something to point folks to

Reply

Geoff S says: 6th Jan 2018 at 11:15 am

Hi Eben. A really helpful and informative post. It made way more sense than the urry of
pseudo-tech commentaries I’d been reading before today. Thanks for the update and
pleased the Pi is spared.

Say hi to Liz toe

Geoff

Reply

Jack Cole says: 6th Jan 2018 at 12:04 pm

Nice article that triggered an equally nice discussion in comments. Sad to see Wikipedia
description incorrectly pin side channel attacks to crypto systems when in fact such
attacks are widely used and not unique to crypto systems. Thanks for taking time to
BLOG
write, and for providing a usefulDOWNLOADS COMMUNITY
product to the general public. HELP FORUMS EDUCATION

Reply

Cat says: 6th Jan 2018 at 1:24 pm

Reading the white papers, the only (known) way to deploy the Spectre attack would be to
have a kernel with the Berkley Packet Filter JIT compiled in, which is in Ring 0. What if it’s
more of a aw in the BPF or gcc toolchain, and not necessarily a aw with any particular
processor?

Have we seen it deployed as anything other than taking advantage of any other
methods?

Either way, the x on the arm white paper for the issue isn’t computationally expensive.

Reply

Alex says: 6th Jan 2018 at 1:56 pm

Incredibly interesting. Thank you for the write up.

Reply

Jay Boisseau says: 6th Jan 2018 at 3:23 pm

This excellent explanation is a greta primer on how (most) modern microprocessors and
compilers attempt to maximize performance, as well as the clearest explanation of the
fundamentals of Meltdown and Spectre. Great job!

Reply

LEAVE A REPLY

Your email address will not be published. Required elds are marked *

Comment

Name *

Email *

Website

Post Comment

About Support Contact Social


About Us Help Contact Us Twitter
Team BLOG DOWNLOADS
Documentation COMMUNITY HELP FORUMS
Facebook EDUCATION
Governance Learning Resources Google+
Our Supporters Training GitHub
Jobs Downloads Vimeo
Research and Insights YouTube
FAQs Instagram

RASPBERRY PI FOUNDATION
UK REGISTERED CHARITY 1129409 Cookies Trademark rules and brand guidelines

Você também pode gostar