The Complete Friday Q&A: Volume III
By Mike Ash, Landon Fuller, Matthew Elton and Gwynne Raskind
()
About this ebook
Related to The Complete Friday Q&A
Related ebooks
The Complete Friday Q&A: Volume II Rating: 0 out of 5 stars0 ratingsJavaScript Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series Rating: 0 out of 5 stars0 ratingsJust the basics of JavaScript Rating: 0 out of 5 stars0 ratingsPro PowerShell for Amazon Web Services Rating: 0 out of 5 stars0 ratingsJavaScript Basics: Flinging Event, Element, and Object Bombs at Zombie Heads: Undead Institute Rating: 0 out of 5 stars0 ratingsAn Introduction to Programming in JavaScript: Stomping Zombies with Variables, Loops, Functions and More: Undead Institute, #10 Rating: 0 out of 5 stars0 ratingsJavaScript Enlightenment Rating: 4 out of 5 stars4/5Learn Programming Using C# Rating: 0 out of 5 stars0 ratingsBeginning Rails 6: From Novice to Professional Rating: 0 out of 5 stars0 ratingsJavascript Concepts: 1St Edition Rating: 0 out of 5 stars0 ratingsPro C# 8 with .NET Core 3: Foundational Principles and Practices in Programming Rating: 0 out of 5 stars0 ratingsJavaScript: Advanced Guide to Programming Code with Javascript: JavaScript Computer Programming, #4 Rating: 0 out of 5 stars0 ratingsJavaScript: Advanced Guide to Programming Code with JavaScript Rating: 0 out of 5 stars0 ratingsPractical Web Development with Haskell: Master the Essential Skills to Build Fast and Scalable Web Applications Rating: 0 out of 5 stars0 ratingsThe Little Book of Javascript Rating: 0 out of 5 stars0 ratingsYour First Week With Node.js Rating: 0 out of 5 stars0 ratingsPro C# 9 with .NET 5: Foundational Principles and Practices in Programming Rating: 0 out of 5 stars0 ratingsThe Complete ASP.NET Core 3 API Tutorial: Hands-On Building, Testing, and Deploying Rating: 0 out of 5 stars0 ratingsUltimate Ios 10, Xcode 8 Development Book: Build 30 Apps Rating: 0 out of 5 stars0 ratingsJavaScript: Best Practice Rating: 0 out of 5 stars0 ratingsC# Interview Questions You'll Most Likely Be Asked Rating: 0 out of 5 stars0 ratingsDeveloping Web Components with TypeScript: Native Web Development Using Thin Libraries Rating: 0 out of 5 stars0 ratingsPractical Oracle SQL: Mastering the Full Power of Oracle Database Rating: 0 out of 5 stars0 ratingsJavaScript Patterns JumpStart Guide (Clean up your JavaScript Code) Rating: 4 out of 5 stars4/5Rails: Novice to Ninja: Build Your Own Ruby on Rails Website Rating: 4 out of 5 stars4/5JavaScript for Web Designers Rating: 0 out of 5 stars0 ratingsJavaScript: Best Practices to Programming Code with JavaScript Rating: 0 out of 5 stars0 ratingsJavaScript: Best Practices to Programming Code with JavaScript: JavaScript Computer Programming, #3 Rating: 0 out of 5 stars0 ratingsPFC Wiggins's Unofficial Commissary: An Undead Institute HTML & CSS Workbook: Undead Institute, #6.5 Rating: 0 out of 5 stars0 ratingsLearn VBScript in 24 Hours Rating: 0 out of 5 stars0 ratings
Computers For You
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratings101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsCompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsDark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5CompTIA Security+ Practice Questions Rating: 2 out of 5 stars2/5The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet Rating: 4 out of 5 stars4/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsPractical Lock Picking: A Physical Penetration Tester's Training Guide Rating: 5 out of 5 stars5/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands Rating: 5 out of 5 stars5/5
Reviews for The Complete Friday Q&A
0 ratings0 reviews
Book preview
The Complete Friday Q&A - Mike Ash
Index
About The Complete Friday Q&A: Volume III
Friday Q&A is a biweekly series on Mac programming. It can be found online at https://mikeash.com/pyblog/. Volume III is a full archive of all posts from December 2012 to April 2016.
The author gratefully acknowledges all of the topic and comment contributions to Friday Q&A from its readers.
The Complete Friday Q&A: Volume III Copyright © 2012-2017 by Michael Ash
Mike Ash
mike@mikeash.com
https://mikeash.com/
Introduction
It's been a long time since the release of The Complete Friday Q&A: Volume I, and the result is a massive backlog of articles. The contents of this volume were originally intended to be part of Volume II, but the result would have been too large. Instead, I'm releasing Volumes II and III simultaneously, which together contain the articles in question.
Like Volume II, Volume III contains some articles from guest authors. Landon Fuller, Matthew Elton, and Gwynne Raskind contributed articles for this volume, and I'm delighted to present their articles next to mine. Their articles are indicated by bylines under the article title. Articles without bylines are my own.
I hope you enjoy the unusual and occasionally absurd programming content collected here. As always, if you have an idea for a topic that you'd like to see covered in Friday Q&A, send it in!
Acknowledgements
Special thanks go to Landon Fuller, Matthew Elton, and Gwynne Raskind, who contributed articles included in this book. Letting someone write for your blog is like letting them stay in your house and borrow your car, and they lived up to the trust I placed in them and then some.
I would like to thank my reviewers, whose valuable input dramatically improved this book. They are: Harry Jordan, Steven Vandeweghe, Matthias Neeracher, Phil Holland, Matthias Neeracher, Alex Blewitt, Landon Fuller, Joshua Pokotilow, and Cédric Luthi.
I would also like to thank everyone who contributed the topic ideas used throughout this book. Their names can be found at the beginning of each chapter.
Finally, I would like to thank everyone who has commented on one of my posts, e-mailed about Friday Q&A, or merely read it. No matter what your contribution, it is appreciated.
Dedication
The Complete Friday Q&A Volumes II and III are dedicated to the memory of my friend and fellow glider club member Steve Zaboji. Steve was killed in a plane crash the same afternoon that I received the proofs of these books. He was a central part of the club and will be deeply missed.
Friday Q&A 2012-12-14:
Objective-C Pitfalls
Related Articles
Windows and Window Controllers
Proper Use of Asserts
Let's Build stringWithFormat:
Swifty Target/Action
Performance Comparisons of Common Operations, 2016 Edition
Objective-C is a powerful and extremely useful language, but it's also a bit dangerous. For today's article, my colleague Chris Denter suggested that I talk about pitfalls in Objective-C and Cocoa, inspired by Cay S. Horstmann's article on C++ pitfalls.
Introduction
I'll use the same definition as Horstmann: a pitfall is code that compiles, links, runs, but doesn't do what you might expect it to. He provides this example, which is just as problematic in Objective-C as it is in C++:
if
(
-
0.5
<=
x
<=
0.5
)
return
0
;
A naive reading of this code would be that it checks to see whether
x
is in the range [-0.5, 0.5]. However, that's not the case. Instead, the comparison gets evaluated like this:
if
((
-
0.5
<=
x
)
<=
0.5
)
In C, the value of a comparison expression is an
int
, either
0
or
1
, a legacy from when C had no built-in boolean type. It is that
0
or
1
, not the value of
x
, that is compared with 0.5. In effect, the second comparison works as an extremely weirdly phrased negation operator, such that the if statement's body will execute if and only if
x
is less than -0.5.
Nil Comparison
Objective-C is highly unusual in that sending messages to
nil
does nothing and simply returns
0
. In nearly every other language you're likely to encounter, the equivalent is either prohibited by the type system or produces a runtime error. This can be both good and bad. Given the subject of the article, we'll concentrate on the bad.
First, let's look at equality testing:
[
nil
isEqual:
@string
]
Messaging
nil
returns
0
, which in this case is equivalent to
NO
. That happens to be the correct answer here, so we're off to a good start! However, consider this:
[
nil
isEqual:
nil
]
This also returns
NO
. It doesn't matter that the argument is the exact same value. The argument's value doesn't matter at all, because messages to
nil
always return
0
no matter what. So going by
isEqual:
,
nil
never equals anything, including itself. Mostly right, but not always.
Finally, consider one more permutation with
nil
:
[
@string
isEqual:
nil
]
What does this do? Well, we can't be sure. It may return
NO
. It may throw an exception. It may simply crash. Passing
nil
to a method that doesn't explicitly say it's allowed is a bad idea, and
isEqual:
doesn't say that it accepts
nil
.
Many Cocoa classes also include a
compare:
method. This takes another object of the same class and returns either
NSOrderedAscending
,
NSOrderedSame
, or
NSOrderedDescending
, to indicate less than, equal, or greater than.
What happens if we compare with
nil
?
[
nil
compare:
nil
]
This returns
0
, which happens to be equal to
NSOrderedSame
. Unlike
isEqual:
,
compare:
thinks
nil
equals
nil
. Handy! However:
[
nil
compare:
@string
]
This also returns
NSOrderedSame
, which is definitely the wrong answer.
compare:
will consider
nil
to be equal to anything and everything.
Finally, like
isEqual:
, passing
nil
as the parameter is a bad idea:
[
@string
compare:
nil
]
In short, be careful with
nil
and comparisons. It really doesn't work right. If there's any chance your code will encounter
nil
, you must check for and handle it separately before you start doing
isEqual:
or
compare:
.
Hashing
You write a little class to contain some data. You have multiple equivalent instances of this class, so you implement
isEqual:
so that those instances will be treated as equal. Then you start adding your objects to an
NSSet
and things start behaving strangely. The set claims to hold multiple objects after you just added one. It can't find stuff you just added. It may even crash or corrupt memory.
This can happen if you implement
isEqual:
but don't implement
hash
. A lot of Cocoa code requires that if two objects compare as equal, they will also have the same hash. If you only override
isEqual:
, you violate that requirement. Any time you override
isEqual:
, always override
hash
at the same time. For more information, see my article on Implementing Equality and Hashing.
Macros
Imagine you're writing some unit tests. You have a method that's supposed to return an array containing a single object, so you write a test to verify that:
STAssertEqualObjects
([
obj
method
],
@
[
@expected
],
@Didn't get the expected array
);
This uses the new literals syntax to keep things short. Nice, right?
Now we have another method that returns two objects, so we write a test for that:
STAssertEqualObjects
([
obj
methodTwo
],
@
[
@expected1
,
@expected2
],
@Didn't get the expected array
);
Suddenly, the code fails to compile and produces completely bizarre errors. What's going on?
What's going on is that
STAssertEqualObjects
is a macro. Macros are expanded by the preprocessor, and the preprocessor is an ancient and fairly dumb program that doesn't know anything about modern Objective-C syntax, or for that matter modern C syntax. The preprocessor splits macro arguments on commas. It's smart enough to know that parentheses can nest, so this is seen as three arguments:
Macro
(
a
,
(
b
,
c
),
d
)
Where the first argument is
a
, the second is
(b,
c)
, and the third is
d
. However, the preprocessor has no idea that it should do the same thing for
[]
and
{}
. With the above macro, the preprocessor sees four arguments:
[obj
methodTwo]
@[
@expected1
@"expected2
]
@"Didn't
get
the
expected
array"
This results in completely mangled code that not only doesn't compile, but confuses the compiler so much that it can't provide understandable diagnostics. The solution is easy, once you know what the problem is. Parenthesize the literal so the preprocessor treats it as one argument:
STAssertEqualObjects
([
obj
methodTwo
],
(
@
[
@expected1
,
@expected2
]),
@Didn't get the expected array
);
Unit tests are where I've run into this most frequently, but it can pop up any time there's a macro. Objective-C literals will fall victim, as will C compound literals. Blocks can also be problematic if you use the comma operator within them, which is rare but legal. You can see that Apple thought about this problem with their
Block_copy
and
Block_release
macros in
/usr/include/Block.h
:
#define Block_copy(...) ((__typeof(__VA_ARGS__))_Block_copy((const void *)(__VA_ARGS__)))
#define Block_release(...) _Block_release((const void *)(__VA_ARGS__))
These macros conceptually take a single argument, but they're declared to take variable arguments to avoid this problem. By taking
...
and using
__VA_ARGS__
to refer to the argument
, multiple arguments
with commas are reproduced in the macro's output. You can take the same approach to make your own macros safe from this problem, although it only works on the last argument of a multi-argument macro.
Property Synthesis
Take the following class:
@interface
MyClass
:
NSObject
{
NSString
*
_myIvar
;
}
@property
(
copy
)
NSString
*
myIvar
;
@end
@implementation
MyClass
@synthesize
myIvar
;
@end
Nothing wrong with this, right? The ivar declaration and
@synthesize
are a little redundant in this modern age, but do no harm.
Unfortunately, this code will silently ignore
_myIvar
and synthesize a new variable called
myIvar
, without the leading underscore. If you have code that uses the ivar directly, it will see a different value from code that uses the property. Confusion!
The rules for
@synthesize
variable names are a little weird. If you specify a variable name with
@synthesize
myIvar
=
_myIvar;
, then of course it uses whatever you specify. If you leave out the variable name, then it synthesizes a variable with the same name as the property. If you leave out
@synthesize
altogether, then it synthesizes a variable with the same name as the property, but with a leading underscore.
Unless you need to support 32-bit Mac, your best bet these days is to avoid explicitly declaring backing ivars for properties. Let
@synthesize
create the variable, and if you get the name wrong, you'll get a nice compiler error instead of mysterious behavior.
Interrupted System Calls
Cocoa code usually sticks to higher level constructs, but sometimes it's useful to drop down a bit and do some
POSIX
. For example, this code will write some data to a file descriptor:
int
fd
;
NSData
*
data
=
...;
const
char
*
cursor
=
[
data
bytes
];
NSUInteger
remaining
=
[
data
length
];
while
(
remaining
>
0
)
{
ssize_t
result
=
write
(
fd
,
cursor
,
remaining
);
if
(
result
<
0
)
{
NSLog
(
@Failed to write data: %s (%d)
,
strerror
(
errno
),
errno
);
return
;
}
remaining
-=
result
;
cursor
+=
result
;
}
However, this can fail, and it will fail strangely and intermittently. POSIX calls like this can be interrupted by signals. Even harmless signals handled elsewhere in the app like
SIGCHLD
or
SIGINFO
can cause this.
SIGCHLD
can occur if you're using
NSTask
or are otherwise working with subprocesses. When
write
is interrupted by a signal, it returns
-1
and sets
errno
to
EINTR
to indicate that the call was interrupted. The above code treats all errors as fatal and will bail out, even though the call just needs to be tried again. The correct code checks for that separately and retries the call:
while
(
remaining
>
0
)
{
ssize_t
result
=
write
(
fd
,
cursor
,
remaining
);
if
(
result
<
0
&&
errno
==
EINTR
)
{
continue
;
}
else
if
(
result
<
0
)
{
NSLog
(
@Failed to write data: %s (%d)
,
strerror
(
errno
),
errno
);
return
;
}
remaining
-=
result
;
cursor
+=
result
;
}
String Lengths
The same string, represented differently, can have different lengths. This is a relatively common but incorrect pattern:
write
(
fd
,
[
string
UTF8String
],
[
string
length
]);
The problem is that
NSString
computes length in terms of UTF-16 code units, while
write
wants a count of bytes. While the two numbers are equal when the string only contains ASCII (which is why people so frequently get away with writing this incorrect code), they're no longer equal once the string contains non-ASCII characters such as accented characters. Always compute the length of the same representation you're manipulating:
const
char
*
cStr
=
[
string
UTF8String
];
write
(
fd
,
cStr
,
strlen
(
cStr
));
Casting to BOOL
Take this bit of code that checks to see whether an object pointer is
nil
:
-
(
BOOL
)
hasObject
{
return
(
BOOL
)
_object
;
}
This works... usually. However, roughly 6% of the time, it will return
NO
even though
_object
is not
nil
. What gives?
The
BOOL
type is, unfortunately, not a boolean. Here's how it's defined:
typedef
signed
char
BOOL
;
This is another bit of legacy from the days when C had no boolean type. Cocoa predates C99's
_Bool
, so it defines its boolean
type as a
signed
char
, which is an 8-bit integer. When you cast a pointer to an integer, you get the numeric value of that pointer. When you cast a pointer to a small integer, you just get the numeric value of the lower bits of that pointer. When the pointer looks like this:
...
.110011001110000
The
BOOL
gets this:
01110000
This is not
0
, meaning that it evaluates as true, so what's the problem? The problem is when the pointer looks like this:
...
.110011000000000
Then the
BOOL
gets this:
00000000
This is
0
, also known as
NO
, even though the pointer wasn't
nil
. Oops!
How often does this happen? There are
256
possible values in the
BOOL
, only one of which is
NO
, so we'd naively expect it to happen about 1/256 of the time. However, Objective-C objects are placed on aligned addresses, normally aligned to
16
bytes. This means that the bottom four bits of the pointer are always zero (something that tagged pointers take advantage of) and there are only four bits of freedom in the resulting
BOOL
. The odds of getting all zeroes there are about 1/16, or about 6%.
To safely implement this method, perform an explicit comparison against
nil
:
-
(
BOOL
)
hasObject
{
return
_object
!=
nil
;
}
If you want to get clever and unreadable, you can also use the
!
operator twice. This
!!
construct is sometimes referred to as C's convert to boolean
operator, although it's built from parts:
-
(
BOOL
)
hasObject
{
return
!!
_object
;
}
The first
!
produces
1
or
0
depending on whether
_object
is
nil
, but backwards. The second
!
then puts it right, resulting in
1
if
_object
is not
nil
, and
0
if it is.
You should probably stick to the
!=
nil
version.
Missing Method Argument
Let's say you're implementing a table view data source. You add this to your class's methods:
-
(
id
)
tableView:
(
NSTableView
*
)
objectValueForTableColumn:
(
NSTableColumn
*
)
aTableColumn
row:
(
NSInteger
)
rowIndex
{
return
[
dataArray
objectAtIndex:
rowIndex
];
}
Then you run your app and
NSTableView
complains that you haven't implemented this method. But it's right there!
As usual, the computer is correct. The computer is your friend.
Look closer. The first parameter is missing. Why does this even compile?
It turns out that Objective-C allows empty selector segments. The above does not declare a method named
tableView:objectValueForTableColumn:row:
with a missing argument name. It declares a method named
tableView::row:
, and the first argument is named
objectValueForTableColumn
. This is a particularly nasty way to typo the name of a method, and if you do it in a context where the compiler can't warn you about the missing method, you may be trying to debug it for a long time.
Conclusion
Objective-C and Cocoa have plenty of pitfalls ready to trap the unwary programmer. The above is just a sampling. However, it's a good list of things to be careful of.
Friday Q&A 2012-12-28:
What Happens When You Load a Byte of Memory
Related Articles
ARM64 and You
Why Registers Are Fast and RAM Is Slow
When an Autorelease Isn't
A Heartbleed-Inspired Paranoid Memory Allocator
Let's Build NSZombie
Swift Struct Storage
The hardware and software that our apps run on is almost frighteningly complicated, and there's no better place to see that than in the contortions that the system goes through when we load data from memory. What exactly happens when we load a byte of memory? Reader and friend of the blog Guy English suggested I dedicate an article to answering that question.
Code
Let's start with the code that loads the byte of memory. In C, it would look something like this:
char
*
addr
=
...;
char
value
=
*
addr
;
On
x86-64
, this compiles to something like:
movsbl
(
%
rdi
),
%
eax
This instructs the CPU to load the byte located at the address stored in
%rdi
into the
%eax
register. On ARM, the compiler produces:
ldrsb
.
w
r0
,
[
r0
]
Although the instruction name is different, the effect is the same. It loads the byte located at the address stored in
r0
, and puts the value into
r0
. (The compiler is reusing
r0
here, since the address isn't needed anymore.)
Now that the CPU has its instruction, the software is done. Well, maybe.
Instruction Decoding and Execution
I don't want to go too in depth with how the CPU actually executes code in general. In short, the CPU loads the above instruction from memory and decodes it to figure out the opcode and operands. Once the CPU sees that incoming instruction is a load, it issues the memory load for the appropriate address.
Virtual Memory
On most hardware you're likely to program for today, and on any Apple platform from the past couple of decades, the system uses virtual memory. In short, virtual memory disconnects the memory addresses seen by your program from the physical memory addresses of the actual RAM in your computer. In other words, when your program accesses address
42
, that might actually access the physical RAM address
977305
.
This mapping is done by page. Each page is a 4kB chunk of memory. The overhead of tracking virtual address mappings for every byte in memory would be far too great, so pages are mapped instead. They're small enough to provide decent granularity, but large enough to not incur too much overhead in maintaining the mapping.
Modern virtual memory systems also have the ability to set permissions on a page. A page may be readable, writeable, or executable, or some combination thereof. If the program tries to do something with a page that it isn't allowed, or tries to access a page that has no mapping at all, the program is suspended and a fault is raised with the operating system. The OS can then take further action, such as killing the program and generating a crash report, which is what happens when you experience the common
EXC_BAD_ACCESS
error.
The hardware that handles this work is called the Memory Management Unit, or MMU. The MMU intercepts all memory accesses and remaps the address according to the current page mappings.
The first thing that happens when the CPU loads a byte of memory is to hand the address to the MMU for translation. (This is not always true. On some CPUs, there is a layer of cache that comes before the MMU. However, the overall principle remains.)
The first thing the MMU does with the address is slice off the bottom 12 bits, leaving a plain page address. 2¹² equals 4096, so the bottom 12 bits describe the address's location within its page. Once the rest of the address is remapped, the bottom 12 bits can be added on to generate the full physical address.
With the page address in hand, the MMU consults the Translation Lookaside Buffer, or TLB. The TLB is a cache for page mappings. If the page in question has been accessed recently, the TLB will remember the mapping, and quickly return the physical page address, at which point the MMU's work is done.
When the TLB does not contain an entry for the given page, this is called a TLB miss, and the entry must be found by searching the entire page table. The page table is a chunk of memory that describes every page mapping in the current process. Most commonly, the page table is laid out in memory by the OS in a special format that the MMU can understand directly. Following a TLB miss, the MMU searches the page table for the appropriate entry. If it finds one, it loads it into the TLB and performs the remapping.
On some architectures, the page table mapping is left entirely up to the OS. When a TLB miss occurs, the CPU passes control to the OS, which is then responsible for looking up the mapping and filling the TLB with it. This is more flexible but much slower, and isn't found much in modern hardware.
If no entry is found in the page table, that means the given address doesn't exist in RAM at all. The CPU informs the OS, which then decides how to handle the situation. If the OS doesn't think that address is valid, it terminates the program and you get an
EXC_BAD_ACCESS
. In some cases, the OS does think the address is valid, but just doesn't have the data in RAM. This can happen if the data has been swapped out to disk, is part of a memory mapped file, or is freshly allocated with backing memory being provided on demand. In these cases, the OS loads the appropriate data into RAM, adds an entry to the page table, and then lets the MMU translate the virtual address into a physical address now that the backing data is available.
Cache
With the address in hand, the CPU consults its memory cache. In days of yore, the CPU would talk directly to RAM. However, CPU speeds have increased faster than memory speeds, and that's no longer practical. If a modern CPU had to talk directly to modern RAM for every memory access, our computers would slow to a relative crawl.
The cache is a hardware map from a set of memory addresses to memory contents. Caches are organized into cache lines, which are typically in the region of 32-128 bytes each. Each entry in the cache holds an address and a single cache line corresponding to that address. When loading data from the cache, it checks to see if the requested address exists in the cache, and if so, returns the appropriate data from that address's cache line.
There are typically several levels of cache. Due to hardware design constraints, larger caches are necessarily slower. By having multiple levels, a small, fast cache can be checked first, with slower, larger caches used later to avoid the cost of fetching from RAM. The CPU first checks with the L1 cache, which is the first level. This cache is small, typically around 16-64kB. If it contains the data in question, then the memory load is complete! Since that's boring, we'll assume the caches don't contain the data being loaded here.
Next up is the L2 cache. This is bigger, generally anywhere from 256kB to several megabytes. In some CPUs, the L2 cache is the last level, and these typically have larger L2 caches. Other CPUs have an L3 cache as well, in which case the L2 is usually smaller, and it's supplemented by a large L3 cache, usually several megabytes, with some high performance chips having up to 20MB of L3 cache.
Once all levels of cache have been tried, if none of them contain the necessary data, it's time to try main memory. Because caches work with entire cache lines, the entire cache line is loaded from main memory at once, even though we're only loading a single byte. This greatly increases efficiency in the common case of accessing other nearby memory, since subsequent nearby loads can come from cache, at the cost of wasting time when memory use is scattered.
Memory
It's finally time to start querying RAM. The CPU has been waiting quite a while by this point, and will have to wait a long time more before it gets the data it wants.
The load is handed off to the memory controller, which is the bit of hardware that actually knows how to talk to RAM. On a lot of modern hardware, the memory controller is integrated directly into the CPU, while on some systems it's part of a separate chip called the northbridge
.
The memory controller then starts loading data from RAM. Modern SDRAM transfers 64 bits of data at a time, so several transfers have to be done to fill the entire cache line being requested.
The memory controller places the load address on the address pins of the RAM and waits for the data to be returned. Internally, the RAM uses the values on the address pins to activate a row of memory cells, whose contents are then exposed on the RAM's output pins.
RAM is not instantaneous, and there's an appreciable delay between when the memory controller requests an address and when the data is available, on the order of 10 nanoseconds in current hardware. It takes more time to perform the subsequent loads needed for the cache line, but the loads can be pipelined, so total transfer time is maybe 50% more.
As the memory controller obtains data from RAM, it hands that data back to the caches, which store it in case other data from the same cache line is needed soon. Finally, the requested byte is handed to the CPU, which places the data into the register requested by the instruction. At last, after all of this work, the CPU can get on with running the code that needed that byte of data.
Consequences
There are a lot of practical consequences that result from how all of this stuff works. In particular, memory acccess is slow, relatively speaking. It's amazing that your computer can do all of the above work literally tens of millions of times per second, but it can do other things literally billions of times per second. Everything is relative.
The total time required for all of this, assuming a TLB hit (the fast case for the MMU) is a couple of dozen nanoseconds. On a 2GHz CPU, that could mean something like 50 clock cycles with the potential to execute perhaps 150 instructions in that time. That's a lot. A TLB miss may double or triple this latency number.
Modern CPUs are pipelined and parallelized. This means that they will likely see the need for the memory read ahead of time and initiate the load at that point, softening the blow. Parallel execution means that the CPU will probably be able to continue executing some code after the load instruction while waiting for the load, especially code that doesn't depend on the loaded value. However, this stuff has limits, and finding 150 instructions that can be executed while waiting for RAM is a tall order. You're almost certain to hit a point where program execution has to stop and wait for the memory load to complete.
Incidentally, this is where hyperthreading gains its advantage. Instead of having an entire CPU core just idle while waiting for RAM, hyperthreading lets it switch over to a completely different thread of execution and run code from that instead, so that it can still get useful work done while it waits.
Access patterns are key to performance. Discussions about micro-optimization tend to center on using some instructions rather than others, avoiding divisions, etc. Relatively few talk about memory access patterns. However, it doesn't matter how optimized your individual instructions are if they're operating on memory that's loaded in a way that isn't kind to the memory system. Saving a few cycles here and there is meaningless if you're waiting dozens of cycles for every new piece of data to load. For example, this is why, although it's the more natural way to express it, you should never write loops to access image data like this:
for
(
int
x
=
0
;
x
<
width
;
x
++
)
for
(
int
y
=
0
;
y
<
height
;
y
++
)
// use the pixel at x, y
Images are typically laid out in contiguous rows, and this loop does not take advantage of that fact. It accesses columns, only coming back to the next pixel in the first row after loading the entire first column. This causes cache and TLB misses. This loop will be vastly slower than if you iterate over rows first, then columns:
for
(
int
y
=
0
;
y
<
height
;
y
++
)
for
(
int
x
=
0
;
x
<
width
;
x
++
)
// use the pixel at x, y
In many cases, the top loop with fast code in the loop body will be massively outperformed by the bottom loop with slow code in the loop body, simply because memory access delays can be so punishing.
To make things even worse, profilers, such as Apple's Time Profiler in Instruments, aren't good at showing these delays. They'll tell you what instructions took time, but because of the pipelined, parallel nature of modern CPUs, the instruction that takes the hit of the memory load may not be the actual load instruction. The CPU will hit the load instruction, mark its destination register as not having its data yet, and move on. When the CPU hits an instruction that actually needs that register's value, then it will stop and wait. The clue here is when the first instruction in a sequence of manipulations on the same value takes far longer than the rest, and far longer than it should. For example, if you have code that does
load
,
add
,
mul
,
add
, and the profiler says that the first
add
takes the vast majority of the time, this is likely to be a memory delay, not actually a slow
add
.
Conclusion
Modern computers operate on time scales that are difficult to envision. To a human, the time required for a single CPU cycle and the time required to perform a hard disk seek are both indistingusihably instantaneous, yet they vary by many orders of magnitude. The computer is an incredibly complicated system that requires a huge number of things to happen in order to load a single chunk of data from memory. Knowing what goes on in the hardware when this happens is fascinating and can even help write better code. It's even more incredible once you think that this complicated set of operations happens literally millions of times every second in the computer you're using to read this.
Friday Q&A 2013-01-11:
Mach Exception Handlers
by Landon Fuller
Related Articles
Swift Name Mangling
Preprocessor Abuse and Optional Parentheses
This is my first guest Friday Q&A article, dear readers, and I hope it will withstand your scrutiny. Today's topic is on Mach exception handlers, something I've recently spent some time exploring on Mac OS X and iOS for the purpose of crash reporting. While there is surprisingly little documentation available about Mach exception handlers, and they're considered by some to be a mystical source of mystery and power, the fact is that they're actually pretty simple to understand at a high level - something I hope to elucidate here. Unfortunately, they're also partially private API on iOS, despite being used in a number of new crash reporting solutions - something I'll touch on in the conclusion.
Signals vs. Exceptions
On most UNIX systems, the only mechanism available for handling crashes (such as dereferencing
NULL
, or writing to an unwritable page) are the standard UNIX signal handlers. When a fatal machine exception is generated, it is caught by the kernel, which then executes a user-space trampoline within the failing process, executing any function previously registered by that process via
sigaction(2)
or
signal(3)
.
On OS X, however, a much more versatile API exists: Mach exceptions. Dating back to Avie Tevanian's work on the Mach OS (yes, that Avie Tevanian), Mach exceptions build on Mach IPC/RPC to provide an alternative to the UNIX signal handler API. The original design of the Mach exception handling facility was first described, as far as I'm aware, in a 1988 paper authored by Avie Tevanian, among others. It remains fairly accurate to this day, and I'd recommend reading it for more details (after finishing this post, of course).
Mach exceptions differ from UNIX signals in three significant ways:
Exception information is delivered as a Mach message via a Mach IPC port, rather than by the kernel calling into a userspace trampoline.
Exception handlers may be registered by any process that has the appropriate mach port rights for the target process.
Exception handlers may be registered for a specific thread, a specific task (process), or for the entire host. The kernel will search for handlers in that order.
These differences introduce a number of properties that can be useful when implementing debuggers and crash reporters, and are what make the Mach API interesting as an alternative to BSD signals.
Exceptions are Messages
The Mach exception API is based on Mach RPC (which is, in itself, based on Mach IPC). There's a lot of confusion around Mach IPC, but at a high-level, it's not too dissimilar to UNIX sockets or other well-known IPC mechanisms that allow one to read/write messages between processes. Mach IPC communication occurs over mach ports, rather than via socket or other traditional UNIX mechanism; mach ports have unique names, and can be shared with other processes. They can be used to send and receive messages containing arbitrary data. There's a bit more complexity involved in their actual use, but conceptually, that's about all you need to know.
To write a Mach exception handler using raw Mach IPC, you would need to wait for a new exception message by calling
mach_msg()
on a Mach port previously registered as an exception handler (how to do this is covered below). The call to
mach_msg()
will block until an exception message is received, or the thread is interrupted. Once a message is received, you are free to introspect it for the state of the thread that generated the exception. You can even correct the cause of the crash and restart the failing thread, if you feel like hacking register state at runtime.
Since exceptions are provided as messages, rather than by calling a local function, exception messages can be forwarded to the previously registered Mach exception handler, even if that existing handler is completely out-of-process. This means that you can insert an exception handler without disturbing an existing one, whether it's the debugger or Apple's crash reporter. To forward the message to an existing handler, you also use
mach_msg()
to send the original message to a previously registered handler's mach port, using the
MACH_SEND_MSG
flag.
However, if you wish to respond the Mach RPC request yourself, rather than forwarding it, you would need to reply to the message, informing the sender whether or not you handled the exception. Mach considers an exception handled if the crashing thread's state has been corrected such that its execution can be resumed. In this case, the kernel does not attempt to find any other exception handler, and considers the matter settled. However, if you reply to the RPC request informing the sender (usually the kernel) that the exception has not been handled, the sender will then try to find the next applicable Mach exception handler. Remember that the kernel attempts to send exceptions to thread-specific, task-specific, and host-global exception handlers, in that order.
The fact that a reply is expected from the exception request can be used for interesting purposes. For example, if a debugger has its exception handler called when a breakpoint is hit, it can simply wait to reply to the Mach exception message until (and only if) you request that the debugger continue execution.
Mach RPC, not IPC
While above I described how one might implement mach exception handling with raw Mach IPC, the fact is that this is not how the interfaces are defined in Mach. Instead, Mach RPC uses an interface description language (called matchmaker in the original 1989 paper), to describe the format of Mach RPC requests (and their replies), and automatically generate code to handle received messages and generate a reply.
On OS X, the Mach RPC interface descriptions for exception handling -
mach_exc.defs
and
exc.defs
- are available via
/usr/include/mach
. If you include these files in your Xcode project, it will automatically run the
mig(1)
tool (Mach Interface Generator), generating headers and C source files necessary to receive and handle Mach exception messages. The
exc.defs
file provides an API for working with 32-bit exceptions, whereas the
mach_exc.defs
file provides an API for working with 64-bit exceptions. Unfortunately, the Mach RPC defs are not provided on iOS, and only a subset of the necessary generated headers are provided. As a result, it's not possible to implement a fully correct Mach exception handler on iOS without relying on undocumented functionality.
The code generated by MIG handles two things:
Interpreting incoming RPC messages and calling out to an existing handler function with the decoded data.
Initialize a response to the RPC messages using the return values from the handler function.
The generated code does not handle registering a Mach exception handler, receiving the Mach message, or actually sending the reply. That is the implementor's responsibility. In addition, there are multiple supported exception behaviors
that provide different sets of information about an exception; it is the implementor's responsibility to provide callback functions for all of them.
This is best illustrated in the following 64-bit safe code, intended to work with RPC code generated by
mach_exc.defs
(I've left out error handling for simplicity):
// Handle EXCEPTION_DEFAULT behavior
kern_return_t
catch_mach_exception_raise
(
mach_port_t
exception_port
,
mach_port_t
thread
,
mach_port_t
task
,
exception_type_t
exception
,
mach_exception_data_t
code
,
mach_msg_type_number_t
codeCnt
)
{
// Do smart stuff here.
fprintf
(
stderr
,
"My exception handler was called by exception_raise()
\n
"
);
// Inform the kernel that we haven't handled the exception, and the
// next handler should be called.
return
KERN_FAILURE
;
}
extern
boolean_t
mach_exc_server
(
mach_msg_header_t
*
msg
,
mach_msg_header_t
*
reply
);
static
void
exception_server
(
mach_port_t
exceptionPort
)
{
mach_msg_return_t
rt
;
mach_msg_header_t
*
msg
;
mach_msg_header_t
*
reply
;
msg
=
malloc
(
sizeof
(
union
__RequestUnion__mach_exc_subsystem
));
reply
=
malloc
(
sizeof
(
union
__ReplyUnion__mach_exc_subsystem
));
while
(
1
)
{
rt
=
mach_msg
(
msg
,
MACH_RCV_MSG
,
0
,
sizeof
(
union
__RequestUnion__mach_exc_subsystem
),
exceptionPort
,
0
,
MACH_PORT_NULL
);
assert
(
rt
==
MACH_MSG_SUCCESS
);
// Call out to the mach_exc_server generated by mig and mach_exc.defs.
// This will in turn invoke one of:
// mach_catch_exception_raise()
// mach_catch_exception_raise_state()
// mach_catch_exception_raise_state_identity()
// .. depending on the behavior specified when registering the Mach exception port.
mach_exc_server
(
msg
,
reply
);
// Send the now-initialized reply
rt
=
mach_msg
(
reply
,
MACH_SEND_MSG
,
reply
->
msgh_size
,
0
,
MACH_PORT_NULL
,
0
,
MACH_PORT_NULL
);
assert
(
rt
==
MACH_MSG_SUCCESS
);
}
}
You'll note from the example code that our exception handler is called a server. In Mach RPC parlance, the kernel would be the client: it issues RPC requests to our exception server, and waits for our reply.
Exception Behaviors
As described above, exception messages come in multiple formats, containing varying types of data. It's the implementor's responsibility to register for the correct behavior; the
mig
-generated RPC code will interpret the messages and hand it off to a user-defined function for the specific type. There are three basic behaviors defined by the Mach Exception API:
EXCEPTION_DEFAULT
: Exception messages will contain a reference thread that triggered it. Handled by
catch_exception_raise()
.
EXCEPTION_STATE
: Exception messages will contain the register state of the triggering thread, but not a reference to the thread itself. Handled by
catch_exception_raise_state()
.
EXCEPTION_STATE_IDENTITY
: Exception messages will contain the register state of the triggering thread, as well as a reference to the triggering thread. Handled by
catch_exception_raise_state_identity()
.
In addition to the above behaviors, an additional variant was added in later OS X releases to support 64-bit safety. The
MACH_EXCEPTION_CODES
flag may be set by OR'ing it with any of the listed behaviors, in which case 64-bit safe exception messages will be provided. This flag is used by LLDB/GDB even when targeting 32-bit processes. When using the
MACH_EXCEPTION_CODES
flag, one must also use the RPC functions generated by
mach_exc.defs
; these use the
mach_
prefix for all functions and types.
Generally speaking,
EXCEPTION_DEFAULT
or
EXCEPTION_STATE_IDENTITY
are sufficient for most purposes. Since
EXCEPTION_DEFAULT
behavior provides a reference to the triggering thread, you can also fetch the thread state that would normally be provided via
EXCEPTION_STATE_IDENTITY
via the Mach
thread_state()
API.
When registering your exception handler, you are responsible for requesting the
MACH_EXCEPTION_CODES
behavior that matches the RPC implementation (
exc.defs
or
mach_exc.defs
) that you intend to use.
Putting it Together
It's time to get down to brass tacks: actually registering an mach port to receive exception messages. As noted above, handlers can be registered for threads, tasks, and the host, and there are different sets of identical APIs for each:
(thread|task|host)_get_exception_ports
: Returns the currently registered set of exception ports.
(thread|task|host)_set_exception_ports
: Sets the exception port that will be used for all future exceptions.
(thread|task|host)_swap_exception_ports
: Atomically set a new exception port, and return the current ports. This can be used to avoid race conditions that could otherwise occur if multiple handlers are registered concurrently.
To register your handler, you'll need to first allocate a mach port to receive the messages, insert a send right
to permit sending responses, and then call one of the exception port
set()
or
swap()
functions to register it as a receiver of exception messages.
For example (error handling again elided for conciseness):
mach_port_t
server_port
;
kern_return_t
kr
=
mach_port_allocate
(
mach_task_self
(),
MACH_PORT_RIGHT_RECEIVE
,
&
server_port
);
assert
(
kr
==
KERN_SUCCESS
);
kr
=
mach_port_insert_right
(
mach_task_self
(),
&
server_port
,
&
server_port
,
MACH_MSG_TYPE_MAKE_SEND
);
assert
(
kr
==
KERN_SUCCESS
);
kr
=
task_set_exception_ports
(
task
,
EXC_MASK_BAD_ACCESS
,
server_port
,
EXCEPTION_DEFAULT
|
MACH_EXCEPTION_CODES
,
THREAD_STATE_NONE
);
If you wish to preserve the previous exception handlers,
task_swap_exception_ports()
should be used in place of
task_set_exception_ports()
.
Conclusion
Mach exception handlers are a very useful tool, and using them requires a fair bit of moving pieces, but hopefully they don't seem dauntingly complex. At the end of the day, mach exceptions are just a simple exception message, coupled with a reply, sent over Mach ports.
There are some signficiant advantages of the Mach API over signal handlers, including the ability to forward exceptions out-of-process, and handle all exceptions on a completely different stack - something that can be useful when handling an exception triggered by a stack overflow on the target thread.
If you plan on implementing your own mach exception handler, there are certainly more details worth further investigation:
When forwarding mach exceptions, you need to send an exception message that matches the previous registered handler's exception flavor. This may mean populating a new Mach exception message with additional thread state.
It's not strictly necessary to use the MIG-generated
exc_server()
or
mach_exc_server()
functions for interpreting Mach messages (though it is probably a good idea). Since
mig(1)
generates structures that may be used to directly interpret the Mach exception messages, you can do so directly.
If you forward exception messages for exceptions that occur in your own process, you need to be sure that the target for the reply is not also your own process. Single-stepping debuggers will only resume the thread they wish to step; that means that they won't resume your exception handler's thread, you'll never receive the reply, and the interrupted thread will never resume.
Lastly, I should highlight that the headers and mach interfaces required to implement a correct mach exception handler on iOS are not available (though they are available and public on Mac OS X). I filed a radar requesting their addition (
rdar://12939497
), as well as an Apple DTS support incident to clarify the situation. The radar is still open, but DTS provided the following guidance:
Our engineers have reviewed your request and have determined that this would be best handled as a bug report, which you have already filed. There is no documented way of accomplishing this, nor is there a workaround possible.
In the meantime, as far as I can determine through my own work, and as per DTS's feedback, it's not possible to implement Mach exception handling on iOS using only public API. Hopefully this will be resolved in a future release of iOS, such that we can safely adopt Mach exceptions.
Friday Q&A 2013-01-25:
Let's Build NSObject
Related Articles
Let's Build Key-Value Coding
Let's Build UITableView
Let's Build NSInvocation, Part I
Let's Build NSInvocation, Part II
Let's Build stringWithFormat:
Let's Build Dispatch Groups
Let's Build Swift Notifications
Let's Build @synchronized
Let's Build Swift.Array
Let's Build dispatch_queue
The
NSObject
class lies at the root of (almost) all classes we build and use as part of Cocoa programming. What does it actually do, though, and how does it do it? Today, I'm going to rebuild
NSObject
from scratch, as suggested by friend of the blog and occasional guest author Gwynne Raskind.
Components of a Root Class
What exactly does a root class do? In terms of Objective-C itself, there is precisely one requirement: the root class's first instance variable must be
isa
, which is a pointer to the object's class. The
isa
is used to figure out what class an object is when dispatching messages. That's all there has to be, from a strict language standpoint.
A root class that only provides that wouldn't be very useful, of course.
NSObject
provides a lot more. The functionality it provides can be broken down into three categories:
Memory management: standard memory management methods like
retain
and
release
are implemented in
NSObject
. The
alloc
and
dealloc
methods are also implemented there.
Introspection:
NSObject
provides a bunch of methods that are essentially wrappers around Objective-C runtime functionality, such as
class
,
respondsToSelector:
, and
isKindOfClass:
.
Default implementations of miscellaneous methods: there are a bunch of methods that we count on every object implementing, such as
isEqual:
and
description
. In order to ensure that every object has an implementation,
NSObject
provides a default implementation that every subclass gets if it doesn't bring its own.
Code
I'll be reimplementing
NSObject
functionality as
MAObject
. I've posted the full code for this article on GitHub:
https://github.com/mikeash/MAObject
Note that this code is built without ARC. Although ARC is great and should be used whenever possible, it really gets in the way when implementing a root class, because a root class needs to implement memory management and ARC prefers that you leave memory management up to the compiler.
Instance Variables
MAObject
has two instance variables. The first is the
isa
pointer. The second is the object's reference count:
@implementation
MAObject
{
Class
isa
;
volatile
int32_t
retainCount
;
}
The reference count will be managed using functions from
OSAtomic.h
to ensure thread safety, which is why it has a somewhat unusual definition rather than using
NSUInteger
or similar.
NSObject
actually holds reference counts externally. There's a global table which maps an object's address to its reference count. This saves memory, because most objects have a reference count of
1
, which the