Você está na página 1de 121

This is CS50

Harvard Extension School


Spring 2019

Menu
WEEK 1
Lecture 1

Last Week
C
C
CS50 Sandbox
More examples
More problems

Last Week

We learned that, thanks to many layers of abstraction and those who came before us, we can easily
write programs that are ultimately just binary, 0s and 1s.
Problem solving can be described as taking some inputs (a problem) and using an algorithm to find
some outputs (a solution).
Computers represent inputs and outputs with lots of bits, binary digits, 0s and 1s, that are on or off.
And with enough of those bits, we can represent not only larger numbers, but text, images, and video.
And there can be different algorithms that can solve the same problem, but with different running
times.
We can write down algorithms more precisely with pseudocode, and along the way use concepts like
functions, loops, and conditions.
With the help of volunteers from the audience, we make peanut butter and jelly sandwiches from
ingredients, though each of us interpreted the instructions differently!
It turns out, we (as humans) naturally make assumptions and abstractions when following instructions
or even pseudocode. But as we saw in Scratch, and as we will see in C, we won’t be able to do that
anymore, and will have to think more carefully about the steps and cases that our programs will need
to handle.
With Scratch, we were able to leverage the work done by the folks at MIT, who created the blocks,
and sprites, to make programs of our own. And we too made custom blocks like the cough
function, that was a layer of abstraction of our own.
C

We’ll use a new language, C, that’s purely text, which comes with some cryptic keywords and
punctuation:

#include <stdio.h>

int main(void)
{
printf("hello, world\n");
}

This is equivalent to the “when green flag clicked” and “say (hello, world)” block:

We can compare a lot of the constructs in C, to blocks we’ve already seen and used in Scratch. The
syntax is far less important than the principles, which we’ve already been introduced to.
The “say (hello, world)” block is a function, and maps to printf("hello, world\n"); . In C, the
function to print something to the screen is printf , where f stands for “format”, meaning we can
format the string in different ways. Then, we use parentheses to pass in what we want to print. We use
double quotes to surround our text, or string, and add a \n which indicates a new line on the screen.
(Then, the next time we call printf , our text will be on a new line. Finally, we add a semicolon ; to
end this line of code in C.
The “set [counter] to (0)” block is creating a variable, and in C we would say int counter = 0; ,
where int specifies that the type of our variable is an integer:

“change [counter] by (1)” is counter = counter + 1; in C. (In C, the = isn’t like an equation, where
we are saying counter is the same as counter + 1 . Instead, = means “copy the value on the
right, into the value on the left”.) We can also say counter += 1; or counter++; both of which are
“syntactic sugar”, or shortcuts that have the same effect with fewer characters to type.
A condition would map to:

if (x < y)
{
printf("x is less than y\n");
}

Notice that in C, we use { and } (as well as indentation) to indicate how lines of code should
be nested.
We can also have if-else conditions:

if (x < y)
{
printf("x is less than y\n");
}
else
{
printf("x is not less than y\n");
}

As another aside, whitespace (the spaces, new lines, and indentation) are generally not
syntactically important in C, i.e. they won’t change how our program ultimately runs, but
following conventions and having good “style” is important for our code to be readable by
humans.
And even else if :

if (x < y)
{
printf("x is less than y\n");
}
else if (x > y)
{
printf("x is greater than y\n");
}
else if (x == y)
{
printf("x is equal to y\n");
}

Notice that, to compare two values in C, we use == , two equals signs.


And, logically, we don’t need the if (x == y) in the final condition, since that’s the only case
remaining, and we can just say else .
Loops can be written like the following:

while (true)
{
printf("hello, world\n");
}

The while keyword also requires a condition, so we use true as the Boolean expression to
ensure that our loop will run forever. Our program will check whether the expression evaluates to
true (which it always will in this case), and then run the lines inside the curly braces. Then it
will repeat that until the expression isn’t true anymore (which won’t change in this case).

for (int i = 0; i < 50; i++)


{
printf("hello, world\n");
}

To write a loop that runs a specific number of times, we use the for keyword, and first, we create
a variable named i and set it to 0. i is a conventional name for a variable that keeps track of
how many iterations of the loop we’ve already done. Then, we check that i < 50 every time we
reach the top of the loop, before we run any of the code inside. If that expression is true, then we
run the code inside. Finally, after we run the code inside, we use i++ to add one to i , and the
loop repeats.
We can also get input from the user:

string answer = get_string("What's your name?\n");


printf("%s\n", answer);

In Scratch, the response will be stored in a variable called “answer”, but in C we can specify the
name of the variable. We’ll choose “answer” too, and the type of this variable is string , which is
just a sequence of characters.
And we’ll use printf to print the string, but we need to specify how. We first pass in "%s , the
string we want to print, which happens to be just %s . And %s is a placeholder, into which
printf will substitute the value of the string we pass in next, which we specify as answer .

And we need this structure because now, we can convert this:

string answer = get_string("What's your name?\n");


printf("hello, %s\n", answer);

CS50 Sandbox

The CS50 Sandbox is a cloud-based, virtual environment where we’ve installed the right libraries and
settings so that we can all start writing and running code the same way. At the top, there is a simple
code editor, where we can type text. Below, we have a terminal window, into which we can type
commands:

We’ll type our code from earlier into the top:

Notice that our code is colorized, so that certain things are more visible.
And we write our code and save it into a file, to something like hello.c to indicate that it is
written in C.
Once we save the code that we wrote, which is called source code, we need to convert it to machine
code, binary instructions that the computer understands more directly.
We use a program called a compiler to compile our source code into machine code.
To do this, we use the Terminal panel. The $ at the left is a prompt, into which we can type
commands.
We type clang hello.c (where clang stands for “C languages”) and … nothing happens. We see
another $ , waiting for another command. We can click the folder icon on the top left of CS50
Sandbox, and see that we have another file now, called a.out . Now, we can type ./a.out in the
terminal prompt, and see hello, world . We just wrote, compiled, and ran our first program!
We can change the name of our program from a.out to something else. We can pass command-line
arguments to programs in the terminal, if they accept them. For example, we can type clang -o
hello hello.c , and -o hello is telling the program clang to save the compiled output as just
hello . Then, we can just run ./hello . (The . means the current folder.)
We can even abstract this away and just type make hello . We see that, by default (in the CS50
Sandbox), make uses clang to compile our code from hello.c into hello , with other special
features.
Now, let’s try to get input from the user.

#include <stdio.h>

int main(void)
{
string name = get_string("What is your name?\n");
printf("hello, name\n");
}

If we run make hello , we get lots and lots of errors now. But, in cases like this, we should scroll
up to the top, and see what that error is, since the first one might have led to all the others.
We see that the first error is hello.c:5:5: error: use of undeclared identifier
'string' ... . This tells us that, on line 5, character 5, of the file hello.c , the compiler
encountered something called string that it didn’t recognize. In fact, the language C doesn’t
have a type called string .
To simplify things (at least for the beginning), we’ll include a library, or set of code, from CS50. The
library provides us with the string variable type, the get_string function, and more. We just have
to write a line at the top to include the file cs50.h :

#include <cs50.h>
#include <stdio.h>

int main(void)
{
string name = get_string("What is your name?\n");
printf("hello, name\n");
}

And stdio.h is a library that comes with C, that stands for “standard input/output”, which
includes the printf function that prints to the screen,
Now, if we try to compile that code, our first error is hello.c:6:12: error: unused variable
'name' ... . It turns out, we didn’t do anything with the name variable after we created it. To do
that, we need to change the next line:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
string name = get_string("What is your name?\n");
printf("hello, %s\n", name);
}
We’re passing in two arguments, or parameters, to printf . The first is the string we want to
print, with our %s placeholder, and the second is the variable name that we want to substitute
in.
If we change our code, we need to save our file and run make hello again. And, if we wanted to stop
our program before it finishes, we just need to press control-C.
Functions, like get_string or printf , can take arguments. They can also have return values, and
get_string returns something of the type string .

More examples

The CS50 library has other functions, getting input of various types:
get_char

get_double

get_float

get_int

get_long

get_string


And there are corresponding types in C and ways to print them with printf :
bool

char , %c

double

float , %f

int , %i

long , %li

string , %s

The CS50 Sandbox has various languages we can choose from, as well as a file name we can get
started with.
In fact, for each of these examples, you can click on the sandbox links on the curriculum to run and
edit your own copies of them.
In int.c , we get and print an integer:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
int i = get_int("Integer: ");
printf("hello, %i\n", i);
}
Notice that we use %i to print an integer.
int main(void) is the equivalent of “when green flag clicked”, and we’ll learn more about that
in the coming weeks.
We can now run make int and run our program with ./int .
In float.c , we can get decimal numbers (called floating-point values in computers, because the
decimal point can “float” between the digits, depending on the number):

#include <cs50.h>
#include <stdio.h>

int main(void)
{
float f = get_float("Float: ");
printf("hello, %f\n", f);
}

Now, if we compile and run our program, we see something like hello, 42.000000 , even if we
just typed in 42 at the prompt.
With ints.c , we can do some math:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Prompt user for x
int x = get_int("x: ");

// Prompt user for y


int y = get_int("y: ");

// Perform arithmetic
printf("x + y = %i\n", x + y);
printf("x - y = %i\n", x - y);
printf("x * y = %i\n", x * y);
printf("x / y = %i\n", x / y);
printf("x mod y = %i\n", x % y);
}

First, we get two integers, x and y . Then, we print out what we want to do, like x + y =
%i\n , and pass in the value we want, x + y . * is used for multiplication, and / for division.
% on its own, between two variables, is the modulo operator.

Interestingly, when we pass in 2 for x and 10 for y , we got … x - y = 0 . It turns out,


since the two variables are integers, the result is an integer, and since 2 divided by 10 is less than
1, all we have left is the 0.
With floats.c , we can see what happens when we use floats:

#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Prompt user for x
float x = get_float("x: ");

// Prompt user for y


float y = get_float("y: ");

// Perform division
printf("x / y = %.50f\n", x / y);
}

With %50f , we can specify the number of decimal places displayed.


Hmm, now we get …

x: 2
y: 10
x / y = 0.20000000298023223876953125000000000000000000000000

Our computer has memory, in hardware chips called RAM, random-access memory. Our programs use
that RAM to store data as they run, but that memory is finite. So with a finite number of bits, we can’t
represent all possible numbers (of which there are an infinite number of). So our computer has a
certain number of bits for each float, and has to round to the nearest decimal value at a certain point.
And these imprecisions can be problematic in finance, rockets, or scientific applications. But we can
get around this problem, by specifying the number of decimal places we will be precise to, and
allocate the right number of bits to represent that many decimal places.
A float in C, on most computers, uses 4 bytes, or 32 bits. Another type, called a double, uses twice as
many bits, or 8 bytes.
If we run doubles.c , which is floats.c but with the double type for variables, we see that we
have many more decimal digits of precision. And the tradeoff for the additional precision is that we
now have to use more memory space.
Let’s look at parity.c :

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Prompt user for integer
int n = get_int("n: ");

// Check parity of integer


if (n % 2 == 0)
{
printf("even\n");
}
else
{
printf("odd\n");
}
}

By taking the remainder after we divide n by 2, we can tell whether n is even or odd.
In conditions.c , we turn the snippet from before into a program:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Prompt user for x
int x = get_int("x: ");

// Prompt user for y


int y = get_int("y: ");

// Compare x and y
if (x < y)
{
printf("x is less than y\n");
}
else if (x > y)
{
printf("x is greater than y\n");
}
else
{
printf("x is equal to y\n");
}
}

In answer.c , we get text from the user:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Prompt user for answer
char c = get_char("Answer: ");

// Check answer
if (c == 'Y' || c == 'y')
{
printf("yes\n");
}
else if (c == 'N' || c == 'n')
{
printf("no\n");
}
}

Here, we use get_char and the char data type to get a single character from the user.
Notice that we use a || to indicate an “or” in our Boolean expression. (A logical “and” would be
&& .)

In Scratch, we were able to create our own block, that we called “cough”. We can do the same in C, by
creating our own function.
If we wanted to print “cough” 3 times, we could use a for loop:

#include <stdio.h>

int main(void)
{
for (int i = 0; i < 3; i++)
{
printf("cough\n");
}
}

We can move the printf line to its own function:

#include <stdio.h>

void cough(void);

int main(void)
{
for (int i = 0; i < 3; i++)
{
cough();
}
}

// Cough once
void cough(void)
{
printf("cough\n");
}

Notice that we need to declare that the cough function exists, so we need the prototype, void
cough(void); , before our main function calls it. The C compiler reads our code from top to
bottom, so we need to tell it that the cough function exists, before we use it. And we want to
keep our main function close to the top, so the actual implementation of cough will still be
below it.
In fact, cs50.h and stdio.h are both header files, containing prototypes for functions like
get_string and printf that we can then use. The actual implementation of those files are in
cs50.c and stdio.c as source code, and compiled to files elsewhere on the system.

And our cough function doesn’t take any inputs, so we have cough(void) , and the function
also doesn’t return anything, so we have void in front of cough as well. (Our main function is
supposed to return an int , and by default it will return 0 if nothing goes wrong.)
We can abstract cough further:
#include <stdio.h>

void cough(int n);

int main(void)
{
cough(3);
}

// Cough some number of times


void cough(int n)
{
for (int i = 0; i < n; i++)
{
printf("cough\n");
}
}

Now, when we want to print “cough” some number of times, we can just call that same function.
Notice that, with cough(int n) , we indicate that the cough function takes as input an int ,
which we refer to as n . And inside cough , we use n in our for loop to print “cough” the right
number of times.
Let’s look at positive.c :

#include <cs50.h>
#include <stdio.h>

int get_positive_int(string prompt);

int main(void)
{
int i = get_positive_int("Positive integer: ");
printf("%i\n", i);
}

// Prompt user for positive integer


int get_positive_int(string prompt)
{
int n;
do
{
n = get_int("%s", prompt);
}
while (n < 1);
return n;
}

The CS50 library doesn’t had a get_positive_int function, but we can write one ourselves. In
our function, we initialize a variable, int n , but not assign a value to it yet. Then, we have a
new construct, do ... while , which does something rst, then checks a condition, and repeats
until the condition is no longer true.
Then, once we have an n that is not < 1 , we can return it with the return keyword. And back
in our main function, we can set int i to that value.
In C, variables also have scope, which generally means that they only exist within the curly braces
that they were declared. For example, if we had int n = get_int(...) within the do-while
loop, we wouldn’t be able to return it, since that line would be outside of the scope of n .
(Similarly, our main function can’t directly see any variables inside get_positive_int , since
each function has its own set of curly braces and thus different scopes for variables declared
inside them.)
In Scratch, you might have noticed that you could make a variable available to one sprite, or all sprites.
And in C, we have both local and global variables. All variables we’ve seen thus far are local, though
eventually we’ll see global variables, which we’ll be able to use anywhere in our program.

More problems

We’ve already seen an example of floating-point imprecision, but we can also have problems with
integers.
If, for example, we had a number like 129, to which we added a 1, we wouldn’t have 1210, where the
last digit went from 9 to 10. Instead, we carry the 1, such that the number we have is 130. And if we
had a number like 999, we would carry the 1 a few times, until we got the number 1000.
But if we only had space to write down 3 digits, we would end up with 000. And this problem is called
overflow, where the number we are trying to store is too big for the amount of space we have
allocated.
In binary, if we had the number 111 , and added 1, we would carry that 1 until we got 1000 . And
similarly, if we only had 3 bits, we would have 000 .
In the Lego Star Wars game, there is a set maximum of 4 billion coins that the player can collect, since
presumably there are only 32 bits used to store that count (and 2 to the power of 32 is slightly over 4
billion).
We can see this in overflow.c :

#include <stdio.h>
#include <unistd.h>

int main(void)
{
// Iteratively double i
for (int i = 1; ; i *= 2)
{
printf("%i\n", i);
sleep(1);
}
}

Notice that here, we have a line that starts with // , which indicates a comment. A comment is a
note to ourselves or future readers, that the compiler will ignore.
In our for loop, we set i to 1 , and double it with *= 2 . (And we’ll keep doing this forever,
so there’s no condition we check.)
We also use the sleep function from unistd.h to let our program pause each time.
Now, when we run this program, we see the number getting bigger and bigger, until:

1073741824
overflow.c:9:31: runtime error: signed integer overflow: 1073741824 * 2 cannot
-2147483648
0
0
...

It turns out, our program recognized that a signed integer (an integer with a positive or negative
sign) couldn’t store that next value, and printed an error. Then, since it tried to double it anyways,
i became a negative number, and then 0.

The Y2K problem arose because many programs stored the calendar year with just two digits, like 98
for 1998, and 99 for 1999. But when the year 2000 approached, the programs would have stored 00,
leading to confusion between the years 1900 and 2000.
A Boeing 787 airplane also had a bug where a counter in the generator overflows after a certain
number of days of continuous operation, since the number of seconds it has been running could no
longer be stored in that counter.
In an older version of Civilization, integer underflow leads to one of the characters, Gandhi, becoming
much more aggressive since his “aggression” value, already low, becomes large when too much is
subtracted from it. For example, if we had 00000001 stored, and subtract 1 from it, we would have
00000000 . But if we were to subtract 2, we actually roll backwards to 11111111 , which is the
largest positive value!
So, we’ve seen a few problems that can happen, but hopefully now too understand why and how to
prevent them.
With this week’s problem set, we’ll use the CS50 Lab, built on top of the CS50 Sandbox, to write some
programs with walkthroughs to guide us.
This is CS50
Harvard Extension School
WEEK 2
Spring 2019

Menu ARRAYS

Lecture 2

Compiling
Debugging
Memory
Arrays
Strings
Command-line arguments
Encryption
Exit codes
Sorting

Compiling

We started the course with Scratch, and then learned C.


Recall that we write our source code in C, but needed to compile it to machine code, in binary, before
our computers could run it.
clang is the compiler we learned to use, and make is a utility that helps us run clang without
having to indicate all the options manually.
If we wanted to use CS50’s library, via #include <cs50.h> , and use clang instead of make ,
we also have to add a flag: clang hello.c -lcs50 . The -l flag links the cs50 file, which
was installed into the CS50 Sandbox.
“Compiling” source code into machine code is actually made up of smaller steps:
preprocessing
compiling
assembling
linking
Preprocessing involves looking at lines that start with a # , like #include , before everything else. For
example, #include <cs50.h> will tell clang to look for that header file first, since it contains
content that we want to include in our program. Then, clang will essentially replace the contents of
those header files into our program:

...
string get_string(string prompt);
int printf(const char *format, ...);
...
int main(void)
{
string name = get_string("Name: ");
printf("hello, %s\n", name);
}

Compiling takes our source code, in C, and converts it to assembly code, which looks like this:

...
main: # @main
.cfi_startproc
# BB#0:
pushq %rbp
.Ltmp0:
.cfi_def_cfa_offset 16
.Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp2:
.cfi_def_cfa_register %rbp
subq $16, %rsp
xorl %eax, %eax
movl %eax, %edi
movabsq $.L.str, %rsi
movb $0, %al
callq get_string
movabsq $.L.str.1, %rdi
movq %rax, -8(%rbp)
movq -8(%rbp), %rsi
movb $0, %al
callq printf
...

These instructions are lower-level and can be understood by the CPU more directly, and generally
operate on bytes themselves, as opposed to abstractions like variable names.
The next step is to take the assembly code and translate it to instructions in binary by assembling it.
Now, the final step is linking, where the contents of linked libraries, like cs50.c , are actually included
in our program as binary.

Debugging

Let’s say we wrote this program, buggy0 :


int main(void)
{
printf("hello, world\n")
}

We see an error, when we try to make this program, that we didn’t include a missing header file.
We can also run help50 make buggy0 , which will tell us, at the end, that we should #include
<stdio.h> , which contains printf .

We do that, and see another error, and realize we’re missing a semicolon at the end of our line.
Let’s look at another program:

#include <stdio.h>

int main(void)
{
for (int i = 0; i <= 10; i++)
{
printf("#\n");
}
}

Hmm, we intended to only see 10 # s, but there are 11. If we didn’t know what the problem is
(since our program is working as we wrote it), we could add another print line to help us:

#include <stdio.h>

int main(void)
{
for (int i = 0; i <= 10; i++)
{
printf("i is %i\n", i);
printf("#\n");
}
}

Now, we see that i started at 0 and continued until it was 10, but we should have it stop once
it’s at 10.
If we wrote our program without any whitespace, like the below, it would still be correct:

#include <stdio.h>

int main(void)
{
for (int i = 0; i < 10; i++)
{
printf("i is %i\n", i);
printf("#\n");
}
}
But, our program is much harder to read, and so it’s poorly styled. With indentation for our loops,
it’ll be easier to see the nesting of our lines of code.
We can run style50 buggy2.c , and see suggestions for what we should change.
So to recap, we have three tools to help us improve our code:
help50

printf

style50

Memory

Inside our computers, we have chips called RAM, random-access memory, that stores data for short-
term use. We might save a file to our hard drive (or SSD) for long-term storage, but when we open it
and start making changes, it gets copied to RAM. Though RAM is much smaller, and temporary (until
the power is turned off), it is much faster.
We can think of bytes, stored in RAM, as though they were in a grid:

In reality, there are millions or billions of bytes per chip.


In C, when we create a variable of type char , which will be sized one byte, it will physically be stored
in one of those boxes in RAM. An integer, with 4 bytes, will take up four of those boxes.

Arrays

In memory, we can store variables one after another, back-to-back. And in C, a list of variables stored,
one after another in a contiguous chunk of memory, is called an array.
It turns out, we can do interesting things with just array.
Let’s look at scores0.c :

#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Get scores from user
int score1 = get_int("Score 1: ");
int score2 = get_int("Score 2: ");
int score3 = get_int("Score 3: ");

// Generate first bar


printf("Score 1: ");
for (int i = 0; i < score1; i++)
{
printf("#");
}
printf("\n");

// Generate second bar


printf("Score 2: ");
for (int i = 0; i < score2; i++)
{
printf("#");
}
printf("\n");

// Generate third bar


printf("Score 3: ");
for (int i = 0; i < score3; i++)
{
printf("#");
}
printf("\n");
}

We get 3 scores from the user, and print bars for each score.
Our 3 integers, score1 , score2 , and score3 will be stored somewhere in memory.
We can use a loop, but we can start factoring out pieces:

#include <cs50.h>
#include <stdio.h>

void chart(int score);

int main(void)
{
// Get scores from user
int score1 = get_int("Score 1: ");
int score2 = get_int("Score 2: ");
int score3 = get_int("Score 3: ");

// Chart first score


printf("Score 1: ");
chart(score1);

// Chart second score


printf("Score 2: ");
chart(score2);

// Chart third score


printf("Score 3: ");
chart(score3);
}

// Generate bar
void chart(int score)
{
// Output one hash per point
for (int i = 0; i < score; i++)
{
printf("#");
}
printf("\n");
}

Now, we have a chart function that can print each score.


Remember that we need our prototype, void chart(int score); , to be at the top. We could
also have the entire chart function at the top, before we use it, but eventually our main
function would be pushed down too far, and be harder and harder to find.
With an array, we can collect our scores in a loop, and access them later in a loop, too:

// Generates a bar chart of three scores using an array

#include <cs50.h>
#include <stdio.h>

void chart(int score);

int main(void)
{
// Get scores from user
int scores[3];
for (int i = 0; i < 3; i++)
{
scores[i] = get_int("Score %i: ", i + 1);
}

// Chart scores
for (int i = 0; i < 3; i++)
{
printf("Score %i: ", i + 1);
chart(scores[i]);
}
}

// Generate bar
void chart(int score)
{
// Output one hash per point
for (int i = 0; i < score; i++)
{
printf("#");
}
printf("\n");
}

Notice that we use int scores[3] to initialize an array for 3 integers. Then, we use scores[i]
= ... to store values into that array, using some index i that goes from 0 to 2 (since there
are 3 elements).
Then, we use scores[i] to access the values stored, at each index.
We repeat the value 3 in a few times, so we can factor that out to a constant, or a number we can
specify and use globally:

#include <cs50.h>
#include <stdio.h>

const int COUNT = 3;

void chart(int score);

int main(void)
{
// Get scores from user
int scores[COUNT];
for (int i = 0; i < COUNT; i++)
{
scores[i] = get_int("Score %i: ", i + 1);
}

// Chart scores
for (int i = 0; i < COUNT; i++)
{
printf("Score %i: ", i + 1);
chart(scores[i]);
}
}

// Generate bar
void chart(int score)
{
// Output one hash per point
for (int i = 0; i < score; i++)
{
printf("#");
}
printf("\n");
}

At the top, we use the const keyword to indicate that this value shouldn’t change. And we can
use this throughout our code, so if we wanted this value to change, we only need to change it
once. Finally, COUNT is in all capital letters, to indicate that it’s a constant (by convention).
We can have our chart function print the entire chart, not just one bar at a time:

#include <cs50.h>
#include <math.h>
#include <stdio.h>

const int COUNT = 3;

void chart(int count, int scores[]);

int main(void)
{
// Get scores from user
int scores[COUNT];
for (int i = 0; i < COUNT; i++)
{
scores[i] = get_int("Score %i: ", i + 1);
}

// Chart scores
chart(COUNT, scores);
}

// Generate bars
void chart(int count, int scores[])
{
// Output one hash per point
for (int i = 0; i < count; i++)
{
for (int j = 0; j < scores[i]; j++)
{
printf("#");
}
printf("\n");
}
}

By passing in the entire scores array, as well as the count of scores we want to print, we can
have the chart function iterate over scores . In fact, chart doesn’t know how big the
scores array actually is, so we necessarily have to pass in a count .

Strings

Strings are actually just arrays of characters. We can see this with string0.c :

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string s = get_string("Input: ");
printf("Output: ");
for (int i = 0; i < strlen(s); i++)
{
printf("%c\n", s[i]);
}
}

First, we need a new library, string.h , for strlen , which tells us the length of a string. Then,
we use the same syntax to access elements in arrays, s[i] , to print each individual character of
the string s .
We can improve the design of our program. string0 was a bit inefficient, since we check the length
of the string, after each character is printed, in our condition. But since the length of the string doesn’t
change, we can check the length of the string once:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string s = get_string("Input: ");
printf("Output:\n");
for (int i = 0, n = strlen(s); i < n; i++)
{
printf("%c\n", s[i]);
}
}

Now, at the start of our loop, we initialize both an i and n variable, and remember the length
of our string in n . Then, we can check the values each time, without having to actually calculate
the length of the string.
n will only be accessible in the scope of the for loop, though we could initialize it outside of
the loop, if we wanted to reuse it later.
When a string is stored in memory, each character is placed into one byte into the grid of bytes.
Somewhere, for example, Zamyla is stored in 6 bytes. But one more byte is needed, to indicate the

end of the string:

The byte in memory where the first character of the string, Z , is stored, is labeled s , since we
called our string s in the code above. Then, after the last character, a , we have one byte with
all 0 s, to indicate the end of the string. And the byte of all 0 s is called a null character, which
we can also write as \0 .
If we wanted to write our own version of strlen , for example, we would need to know this:
#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Prompt for user's name
string s = get_string("Name: ");

// Count number of characters up until '\0' (aka NUL)


int n = 0;
while (s[n] != '\0')
{
n++;
}
printf("%i\n", n);
}

Here, we iterate over each character of the string s with the syntax we use to access elements in
arrays, and we increment a counter, n , as long as the character isn’t the null character, \0 . If it
is, we’re at the end of the string, and can print out the value of n .
And, since we know that each character has a numeric, ASCII value, we can even print that:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string s = get_string("String: ");
for (int i = 0; i < strlen(s); i++)
{
int c = (int) s[i];
printf("%c %i\n", s[i], c);
}
}

With (int) s[i] , we take the value of s[i] and convert that character type to an integer
type. Then, we can print out both the character and its numeric value.
Technically, we can even do printf("%c %i\n", s[i], s[i]); , and printf will interpret
the value of s[i] as an integer.
We can now combine what we’ve seen, to write a program that can capitalize letters:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string s = get_string("Before: ");
printf("After: ");
for (int i = 0, n = strlen(s); i < n; i++)
{
if (s[i] >= 'a' && s[i] <= 'z')
{
printf("%c", s[i] - ('a' - 'A'));
}
else
{
printf("%c", s[i]);
}
}
printf("\n");
}

First, we get a string s . Then, for each character in the string, if it’s lowercase (its value is
between that of a and z ), we convert it to uppercase. Otherwise, we just print it.
We can convert a lowercase letter to its uppercase equivalent, by subtracting the difference
between a lowercase a and an uppercase A . (We know that lowercase letters have a higher
value than uppercase letters, and so we can subtract that difference to get an uppercase letter
from a lowercase letter.)
But there are library functions that we can use, to accomplish the same thing:

#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
string s = get_string("Before: ");
printf("After: ");
for (int i = 0, n = strlen(s); i < n; i++)
{
if (islower(s[i]))
{
printf("%c", toupper(s[i]));
}
else
{
printf("%c", s[i]);
}
}
printf("\n");
}

islower() and toupper() are two functions, among others, from a library called ctype , that
we can use. (And we would only know this from reading the documentation, for that library, that
other people wrote.)
We can use a command-line program, man , to read manual information for other programs, if it
exists. For example, we can run man toupper to see some documentation about that function.
Then, we’ll see that toupper will return the character as-is, if it’s not a lowercase letter, and so
we can simply have:

for (int i = 0, n = strlen(s); i < n; i++)


{
printf("%c", toupper(s[i]));
}

Command-line arguments

We’ve used programs like make and clang , which take in extra words after their name in the
command line. It turns out that programs of our own, can also take in command-line arguments.
In argv0.c , we change what our main function looks like:

#include <cs50.h>
#include <stdio.h>

int main(int argc, string argv[])


{
if (argc == 2)
{
printf("hello, %s\n", argv[1]);
}
else
{
printf("hello, world\n");
}
}

argc and argv are two variables that our main function will now get, when our program is
run from the command line. argc is the argument count, or number of arguments, and argv is
an array of strings that are the arguments. And the first argument, argv[0] , will be the name of
our program (the first word typed, like ./hello ). In this example, we’ll check if we have two
arguments, and print out the second one if so.
We can print every argument, one at a time:

#include <cs50.h>
#include <stdio.h>

int main(int argc, string argv[])


{
for (int i = 0; i < argc; i++)
{
printf("%s\n", argv[i]);
}
}

We can print out each character of each argument, too:


#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(int argc, string argv[])


{
for (int i = 0; i < argc; i++)
{
for (int j = 0, n = strlen(argv[i]); j < n; j++)
{
printf("%c\n", argv[i][j]);
}
printf("\n");
}
}

With argv[i] , we get the current argument from the array of arguments, and with argv[i]
[j] , we get a character from that string.

Encryption

If we wanted to send a message to someone, we might want to encrypt, or somehow scramble that
message so that it would be hard for others to read. The original message is called plaintext, and the
encrypted message is called ciphertext.
A message like HI! could be converted to ASCII, 72 73 33 . But anyone would be able to convert
that back to letters.
We look at examples, from World War I, to a poem about Paul Revere’s ride, of historical codes.
Encryption generally requires another input, in addition to the plaintext. A key is needed, and
sometimes it is simply a number, that is kept secret. With the key, plaintext can be converted, via some
algorith, to ciphertext, and vice versa.
For example, if we wanted to send a message like I L O V E Y O U , we can first convert it to ASCII:
73 76 79 86 69 89 79 85 . Then, we can encrypt it with a key of just 1 and a simple algorithm,
where we just add the key to each value: 74 77 80 87 70 90 80 86 . Then, someone converting
that ASCII back to text will see J M P W F Z P V . To decrypt this, someone might have to guess the
value of each letter, through trial-and-error, but they wouldn’t be sure, without knowing the key. In
fact, this algorithm is known as a Caesar cipher.

Exit codes

It turns out that we can indicate errors in our program, by returning a value from our main function:

#include <cs50.h>
#include <stdio.h>

int main(int argc, string argv[])


{
if (argc != 2)
{
printf("missing command-line argument\n");
return 1;
}
printf("hello, %s\n", argv[1]);
return 0;
}

The return value of main in our program is called an exit code, and we can actually see this in
our command line. If we ran this program with ./exit , we can then type echo $? , which will
print the last program’s return value.
As we write more complex programs, error codes like this will help us determine what went
wrong, even if it’s not visible or meaningful to the user.

Sorting

With arrays, we can solve more interesting problems than before. We can think of arrays like lockers,
where, behind the doors of each locker, is some value, like an integer or character. Indeed, computers
can only look at one locker, or value at a time.
If we had a list of numbers, and we wanted to find a number in that list, the best we could do is look
through it, one at a time, or randomly.
But if we knew the list was sorted, we could look in the middle first, and move left or right accordingly.
With some volunteers, we demonstrate how we might sort a list.
Our volunteers start in the following random order:

6 5 1 3 7 8 4 2

We look at the first two numbers, and swap them so they are in order:

5 6 1 3 7 8 4 2

Then we look at the next pair, 6 and 1 , and swap them:

5 1 6 3 7 8 4 2

We repeat this, until, after our first pass, the largest number ended up furthest on the right:

5 1 6 3 7 4 2 8

(In the lecture, the 1 accidentally moved a spot too far!)


We repeat this, and every time we make a pass, the next-largest number ends up next-furthest to the
right:

1 5 3 6 4 2 7 8
Eventually, our list becomes fully sorted. The first time, we compared 7 pairs of numbers. The second
time, we compared 6 pairs.
We shuffle our numbers again:

2 4 8 5 7 1 3 6

And this time, we look for the smallest number each time, as we go down the list, and put that to the
far left:

1 4 8 5 7 2 3 6
1 2 8 5 7 4 3 6
1 2 3 5 7 4 8 6

Each time, we select the smallest number and swap it with the number that’s in the furthest left
part of the unsorted part of the list.
With this algorithm, we still pass through the list n - 1 times, since there are n people, and we do have
to compare each number with the smallest number we’ve seen thus far.
Let’s try to figure this out a little more formally. The first algorithm, bubble sort, involved comparing
pairs of numbers next to each other, until the largest bubbled up to the right. We might write that in
pseudocode as:

repeat until no swaps


for i from 0 to n-2
if i'th and i+1'th elements out of order
swap them

And selection sort might be as follows:

for i from 0 to n-1


find smallest element between i'th and n-1'th
swap smallest with i'th element

For the first pass, we needed to make n - 1 comparisons, to find the smallest number. Then, in each
of the following passes, we made one less comparison, since we had already moved some numbers to
the left:

(n – 1) + (n – 2) + ... + 1
n(n – 1)/2
(n^2 – n)/2
n^2 / 2 – n/2

Each line simplifies to the next, and eventually, we get n^2 / 2 – n/2 as the number of
comparisons we need to make. In computer science, we can use O, big O notation, to simplify that
further, and say that our algorithm takes O(n^2) steps, “on the order of n squared”. This is because,
as n gets bigger and bigger, only the n^2 term matters.
For example, if n were 1,000,000, we would get:
n^2 / 2 – n/2
1,000,000^2 / 2 – 1,000,000/2
500,000,000,000 – 500,000
499,999,500,000

which is on the same order of magnitude as n2.


It turns out, there are other common orders of magnitude:
O(n2)
O(n log n)
O(n)
O(log n)
O(1)
Searching through a phone book, one page at a time, has O(n) running time, since we need one step for
every page. Using binary search would have O(log n) running time, since we divided the problem in
half each time.
Let’s take another array of numbers, but this time, use an empty array of the same size as our working
space:

4 2 7 5 6 8 3 1
_ _ _ _ _ _ _ _

Since we have 8 numbers, let’s look at the first half, the first 4. We’ll sort that recursively, and look at
just the left half of that. With 2 numbers, 4 2 , we look at the left half of that (sorted), and the right
half of that (sorted), and combine them by sorting them, 2 4 . We’ll move them to our second array:

_ _ 7 5 6 8 3 1
2 4 _ _ _ _ _ _

We repeat this, for the right half of the original half:

_ _ | _ _ 6 8 3 1
2 4 | 5 7 _ _ _ _

Then, we merge those halves, to get a sorted left half:

_ _ _ _ 6 8 3 1
_ _ _ _ _ _ _ _
2 4 5 7 _ _ _ _

We repeat, for the right half:

_ _ _ _ | _ _ _ _
_ _ _ _ | _ _ _ _
2 4 5 7 | 1 3 6 8

And now, we can merge both halves:


_ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _
1 2 3 4 5 6 7 8

Each number had to move 3 times, since we divided 8 by 2 three times, or log n times. So this
algorithm takes O(n log n) to sort a list.
We look at demos like Sorting Algorithms Animations and What different sorting algorithms sound like
to conclude.
This is CS50
Harvard Extension School
Spring 2019
WEEK 3
Menu

MEMORY
Lecture 3

Enhance
Last time
CS50 IDE
Tools
Strings
Memory
Memory layout
Structs
Enhance?

Enhance

We watch a clip, CSI Zoom Enhance, where the main characters zoom in further and further into an
image, revealing more and more details. Today, we’ll see how that works (or doesn’t) in reality.

Last time

We talked about the details of compiling, which is actually made of four steps:
First, our source code is preprocessed, so any header files like stdio.h that we include, are
actually included.
Then, our code is compiled into assembly code, instructions that our CPU can understand.
Then, that assembly code is further assembled into the binary that match those assembly
instructions.
Finally, the compiled library files that we wanted to include, such as cs50.c or printf.c , are
linked, or merged with our program.
We discovered some helpful tools:
help50 , which might help us understand error messages
printf , which can help us understand our program as it runs

style50 , which checks the style of our code so it’s more readable and consistent

CS50 IDE

CS50 IDE is like the CS50 Sandbox, but with more features. It is an online development environment,
with a code editor and a terminal window, but also tools for debugging and collaborating:

Once we log in, we’ll see a workspace that looks similar to that of CS50 Sandbox, but now our
workspace will be saved to our account.
We can create a new file with File > New File (or the green plus sign), and use File > Save to save it as
hello.c in the folder ~/workspace/ . Now we’ll write our simple program:

#include <stdio.h>

int main(void)
{
printf("hello, world\n");
}

And we’ll need to manually save, with File > Save or the keyboard shortcut.
Now, in the terminal window below, we can type make hello and ./hello to see our program run.
The folder icon at the top left will show us all our files in a directory (folder) called ~/workspace/ ,
and we can create folders and files inside. The ~ symbol refers to our home directory in this
environment, which is just the set of all the files related to our account, and workspace is a folder
inside ~ that we can use. (The ~ directory also has other configuration files for our account, but we
won’t need to worry about them.)
In the terminal, we see ~/workspace/ $ . The $ part of the prompt is the same as before, after
which we can type a command, but the first part of the prompt tells us the directory our terminal is in.
For example, we can type ls , and we’ll see a textual version of the workspace directory. And
./hello refers to a file called hello in . , which is the current folder.

We can change our directory with cd , and if we type something like cd src3 (assuming we have a
folder already named src3 ), we’ll see our prompt change to ~/workspace/src3/ $ .
We can delete files and folders with the graphical file tree, right-clicking them as we might be familiar
with already. But we can do the same in the command line, with rm hello , which will remove files.
The command will ask us for a confirmation, and we can type yes or y (or n , if we’ve changed our
minds).
We can create directories with mkdir test , and rmdir to remove them.

Tools

In the CS50 IDE, we’ve also added another tool, check50 . Like style50 , we wrote this tool to
automatically check the correctness of your programs, by passing in inputs and looking at their
outputs.
After we write a program from a problem set, and have tested it ourselves with a few inputs, we can
type check50 cs50/2018/fall/hello . The cs50/2018/fall/hello is an indicator for the
program specification that check50 should check, and once we run that command, we’ll see
check50 uploading our code and checking it.

We can also now use a tool called a debugger, built into the CS50 IDE.
After we compile our code, we can run debug50 ./hello , which will tell us to set a breakpoint first.
A breakpoint indicates a line of code where the debugger should pause our program, until we choose to
continue it. For example, we can click to the left of a line of our code, and a red circle will appear:
Now, if we run debug50 ./hello again, we’ll see the debugger panel open on the right:

We see that the variable we made, name , is under the Local Variables section, and see that
there’s a value of 0x0 (which is null ), and a type of string , as we expected.
Our breakpoint has paused our program before line 6, so to continue, we have a few controls in the
debugger panel. The blue triangle will continue our program until we reach another breakpoint. The
curved arrow to its right will “step over” the line, running it and pausing our program again
immediately after. The arrow pointing downward will “step into” the line, if there is a function being
called. And the arrow pointing up and to the right will “step out” of a function, if we are in one.
So, we’ll use the curved arrow to run the next line, and see what changes after. After we type in our
name, we’ll see that the name variable is also updated in the debugger.
We can save lots of time in the future by investing a little bit now to learn how the debugger works!

Strings

We’ve been using helpful functions from the CS50 Library, like get_int or get_string , to get input
of a specific type from the user. These functions are generally tricky to write, because we want to
prompt the user over and over again, if the input they give us isn’t actually valid.
Today, we’ll look into the string type. As we learned last week, a string is just an array of characters,
stored back-to-back. But let’s investigate what a string variable actually is.
Let’s open compare0.c :

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Get two integers
int i = get_int("i: ");
int j = get_int("j: ");
// Compare integers
if (i == j)
{
printf("same\n");
}
else
{
printf("different\n");
}
}

As expected, if we provide the same values for i and j , we see that they’re the same.
In compare1.c , we’ll try to do the same with strings:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Get two strings
string s = get_string("s: ");
string t = get_string("t: ");

if (s == t)
{
printf("same\n");
}
else
{
printf("different\n");
}
}

Hmm, no matter what we type in for our strings, our program thinks they are different.
It turns out, string is not actually a data type in C. The word “string” is common in computer science,
but there is no way to store strings in C. Instead, we defined that type in the CS50 Library.
Recall that strings are just arrays of characters, so when we ran our compare1 program, we got two
strings as input from the user, and those might be stored in memory as the following:

Each character is in one byte, and somewhere we have bytes in memory containing the values for
each of string.
It turns out, each byte in memory has a numeric location, or address. For example, the character B
might have the address 100, and V might have ended up in 900 (depending on what parts of
memory were available, or free):

Notice that, since each string is an array of characters, each character within the array has
consecutive addresses, since they are stored next to each other in memory. But the strings
themselves might have very different addresses.
So, get_string actually returns just the address of the first character of the string. (We can tell
where it ends by looking for the null character, \0 .) Now, we can infer that comparing two “strings”
actually just compares two addresses (which will always be different, since get_string stores the
input in a new place each time), even if the characters stored at those addresses are the same.
Other data types in C, such as int s or float s, are generally passed and stored as their values, since
they are always a fixed number of bytes. Strings, on the other hand, are passed as their addresses,
since they could be really long.
If we do want to compare two strings, it seems like what we need to do is compare each character one
at a time:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

bool compare_strings(string a, string b);

int main(void)
{
// Get two strings
string s = get_string("s: ");
string t = get_string("t: ");

// Compare strings for equality


if (compare_strings(s, t))
{
printf("same\n");
}
else
{
printf("different\n");
}
}

bool compare_strings(string a, string b)


{
// Compare strings' lengths
if (strlen(a) != strlen(b))
{
return false;
}

// Compare strings character by character


for (int i = 0, n = strlen(a); i < n; i++)
{
// Different
if (a[i] != b[i])
{
return false;
}
}

// Same
return true;
}

We write a function called compare_strings , which takes in two strings as arguments, and
return a bool , or Boolean expression.
First, we compare the strings’ lengths, and return false if they are not the same. Then, we can
check each character, and return false if we get to any that are different.
We also need to remember to add the prototype, bool compare_strings(string a, string
b); to the top.

A string is actually a synonym for a char * . The * in C (which also means multiplication,
depending on the context), means that the data type is an address. So a char * is an address to a
char . And such a variable type is called, more formally, a pointer.

Now, we can replace char * where we’ve been using string:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

bool compare_strings(char *a, char *b);

int main(void)
{
// Get two strings
char *s = get_string("s: ");
char *t = get_string("t: ");

// Compare strings for equality


if (compare_strings(s, t))
{
printf("same\n");
}
else
{
printf("different\n");
}
}
bool compare_strings(char *a, char *b)
{
// Compare strings' lengths
if (strlen(a) != strlen(b))
{
return false;
}

// Compare strings character by character


for (int i = 0, n = strlen(a); i < n; i++)
{
// Different
if (a[i] != b[i])
{
return false;
}
}

// Same
return true;
}

It turns out, there’s a library function in string.h , written by others many years ago, called strcmp ,
which compares strings for us:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
// Get two strings
char *s = get_string("s: ");
char *t = get_string("t: ");

// Compare strings for equality


if (strcmp(s, t) == 0)
{
printf("same\n");
}
else
{
printf("different\n");
}
}

The return value for strcmp , based on looking at documentation like CS50 Reference, will be 0
if the strings are equal, or some other value if they are different.
We should also be checking for other errors, that we haven’t paid attention to before.
get_string is supposed to return the address to the first byte of a string, but sometimes it may
return NULL , an invalid address that indicates something went wrong. (And that address has the value
of 0 , which is a special address that isn’t used to store anything.)
To check for errors, we might do this:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
// Get a string
char *s = get_string("s: ");
if (s == NULL)
{
return 1;
}

// Get another string


char *t = get_string("t: ");
if (t == NULL)
{
return 1;
}

// Compare strings for equality


if (strcmp(s, t) == 0)
{
printf("same\n");
}
else
{
printf("different\n");
}
return 0;
}

If, for some reason, get_string doesn’t return a valid address, we ourselves will return an exit
code of 1 , to indicate some error has occurred. If we continued, we might see a segmentation
fault, which means that we tried to access memory that we aren’t able to (such as at the NULL
address).
We can simplify the condition to just if (!s) , since “not s ” will be “not 0” when s is NULL ,
which ultimately resolves to “true”.
Now, let’s try to copy a string:

#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
// Get a string
string s = get_string("s: ");
// Copy string's address
string t = s;

// Capitalize first letter in string


if (strlen(t) > 0)
{
t[0] = toupper(t[0]);
}

// Print string twice


printf("s: %s\n", s);
printf("t: %s\n", t);
}

We get a string s , and copy the value of s into t . Then, we capitalize the first letter in t .
But when we run our program, we see that both s and t are now capitalized.
Since we set s and t to the same values, they’re actually pointers to the same character, and so
we capitalized the same character:

To actually make a copy of a string, we have to do a little more work:

#include <cs50.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
// Get a string
char *s = get_string("s: ");
if (!s)
{
return 1;
}

// Allocate memory for another string


char *t = malloc((strlen(s) + 1) * sizeof(char));
if (!t)
{
return 1;
}

// Copy string into memory


for (int i = 0, n = strlen(s); i <= n; i++)
{
t[i] = s[i];
}

// Capitalize first letter in copy


if (strlen(t) > 0)
{
t[0] = toupper(t[0]);
}

// Print strings
printf("s: %s\n", s);
printf("t: %s\n", t);

// Free memory
free(t);
return 0;
}

We create a new variable, t , of the type char * , with char *t . Now, we want to point it to a
new chunk of memory that’s large enough to store the copy of the string. With malloc , we can
allocate some number of bytes in memory (that aren’t already used to store other values), and we
pass in the number of bytes we’d like. We already know the length of s , so we add 1 to that for
the terminating null character, and we multiply that by sizeof(char) (which gets us the
number of bytes for each character) to be sure that we have enough memory. So, our final line of
code is char *t = malloc((strlen(s) + 1) * sizeof(char)); .
Then, we copy each character, one at a time, and now we can capitalize just the first letter of t .
And we use i <= n , since we actually want to go up to one past n , to ensure we copy the
terminating character in the string. Finally, after we’re done, we call free(t) , which tells our
computer that those bytes are no longer useful to our program, and so those bytes in memory can
be reused again.
We can actually also use the strcpy library function, which we can learn about through reading
documentation, to copy a string.
A memory leak happens when we allocate more and more memory for our program to use, but we don’t
free that memory. Then, our computer gets slower and slower (since it has to compensate for less and
less memory).
Let’s look at why it might be hard to get input from a user:

#include <stdio.h>

int main(void)
{
int x;
printf("x: ");
scanf("%i", &x);
printf("x: %i\n", x);
}
scanf is a function that gets input from the user, according to a particular format. We pass in
%i to indicate that we’re looking for an integer, and we use &x to get the address of x , so
scanf can put the value into the right place in memory.

But now let’s try to get a string:

#include <stdio.h>

int main(void)
{
char *s;
printf("s: ");
scanf("%s", s);
printf("s: %s\n", s);
}

Since we didn’t allocate any memory for the actual bytes of the string, scanf had nowhere to
store the input.
We can allocate some number of bytes as an array of characters:

#include <stdio.h>

int main(void)
{
char s[5];
printf("s: ");
scanf("%s", s);
printf("s: %s\n", s);
}

Now, we have 5 bytes in memory into which we can store input.


Notice that we can pass in s as an address, since arrays can be treated like pointers to the first
element in the array.
But if we were to type in a much longer string, we eventually get a “segmentation fault”, where we
tried to access a segment of memory we couldn’t or shouldn’t. It turns out that scanf doesn’t
know how much memory is allocated, so it keeps writing to memory, starting at the address s ,
for as much input as is passed in, even though we might not have allocated as much.
get_string handles this for us, and allocates memory as needed. (And if you’re super interested,
the source code for the CS50 Library is available!)

Memory

To tie this all together, recall that we have physical chips of RAM in our computers, that store all the
bytes we have. And each byte has an address. We can see this with addresses.c :

#include <cs50.h>
#include <stdio.h>
int main(void)
{
// Get two strings
string s = get_string("s: ");
string t = get_string("t: ");

// Print strings' addresses


printf("s: %p\n", s);
printf("t: %p\n", t);
}

Here, we tell printf to treat s and t as pointers with %p , so we see addresses like
0x2331010 and 0x2331050 .

The values are super big (because there are lots of location in memory), and they’re usually noted in a
system called hexadecimal. Like binary and decimal, hexadecimal is a way to represent numbers, and it
has 16 possible values per digit, 0-9 and A-F. (It just happens that the addresses for s and t had no
alphabetical characters.) And a value in hexadecimal will conventionally start with 0x , to indicate
that.
Earlier, we saw 0x0 in the debugger panel for the name variable, and then a different value once we
inputted a string, and that was the address of our string.
We can look at an example of converting three bytes from decimal, to binary, and to hexadecimal:

255 216 255


11111111 11011000 11111111
f f d 8 f f

Since each digit in hexadecimal has 16 possible values, that maps to 4 binary digits, and so each
byte can be expressed as 2 hexadecimal digits, like 0xff and 0xd8 . Four 1 s in binary is 15 in
decimal, and f in hexadecimal.
We have two drinks, milk and orange juice, each of which is in a cup. We want to swap the drinks
between the two cups, but we can’t do that without a third cup to pour one of the drink into first.
Now, let’s say we wanted to swap the values of two integers.

void swap(int a, int b)


{
int tmp = a;
a = b;
b = tmp;
}

With a third variable to use as temporary storage space, we can do this pretty easily.
But, if we tried to use that function in a program, we don’t see any changes:

#include <stdio.h>

void swap(int a, int b);

int main(void)
{
int x = 1;
int y = 2;

printf("x is %i, y is %i\n", x, y);


swap(x, y);
printf("x is %i, y is %i\n", x, y);
}

void swap(int a, int b)


{
int tmp = a;
a = b;
b = tmp;
}

It turns out that the swap function gets its own variables, a and b when they are passed in,
that are copies of x and y , and so changing those values don’t change x and y in the main
function.
By passing in the address of x and y , our swap function can actually work:

#include <stdio.h>

void swap(int *a, int *b);

int main(void)
{
int x = 1;
int y = 2;

printf("x is %i, y is %i\n", x, y);


swap(&x, &y);
printf("x is %i, y is %i\n", x, y);
}

void swap(int *a, int *b)


{
int tmp = *a;
*a = *b;
*b = tmp;
}

The addresses of x and y are passed in from main to swap , and we use the *a syntax to
follow (or dereference) a pointer and get the value stored there. We save that to tmp , and then
take the value at b and store that as the value of a . Finally, we store the value of tmp as the
value of b , and we’re done.
We’ll click to the left of the line int x = 1 to set a breakpoint with the red icon, and run
debug50 ./swap again, to step through our program one line at a time. We can use the “step
into” button now, to go into our swap function and see how it works.

Memory layout
Within our computer’s memory, the different types of data that need to be stored for our program are
organized into different sections:

The text section is our compiled program’s binary code. When we run our program, that code is
loaded into the “top” of memory.
The heap section is an open area where malloc can get free memory from, for our program to
use.
The stack section is used by functions in our program as they are called. For example, our main
function is at the very bottom of the stack, and has the variables x and y . The swap function,
when it’s called, has some memory that’s on top of main , with the variables a , b , and tmp :

Once the function swap returns, the memory it was using is freed for the next function call,
and we lose anything we did, other than the return values.
So by passing in the addresses of x and y from main to swap , we could actually change
the values of x and y .
Global variables are in the initialized data and uninitialized data sections, and environment
variables from the command-line are also stored in a section.
Let’s look at a buggy section of code:

int main(void)
{
int *x;
int *y;

x = malloc(sizeof(int));

*x = 42;
*y = 13;

y = x;

*y = 13;
}

Here, we declare two pointers called x and y . We allocate memory for an integer for x , but
not y , so trying to store the value 13 into *y might lead to a segmentation fault.
But if we set y to be the same as x , pointing to the same address, we can successfully store the
value 13 to that location.
We watch another clip, Pointer Fun with Binky.
We might have used the website StackOverflow, a Q&A site commonly used for programming
questions. Now, we can understand that the name of the site comes from a reference to the stack
overflowing, or having too many function calls to fit in our computer’s memory.
Structs

We can create variables of our own type with a concept called structs.
For example, if we wanted to store both names and dorms of individual students, we might have arrays
for each:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Space for students
int enrollment = get_int("Enrollment: ");
string names[enrollment];
string dorms[enrollment];

// Prompt for students' names and dorms


for (int i = 0; i < enrollment; i++)
{
names[i] = get_string("Name: ");
dorms[i] = get_string("Dorm: ");
}

// Print students' names and dorms


for (int i = 0; i < enrollment; i++)
{
printf("%s is in %s.\n", names[i], dorms[i]);
}
}

But we might want to start having other pieces of data, and we have to make sure that all the arrays
are the right length, and have the data for the same person at the same index. and so on. Instead, we
can use structs, with a struct.h file containing:

typedef struct
{
char *name;
char *dorm;
}
student;

And a struct.c file containing:

#include <cs50.h>
#include <stdio.h>

#include "struct.h"

int main(void)
{
// Space for students
int enrollment = get_int("Enrollment: ");
student students[enrollment];

// Prompt for students' names and dorms


for (int i = 0; i < enrollment; i++)
{
students[i].name = get_string("Name: ");
students[i].dorm = get_string("Dorm: ");
}

// Print students' names and dorms


for (int i = 0; i < enrollment; i++)
{
printf("%s is in %s.\n", students[i].name, students[i].dorm);
}
}

Now, a student is our own variable type, that itself contains two variables, name and dorm ,
that we can access with .name and .dorm later on.
We can even open and save files with a snippet of code like:

FILE *file = fopen("students.csv", "w");


if (file)
{
for (int i = 0; i < enrollment; i++)
{
fprintf(file, "%s,%s\n", students[i].name, students[i].dorm);
}
fclose(file);
}

This is just a sneak preview of what we’ll learn to use in the next problem set!

Enhance?

Now, if we try to zoom in on an image, we’ll eventually see the pixels that it’s made of. But since
images are represented as a finite number of bytes, we can’t possibly see details that aren’t already
captured.
Images can be represented as a bitmap, or map of bits:

Each 1 maps to a white pixel, and a 0 to a black pixel.


An image with color will use more than one bit per pixel.
And an image file will also include special data values, at the beginning of the file, so that programs
can open them correctly. In the problem set, we’ll learn about one such image file format, .bmp , for
bitmaps. And we’ll learn to tweak images digitally, resizing or filtering them as we’d like.
We end on a clip of a realistic example from the TV show Futurama, Let’s Enhance.
This is CS50
Harvard Extension School
Spring 2019
WEEK 4
Menu
DATA STRUCTURES

Lecture 4
Tools for debugging
valgrind
Rubber duck debugging
More from last week
Structs
Linked Lists
More data structures

Tools for debugging

We took a look at CS50 IDE, a new web-based programming environment. We learned to use:
check50 to check our work automatically

debug50 and the debugger to step through our code

help50 to make compilation errors more understandable

printf to print out helpful information as our program runs

style50 to check the style of our code

We also looked at how memory is used by our programs in C, and how we can allocate and free
memory.

valgrind

valgrind is a command-line tool that we can use to run our program and see if it has any memory
leaks.
Recall that a memory leak happens when we call malloc to ask the operating system for
memory for our program, but we don’t call free to mark it as free. Then, our program will use
more and more memory as it runs, eventually slowing down or even crashing our entire computer.
Let’s write a program called memory.c to experiment a little bit:

#include <stdlib.h>

void f(void)
{
int *x = malloc(10 * sizeof(int));
x[10] = 0;
}

int main(void)
{
f();
return 0;
}

We have a function, f , that allocates memory for 10 integers, and using x as a pointer to the
address of the start of that memory. Then, we try to access the 11th integer with x[10]
(remember that pointers can act like arrays, since the memory we get is contiguous, or back-to-
back).
When we run this program, we can see that it still works. It turns out, malloc sometimes (but
not always) gives us back a little more memory than we ask for, and we might get lucky and be
able to access and use memory beyond the bounds of what we should have.
We can run our program with valgrind ./memory , and we see a lot printed out. But we do see one
error message in particular, Invalid read of size 4 , followed by some pointers in hexadecimal.
We see f (memory.c:15) in the line immediately after, which tells us that this happened in line 15
of memory.c in the function f .
We can also use help50 valgrind ./memory , which will distill the output for us, one error message
at a time, and add additional clues to guide us.
If we fix our program to access x[9] , then valgrind onluy has 1 error message for us: 40 bytes
in 1 blocks are definitely lost... . And help50 tells us that we forgot to free memory we
allocated with malloc , so we should call free(x) after we’re doing using it. Now, we have 0 errors.
Recall that we have some finite number of bytes in memory, and our operating system keeps track of
which bytes are used by which program, which bytes are free, and indicates segmentation faults when
we try to access memory that isn’t allocated to our program.

Rubber duck debugging

Rubber duck debugging is the process of explaining our code, step-by-step, to a rubber duck (or some
other inanimate object), so we ourselves can understand it better and hopefully realize where we
might have a bug or opportunity of improvement. Hopefully, this will be another simple, but powerful,
tool for us to use.

More from last week


We learned, last week, that a string is a synonym for char * , a pointer to a character.
We also learned that memory is laid out in a certain way for our program, where different regions are
used to store different types of data, such as:
the text segment, where the machine code for our program is loaded when we start it
the heap, where dynamically allocated memory (memory we allocate when the program is
running), stores
the stack, where local variables and functions, including our main function, live when our
program is running
We saw a swap function that didn’t work, when values were passed in directly, since it got its own
copies of those values. Then, we saw a swap function that took in the addresses of two variables, so
those values could actually be swapped.
Binky, from the short clip we saw, also demonstrated how we could dereference pointers and use them
correctly (and incorrectly). We need to make sure that our pointers have a valid address, before we try
to dereference them.
Finally, we saw an introduction to structs, where we can build our own data structures, with variables
of our choice.

Structs

Let’s make a file called struct.h :

typedef struct
{
char *name;
char *dorm;
}
student;

This is a header file, which we can share among various .c files.


In struct0.c , we import the header file:

#include <cs50.h>
#include <stdio.h>
#include <string.h>

#include "struct.h"

int main(void)
{
// Allocate space for students
int enrollment = get_int("Enrollment: ");
student students[enrollment];

// Prompt for students' names and dorms


for (int i = 0; i < enrollment; i++)
{
students[i].name = get_string("Name: ");
students[i].dorm = get_string("Dorm: ");
}

// Print students' names and dorms


for (int i = 0; i < enrollment; i++)
{
printf("%s is in %s.\n", students[i].name, students[i].dorm);
}
}

Notice now that we have a student type, and an array called students with structs of that
type.
Then, we can access variables inside each student struct with the . notation.
A student is like an abstraction, where we encapsulate some variables together.

Linked Lists

So far, arrays had to be a fixed size at the time of initialization. If we wanted to add to our array, we
would have to initialize a bigger array, and copy the values from the original array. But the running
time of resizing an array is now O(n), where n is the size of the original array.
We can use a function called realloc , which reallocates memory. We can pass in the address of our
memory and the new amount we want, and our operating system will return a new address, where we
have that much contiguous memory. It will also copy the array for us to the new location. But this
operation will cost a linear amount of time, too, depending on how big our array is.
We can do the opposite, and ask for just enough memory for one element, like one integer, at a time.
But they might be stored anywhere in the heap, so we need a way to link each element to the next, via
a stored pointer.
With this data structure, called a linked list, we lose the ability to randomly access elements. For
example, we can no longer access the 5th element of the list by calculating where it is, in constant
time. (Since we know arrays store elements back-to-back, we can add 1, or 4, or the size of our
element, to calculate addresses.) Instead, we have to follow each element, one at a time.
And we create a linked list by allocating, for each element, enough memory for both the value, and a
pointer to the next element. We’ll call these nodes:
We have three nodes at various addresses in memory, 100 , 150 , and 475 . Each node has the
value we want to store, and also a pointer to the next node. The final node has a pointer of
NULL , indicating the end of our linked list.

In code, we might create our own struct called node , with an int and a pointer to the next node
called next :

typedef struct node


{
int n;
struct node *next;
}
node;

We start this struct with typedef struct node so that we can refer to a node inside our
struct.
With some volunteers, we demonstrate how a linked list works. To store 3 values, we need 3 nodes,
and a pointer, that we labeled “first”, pointing to the first node. Each node holds a value, along with a
pointer to the next node. And to add a node, we would allocate memory for a new node, and change
our pointers carefully. First, we need to find the next node that will follow the new node (if we want to
keep our linked list sorted). Then, our new node will point to the next node, and change the node
before it to point to the new node. And to find the right place for inserting a new node, we have to
start with our “first” pointer, and look at the values of each node as we follow the pointers in them.
The running time of inserting a node, now, is O(n), since we have to follow each node to check their
values. There’s more logic and running time, but we don’t need to decide on a fixed size for our list
now. And if we were to insert nodes in unsorted order, the running time would be O(1), since we can
just add it to the front of the list. We can also keep an additional pointer to the last node, calling it
“last”, or we can even have each node store two pointers, one to the previous node and one to the next
node, so we can move forwards and backwards.
Let’s see how we might do this in code. First, we can store a fixed number of integers in an array:
#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Prompt for number of numbers
int capacity;
do
{
capacity = get_int("Capacity: ");
}
while (capacity < 1);

// Memory for numbers


int numbers[capacity];

// Prompt for numbers


int size = 0;
while (size < capacity)
{
// Prompt for number
int number = get_int("Number: ");

// Add to list
numbers[size] = number;
size++;
}

// Print numbers
for (int i = 0; i < size; i++)
{
printf("%i\n", numbers[i]);
}
}

We get a capacity from the user, and create an array of size capacity . Then, we keep adding
numbers to the array, until we reach the capacity. Then, we print each number in the array.
But our program is limited to a capacity we choose initially.
We can size an array dynamically:

#include <cs50.h>
#include <stdio.h>

int main(void)
{
// Memory for numbers
int *numbers = NULL;
int capacity = 0;

// Prompt for numbers (until EOF)


int size = 0;
while (true)
{
// Prompt for number
int number = get_int("Number: ");

// Check for EOF


if (number == INT_MAX)
{
break;
}

// Check whether enough space for number


if (size == capacity)
{
// Allocate space for number
int *tmp = realloc(numbers, sizeof(int) * (size + 1));
if (!tmp)
{
if (numbers)
{
free(numbers);
}
return 1;
}
numbers = tmp;
capacity++;
}

// Add number to list


numbers[size] = number;
size++;
}

// Print numbers
printf("\n");
for (int i = 0; i < size; i++)
{
printf("%i\n", numbers[i]);
}

// Free memory
if (numbers)
{
free(numbers);
}
}

We get one number at a time,


First, we initialize a pointer called numbers , but we don’t initialize it yet. We track the
capacity of our array, as well as the size of the array so far.

Then, we get one number at a time from the user. get_int will return INT_MAX if we indicate
EOF, or “end of file” as the end to our input (control + d in the terminal), so if that happens, we can
break out of the loop.
If we’ve reached our capacity for the numbers array, we use realloc to reallocate enough
space for an additional integer in the array. We check that realloc returned a pointer that isn’t
null, and if not, free the existing numbers array if we have one, and return 1 . If we do get
enough space, then we can add the new number to the array.
Finally, we can print each number in the array, and free the array. If not, running valgrind
./list1 will show us an error.

Now let’s write the same program, using a linked list:

#include <cs50.h>
#include <stdio.h>

typedef struct node


{
int number;
struct node *next;
}
node;

int main(void)
{
// Memory for numbers
node *numbers = NULL;

// Prompt for numbers (until EOF)


while (true)
{
// Prompt for number
int number = get_int("number: ");

// Check for EOF


if (number == INT_MAX)
{
break;
}

The beginning of our program is essentially the same, though we define node at the top of this
program.

// Allocate space for number


node *n = malloc(sizeof(node));
if (!n)
{
return 1;
}

// Add number to list


n->number = number;
n->next = NULL;
if (numbers)
{
for (node *ptr = numbers; ptr != NULL; ptr = ptr->next)
{
if (!ptr->next)
{
ptr->next = n;
break;
}
}
}
else
{
numbers = n;
}
}

Now, we allocate enough memory for a new node and point to it with a pointer n . If n was null
after we called malloc , then we exit with an error. With the -> syntax, we can follow a pointer
to get a variable in a struct, so we store the new number into the node n points to, along with
NULL for the next pointer. (If n was a node and not a pointer, we would use the n.number
syntax.)
Then, if numbers is a pointer, we create a temporary pointer ptr to follow our linked list. We
start with ptr = numbers . Inside our loop, if ptr doesn’t have a next pointer (i.e. it’s the last
node in our linked list), we set the next pointer to n and break. Otherwise, our loop continues,
and our temporary pointer ptr becomes ptr->next , i.e. we look at the next node.
If we didn’t have an existing numbers pointer, we can just set it to n , or the start of our new
list.

// Print numbers
printf("\n");
for (node *ptr = numbers; ptr != NULL; ptr = ptr->next)
{
printf("%i\n", ptr->number);
}

// Free memory
node *ptr = numbers;
while (ptr != NULL)
{
node *next = ptr->next;
free(ptr);
ptr = next;
}
}

Finally, we print the numbers by following the linked list in the same way, and we also free each
node as we follow its next pointer.

More data structures

If we had an unsorted array or linked list storing names, we would have to look through each value,
one at a time.
We, as humans, might make smaller lists where each person whose name starts with “A” will be in one
list, “B” in another, and so on. We can represent this concept with a hash table, where each value to be
stored is hashed by a hash function. The resulting hash might be a number, and in this case might be 0
for a string that starts with A , 1 for a string that starts with B , and so on, but the important part is
that we can use that number to index into some array. The array, in turn, will have a linked list for each
letter of the alphabet (or more generally, a linked list for each bucket), and so this data structure is
called a hash table.
Now, each linked list (in our example of strings) will only be, on average, 1/26th the size of a list
with all the strings together. In the worst case, all the strings will end up in the same bucket (if
they happen to start with the same letter), and we would have O(n) running time, like an unsorted
array. We can also use a different hash function, which might distribute our elements more evenly.
But in the real world, our running time is likely to be much lower with a hash table. And we can
even have more buckets in our hash table, so each list is an even smaller proportion.
A tree is another data structure where each node points to two other nodes, one to the left (with a
smaller value) and one to the right (with a larger value):

Now, we can easily do binary search, and since each node is pointing to another, we can also
insert nodes into the tree without moving all of them around as we would have to in an array.
Recursively searching this tree would look something like:

typedef struct node


{
int n;
struct node *left;
struct node *right;
} node;
...
bool search(int n, node *tree)
{
if (tree == NULL)
{
return false;
}
else if (n < tree->n)
{
return search(n, tree->left);
}
else if (n > tree->n)
{
return search(n, tree->right);
}
else {
return true;
}
}

We can use another data structure called a trie (pronounced like “try”, and is short for “retrieval”):

Imagine we want to store a dictionary of words efficiently, and be able to access each one in
constant time. A trie is like a tree, but each node is an array. Each array will have each letter, A-Z,
stored. For each word, the first letter will point to an array, where the next valid letter will point
to another array, and so on, until we reach something indicating the end of a valid word. If our
word isn’t in the trie, then one of the arrays won’t have a pointer or terminating character for our
word.
In our upcoming problem set, we’ll use what we’ve learned about pointers and data structures to
implement a spell-checking program, and gain an understanding of how something that might work at
a low level.
This is CS50
Harvard Extension School
Spring 2019
WEEK 5
Menu
THE INTERNET

Lecture 5
Networking
HTTP
HTML
Forms
CSS
JavaScript

Networking

Today we’ll transition from building command-line programs in C to web applications, and though
we’ll see new languages, many ideas and concepts will stay the same.
TCP/IP (Transmission Control Protocol and Internet Protocol) are two protocols, or rules that specify
how computers can communicate with each other. The modern internet relies on these protocols to
work.
We might have sent handwritten letters in the mail in the past. On the outside of the envelope, we
need to write an address, including information like a name, street, and city. We also write our own
name and address as the return address.
Each address, too, should uniquely identify a building or place.
Our computers also have addresses that uniquely identify them on the internet, called IP addresses. In
IPv4, or version 4 of the protocol, these addresses are numbers in the format #.#.#.# , four numbers
between 0 and 255 separated by dots. And to represent each number (with 256 possible values), we
need exactly 8 bits, and so each IP address is made of 32 bits. But with 32 bits, we can only represent
4 billion values. And since there are more than 4 billion devices connected to the internet, we have a
newer version of the protocol, IPv6, which has 128-bit addresses, that the world is starting to
transition to.
A server, which is just a computer connected to the internet that can listen for and respond to
messages, might provide many services, such as a web site or email. To specify that a message is
intended for a particular service, such as web browsing, another number called the port number is
added to the address. For example, HTTP, for browsing websites, is usually communicated with port
80. So an envelope with a message might have 1.2.3.4:80 as the destination address, and
5.6.7.8 as the return address. And there are other complexities, but that’s the basics of how
computers can communicate over a network.
Let’s say we wanted to visit a URL, Uniform Resource Locator, like http://www.example.com/ . It
turns out that there’s another technology called DNS, Domain Name System, that many internet
providers and organizations maintain, which converts domain names (like example.com ) into IP
addresses.
There are actually now hundreds of TLDs, top-level domains in addition to .com , such as .net ,
.org , .us , .uk , and more.

The www in front of a domain name is actually a subdomain, and there might be many of them
created, each of which pointing to a different server or set of servers. It’s not required, and www is
only used by convention. For example, MIT uses web.mit.edu for their main website’s address.
The / at the end implies that we’re asking for the root page of the site, which is conventionally
index.html , where .html indicates that the file is written in HTML, a language we’ll soon look
at.
When we type that URL in a browser, our browser first uses DNS to look up the IP address for that
domain, and then sends a request (in a virtual envelope) to the right IP address for the website. And
when the server at that address responds, it will send us the content of the website in a virtual
envelope with our address as the destination.

HTTP

HTTP, Hypertext Transfer Protocol, is another set of rules and conventions for communicating. For
example, humans might have the convention of shaking hands when meeting for the first (or
subsequent) times. When our browser communicates to web servers through HTTP, too, both
computers follow a protocol for making requests and responses.
A request for a webpage will look like this:

GET / HTTP/1.1
Host: www.example.com
...

GET is an HTTP verb that indicates we want to fetch some resource. The / indicates we’re
looking for the default page, and HTTP/1.1 indicates the version of HTTP our browser is using.
Then, Host: www.example.com is included, since the same server might be listening for and
responding to requests for multiple websites. There are also other pieces of information included
in the ... , to help the server respond to us appropriately.
The response from the server might look like this:

HTTP/1.1 200 OK
Content-Type: text/html
...

First, we get back the version of HTTP, HTTP/1.1 . Then, 200 is a numeric code that means OK ,
that the server was able to understand and respond to the request.
Content-Type: text/html indicates that the content of the response is in the language called
HTML, in text format.
We can open a browser like Chrome, and open the Developer Tools with View > Developer > Developer
Tools. A panel will open:

We can click the Network tab, and if we type harvard.edu into the address bar and press enter,
a lot will happen very quickly. We can scroll to the very top, click the first request for
harvard.edu , and see in the right panel, under “Request Headers”, that the browser indeed
sends a request that starts with what we expected:
We can scroll in the same panel and see that the response headers are slightly different:

The response code, 301 , seems to say “Moved Permanently”. And if we look down to “Location:”,
we see that the new location is https://www.harvard.edu . There’s a www , and also a different
protocol, HTTPS, which will encrypt our communication more securely.
Another HTTP code, 404 , is “Not Found”, and we get that back if we’re trying to get some URL that
the server can’t find. These are some interesting ones:
200 OK

301 Moved Permanently

302 Found

304 Not Modified

401 Unauthorized

403 Forbidden

404 Not Found

418 I'm a Teapot

500 Internal Server Error

...

HTML

Now that our computers can communicate, we can start thinking about creating the content that
websites are comprised of.
HTML, Hypertext Markup Language, is a standard with which webpages are written. It’s interpreted by
browsers from top to bottom, and each line might have some text, image, or styling instructions.
In our browser, we can click View > Developer > View Source on a website to see the HTML that drives
websites:
We can see that this is just text, and the first line, <!DOCTYPE html> , indicates to browsers that
the page is written in HTML.
Then, we see a pattern of lines and indentations, and many tags that start with < and end with
> . First, we have the <html> tag, and nested inside is a <head> tag, which will include
information about the webpage, that might not necessarily appear.
Then, we eventually see a <body> tag, which will have the content of the webpage.
We can look at a simple example:

<!DOCTYPE html>
<html lang="en">
<head>
<title>
hello, title
</title>
</head>
<body>
hello, body
</body>
</html>

Inside the <head> of the webpage, we have a <title> tag that indicates the title of our
webpage, “hello, title”. And then, we have a line with </title> , which is a closing tag that
indicates the end of the title.
Notice that the indentation and opening and closing tags are symmetric. Like in C, the whitespace
is not necessary, but stylistically important.
The content of this page is just “hello, body”.
With the text editor in CS50 IDE, we can create and save a file called index.html with our example
code. The CS50 IDE is web-based, and it can run a web server, which is a program that can listen for
and respond to web requests.
We can run a server in the terminal, called http-server , a free and open-source package. If we run
that command, we’ll see some information:
./ is the current directory, and in this case we are in our ~/workspace/ folder.

Then, we see a URL to our IDE’s web server, and since we want to serve these files separately from
the IDE itself, the URL ends in :8080 , indicating that we’re using port number 8080.
If we click that link, we’ll see a page that says Index of / with the files in our workspace. We
can click on index.html and see our page. We can also change the code in our editor, save, and
refresh to see our changes. Since HTML is interpreted by our browser, we don’t need to compile it.
Let’s take a look at examples of other tags:

<img src="cat.jpg">

Images can be included with the <img> tag, and src is an attribute on the tag that modifies it.
In this case, it will specify the source of the image, and the value can be a file or other URL. (In
the CS50 IDE, we should upload a file called cat.jpg in our workspace folder for this to work.)
Finally, we don’t close image tags (and other “empty tags”), since there’s nothing else inside the
element.
We can also add another attribute tag, alt , to add alternative text for the image. So our image
will look like this: <img alt="photo of cat" src="cat.jpg">
We can add links with something like Visit <a
href="https://www.harvard.edu/">Harvard</a>. in our body. The Visit and Harvard pieces
are just text, but the <a> tag surrounding Harvard is an anchor tag, which specifies a link with the
href attribute. In fact, we can phish, or trick, people, into clicking a link to a site that isn’t really what
they expect. A bad actor could even copy the HTML of some site, and create a site of their own that
appears to be the same. (Though, they won’t have access to the code and data stored on the server.)
We can wrap text with the <strong> tag to tell browsers to make it bolder.
There’s also the <p> tag for paragraphs:

<!DOCTYPE html>

<html lang="en">
<head>
<title>paragraphs</title>
</head>
<body>
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam in tin
</p>
<p>
Ut tempus rutrum arcu eget condimentum. Morbi elit ipsum, gravida fauc
</p>
<p>
Mauris eget erat arcu. Maecenas ac ante vel ipsum bibendum varius. Nun
</p>
</body>
</html>

Without the <p> tags, all of these lines would be displayed together on the page, since HTML
ignores whitespace like new lines, and instead combines them to at most one space.
We look at a few more tags from HTML like headings ( <h1> through <h6> indicating the level of
heading) and tables ( <table> , <tr> for rows, <td> for cells), but through practice and
documentation, we can learn to use them fully. Once we understand the pattern of tags and attributes,
we can write our own HTML.
We can use tools like the W3C Markup Validator to check that our HTML is valid.

Forms

On Google, if we search for something, we get redirected to a long URL. It turns out that the URL has
our search term in it, and going to a link like https://www.google.com/search?q=cats will bring
us directly to the results page for a search for “cats”.
The page is called search , and that goes to code on their servers that generates a response for
that page dynamically and programmatically.
The ? in the URL adds additional input for the page, and q=cats is telling the server that we
are passing in “cats” for the input (search box in this case) with the name “q”, which probably
stands for “query”.
We can write the HTML for a form that takes us to the Google search results for some user input:

<!DOCTYPE html>

<html lang="en">
<head>
<title>search</title>
</head>
<body>
<form action="https://www.google.com/search" method="get">
<input name="q" type="text">
<input type="submit" value="Search">
</form>
</body>
</html>

With the form tag, we can create a form. The action attribute tells the browser where the
form should go, and the method attribute indicates how to send the form inputs.
The first input tag is a text box, which we will name q so that it can be sent to Google
correctly, and the second input tag is a submit button that we’ll label “Search”.

CSS
While HTML is used for layout and structure, CSS, Cascading Style Sheets, is another language we can
use to style, or change the aesthetics, of our webpages.
Let’s take a look at css0.html :

<!DOCTYPE html>

<html lang="en">
<head>
<title>css0</title>
</head>
<body>
<header style="font-size: large; text-align: center;">
John Harvard
</header>
<main style="font-size: medium; text-align: center;">
Welcome to my home page!
</main>
<footer style="font-size: small; text-align: center;">
Copyright &#169; John Harvard
</footer>
</body>
</html>

Here, for each of these tags, we’ve added a style attribute and some set of key-value pairs as
the value that will apply to just those elements. These pairs, like font-size: large; , are
setting CSS properties and can change many aesthetic aspects of elements.
Notice that we have semantic, or meaningful, tags like <header> , <main> , and <footer> that
separates our page into sections.
Since CSS is inherited by nested elements in HTML, we can factor out the common styles:

<!DOCTYPE html>

<html lang="en">
<head>
<title>css1</title>
</head>
<body style="text-align: center;">
<header style="font-size: large;">
John Harvard
</header>
<main style="font-size: medium;">
Welcome to my home page!
</main>
<footer style="font-size: small;">
Copyright &#169; John Harvard
</footer>
</body>
</html>

Here, the text-align: center; style is applied to the <body> element, so it will cascade, or
be inherited by each element inside <body> .
We can factor out CSS into the <head> , with CSS classes:

<!DOCTYPE html>

<html lang="en">
<head>
<style>

.centered
{
text-align: center;
}

.large
{
font-size: large;
}

.medium
{
font-size: medium;
}

.small
{
font-size: small;
}

</style>
<title>css2</title>
</head>
<body class="centered">
<header class="large">
John Harvard
</header>
<main class="medium">
Welcome to my home page!
</main>
<footer class="small">
Copyright &#169; John Harvard
</footer>
</body>
</html>

Now, the HTML in the <body> specifies a class for each element, but all the CSS for the
styling has been moved to the <head> , so we can compartmentalize it more easily. And in CSS,
we use .something to apply properties to elements with a class of something . Each class, too,
can have many CSS properties, not just one.
We could even apply CSS to all elements of a certain type, using CSS selectors:

<!DOCTYPE html>

<html lang="en">
<head>
<style>

body
{
text-align: center;
}

header
{
font-size: large;
}

main
{
font-size: medium;
}

footer
{
font-size: small;
}

</style>
<title>css3</title>
</head>
<body>
<header>
John Harvard
</header>
<main>
Welcome to my home page!
</main>
<footer>
Copyright &#169; John Harvard
</footer>
</body>
</html>

Notice that now we can use body and header to select those elements, without attaching a
class to them in the HTML.
Finally, we can include external stylesheets, or CSS in separate files, that multiple HTML pages can
include and share:

<!DOCTYPE html>

<html lang="en">
<head>
<link href="css4.css" rel="stylesheet">
<title>css4</title>
</head>
<body>
<header>
John Harvard
</header>
<main>
Welcome to my home page!
</main>
<footer>
Copyright &#169; John Harvard
</footer>
</body>
</html>

We need to create a file called css4.css , and place our CSS code inside that, for this to work.
But now we can use the <link> tag to include it.
There are tradeoffs, too, to having separated CSS files, since a simple webpage may not need the
additional complexity and overhead of a linked stylesheet. But having separation of concerns
allows for easier collaboration and clearer organization of code.
Phew, we covered lots of concepts here! But, now that we’re familiar with some of these patterns, we
can learn to use additional features by reading examples and documentation online.

JavaScript

JavaScript, a programming language, can be used on our webpages to make them more dynamic. The
user’s browser runs the JavaScript code we write, to make changes to the page.
JavaScript is similar to C, and is interpreted by a browser from top to bottom.
Many of the programming elements are the same:

let counter = 0;

We use the let keyword in JavaScript to initialize a variable, and we don’t need to specify what
the type of the variable will be.
Adding 1 to a variable has the exact same syntax as it does in C.

counter = counter + 1;
counter += 1;
counter++;

Conditions and loops, too, are the same.

if (x < y)
{

}
else if {

}
else
{

}
while (true)
{

for (let i = 0; i < 50; i++)


{

Our example webpage can be represented by a tree, in what’s called the DOM, Document Object
Model:

Notice that each node is an element on the page, and nested nodes show as children nodes. A
browser, when it loads a webpage, automatically builds a tree in memory with elements from the
HTML.
With JavaScript, we can add or change any of these nodes in the DOM.
We can make an interactive page like the following:

<!DOCTYPE html>

<html lang="en">
<head>
<script>

function greet()
{
alert('hello, ' + document.querySelector('#name').value);
}

</script>
<title>hello1</title>
</head>
<body>
<form onsubmit="greet(); return false;">
<input autocomplete="off" autofocus id="name" placeholder="Name" type=
<input type="submit">
</form>
</body>
</html>

We have a form element in the <body> with a text input and a submit button. But when the form
is submitted, we want our browser to call a greet() function, and with return false; , we
tell the browser to do nothing else with the form. So we put that into the onsubmit attribute of
the form. Notice that we also have id="name" for the text input element. The
autocomplete="off" attribute turns off the autocomplete in the browser, and autofocus
selects the input box when the page is loaded so the user can start typing into it right away.
The greet() function is defined in the <head> of our page, inside a <script> tag that allows
us to write our own JavaScript. In JavaScript, we can define a function with the function
keyword, and if it takes no inputs, we can simply use () . And this function in turns calls the
alert() function, which is built into browsers, to create an alert box.

The content of the alert box will be hello, plus the value of the element in the webpage
(called document ) with the ID name . The querySelector function is attached to the object
that represents the webpage, so we call it with document.querySelector() . Then, the element
that gets selected will also has an attribute called value that we can access with .value .
We look at another example, that can change the style of a webpage:

<!DOCTYPE html>

<html lang="en">
<head>
<title>background</title>
</head>
<body>
<button id="red">R</button>
<button id="green">G</button>
<button id="blue">B</button>
<script>

let body = document.querySelector('body');


document.querySelector('#red').onclick = function() {
body.style.backgroundColor = 'red';
};
document.querySelector('#green').onclick = function() {
body.style.backgroundColor = 'green';
};
document.querySelector('#blue').onclick = function() {
body.style.backgroundColor = 'blue';
};

</script>
</body>
</html>

It turns out that we can attach JavaScript functions to events in the browser, like the following:
blur

change

click

drag

focus

keypress

load

mousedown

mouseover

mouseup

submit

touchmove

unload

...

We can add code called event listeners to elements like document.querySelector('#red') . The
onclick value of each element can be a function that is automatically called by the browser, when
the element is clicked. And the function attached doesn’t have a name, but is defined with
function() {} .

With body.style.backgroundColor , we can access the style of the body , and set its
backgroundColor value.

We can change the font size, too:

<!DOCTYPE html>

<html lang="en">
<head>
<title>size</title>
</head>
<body>
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam in tin
</p>
<select>
<option value="xx-large">xx-large</option>
<option value="x-large">x-large</option>
<option value="large">large</option>
<option selected value="initial">initial</option>
<option value="small">small</option>
<option value="x-small">x-small</option>
<option value="xx-small">xx-small</option>
</select>
<script>

document.querySelector('select').onchange = function() {
document.querySelector('body').style.fontSize = this.value;
};

</script>
</body>
</html>

We have a set of option elements in a select (a dropdown menu in HTML that we can look
up the documentation for) and now, whenever the select element is changed, we set the
fontSize of the style of the body element. We set that value to this.value , and this
refers to the select element when the function is called, since the function is called from that
element.
We can write a page with an element that blinks, or appears and disappears repeatedly:

<!DOCTYPE html>

<html lang="en">
<head>
<script>

// Toggles visibility of greeting


function blink()
{
let body = document.querySelector('body');
if (body.style.visibility == 'hidden')
{
body.style.visibility = 'visible';
}
else
{
body.style.visibility = 'hidden';
}
}

// Blink every 500ms


window.setInterval(blink, 500);

</script>
<title>blink</title>
</head>
<body>
hello, world
</body>
</html>

We use the visibility attribute to make the body visible or hidden, and
window.setInterval to call this function every 500 milliseconds.

Browsers also have a geolocation function, which we can call to get the user’s current location:

<!DOCTYPE html>

<html lang="en">
<head>
<title>geolocation</title>
</head>
<body>
<script>

navigator.geolocation.getCurrentPosition(function(position) {
document.write(position.coords.latitude + ", " + position.coords.l
});

</script>
</body>
</html>

navigator refers to the user’s browser, and the geolocation.getCurrentPosition function


will return a position object. When we get that position object, we want to call a function
that will then write the latitude and longitude values to the document.
This is CS50
Harvard Extension School
Spring 2019
WEEK 6
Menu
PYTHON

Lecture 6
Last Time
Python
Data types in Python
Programming in Python

Last Time

We learned some basics about the internet, and technologies like:


TCP/IP, protocols by which computers can send each other messages across a network of many
computers, using IP addresses and port numbers.
HTTP, a protocol by which browsers, and other programs, can make a request for a webpage (or
other content) from a server.
URLs, including a domain name and parameters like ?q=cats , to pass along additional inputs to
a server.
HTTP status codes, like 404 Not Found, which shows us an error page, and 301 Moved
Permanently, which redirects us to the right URL if a website has moved.
HTML and CSS, languages by which we can format and stylize webpages.
JavaScript and the DOM, Document Object Model, by which we can change nodes in a tree
representation of an HTML page, thereby changing the page itself.

Python

Python is another programming language, but it is interpreted (run top to bottom by an interpreter, like
JavaScript) and higher-level (including features and libraries that are more powerful).
For example, we can implement the entire resize program in just a few lines with Python:
import sys
from PIL import Image

if len(sys.argv) != 4:
sys.exit("Usage: python resize.py n infile outfile")

n = int(sys.argv[1])
infile = sys.argv[2]
outfile = sys.argv[3]

inimage = Image.open(infile)
width, height = inimage.size
outimage = inimage.resize((width * n, height * n))

outimage.save(outfile)

First, we import (like include ) a sys library (for command-line arguments) and an Image
library.
We check that there are the right number of command-line arguments with len(sys.argv) , and
then create some variables n , infile , and outfile , without having to specify their types.
Then, we use the Image library to open the input image, getting its width and height, resizing it
with a resize function, and finally saving it to an output file.
Let’s take a look at some new syntax. In Python, we can create variables with just counter = 0 . To
increment a variable, we can use counter = counter + 1 or counter += 1 .
Conditions look like:

if x < y:
something
elif:
something
else:
something

Unlike in C and JavaScript (whereby braces { } are used for blocks of code), the exact
indentation of each line is what determines the level of nesting in Python.
Boolean expressions are slightly different, too:

while True:
something

Loops can be created with another function, range , that, in the example below, returns a range of
numbers from 0, up to but not including 50:

for i in range(50):
something

In Python, we’ll start by looking at just a few data types:


bool , True or False
float , real numbers

int , integers

str , strings

dict , a dictionary of key-value pairs, that act like hash tables

list , like arrays, but can automatically resize

range , range of values

set , a collection of unique things

tuple , a group of two or more things

In Python, we can too include the CS50 library, but our syntax will be:

from cs50 import get_float, get_int, get_string

Notice that we specify the functions we want to use.


In Python, we can run our program without compiling it with python hello.py (or whatever the
name of our file is).
python is name of the program that we’re actually running at the command line, and it is an
interpreter which can read our source code (written in the language Python) and run it, one line at
a time. (Technically, there is a compiler that turns our source code into something called bytecode
that the interpreter actually runs, but that is abstracted away for us.)

Data types in Python

Our first hello.py program is just:

print("hello, world")

Notice that we didn’t need a main function, or anything that we needed to import for the
print function. The print function in Python also adds a new line for us automatically.

Now we can run it with python hello.py .


We can get strings from a user:

from cs50 import get_string

s = get_string("Name: ")
print("hello,", s)

We create a variable called s , without specifying the type, and we can pass in multiple variables
into the print function, which will print them for us on the same line, separated by a space
automatically.
To avoid the extra spaces, we can put variables inside a string similar to how they are included in
C: print(f"hello, {s}") . Here, we’re saying that the string hello, {s} is a formatted
string, with the f in front of the string, and so the variable s will be substituted in the string.
And we don’t need to worry about the variable type; we can just include them inside strings.
We can do some math, too:

from cs50 import get_int

x = get_int("x: ")

y = get_int("y: ")

print(f"x + y = {x + y}")
print(f"x - y = {x - y}")
print(f"x * y = {x * y}")
print(f"x / y = {x / y}")
print(f"x mod y = {x % y}")

Notice that expressions like {x + y} will be evaluated, or calculated, before it’s substituted into
the string to be printed.
By running this program, we see that everything works as we might expect, even dividing two
integers to get a floating-point value. (To keep the old behavior of always returning a truncated
integer with division, there is the // operator.)
We can experiment with floating-point values:

from cs50 import get_float

x = get_float("x: ")

y = get_float("y: ")

z = x / y

print(f"x / y = {z}")

We see the following when we run this program:

$ python floats.py
x: 1
y: 10
x / y = 0.1

We can print more decimal places with syntax like print(f"x / y = {z:.50f}") :

x / y = 0.10000000000000000555111512312578270211815834045410

It turns out that Python still has floating-point imprecision by default, but there are some
libraries that will use more memory to store decimal values more precisely.
We can see if Python has integer overflow:

from time import sleep


i = 1
while True:
print(i)
i *= 2
sleep(1)

We use the sleep function to pause our program for one second, but double i over and over.
And it turns out that integers in Python can be as big as memory allows, so we won’t experience
overflow for a much longer time.

Programming in Python

Let’s take a closer look at conditions:

from cs50 import get_int

# Get x from user


x = get_int("x: ")

# Get y from user


y = get_int("y: ")

# Compare x and y
if x < y:
print("x is less than y")
elif x > y:
print("x is greater than y")
else:
print("x is equal to y")

Notice that we use consistent indentation, but we don’t need parentheses or braces for our
conditions.
Comments, too, start with just a single # character.
We can compare strings the way we might expect:

from cs50 import get_char

# Prompt user for answer


c = get_string("Answer: ")

# Check answer
if c == "Y" or c == "y":
print("yes")
elif c == "N" or c == "n":
print("no")

Strings can be compared directly, and Boolean expressions can include the words and and or .
We can write functions in Pythons like this:
def main():
for i in range(3):
cough()

def cough():
"""Cough once"""
print("cough")

if __name__ == "__main__":
main()

We use the def keyword to define a function cough , indicating that it takes no parameters, or
inputs, by using just () , and call it from our main function. Notice that all the code for each
function is indented additionally, instead of surrounded by braces.
Then, at the below, we use a special line if __name__ == "__main__": to call our main
function when our program is run. This way, the interpreter will know about the cough function
by the time main actually calls it. We could also call cough directly, instead of main , though
that would be unconventional in Python. (Instead, we want to try to be “Pythonic”, or following the
styles and patterns encouraged by the language and its community.)
We can add parameters and loops to our cough function, too:

def main():
cough(3)

def cough(n):
for i in range(n):
print("cough")

if __name__ == "__main__":
main()

n is a variable that can be passed into cough , which we can also pass into range . And notice
that we don’t specify types in Python, so n can be of any data type (and can even be assigned to
have a value of another type). It’s up to us, the programmer, to use this great power with great
responsibility.
We can define a function to get a positive integer:

from cs50 import get_int

def main():
i = get_positive_int("Positive integer: ")
print(i)
def get_positive_int(prompt):
while True:
n = get_int(prompt)
if n > 0:
break
return n

if __name__ == "__main__":
main()

Since there is no do-while loop in Python as there is in C, we have a while loop that will go on
infinitely, but we use break to end the loop if n > 0 . Then, our function will just return n .
Notice that variables in Python have function scope by default, meaning that n can be initialized
within a loop, but still be accessible later in the function.
We can print each character in a string and capitalize them:

from cs50 import get_string

s = get_string()
for c in s:
print(c.upper(), end="")
print()

Notice that we can easily iterate over characters in a string with something like for c in s ,
and we print the uppercase version of each character with c.upper() . Strings in Python are
objects, like a data structure with both the value it stores, as well as built-in functions like
.upper() that we can call.

Finally, we pass in another argument to the print function, end="" , to prevent a new line from
being printed each time. Python has named arguments, where we can name arguments that we
can pass in, in addition to positional arguments, based on the position they are in the list. With
named arguments, we can pass in arguments in different orders, and omit optional arguments
entirely. Notice that this example is labeled with end , indicating the string that we want to end
each printed line with. By passing in an empty string, "" , nothing will be printed after each
character. Before, when we called print without the end argument, the function used \n as
the default for end , which is how we got new lines automatically.
We can get the length of the string with the len() function.

from cs50 import get_string

s = get_string("Name: ")
print(len(s))

We’ll be using version 3 of Python, which the world is starting to use more and more, so when
searching for documentation, we want to be sure that it’s for the right version.
We can take command-line arguments with:
from sys import argv

if len(argv) == 2:
print(f"hello, {argv[1]}")
else:
print("hello, world")

We check the number of arguments by looking at the length of argv , a list of arguments, and if
there is 2, we print the second one. Like in C, the first command-line argument is the name of the
program we wrote, rather than the word python , which is technically the name of the program
we run at the command-line.
We can print each argument in the list:

from sys import argv

for s in argv:
print(s)

This will iterate over each element in the list argv , allowing us to use it as s .
And we can iterate over each character, of each argument:

from sys import argv

for s in argv:
for c in s:
print(c)
print()

We can swap two variables in Python just by reversing their orders:

x = 1
y = 2

print(f"x is {x}, y is {y}")


x, y = y, x
print(f"x is {x}, y is {y}")

Here, we’re using x, y = y, x to set x to y at the same time as setting y to x .


We can create a list and add to it:

from cs50 import get_int

numbers = []

# Prompt for numbers (until EOF)


while True:

# Prompt for number


number = get_int("number: ")

# Check for EOF


if not number:
break

# Check whether number is already in list


if number not in numbers:

# Add number to list


numbers.append(number)

# Print numbers
print()
for number in numbers:
print(number)

Here, we create a empty list called numbers with numbers = [] , and we get a number from
the user. If that number is not already in our list, we add it to our list. We can use not in to
check if a value is (not) in a list, and append to add a value to the end of a list.
We can create our own data structures, objects:

from cs50 import get_string

# Space for students


students = []

# Prompt for students' names and dorms


for i in range(3):
name = get_string("name: ")
dorm = get_string("dorm: ")
students.append({"name": name, "dorm": dorm})

# Print students' names and dorms


for student in students:
print(f"{student['name']} is in {student['dorm']}.")

We create a list called students , and after we get some input from the user, we append a
dictionary of key-value pairs, {"name": name, "dorm": dorm} , to that list. Here, "name" and
"dorm" are the keys, and we want their values to be the variables we gathered as input. Then,
we can later access each object’s values with student['name'] or student['dorm'] to print
them out. In Python, we can index into dictionaries with words or strings, as opposed to just
numeric indexes in lists.
Let’s print four question marks, one at a time:

for i in range(4):
print("?", end="")
print()

We can print a vertical bar of hash marks, too:

for i in range(3):
print("#")
And we can print a square with a nested loop:

for i in range(3):
for j in range(3):
print("#", end="")
print()

Now we can revisit resize.py , and it might make more sense to us now:

from PIL import Image


from sys import argv

if len(sys.argv) != 4:
sys.exit("Usage: python resize.py n infile outfile")

n = int(sys.argv[1])
infile = sys.argv[2]
outfile = sys.argv[3]

inimage = Image.open(infile)
width, height = inimage.size
outimage = inimage.resize((width * n, height * n))

outimage.save(outfile)

We import the Image library from something called PIL, a free open-source library that we can
download and install (which doesn’t come with Python by default).
Then, we import argv from the system library, and we check our arguments, storing them as n ,
infile , and outfile , converting the string input for n into an int as we do so.

By reading the documentation for Python and the Image library, we can open files as an image,
getting its size and calling a resize function on it to get another image, which we can then
save to another file.

Let’s look at another example, a spell-checker in Python:

# Words in dictionary
words = set()

def check(word):
"""Return true if word is in dictionary else false"""
return word.lower() in words

def load(dictionary):
"""Load dictionary into memory, returning true if successful else false"""
file = open(dictionary, "r")
for line in file:
words.add(line.rstrip("\n"))
file.close()
return True

def size():
"""Returns number of words in dictionary if loaded else 0 if not yet loaded"""
return len(words)

def unload():
"""Unloads dictionary from memory, returning true if successful else false"""
return True

The functions for dictionary.py are pretty straightforward, since all we need is a set() , a
collection into which we can load unique values. In load , we open the dictionary file, and
add each line in the file as a word (without the newline character).
For check , we can just return whether word is in words , and for size , we can just return the
length of words . Finally, we don’t need to do anything to unload , since Python manages
memory for us.
By having used C first, we have an understanding (and appreciation!) for the abstractions that a higher-
level language like Python provides us. Indeed, if we run some tests for performance, a speller
implementation in Python might be 1.5x slower, and so depending on the application, this may or may
not be important enough to justify the human time it might take to write a program in a lower-level
language like C, which might run much faster or require less memory.
This is CS50
Harvard Extension School
Spring 2019
WEEK 7
Menu
WEB DEVELOPMENTS

Lecture 7
Last times
Flask
Words

Last times

Last time, we learned about Python, a programming language that comes with many features and
libraries. Today, we’ll use Python to generate HTML for webpages, and see how separations of concerns
might be applied.
A few weeks ago, we learned about web requests in HTTP, which might look like this:

GET / HTTP/1.1
Host: www.example.com
...

Hopefully, a server responds with something like:

HTTP/1.1 200 OK
Content-Type: text/html
...

The ... is the actual HTML of the page.

Flask

Today, we’ll use Flask, a microframework, or a set of code that allows us to build programs without
writing shared or repeated code over and over. (Bootstrap, for example, is a framework for CSS.)
Flask is written in Python and is a set of libraries of code that we can use to write a web server in
Python.
One methodology for organizing web server code is MVC, or Model-View-Controller:

Thus far, the programs we’ve written have all been in the Controller category, whereby we have
logic and algorithms that solve some problem and print output to the terminal. But with web
programming, we also want to add formatting and aesthetics (the View component), and also
access data in a more organized way (the Model component). When we start writing our web
server’s code in Python, most of the logic will be in the controllers.
By organizing our program this way, we can have separation of concerns.
Today, we’ll build a website where students can fill out a form to register for Frosh IMs, freshman year
intramural sports.
We can start by opening the CS50 IDE, and write some Python code that is a simple web server
program, serve.py :

from http.server import BaseHTTPRequestHandler, HTTPServer

class HTTPServer_RequestHandler(BaseHTTPRequestHandler):

def do_GET(self):

self.send_response(200)

self.send_header("Content-type", "text/html")
self.end_headers()

self.wfile.write(b"<!DOCTYPE html>")
self.wfile.write(b"<html lang='en'>")
self.wfile.write(b"<head>")
self.wfile.write(b"<title>hello, title</title>")
self.wfile.write(b"</head>")
self.wfile.write(b"<body>")
self.wfile.write(b"hello, body")
self.wfile.write(b"</body>")
self.wfile.write(b"</html>")
port = 8080
server_address = ("0.0.0.0", port)
httpd = HTTPServer(server_address, HTTPServer_RequestHandler)

httpd.serve_forever()

We already know how to write a hello, world HTML page, but now we’re writing a program in
Python to actually generate and return an HTML page.
Most of this code is based on the http library that we can import that handles the HTTP layer,
but we have written our own do_GET function that will be called every time we receive a GET
request. As usual, we need to look at the documentation for the library to get a sense of what we
should write, and what we have available for us. First, we send a 200 status code, and send the
HTTP header indicating that this is an HTML page. Then, we write (as ASCII bytes) some HTML,
line by line, into the response.
Notice that we set the server to use port 8080 (since the IDE itself is using port 80), and actually
create and start the server (based on documentation we found online).
Now, if we run python serve.py , we can click CS50 IDE > Web Server, which will open our IDE’s
web server in another tab for us, and we’ll see the hello, world page we just wrote.
We can see that reimplementing many common functions of a web server can get tedious, even with
an HTTP library, so a framework like Flask helps a lot in providing abstractions and shortcuts that we
can reuse.
With Flask, we can write the following in an application.py file:

from flask import Flask, render_template, request

app = Flask(__name__)

@app.route("/")
def index():
return "hello, world"

With app = Flask(__name__) , we initialize a Flask application for our application.py file.
Then, we use the @app.route("/") syntax to indicate that the function below will respond to
any requests for / , or the root page of our site. We call that function index by convention, and
it will just return “hello, world” as the response, without any HTML.
Now, we can call flask run from the terminal in the same folder as our application.py , and
the resulting URL will show a page that reads “hello, world” (which our browser displays even
without HTML).
We can change the index function to return a template, or a file that has HTML that we’ve written,
that acts as the View.

return render_template("index.html")

In a templates folder, we’ll have an index.html file with the following:


<!DOCTYPE html>

<html lang="en">
<head>
<meta name="viewport" content="initial-scale=1, width=device-width">
<title>hello</title>
</head>
<body>
hello,
</body>
</html>

We see a new feature, ``, like a placeholder. So we’ll go back and change the logic of index , our
controller, to check for parameters in the URL and pass them to the view:

return render_template("index.html", name=request.args.get("name", "world"))

We use request.args.get to get a parameter from the request’s URL called name . (The
second argument, world , will be the default value that’s returned if one wasn’t set.) Now,
we can visit /?name=David to see “hello, David” on the page. Now, we can generate an
infinite number of webpages, even though we’ve only written a few lines of code.
In froshims0 , we can write an application.py that can receive and respond to a POST request
from a form:

from flask import Flask, render_template, request

app = Flask(__name__)

@app.route("/")
def index():
return render_template("index.html")

@app.route("/register", methods=["POST"])
def register():
if not request.form.get("name") or not request.form.get("dorm"):
return render_template("failure.html")
return render_template("success.html")

For the default page, we’ll return an index.html that contains a form:

{% extends "layout.html" %}

{% block body %}
<h1>Register for Frosh IMs</h1>
<form action="/register" method="post">
<input autocomplete="off" autofocus name="name" placeholder="Name" type
<select name="dorm">
<option disabled selected value="">Dorm</option>
<option value="Apley Court">Apley Court</option>
<option value="Canaday">Canaday</option>
<option value="Grays">Grays</option>
<option value="Greenough">Greenough</option>
<option value="Hollis">Hollis</option>
<option value="Holworthy">Holworthy</option>
<option value="Hurlbut">Hurlbut</option>
<option value="Lionel">Lionel</option>
<option value="Matthews">Matthews</option>
<option value="Mower">Mower</option>
<option value="Pennypacker">Pennypacker</option>
<option value="Stoughton">Stoughton</option>
<option value="Straus">Straus</option>
<option value="Thayer">Thayer</option>
<option value="Weld">Weld</option>
<option value="Wigglesworth">Wigglesworth</option>
</select>
<input type="submit" value="Register">
</form>
{% endblock %}

We have an HTML form, with an input tag for a student to type in their name, and a
select tag to create a dropdown list for them to select a dorm. Our form will be submitted
to a route we call /register , and we’ll use the POST method to send the form’s
information.
Notice that our template is now using a new feature, extends , to define blocks that will be
substituted themselves in another file, layout.html :

<!DOCTYPE html>

<html lang="en">
<head>
<meta name="viewport" content="initial-scale=1, width=device-width"
<title>froshims0</title>
</head>
<body>
{% block body %}{% endblock %}
</body>
</html>

Now, if we have other pages on our site, they can easily share the common markup we
would want on every page. The {% block body %}{% endblock %} syntax is a
placeholder block in Flask, where other pages, like index.html , can provide HTML that
will be substituted into that block.
In our register function, we’ll indicate that we’re listening for a POST request, and inside
the function, just make sure that we got a value for both name and dorm . request.form
is an abstraction provided by Flask, such that we can access the arguments, or parameters,
from the request’s POST data.
When we run our application with flask run , and visit the URL, sometimes we might see an Internal
Server Error. And if we come back to our terminal, where our Flask server is running, we’ll see an error
message that provides us clues to what went wrong. We can press Control+C to stop our web server,
make changes that will hopefully fix our error, and start our web server again. And even if nothing is
broken but we made a change, sometimes we need to quit Flask and start it again, for it to notice
those changes.
We also need a success.html and failure.html in our templates directory, which might look
like:

{% extends "layout.html" %}

{% block body %}
You are registered! (Well, not really.)
{% endblock %}

Our register function will return that, with the template fully rendered, if we provided both a
name and dorm in the form.
With layout.html , we didn’t need to copy and paste the same <head> and other shared
markup, making it easier for us to make changes across all the pages we have at once.
The failure page, too, will share the same layout but send a different message:

{% extends "layout.html" %}

{% block body %}
You must provide your name and dorm!
{% endblock %}

The {% %} syntax is actually called Jinja, a templating language that Flask is able to understand
and put together.
And all of this Python code lives on our server in the CS50 IDE, generating a completed HTML page
each time and sending it to the browser as a response. We can see that by right-clicking the page in
Chrome, clicking View Source, and seeing the full HTML that users will get.
Now let’s actually do something with the submitted form information. In
froshims1/application.py , we’ll create a list to store all the registered students:

from flask import Flask, redirect, render_template, request

# Configure app
app = Flask(__name__)

# Registered students
students = []
@app.route("/")
def index():
return render_template("index.html")

@app.route("/registrants")
def registrants():
return render_template("registered.html", students=students)

@app.route("/register", methods=["POST"])
def register():
name = request.form.get("name")
dorm = request.form.get("dorm")
if not name or not dorm:
return render_template("failure.html")
students.append(f"{name} from {dorm}")
return redirect("/registrants")

We create an empty list, students = [] , and when we get a name and dorm in register ,
we’ll use students.append(f"{name} from {dorm}") to add a formatted string with that
name and dorm, to the students list.
In the registrants function, we’ll pass in our students list to the template of
registered.html :

{% extends "layout.html" %}

{% block body %}
<ul>
{% for student in students %}
<li>{{ student }}</li>
{% endfor %}
</ul>
{% endblock %}

Notice that, with Jinja, we can have simple concepts like a for loop to generate HTML
based on variables passed into the template. (We need an endfor since, in HTML,
indentation is only needed for stylistic purposes, so we need to specify when a loop ends.)
Here, we’re creating an <li> for each student , or string, in the students variable that
was passed in by the controller, application.py . And notice that the markup, or formatting
of the list, is in this template, or view.
If we stop our server, and restart it, we’ll have lost all of the data we’ve collected, since the students
variable is only created and stored as long as our program is running.
In froshims2/application.py , we use a new library:

import os
import smtplib
from flask import Flask, render_template, request
# Configure app
app = Flask(__name__)

@app.route("/")
def index():
return render_template("index.html")

@app.route("/register", methods=["POST"])
def register():
name = request.form.get("name")
email = request.form.get("email")
dorm = request.form.get("dorm")
if not name or not email or not dorm:
return render_template("failure.html")
message = "You are registered!"
server = smtplib.SMTP("smtp.gmail.com", 587)
server.starttls()
server.login("jharvard@cs50.net", os.getenv("PASSWORD"))
server.sendmail("jharvard@cs50.net", email, message)
return render_template("success.html")

The SMTP (Simple Mail Transfer Protocol) library allows us to use abstractions for sending email,
and here, every time we get a valid form, we’ll send an email. By reading the documentation for
smtplib and for Gmail, we can figure out the lines of code needed to log in to Gmail’s server
programmatically, and send an email to the email address from our form.
We can also save the registration data to a CSV on our server, which can then be opened even after our
server is stopped:

from flask import Flask, render_template, request


import csv

app = Flask(__name__)

@app.route("/")
def index():
return render_template("index.html")

@app.route("/register", methods=["POST"])
def register():
if not request.form.get("name") or not request.form.get("dorm"):
return render_template("failure.html")
file = open("registered.csv", "a")
writer = csv.writer(file)
writer.writerow((request.form.get("name"), request.form.get("dorm")))
file.close()
return render_template("success.html")
@app.route("/registered")
def registered():
file = open("registered.csv", "r")
reader = csv.reader(file)
students = list(reader)
return render_template("registered.html", students=students)

We import the csv library, and open a file called registered.csv to append or read from. If
we received a form in the register route, we’ll open the file with a , to append. Then, we
create a csv.writer (based on the documentation for the library), and use the writerow
function to write the name and dorm to the file. Finally, we’ll close the file.
The registered route will open the file for reading, and create a list of lists based on the file.
Then, in registered.html , we can iterate over each list in the list (each row), and print the first
item (the name) and the second item (the dorm):

{% extends "layout.html" %}

{% block body %}
<h1>Registered</h1>
<ul>
{% for student in students %}
<li>{{ student[0] }} from {{ student[1] }}</li>
{% endfor %}
</ul>
{% endblock %}

With a language we’ll look at next week, SQL, we’ll be able to work with data more easily than we can
with a CSV file.
In froshims6/templates/index.html , we use JavaScript in our template to check the input
immediately:

{% extends "layout.html" %}

{% block body %}
<h1>Register for Frosh IMs</h1>
<form action="/register" method="post">
<input autocomplete="off" autofocus name="name" placeholder="Name" type="t
<select name="dorm">
<option disabled selected value="">Dorm</option>
<option value="Apley Court">Apley Court</option>
<option value="Canaday">Canaday</option>
<option value="Grays">Grays</option>
<option value="Greenough">Greenough</option>
<option value="Hollis">Hollis</option>
<option value="Holworthy">Holworthy</option>
<option value="Hurlbut">Hurlbut</option>
<option value="Lionel">Lionel</option>
<option value="Matthews">Matthews</option>
<option value="Mower">Mower</option>
<option value="Pennypacker">Pennypacker</option>
<option value="Stoughton">Stoughton</option>
<option value="Straus">Straus</option>
<option value="Thayer">Thayer</option>
<option value="Weld">Weld</option>
<option value="Wigglesworth">Wigglesworth</option>
</select>
<input type="submit" value="Register">
</form>

<script>

document.querySelector('form').onsubmit = function() {
if (!document.querySelector('input').value) {
alert('You must provide your name!');
return false;
}
else if (!document.querySelector('select').value) {
alert('You must provide your dorm!');
return false;
}
return true;
};

</script>

{% endblock %}

With JavaScript on the page, the user can get feedback immediately since it runs in the browser.
And we should still validate the input on our server, since someone might disable JavaScript or try
to send bad requests programmatically. With libraries like Bootstrap, we can make validation
pretty and really improve a user’s experience, or UX.
In this example, we have a function that will be called when the form on the page is submitted,
and checks that there’s a value for both the input and the select . If there is no value for one
of them, we’ll create an alert and return fallse to stop the form from being submitted.
Otherwise, our function will return true if both are present, allowing the form to be submitted
by the browser.
We could also factor out the JavaScript code into a .js file and include it, but since we don’t
have very many lines of code yet, we can make a design decision to include our JavaScript code
directly in our template. Frameworks like React will organize view code, like the HTML and
JavaScript, in particular ways, so that we can maintain consistent patterns in more complicated
web applications.

Words

Let’s create a website where someone can search for words that start with some string, much like how
we might want to have autocomplete. We’ll need a file called large that’s a list of dictionary words,
and in words0/application.py we’ll have:

from flask import Flask, render_template, request

app = Flask(__name__)

WORDS = []
with open("large", "r") as file:
for line in file.readlines():
WORDS.append(line.rstrip())

@app.route("/")
def index():
return render_template("index.html")

@app.route("/search")
def search():
words = [word for word in WORDS if word.startswith(request.args.get("q"))]
return render_template("search.html", words=words)

When our server starts, we’ll create a WORDS list from reading in each line of the large file,
removing the new line with rstrip , and storing that in our list.
In our index function, we’ll render index.html , which is just a form:

{% extends "layout.html" %}

{% block body %}
<form action="/search" method="get">
<input autocomplete="off" autofocus name="q" placeholder="Query" type="
<input type="submit" value="Search">
</form>
{% endblock %}

Our form will use the get method, since we want the query to be in the URL.
In our search route, we create a list, words , which is a list of every word in our global
WORDS list (that we read in earlier) that start with the value of the parameter q . It’s equivalent
to:

words = []
q = request.args.get("q")
for word in WORDS:
if word.startswith(q):
words.append(word)

Once we have a list of words that match, we’ll pass it to our template, search.html that
will display each one with markup.
We can run our server with flask run , and when we visit the URL, we see a form that we can
type some input into. If we type in the letter a or b , we can click submit and be taken to a page
with all the words in our dictionary that start with a or b . And we notice that our route is
something like /search?q=a , though we could have changed q (for query) to anything we’d
like. We can even change the URL with some other value for q , and see our results displayed.
In words1 , we’ll get the results list immediately with JavaScript. And we can infer how that example
works, before looking at the code, by running it in the IDE. We can visit the URL, and use the Network
tab in Developer Tools by right-clicking the page in Chrome:

We see that our browser is making a request every time we type into the input box, and if we click
on the request and then Response, we can see that our browser got some fragment of HTML with
our results.
We can click on View Source on the page, and see that our page has a bit of JavaScript after the HTML:

<input autocomplete="off" autofocus placeholder="Query" type="text">

<ul></ul>

<script src="https://code.jquery.com/jquery-3.3.1.min.js"></script>
<script>

let input = document.querySelector('input');


input.onkeyup = function() {
$.get('/search?q=' + input.value, function(data) {
document.querySelector('ul').innerHTML = data;
});
};

</script>

Here, we’re using a JavaScript library called jQuery, which provides us with some abstractions.
We’re selecting the input element, and every time the keyup event occurs, we want to change
the page. The keyup event will happen when we press a key in the input box, and let go. We use
jQuery’s $.get function to make a GET request to our server at the /search?q= route, with the
value of the input box appended. When we get some data back, the $.get function will call an
anonymous function (a callback) to set the innerHTML of the ul on our page to that data .
And notice that we provided an empty opened and closed <ul> element in our template, but
we’ll change the HTML inside with what our server responds with.
On our server-side code, our search route is the mostly the same as before, but the template,
search.html , will only have <li> elements, one for each matching word:

{% for word in words %}


<li>{{ word }}</li>
{% endfor %}

Since we don’t extend a layout.html , this route will only return an incomplete fragment of
HTML. But that still works because our JavaScript code is putting it inside a complete page, our
index.html .

With words2 , we have our server return data more efficiently, in a format called JSON, JavaScript
Object Notation:

Then, in our JavaScript code on the page, we’ll write each of them as an <li> , generating the
markup in the browser instead of on our server.
The Python code in application.py uses a jsonify function to return a list as a JSON object:

@app.route("/search")
def search():
q = request.args.get("q")
words = [word for word in WORDS if q and word.startswith(q)]
return jsonify(words)

And our index.html has the JavaScript to append each word as an <li> element:
let input = document.querySelector('input');
input.onkeyup = function() {
$.get('/search?q=' + input.value, function(data) {
let html = '';
for (word of data) {
html += '<li>' + word + '</li>';
}
document.querySelector('ul').innerHTML = html;
});
};

In fact, since the browser can run JavaScript that can search a list, we can write all of this in JavaScript,
without making a request to a server:

let input = document.querySelector('input');


input.onkeyup = function() {
let html = '';
if (input.value) {
for (word of WORDS) {
if (word.startsWith(input.value)) {
html += '<li>' + word + '</li>';
}
}
}
document.querySelector('ul').innerHTML = html;
};

When we get input from the user, we’ll just iterate over a WORDS array and append any word
string that starts with the input’s value to the page as an <li> element.
We’ll also have to include a large.js file that creates that global variable, WORDS , which starts
with the following:

let WORDS = [
"a",
"aaa",
"aaas",
"aachen",
"aalborg",
"aalesund",
"aardvark",
...

Even with a relatively simple example, we see how there can be a few different approaches to solving
the same problem. With version 0, our server sent back entire, complete pages on every search. With
version 1, we used JavaScript to make requests without navigating to another page, getting back data
with markup from the server. With version 2, we used JavaScript, but only got back data from the
server, that we then marked up in the browser. Finally, with version 3, we used JavaScript and the word
list to accomplish the same results, but all within the browser. Each approach has pros and cons, so
depending on what tradeoffs we value, one solution might be better than the rest.
This is CS50
Harvard Extension School
WEEK 8
Spring 2019

Menu
MySQL
Lecture 8
Last time (and next times)
Logging in
Databases
SQLite
SQL
lecture.db
Problems

Last time (and next times)

The CS50 Hackathon is coming up, an overnight event where students and TFs will work together on
final projects. See the Muppet Hackathon short video!
The CS50 Fair will come after, where students will demo their final projects!
We’ve been introduced to web programming, where we’ve learned to use Flask, a framework written in
the language of Python, to build dynamic web-based applications.
The internal structure of our applications have followed a paradigm, or methology, called MVC, Model-
View-Controller, where code used for different functions are organized in different files and folders,
and interact with each other in predictable ways.
But until now, we haven’t had much code in our Model layer. We’ve used CSVs to read and write data,
but those rows of text are a bit clunky to work with.
This week’s problem set, CS50 Finance will use a database language, SQL, to work with data more
efficiently. The problem set will also use a real third-party API, application programming interface, to
get real-time data on stock prices, allowing users to “buy” and “sell” virtual stocks.

Logging in
When we log in to a website, with our username and password, we’re not prompted to log in again for
each page we visit after, usually until we explicitly log out.
It turns out that there is another web technology called cookies, small pieces of data that a website
can ask a browser to store on a user’s computer. Then, when the browser visits that website again, it
will automatically send that cookie back, like a virtual handstamp that identifies ourself to the server,
without having to enter our login information again. The cookie might store a long random string, to
prevent adversaries from easily guessing it, and the server will remember that it corresponds to our
account.
When we visit a site like Gmail for the first time, our browser will send HTTP headers like this:

GET / HTTP/1.1
Host: gmail.com
...

Then, Gmail’s server will reply with the login page. After we successfully log in, Gmail’s server will then
reply with headers like this:

HTTP/1.1 200 OK
Content-Type: text/html
Set-Cookie: session=value
...

The Set-Cookie header asks our browser to save the session and value key-value pair to
our computer; value will be a long random string or number that identifies us to the server.
If we, as the user, set our browser to not save cookies, we’ll have to log in for every page we visit.
But cookies might also identify us to advertisers, who can then track us across different sites we
visit, if those sites include embedded images or scripts that are from the same third-party
advertising service. So political entities like the EU have passed laws to help ensure companies
behind websites are explicit to users about the purpose of cookies they want to store.
When we visit Gmail again later, our browser will send the same value back as part of the Cookie
header:

GET / HTTP/1.1
Host: gmail.com
Cookie: session=value
...

And cookies can be set to expire by the server, which is why after some number of days, we might
be asked to log in again.
In today’s source code directory in the CS50 IDE, we’ll first look at the example called store . We’ll
cd into the store directory, and call flask run in our terminal to start our IDE’s web server.
Then, we can visit the link to see a simple “store”:

We can change the quantity of each item, and click “Purchase” to see them added to a virtual cart
that tell us the count of each item we have.
We can keep shopping, but even if we close the window and reopen it, we see that our cart still
saved the number of each item we added before.
With cookies, we can implement sessions on our server. A session is an abstraction of saved state for
each user’s visit to our website; our server might give me a cookie with session=12345 and you a
cookie with session=78910 , and store some data for each user who visits, based on that session
value.
With Flask, we only need a few lines of code to use this abstraction:

...
from flask_session import Session
...
app.config["SESSION_PERMANENT"] = False
app.config["SESSION_TYPE"] = "filesystem"
Session(app)
...
@app.route("/update", methods=["POST"])
def update():
for item in request.form:
session[item] = int(request.form.get(item))
return redirect("/cart")

We set up our Flask app to use the Session library, and in each of our routes, we’ll have access to a
session dictionary, which we can store data in our server’s memory or filesystem for each
specific user.
We’ll introduce this library in a bit more detail in this week’s problem set.

Databases

So far, we’ve seen how we can store data in CSV files. But finding data requires linear search, and we
have to open, read, change, and save the entire file if we want to make a change.
Databases are a set of data, usually organized and managed by some software for us. Database
management software such as MySQL and Postgres commonly allow for selecting, inserting, updating,
and deleting data with a language called SQL, Structured Query Language.
A spreadsheet with rows and columns is like a simple database. Each column is a specific field of data,
and each row contains values for one entry in the database.
For example, we might store information about students, with a column for an ID, a column for a
name, and so on.
We might have different sheets, or tabs, within the same spreadsheet, and in databases these would be
called tables.
With Google Sheets, we can create a spreadsheet called “university”, and create two sheets within that,
one called “students” and one called “faculty”:

With a database, we can represent data in the same way, and SQL also requires us to specify the type
of data we want to store in each field. By specifying the type, our database software can store and
optimize our data more efficiently.
There are different variants, or dialects, of SQL, depending on what database program we’re using. In
SQLite, a popular, lightweight software we’ll be using, data types include:
BLOB

INTEGER

NUMERIC

REAL

TEXT
And within the INTEGER type, we might specify the size as a smallint , integer , or bigint ,
each of which might be stored with a different number of bytes. We might be tempted to use a
bigint , for example, but that might use unnecessary space and be more and more costly as we have
more and more rows to store.
REAL numbers can be a real type for a floating-point number, or double precision , with more
bytes allocated.
The NUMERIC type represents other types of numbers, such as a boolean . date , datetime ,
numeric(scale, precision) (for decimal numbers with a specific number of digits), time , and
timestamp .

A TEXT field can be a char(n) field, a fixed number of characters; a varchar(n) , a variable
number of characters up to n, or a larger text field with no specified maximum. And we can infer,
from our experience with arrays in C, that a fixed number of characters for each row would be faster to
index into, since we can calculate exactly where each value will be. Our database software will provide
this abstraction, and use the right data structures and algorithms for storing and accessing our data,
faster than something we might be able to implement ourselves.

SQLite

We can open the CS50 IDE and use the terminal to explore SQLite, a popular database management
software. SQLite is a technology for storing data on a server’s disk as a binary file, so it doesn’t have a
server or other software to set up. (Other technologies, like Postgres and MySQL, use a running
program that acts as a database server, which has better performance but requires some configuration
and memory.) Instead, we’ll use the sqlite3 program on our IDE as a human interface to a database
file, and in our code, we’ll use abstractions that can open and work with an SQLite database file.
We’ll start by typing sqlite3 froshims3.db , and we’ll be able to create a table in that database
with a command like CREATE TABLE 'registrants' ('id' integer, 'name' varchar(255),
'dorm' varchar(255)); . We specify the name of our new table, and for each column or field, the
type of data. And by convention, we use 255 for our varchar fields, since that used to be the
maximum for many older databases, and are probably enough for all realistic possibilities, without
being too excessive.
Nothing happens at our command line after, but we can type .schema and see the schema, or
description, of our table:

We can add a row to our table with INSERT INTO registrants (id, name, dorm) VALUES(1,
'Brian', 'Pennypacker'); , and conventionally the uppercased words are SQL keywords, while the
rest are words specific to our data.
We can see our table with SELECT * FROM registrants; , and see our table printed out.
And we can easily filter our data with SELECT * FROM registrants WHERE dorm = 'Matthews'; .
We can specify just the fields we want to get back, too, with something like SELECT name FROM
registrants WHERE dorm = 'Matthews';

We can change rows with something like UPDATE registrants SET dorm = 'Canaday' WHERE id
= 1; .

We can delete rows with something like DELETE FROM registrants WHERE id = 1; .
The CS50 IDE also has a graphical program, phpLiteAdmin, which can open SQLite files too. We can
double-click froshims3 in our files list on the IDE, and be able to browse rows. We can try to insert a
row, too, by clicking on the name of the table and the Insert link, and see the SQL that phpLiteAdmin
runs:

We can start over by deleting the file froshims3.db , and creating a blank file with the same name.
Now we can double-click it, and phpLiteAdmin will let us create a new table. We’ll create 7 fields this
time, and we’ll have more options:

For our first field, id , we’ll make that an integer , and we can use the Primary Key option to
indicate to our database that this column will be the one used to uniquely identify each record.
We’ll use autoincrement so our database will automatically provide the next value for id each
time we add a record, and the Not Null option ensures that there is a value for that field for each
record.
Then we’ll have a name field that is a varchar with a maximum length of 255, and make that
Not NULL.
We’ll have a dorm field, varchar with a maximum length of 255, and a phone field that we’ll
set to a fixed char of 10 characters. If we were to use a numeric field, phone numbers that start
with 0 would lose those leading zeroes, and we might consider needing more characters if we
wanted to support international phone numbers.
We’ll have an email field as a varchar with maximum length 255, and store a birthdate as
a date . For sports , there might be more information we need someday, so we’ll have that as a
varchar with a maximum length of 1024, an even power of 2.
We can add fields to the table later, too.
We can click the SQL tab to insert rows manually, or use the Insert tab, but writing code to execute
queries will be lead to the most organized data, since we’ll be able to set everything consistently.

SQL

We can import the CS50 SQL library to execute queries easily, with lecture.py :

from cs50 import SQL

db = SQL("sqlite:///froshims.db")

rows = db.execute("SELECT * FROM registrants)

for row in rows:


print(f"{row['name']} registered")

Here, we’re opening a file called froshims.db in the same directory and calling it db , using
SQL to open it. We call db.execute to run a query, and save the results into the rows list.
Then, we can iterate over each row and print out any fields we’d like.
We can run python lecture.py , and the CS50 SQL library will also helpfully print a debug line
showing what we sent to the database.
Since we can query a database in Python, we can also integrate that into a Flask application. We can
create a Flask application.py file:

from flask import Flask, render_template, request

from cs50 import SQL

app = Flask(__name__)

db = SQL("sqlite:///lecture.db")

@app.route("/")
def index():
rows = db.execute("SELECT * FROM registrants")
return render_template("index.html", rows=rows)

We select everything from the registrants table and store that in a variable called rows .
Then, we pass the results, rows , to the template.
In our template, templates/index.html , we’ll iterate over a list of rows that are passed in,
displaying each one’s name field:

{% extends "layout.html" %}

{% block body %}

<ul>
{% for row in rows %}

<li>{{ row["name"] }} registered</li>

{% endfor %}
</ul>

{% endblock %}

We can implement a search functionality too, by adding a q URL parameter:

...
@app.route("/")
def index():
q = request.args.get("q")
rows = db.execute(f"SELECT * FROM registrants WHERE name = '{q}'")
return render_template("index.html", rows=rows)

Now, only rows that have a matching name will be returned to our template to display.

lecture.db

A sample database of music metadata, lecture.db , is in this week’s source directory. Importantly, it
demonstrates how we can relate data in different tables.
With our database of students, we might have noticed that the dorm field will have the same strings
repeated over and over again. Instead of storing the same data, we can store a reference to some other
table of dorms, using fewer bytes to represent the same data.
In the sample database, we have a table for Albums, Artists, and Tracks. In the Album table, each row
has an AlbumId, Title, and ArtistId. The Artist table has an ArtistId and Name for each row, so by
joining the two tables together, we can figure out the artist’s name. And since each artist has multiple
albums, we’re saving space. If a row in the Artist table has more data, we can update it just once, rather
than in every row of the Album table, if that data was repeated there.
We can use phpLiteAdmin in the CS50 IDE, as before, to look at the tables and rows in lecture.db ,
and run a query like SELECT * FROM Album WHERE ArtistId = 1; to see all the albums by the
artist with ID 1. We can use SQL to join tables, getting the artist’s name too, with SELECT * FROM
Album, Artist WHERE Album.ArtistId = Artist.ArtistId; . The name from the Artist table
will also be selected (since we said SELECT * ), and matched to each row in Album where the
ArtistId field is the same. Another way to express the same idea would be SELECT * FROM
Artist JOIN Album ON Artist.ArtistId = Album.ArtistId; .

Normalizing our database is this method of storing redundant data once, and using a reference to
another table as needed to save space, with a slight cost to performance and simplicity.
In SQL, we can add a UNIQUE constraint to a field, without using it as a primary key. We can also
indicate that a field should be an INDEX , where the database should build an index (with a tree, hash
table, or some other data structure) for looking up fields more quickly. Finally a FOREIGN KEY is what
we would call a field that refers to a row in some other table; for example, the ArtistId field in the
Album table is a foreign key to the Artist table.

SQL also has functions for numeric operations like AVG , COUNT , MAX , MIN , and SUM .

Problems

One problem with databases is race conditions, where the timing of two actions or events cause
unexpected behavior.
For example, when we sign up for a new account on a website, it might ask us for a username we’d
like. If the username is taken already, we’ll see a message that tells us so, and if not, we’re able to start
creating an account with that username. And if we take our time with putting in the rest of our
information, that username might be taken by someone else by the time we actually submit the form.
But if the web server didn’t check again, there would be a problem where the same username is now
reassigned to us!
If two people, or web server threads, are checking the state of a variable at the exact same time, and
then make a change based on that, after some amount of time, then there is a race condition in that
window of time.
Another example is two people, withdrawing money from two different ATMs at the exact same time,
with the same account information. If the account has $100, but both people try to withdraw $100 at
the same time, each ATM might check the account balance, see there is a balance of $100, and saves
$0 back into the account. But each ATM did this, so a total of $200 would have been withdrawn!
Another example involves two roommates and a fridge. The first roommate comes home, and sees that
there is no milk in the fridge. So the first roommate leaves to the store to buy milk, and while they are
at the store, the second roommate comes home, sees that there is no milk, and leaves for another store
to get milk. Later, there will be two jugs of milk in the fridge. By leaving a note, we can solve this
problem. We can even lock the fridge so that our roommate can’t check whether there is milk, until
we’ve gotten back.
In the database world, we can also lock rows and tables. We can use transactions, where a set of
actions is guaranteed to happen together. That property is called atomicity, where, for example, we
can check the value of a row and change it, without anything else being able to read or change that
value.
We might have also seen some websites, like for airline tickets or hotel rooms, provide a window of
time after we add something to our cart, that reserves it for us, letting us fill out our information
without someone else purchasing it in the meantime.
Another problem in SQL is called a SQL injection attack, where an adversary can execute their own
commands on our database. Earlier, we passed in the q URL parameter to filter registrants based on
their name, with rows = db.execute(f"SELECT * FROM registrants WHERE name = '{q}'") .
But if q had the value of Brian'; DELETE FROM registrants WHERE name = 'Brian , it would
end our previous statement and run another statement.
To guard against this, we can sanitize user data, or escape characters like semicolons and single
quotes, such that they are interpreted as part of the string, rather than special characters that end
strings or commands.
The CS50 SQL Library allows us to escape user input with the execute function, and we can write
rows = db.execute("SELECT * FROM registrants WHERE name = :name", name=q) where we
use a special placeholder, :name , that will be escaped before it is substituted into the string.
Another example would be typing in ' OR '1' = '1 in a password field; if the query is
db.execute(f"SELECT * FROM users WHERE username = '{username}' AND password =
'{password}'") , then substituting that password would get us db.execute("SELECT * FROM
users
 WHERE username = 'me@examplemailprovider.com' AND password = '' OR '1' =
'1'") , and that would select that user since 1 = 1 .

On the other hand, the escaped input would be substituted as db.execute("SELECT * FROM
users
 WHERE username = 'me@examplemailprovider.com' AND password = '\' OR \'1\'
= \'1'") , preventing the intention of our command from being changed since the single quotes
are escaped.
This is CS50
Harvard Extension School
Spring 2019
Week 10
Menu
THE END
Lecture 9
Last times
Next times
Jeopardy

Last times

In week 0, we said:
what ultimately matters in this course is not so much where you end up relative to your
classmates but where you, in Week 10, end up relative to yourself in Week 0
And indeed, looking back now, we’ve learned new technologies and concepts every week. Even though
we might not have ever felt like we were completely comfortable, we’ve certainly gained experience
and knowledge in more and more areas.
In our first problem set, we probably found it difficult to print out a pyramid of # marks. But now,
we’re able to build an entire web application for CS50 Finance, with a back-end and a front-end, using
a number of languages and frameworks.
One of your classmates wrote, “Beside learning the new material, I felt that I started to be more
‘comfortable with being uncomfortable’ and being okay with working towards building a skill rather
than just getting the right answer…”, which beautifully encapsulates one of the goals of this course.
Indeed, even in the real world, languages and technologies evolve over time, but we’ve come to
appreciate principles of abstractions and tradeoffs and algorithms and data structures that will guide
us in solving problems with computer science.
In fact, David himself had to learn Python a few years ago! And he too had to figure out how to
build the Frosh IMs site years ago, since web development wasn’t part of the CS curriculum back
then.
A big thank you to all the staff, including the production team, the TFs and CAs, and course heads,
without whom this course would not be possible.
In particular: Cheng, who has been writing these notes; Veronica, who has been translating the
course’s materials in Spanish; and Brian, who has been an extraordinary head TF these past few
years.
In week 0, we learned that algorithms were a series of steps to solve a problem, given some inputs and
outputs:

And to represent those inputs and outputs in computers, we needed to understand bits, binary
digits, atop which we discovered abstractions like ASCII for text characters, Unicode for emoji, and
RGB for pixels in images and videos.
We wrote some algorithms down in pseudocode, and were already able to discuss running time of
algorithms in terms of steps required to solve a problem of size n:

Finally, we were introduced to Scratch, where we practiced using these new programming
concepts.
Then, in week 1, we transitioned to learning C, where we used only text to write source code, which
was then compiled into machine code, binary that our computer could actually run.
In week 2, we learned more programming features like functions, arguments, and discussed memory,
where we could change bytes on the hardware of our computer, to store and work with data:

In week 3, we learned about strings in C, which led us to discover pointers. In week 4, we used pointers
to build data structures like a trie:

In week 5, we learned about (many) web technologies like HTTP, DNS, IP, HTML, CSS, and JavaScript.
We looked at request headers and response codes in HTTP, and we looked at HTML’s Document Object
Model (DOM) that represents a webpage in the browser:

And as an aside, in one year’s Harvard-Yale football game, some Yale students actually managed
to prank much of the Harvard audience to hold signs that read “WE SUCK”!
In week 6, we learned Python, a higher-level language that includes features like linked lists and hash
tables, so we don’t need to implement them ourselves before we start writing more useful programs.
In week 7, we looked at one particular paradigm of web programming, Model-View-Controller, which is
just one popular pattern for organizing our code:

Finally, in week 8, we looked at SQL and databases more generally, with which we can query data
efficiently and cleanly.

Next times

In a few weeks, we’re hosting the CS50 Hackathon, an evening at which students and staff will
collaborate on final projects.
Then, we’ll have the CS50 Fair, where we’ll demo student projects for the entire campus to check out.
We watch a short video of a Muppet submitting a final project just before the deadline!
In the real world, we might not use the CS50 IDE, but command-line tools for macOS and Windows
like:
Xcode
Windows Subsystem for Linux
We should probably learn about and use Git, a version-control software so we can track and revert
changes to our source code, among other useful features.
As students, we can sign up for free private repositories on GitHub, a service where we can store our
code in the cloud.
There are lots of text editors, too, like:
Atom
VS Code
Vim
We can host static websites with free services like:
GitHub Pages
Netlify

And web apps with services like:
Heroku
Wix Code

We can look into bigger providers, who have more sophisticated services, like:
AWS
Azure
Google Cloud
There are lots of sites where we can learn more about technology:
https://www.reddit.com/r/programming
https://news.ycombinator.com/
https://techcrunch.com/
https://stackoverflow.com/
https://serverfault.com/
https://www.google.com/

CS50, too, has a presence online in various communities:
https://www.reddit.com/r/cs50
https://discord.gg/QYZQfZ6
https://www.facebook.com/groups/cs50
https://gitter.im/cs50/x
https://www.instagram.com/cs50/
https://www.linkedin.com/groups/7437240/
https://www.quora.com/topic/Harvard-CS50
https://cs50x.slack.com/
https://www.snapchat.com/add/cs50
http://cs50.stackexchange.com/
https://twitter.com/cs50
In fact, after this class, we are now qualified to be staff for CS50, since we’ll (hopefully!) take more CS
courses in the spring and become even more knowledgeable by next fall.

Jeopardy

We take a few staff and student volunteers, and play Jeopardy! These questions and answers were
submitted by students as suggested quiz questions.
The first question: What institution developed Scratch?
Answer: MIT
Question: What color is the cat?
Answer: Orange
Question: What is running time of merge sort?
Answer: O(n log n)
Question: What is the tool you can run if your program has memory leaks?
Answer: Valgrind
Question: What is rubber duck debugging?
Answer: When you read your code to a rubber duck and try to find the mistake.
Question: Is the C the absolute worst?
Answer: Yes
Question: What is the name of an animal you don’t want to find under your pillow? (That shares a
name with a programming language.)
Answer: Python
Question: Who is your friend?
Answer: Bootstrap
Question: What is the (HTTP response) code for OK?
Answer: 200
Question: What do you call the situation where a structure refers to itself in its definition?
Answer: Recursion
Question: In what state, besides Massachusetts, does Google sometimes think we reside?
Answer: Kansas

Você também pode gostar