Escolar Documentos
Profissional Documentos
Cultura Documentos
Statistics and probability are sections of mathematics that deal with data collection and analysis. Probability is the study of chance and is a very fundamental subject
that we apply in everyday living, while statistics is more concerned with how we handle data using different analysis techniques and collection methods. These two
subjects always go hand in hand and thus you can't study one without studying the other.
Still need help after using our statistics resources? Use our service to find a statistics tutor.
Introduction to Statistics
This section deals with introducing the concept of statistics and its relevance to everyday life. data is defined and the various methods of data collection,
like sampling are also introduced.
Averages
We have all used the term 'Average' in some form or another at some point in our lives. Statistical averages are introduced and defined in this section. Mean, median,
mode and range are discussed both at an introductory level and also at a more advanced level, like the concept of Assumed mean.
Frequency and its aspects like Cumulative Frequency are also discussed.
Probability
This section serves as an introduction to the concept of Probability, including definitions of the different terminology and the fundamental method of calculating
Probability.
Different concepts like Dependence and Independence of Events are discussed including the methods of dealing with such concepts.
Probability Distributions
This section sets the stage for a more advanced view of Probability by introducing the idea of Random Variable and the meaning and types of probability distributions
including Discrete and Continous Probability Distributions.
Joint Probability Distributions are also discussed. This entire section is fundamental in understanding the way Probability and Statistics interact.
Introduction to Statistics
Statistics is a branch of mathematics that deals with the collection, analysis and interpretation of data.
Data can be defined as groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. In layman's terms, data in
statistics can be any set of information that describes a given entity. An example of data can be the ages of the students in a given class. When you collect those ages,
that becomes your data.
A set in statistics is referred to as a population. Though this term is commonly used to refer to the number of people in a given place, in statistics, a population refers to
any entire set from which you collect data.
Census data collection is a method of collecting data whereby all the data from each and every member of the population is collected.
For example, when you collect the ages of all the students in a given class, you are using the census data collection method since you are including all the members of
the population (which is the class in this case).
This method of data collection is very expensive (tedious, time consuming and costly) if the number of elements (population size) is very large. To understand the scope
of how expensive it is, think of trying to count all the ten year old boys in the country. That would take a lot of time and resources, which you may not have.
Sample data collection, which is commonly just referred to as sampling, is a method which collects data from only a chosen portion of the population.
Sampling assumes that the portion that is chosen to be sampled is a good estimate of the entire population. Thus one can save resources and time by only collecting
data from a small part of the population. But this raises the question of whether sampling is accurate or not. The answer is that for the most part, sampling is
approximately accurate. This is only true if you choose your sample carefully to be able to closely approximate what the true population consists of.
Sampling is used commonly in everyday life, for example, all the different research polls that are conducted before elections. Pollsters don't ask all the people in a given
state who they'll vote for, but they choose a small sample and assume that these people represent how the entire population of the state is likely to vote. History has
shown that these polls are almost always close to accuracy, and as such sampling is a very powerful tool in statistics.
Experimental data collection involves one performing an experiment and then collecting the data to be further analyzed. Experiments involve tests and the results of
these tests are your data.
An example of experimental data collection is rolling a die one hundred times while recording the outcomes. Your data would be the results you get in each roll. The
experiment could involve rolling the die in different ways and recording the results for each of those different ways.
Experimental data collection is useful in testing theories and different products and is a very fundamental aspect of mathematics and all science as a whole.
Observational data collection method involves not carrying out an experiment but observing without influencing the population at all. Observational data collection is
popular in studying trends and behaviors of society where, for example, the lives of a bunch of people are observed and data is collected for the different aspects of
their lives.
Data
Data can be defined as groups of information that represent the qualitative or quantitative attributes of a variable or set of variables, which is the same as saying that
data can be any set of information that describes a given entity. Data in statistics can be classified into grouped data and ungrouped data.
Any data that you first gather is ungrouped data. Ungrouped data is data in the raw. An example of ungrouped data is a any list of numbers that you can think of.
Grouped Data
Grouped data is data that has been organized into groups known as classes. Grouped data has been 'classified' and thus some level of data analysis has taken place,
which means that the data is no longer raw.
A data class is group of data which is related by some user defined property. For example, if you were collecting the ages of the people you met as you walked down
the street, you could group them into classes as those in their teens, twenties, thirties, forties and so on. Each of those groups is called a class.
Each of those classes is of a certain width and this is referred to as the Class Interval or Class Size. This class interval is very important when it comes to drawing
Histograms and Frequency diagrams. All the classes may have the same class size or they may have different classes sizes depending on how you group your data. The
class interval is always a whole number.
Below is an example of grouped data where the classes have the same class interval.
Age (years)
Frequency
0-9
12
10 - 19
30
20 - 29
18
30 - 39
12
40 - 49
50 - 59
60 - 69
Solution:
Below is an example of grouped data where the classes have different class interval.
Age (years)
Frequency
Class Interval
0-9
15
10
10 - 19
18
10
20 - 29
17
10
30 - 49
35
20
50 - 79
20
30
The first step is to determine how many classes you want to have. Next, you subtract the lowest value in the data set from the highest value in the data set and then
you divide by the number of classes that you want to have:
Example 1:
Solution:
Class interval should always be a whole number and yet in this case we have a decimal number. The solution to this problem is to round off to the nearest whole
number.
In this example, 2.8 gets rounded up to 3. So now our class width will be 3; meaning that we group the above data into groups of 3 as in the table below.
Number
Frequency
1-3
4-6
7-9
10 - 12
13 - 15
16 - 18
19 - 21
22 - 24
25 - 27
28 - 30
On the other hand, class boundaries are not always observed in the frequency table. Class boundaries give the true class interval, and similar to class limits, are also
divided into lower and upper class boundaries.
The relationship between the class boundaries and the class interval is given as follows:
As a result of the above, the lower class boundary of one class is equal to the upper class boundary of the previous class.
Class limits and class boundaries play separate roles when it comes to representing statistical data diagrammatically as we shall see in a moment.
Sampling
Sampling is a fundamental aspect of statistics, but unlike the other methods of data collection, sampling involves choosing a method of sampling which further
influences the data that you will result with. There are two major categories in sampling: probability and non-probability sampling.
Probability Sampling
Under probability sampling, for a given population, each element of that population has a chance of being picked to part of the sample. In other words, no single
element of the population has a zero chance of being picked
The odd/chances/probability of picking any element is known or can be calculated. This is possible if we know the total number in the entire population such that we
are then able to determine that odds of picking any one element.
Probability sampling involves random picking of elements from a population, and that is the reason as to why no element has a zero chance of being picked to be part
of a sample.
1.
Random Sampling
Random sampling is the method that most closely defines probability sampling. Each element of the sample is picked at random from the given
population such that the probability of picking that element can be calculated by simply dividing the frequency of the element by the total number of
elements in the population. In this method, all elements are equally likely to be picked if they have the same frequency.
2.
Systematic Sampling
Systematic sampling is the method that involves arranging the population in a given order and then picking the nthelement from the ordered list of all the
elements in the population. The probability of picking any given element can be calculated but is not likely to be the same for all elements in the
population regardless of whether they have the same frequency.
3.
Stratified Sampling
Stratified sampling involves dividing the population into groups and then sampling from those different groups depending on a certain set criteria.
For example, dividing the population of a certain class into boys and girls and then from those two different groups picking those who fall into the specific
category that you intend to study with your sample.
4.
Cluster Sampling
Cluster sampling involves dividing up the population into clusters and assigning each element to one and only one cluster, in other words, an element
can't appear in more than one cluster.
5.
Multistage Sampling
Multistage sampling involves use of more than one probability sampling method and more than one stage of sampling, for example for using the stratified
sampling method in the first stage and then the random sampling method in the second stage and so on until you achieve the sample that you want.
6.
Under probability proportional to size sampling, the sample is chosen as a proportion to the total size of the population. It is a form of multistage sampling
where in stage one you cluster the entire population and then in stage two you randomly select elements from the different clusters, but the number of
elements that you select from each cluster is proportional to the size of the population of that cluster.
Non-Probability Sampling
Unlike probability sampling, under non-probability sampling certain elements of the population might have a zero chance of being picked. This is because we can't
accurately determine the chances/probability of picking a given element so we do not know whether the odds of picking that element are zero or greater than zero.
Non-probability sampling may not always be a consequence of the sampler's ignorance of the total number of elements in the population but may be a result of the
sampler's bias in the way he chooses the sample by excluding some elements.
1.
Quota Sampling
Quota sampling is similar to stratified sampling only that in this case, after the population is divided into groups, the elements are then sampled from the
group using the sampler's judgement and as a consequence the method loses any aspect of being random and can be extremely biased.
2.
Accidental sampling is a method of sampling where by the sampler picks the sample based on the fact that the elements that he/she picks are
conveniently close at the moment. For example, if you walked down the street and sampled the first ten people you meet, the fact that they happened to
be there is convenient for you but accidental for them which leads to the name of the method.
3.
Purposive or judgemental sampling is a method of sampling where by the sampler picks the sample from the entire population solely based on the his/her
judgement. The sampler controls to a very large extent which elements have a chance of being selected to be in the sample and which ones don't.
4.
Voluntary Sampling
Voluntary sampling, as the name suggests, involves picking the sample based on which elements of the population volunteer to participate in the sample.
This is the most common method used in research polls.
5.
Snowball Sampling
Snowball sampling is a method of sampling that relies on referrals of previously selected elements to pick other elements that will participate in the
sample.
Averages
In statistics, an average is defined as the number that measures the central tendency of a given set of numbers. There are a number of different averages including but
not limited to: mean, median, mode and range.
Mean
Mean is what most people commonly refer to as an average. The mean refers to the number you obtain when you sum up a given set of numbers and then divide this
sum by the total number in the set. Mean is also referred to more correctly as arithmetic mean.
The mean is found by adding up all the a's and then dividing by the total number, n
Solution
The first step is to count how many numbers there are in the set, which we shall call n
The last step is to find the actual mean by dividing the sum by n
Mean can also be found for grouped data, but before we see an example on that, let us first define frequency.
Frequency in statistics means the same as in everyday use of the word. The frequency an element in a set refers to how many of that element there are in the set. The
frequency can be from 0 to as many as possible. If you're told that the frequency an element a is 3, that means that there are 3 as in the set.
Example 2
Age (years)
Frequency
10
11
12
13
14
Solution
The first step is to find the total number of ages, which we shall call n. Since it will be tedious to count all the ages, we can find nby adding up the frequencies:
Next we need to find the sum of all the ages. We can do this in two ways: we can add up each individual age, which will be a long and tedious process; or we can use the
frequency to make things faster.
Since we know that the frequency represents how many of that particular age there are, we can just multiply each age by its frequency, and then add up all these
products.
In statistics there are two kinds of means: population mean and sample mean. A population mean is the true mean of the entire population of the data set while a
sample mean is the mean of a small sample of the population. These different means appear frequently in both statistics and probability and should not be confused
with each other.
Population mean is represented by the Greek letter (pronounced mu) while sample mean is represented by x(pronounced x bar). The total number of elements in a
population is represented by N while the number of elements in a sample is represented by n. This leads to an adjustment in the formula we gave above for calculating
the mean.
The sample mean is commonly used to estimate the population mean when the population mean is unknown. This is because they have the same expected value.
Median
The median is defined as the number in the middle of a given set of numbers arranged in order of increasing magnitude. When given a set of numbers, the median is
the number positioned in the exact middle of the list when you arrange the numbers from the lowest to the highest. The median is also a measure of average. In higher
level statistics, median is used as a measure of dispersion. The median is important because it describes the behavior of the entire set of numbers.
Example 3
Solution
From the definition of median, we should be able to tell that the first step is to rearrange the given set of numbers in order of increasing magnitude, i.e. from the lowest
to the highest
Then we inspect the set to find that number which lies in the exact middle.
Lets try another example to emphasize something interesting that often occurs when solving for the median.
Example 4
Solution
As in the previous example, we start off by rearranging the data in order from the smallest to the largest.
Next we inspect the data to find the number that lies in the exact middle.
We can see from the above that we end up with two numbers (4 and 5) in the middle. We can solve for the median by finding the mean of these two numbers as
follows:
Mode
The mode is defined as the element that appears most frequently in a given set of elements. Using the definition of frequency given above, mode can also be defined
as the element with the largest frequency in a given data set.
For a given data set, there can be more than one mode. As long as those elements all have the same frequency and that frequency is the highest, they are all the modal
elements of the data set.
Example 5
Solution
Mode = 3 and 15
As we saw in the section on data, grouped data is divided into classes. We have defined mode as the element which has the highest frequency in a given data set. In
grouped data, we can find two kinds of mode: the Modal Class, or class with the highest frequency and the mode itself, which we calculate from the modal class using
the formula below.
where
f0 is the frequency of the class before the modal class in the frequency table
f2 is the frequency of the class after the modal class in the frequency table
Example 6
Find the modal class and the actual mode of the data set below
Number
Frequency
1-3
4-6
7-9
10 - 12
13 - 15
16 - 18
19 - 21
22 - 24
25 - 27
28 - 30
Solution
Modal class = 10 - 12
where
L = 10
f1 = 9
f0 = 4
f2 = 2
h=3
therefore,
Range
The range is defined as the difference between the highest and lowest number in a given data set.
Example 7
Solution
Assumed Mean
In the section on averages, we learned how to calculate the mean for a given set of data. The data we looked at was ungrouped data and the total number of elements
in the data set was not that large. That method is not always a realistic approach especially if you're dealing with grouped data.
Assumed mean, like the name suggests, is a guess or an assumption of the mean. Assumed mean is most commonly denoted by the letter a. It doesn't need to be
correct or even close to the actual mean and choice of the assumed mean is at your discretion except for where the question explicitly asks you to use a certain
assumed mean value.
Assumed mean is used to calculate the actual mean as well as the variance and standard deviation as we'll see later.
It's very important to remember that the above formula only applies to grouped data with equal class intervals.
fi is the frequency of each class, we find the total frequency of all the classes in the data set (fi) by adding up all the fi 's
where h is the class interval and each di is the difference between the mid element in a class and the assumed mean.
Therefore ui becomes
Let's try an example to see how to apply the assumed mean method for finding mean.
Example 1
The student body of a certain school were polled to find out what their hobbies were. The number of hobbies each student had was then recorded and the data
obtained was grouped into classes shown in the table below. Using an assumed mean of 17, find the mean for the number of hobbies of the students in the school.
Number of hobbies
Frequency
0-4
45
5-9
58
10 - 14
27
15 - 19
30
20 - 24
19
25 - 29
11
30 - 34
35 - 40
Solution
We have been given the assumed mean a as 17 and we know the formula for finding mean from the assumed mean as
we can find the class interval by using the class limits as follows:
We now have one component we need and we're one step closer to finding the mean.
So we can solve the rest of this problem using a table where by we find each remaining component of the formula and then substitute at the end:
Hobbies
Frequency fi
xi
di = x i - a
ui = dih
fiui
0-4
45
-15
-3
-135
5-9
58
-10
-2
-116
10 - 14
27
12
-5
-1
-27
15 - 19
30
17
20 - 24
19
22
19
25 - 29
11
27
10
22
30 - 34
32
15
24
35 - 40
37
20
fi = 200
substituting
fiui = -202
Cumulative Frequency
Cumulative frequency is defined as a running total of frequencies. The frequency of an element in a set refers to how many of that element there are in the set.
Cumulative frequency can also defined as the sum of all previous frequencies up to the current point.
The cumulative frequency is important when analyzing data, where the value of the cumulative frequency indicates the number of elements in the data set that lie
below the current value. The cumulative frequency is also useful when representing data using diagrams like histograms.
The cumulative frequency is usually observed by constructing a cumulative frequency table. The cumulative frequency table takes the form as in the example below.
Example 1
The set of data below shows the ages of participants in a certain summer camp. Draw a cumulative frequency table for the data.
Age (years)
Frequency
10
11
18
12
13
13
12
14
15
27
Solution:
The cumulative frequency at a certain point is found by adding the frequency at the present point to the cumulative frequency of the previous point.
The cumulative frequency for the first data point is the same as its frequency since there is no cumulative frequency before it.
Age (years)
Frequency
Cumulative Frequency
10
11
18
3+18 = 21
12
13
21+13 = 34
13
12
34+12 = 46
14
46+7 = 53
15
27
53+27 = 80
Example 2
Plot the cumulative frequency curve for the data set below
Age (years)
Frequency
10
11
10
12
27
13
18
14
15
16
16
38
17
Solution:
Age (years)
Frequency
Cumulative Frequency
10
11
10
5+10 = 15
12
27
15+27 = 42
13
18
42+18 = 60
14
60+6 = 66
15
16
66+16 = 82
16
38
82+38 = 120
17
120+9 = 129
Percentiles
A percentile is a certain percentage of a set of data. Percentiles are used to observe how many of a given set of data fall within a certain percentage range; for example;
a thirtieth percentile indicates data that lies the 13% mark of the entire data set.
Calculating Percentiles
Let designate a percentile as Pm where m represents the percentile we're finding, for example for the tenth percentile, m} would be 10. Given that the total number of
elements in the data set is N
Quartiles
The term quartile is derived from the word quarter which means one fourth of something. Thus a quartile is a certain fourth of a data set. When you arrange a date set
increasing order from the lowest to the highest, then you divide this data into groups of four, you end up with quartiles. There are three quartiles that are studied in
statistics.
When you arrange a data set in increasing order from the lowest to the highest, then you proceed to divide this data into four groups,
the data at the lower fourth (14) mark of the data is referred to as the First Quartile.
The First Quartile is equal to the data at the 25th percentile of the data. The first quartile can also be obtained using the Ogive whereby
you section off the curve into four parts and then the data that lies on the last quadrant is referred to as the first quartile.
When you arrange a given data set in increasing order from the lowest to the highest and then divide this data into four groups , the
data value at the second fourth (24) mark of the data is referred to as the Second Quartile.
This is the equivalent to the data value at the half way point of all the data and is also equal to the the data value at the 50th percentile.
The Second Quartile can similarly be obtained from an Ogive by sectioning off the curve into four and the data that lies at the second
quadrant mark is then referred to as the second data. In other words, all the data at the half way line on the cumulative frequency curve
is the second quartile. The second quartile is also equal to the median.
When you arrange a given data set in increasing order from the lowest to the highest and then divide this data into four groups, the data
value at the third fourth (34) mark of the data is referred to as the Third Quartile.
This is the equivalent of the the data at the 75th percentile. The third quartile can be obtained from an Ogive by dividing the curve into
four and then considering all the data value that lies at the 34 mark.
The different quartiles can be calculated using the same method as with the median.
First Quartile
The first quartile can be calculated by first arranging the data in an ordered list, then finding then dividing the data into two groups. If the
total number of elements in the data set is odd, you exclude the median (the element in the middle).
After this you only look at the lower half of the data and then find the median for this new subset of data using the method for finding
median described in the section on averages.
Second Quartile
The second quartile is the same as the median and can thus be found using the same methods for finding median described in the
section on averages.
Third Quartile
The third quartile is found in a similar manner to the first quartile. The difference here is that after dividing the data into two groups,
instead of considering the data in the lower half, you consider the data in the upper half and then you proceed to find the Median of this
subset of data using the methods described in the section on Averages.
As mentioned above, we can obtain the different quartiles from the Ogive, which means that we use the cumulative frequency to calculate the quartile.
Given that the cumulative frequency for the last element in the data set is given as f c, the quartiles can be calculated as follows:
The quartile is then located by matching up which element has the cumulative frequency corresponding to the position obtained above.
Example 3
Find the First, Second and Third Quartiles of the data set below using the cumulative frequency curve.
Age (years)
Frequency
10
11
10
12
27
13
18
14
15
16
16
38
17
Solution:
Age (years)
Frequency
Cumulative Frequency
10
11
10
15
12
27
42
13
18
60
14
66
15
16
82
16
38
120
17
129
From the Ogive, we can see the positions where the quartiles lie and thus can approximate them as follows
Interquartile Range
The interquartile range is the difference between the third quartile and the first quartile.
Absolute Deviation
Absolute deviation for a given data set is defined as the average of the absolute difference between the elements of the set and the mean (average deviation) or the
median element (median absolute deviation).
which means that the average deviation is the average of the differences between each element of the data set and the mean.
Example 1
The heights of a group of 10 students randomly selected from a given school are as follows (in ft):
5.5, 3.5, 4.6, 6.1, 5.7, 5.11, 4.9, 5.0, 5.0, 5.5
Solution
a) To find the absolute deviation from the mean, we need to first find the mean of the heights.
The deviation from the mean for each of the elements in the data set is obtained by subtracting the mean from that element, as follows:
For 5.5:
We find all the deviations and then take their average (remember that we only consider their absolute values):
b) To find the absolute deviation from the median, we need to first find the median height for the data set.
We know that to find the median value, we arrange the elements in the data set in ascending or descending order and the find that element that lies in the middle.
Since we had an even number of elements in the data set, it comes as no surprise that we're unable to obtain a median by canceling out corresponding elements.
We're left with two elements and so we find their mean which then becomes our median.
Having obtained our median as 5.25, we can proceed to find the average deviation from the median using the same steps as in the previous question.
Population variance is the variance of the entire population and is denoted by 2 while sample variance is the variance of a sample space of the population; and is
denoted by S2
Standard deviation is the square root of variance. Standard deviation is a measure of how precise the mean of a population or sample is. It is used to indicate trends in
the elements in a given data set with respect to the mean, i.e, the spread of these elements from the mean.
Just as we have a population and sample variance, we also have a population and sample standard deviation. Population standard deviation is denoted by while the
sample standard deviation is denoted by S
Although absolute deviation is also a measure of dispersion, variance and standard deviation are better measures because of the way they're calculated. Calculating
variance involves squaring the differences (deviations) between the element and the mean and this makes the differences larger and thus more manageable. Making
the differences larger adds a weighting factor to them making trends easier to spot.
Standard deviation is simply the square root of variance, so we can calculate it by taking the square root of the above variance formulae:
The difference in calculating 2 and S2 is the average if found using the number of elements in the set for 2. By contrast, we use one less than the sample space size
for S2. The reason for this is that by using n-1 we ensure that S2 is an unbiased estimator of2.
Probability
Probability is the branch of mathematics that deals with the study chance. Probability deals with the study of experiments and their outcomes.
Experiment
An experiment in probability is a test to see what will happen incase you do something. A simple example is flipping a coin. When you
flip a coin, you are performing an experiment to see what side of the coin you'll end up with.
Outcome
An outcome in probability refers to a single (one) result of an experiment. In the example of an experiment above, one outcome would
be heads and the other would be tails.
Event
An event in probability is the set of a group of different outcomes of an experiment. Suppose you flip a coin multiple times, an example
of an event would the getting a certain number of heads.
Sample Space
A sample space in probability is the total number of all the different possible outcomes of a given experiment. If you flipped a coin once,
the sample space S would be given by:
If you flipped the coin multiple times, all the different combinations of heads and tails would make up the sample space. A sample space
is also defined as a Universal Set for the outcomes of a given experiment.
Notation of Probability
The probability that a certain event will happen when an experiment is performed can in layman's terms be described as the chance that something will happen.
Suppose that our experiment involves rolling a die. There are 6 possible outcomes in the sample space, as shown below:
The size of the sample space is often denoted by N while the number of outcomes in an event is denoted by n.
For the sample space given above, if the event is 2, there is only one 2 in the sample space, thus n = 1 and N = 6.
When an event has probability of one, we say that the event must happen and when the probability is zero we say that the event is impossible.
The total of all the probabilities of the events in a sample space add up to one.
Events with the same probability have the same likelihood of occurring. For example, when you flip a fair coin, you are just as likely to get a head as a tail. This is
because these two outcomes have the same probability i.e.
Events can be pided into two major categories dependent or Independent events.
Independent Events
When two events are said to be independent of each other, what this means is that the probability that one event occurs in no way affects the probability of the other
event occurring. An example of two independent events is as follows; say you rolled a die and flipped a coin. The probability of getting any number face on the die in no
way influences the probability of getting a head or a tail on the coin.
Dependent Events
When two events are said to be dependent, the probability of one event occurring influences the likelihood of the other event.
For example, if you were to draw a two cards from a deck of 52 cards. If on your first draw you had an ace and you put that aside, the probability of drawing an ace on
the second draw is greatly changed because you drew an ace the first time. Let's calculate these different probabilities to see what's going on.
If we don't return this card into the deck, the probability of drawing an ace on the second pick is given by
As you can clearly see, the above two probabilities are different, so we say that the two events are dependent. The likelihood of the second event depends on what
happens in the first event.
Conditional Probability
We have already defined dependent and independent events and seen how probability of one event relates to the probability of the other event.
Conditional probability deals with further defining dependence of events by looking at probability of an event given that some other event first occurs.
The above is read as the probability that B occurs given that A has already occurred.
Given two events A and B and given that these events are part of a sample space S. This sample space is represented as a set as in the diagram below.
The different regions of the set S can be explained as using the rules of probability.
Rules of Probability
When dealing with more than one event, there are certain rules that we must follow when studying probability of these events. These rules depend greatly on whether
the events we are looking at are Independent or dependent on each other.
This region is referred to as 'A intersection B' and in probability; this region refers to the event that both A and B happen. When we use the word and we are referring
to multiplication, thus A and B can be thought of as AxB or (using dot notation which is more popular in probability) AB
If A and B are dependent events, the probability of this event happening can be calculated as shown below:
If A and B are independent events, the probability of this event happening can be calculated as shown below:
Conditional probability for two independent events can be redefined using the relationship above to become:
The above is consistent with the definition of independent events, the occurrence of event A in no way influences the occurrence of event B, and so the probability that
event B occurs given that event A has occurred is the same as the probability of event B.
In probability we refer to the addition operator (+) as or. Thus when we want to we want to define some event such that the event can be A or B, to find the probability
of that event:
But remember from set theory that and from the way we defined our sample space above:
and that:
Mutual Exclusivity
Certain special pairs of events have a unique relationship referred to as mutual exclusivity.
Two events are said to be mutually exclusive if they can't occur at the same time. For a given sample space, its either one or the other but not both. As a consequence,
mutually exclusive events have their probability defined as follows:
An example of mutually exclusive events are the outcomes of a fair coin flip. When you flip a fair coin, you either get a head or a tail but not both, we can prove that
these events are mutually exclusive by adding their probabilities:
For any given pair of events, if the sum of their probabilities is equal to one, then those two events are mutually exclusive.
Multiplication Rule
From the definition of mutually exclusive events, we should quickly conclude the following:
Addition Rule
As we defined above, the addition rule applies to mutually exclusive events as follows:
Subtraction Rule
From the addition rule above, we can conclude that the subtraction rule for mutually exclusive events takes the form;
hence
Below is a venn diagram of a set containing two mutually exclusive events A and B.
In other words, a random variable is a generalization of the outcomes or events in a given sample space. This is possible since the random variable by definition can
change so we can use the same variable to refer to different situations. Random variables make working with probabilities much neater and easier.
A random variable in probability is most commonly denoted by capital X, and the small letter x is then used to ascribe a value to the random variable.
For examples, given that you flip a coin twice, the sample space for the possible outcomes is given by the following:
There are four possible outcomes as listed in the sample space above; where H stands for heads and T stands for tails.
To find the probability of one of those out comes we denote that question as:
which means that the probability that the random variable is equal to some real number x.
Let X be a random variable defined as the number of heads obtained when two coins are tossed. Find the probability the you obtain two heads.
So now we've been told what X is and that x = 2, so we write the above information as:
Since we already have the sample space, we know that there is only one outcomes with two heads, so we find the probability as:
From this example, you should be able to see that the random variable X refers to any of the elements in a given sample space.
There are two types of random variables: discrete variables and continuous random variables.
A quick example is the sample space of any number of coin flips, the outcomes will always be integer values, and you'll never have half heads or quarter tails. Such a
random variable is referred to as discrete. Discrete random variables give rise to discrete probability distributions.
Continuous is the opposite of discrete. Continuous random variables are those that take on any value including fractions and decimals. Continuous random variables
give rise to continuous probability distributions.
Probability Distributions
A probability distribution is a mapping of all the possible values of a random variable to their corresponding probabilities for a given sample space.
The probability distribution can also be referred to as a set of ordered pairs of outcomes and their probabilities. This is known as the probability function f(x).
The Cumulative Distribution Function (CDF) is defined as the probability that a random variable X with a given probability distribution f(x) will be found at a value less
than x. The cumulative distribution function is a cumulative sum of the probabilities up to a given point.
P(X = x)
6
6
6
6
6
For a discrete probability distribution, the set of ordered pairs (x,f(x)), where x is each outcome in a given sample space and f(x) is its probability, must follow the
following:
P(X = x) = f(x)
f(x) 0
x f(x) = 1
In other words, to get the cumulative distribution function, you sum up all the probability distributions of all the outcomes less than or equal to the given variable.
For example, given a random variable X which is defined as the face that you obtain when you toss a fair die, find F(3)
The probability function can also found from the cumulative distribution function, for example
given that you know the full table of the cumulative distribution functions of the sample space.
This is because the random variable X is continuous and as such can be infinitely divided into smaller parts such that the probability of selecting a real integer value x is
zero.
and so on.
While a discrete probability distribution is characterized by its probability function (also known as the probability mass function), continuous probability distributions
are characterized by their probability density functions.
Since we look at regions in which a given outcome is likely to occur, we define the Probability Density Function (PDF) as the a function that describes the probability that
a given outcome will occur at a given point.
For a continuous probability distribution, the set of ordered pairs (x,f(x)), where x is each outcome in a given sample space and f(x) is its probability, must follow the
following:
f(x) dx = 1
and
From the above, we can see that to find the probability density function f(x) when given the cumulative distribution function F(x);
whereby the above means that the probability density function f(x) exists within the region {x;a,b} but takes on the value of zero anywhere else.
Find
1.
P(X 4)
2.
P(X < 1)
3.
P(2 X 3)
4.
P(X > 1)
5.
F(2)
Solutions:
1. P(X 4)
Since we're finding the probability that the random variable is less than or equal to 4, we integrate the density function from the given lower limit (1) to the limit we're
testing for (4).
We need not concern ourselves with the 0 part of the density function as all it indicates is that the function only exists within the given region and the probability of the
random variable landing anywhere outside of that region will always be zero.
2. P(X < 1)
P(X < 1) = 0 since the density function f(x) doesn't exist outside of the given boundary
3. P(2 X 3)
Since the region we're given lies within the boundary for which x is defined, we solve this problem as follows:
4. P(X > 1)
The above problem is asking us to find the probability that the random variable lies at any point between 1 and positive Infinity. We can solve it as follows:
but remember that we approximate the inverse of infinity to zero since it is too small
The above is our expected result since we already defined f(x) as lying within that region hence the random variable will always be picked from there.
5. F(2)
Since all random variables are divided into discrete and continuous random variables, we have end up having both discrete and continuous joint probability
distributions. These distributions are not so different from the one variable distributions we just looked at but understanding some concepts might require one to have
knowledge of multivariable calculus at the back of their mind.
Essentially, joint probability distributions describe situations where by both outcomes represented by random variables occur. While we only X to represent the
random variable, we now have X and Y as the pair of random variables.
where by the above represents the probability that events x and y occur at the same time.
The Cumulative Distribution Function (CDF) for a joint probability distribution is given by:
The table below represents the joint probability distribution obtained for the outcomes when a die is flipped and a coin is tossed.
f(x,y)
Row Totals
Heads
Tails
Column Totals
In the table above, x = 1, 2, 3, 4, 5, 6 as outcomes when the die is tossed while y = Heads, Tails are outcomes when the coin is flipped. The letters a through l represent
the joint probabilities of the different events formed from the combinations of x and ywhile the Greek letters represent the totals and should equal to 1. The row
sums and column sums are referred to as the marginal probability distribution functions (PDF).
We shall see in a moment how to obtain the different probabilities but first let us define the probability mass function for a joint discrete probability distribution.
The probability function, also known as the probability mass function for a joint probability distribution f(x,y) is defined such that:
Which means that the joint probability should always greater or equal to zero as dictated by the fundamental rule of probability.
x y f(x,y) = 1
Which means that the sum of all the joint probabilities should equal to one for a given sample space.
The mass probability function f(x,y) can be calculated in a number of different ways depend on the relationship between the random variables X and Y.
As we saw in the section on probability concepts, these two variables can be either independent or dependent.
In the example we gave above, flipping a coin and tossing a die are independent random variables, the outcome from one event does not in any way affect the
outcome in the other events. Assuming that the coin and die were both fair, the probabilities given by a through l can be obtained by multiplying the probabilities of
the different x and y combinations.
Since we claimed that the coin and the die are fair, the probabilities a through l should be the same.
The marginal PDF's, represented by the Greek letters should be the probabilities you expect when you obtain each of the outcomes.
For example:
f(x,y)
Heads
Tails
Column Totals
12
12
12
12
12
12
12
12
12
12
Row Totals
12
12
12
If X and Y are dependent variables, their joint probabilities are calculated using their different relationships as in the example below.
Given a bag containing 3 black balls, 2 blue balls and 3 green balls, a random sample of 4 balls is selected. Given that X is the number of black balls and Y is the number
of blue balls, find the joint probability distribution of X and Y.
Solution:
The random variables X and Y are dependent since they are picked from the same sample space such that if any one of them is picked, the probability of picking the
other is affected. So we solve this problem by using combinations.
We've been told that there are 4 possible outcomes of X i.e {0,1,2,3} where by you can pick none, one, two or three black balls; and similarly for Y there are 3 possible
outcomes {0,1,2} i.e. none, one or two blue balls.
f(x,y)
Row Totals
0
1
2
Column Totals
To fill out the table, we need to calculate the different entries. We know the total number of black balls to be 3, the total number of blue balls to be 2, the total sample
need to be 4 and the total number of balls in the bag to be 3+2+3 = 8.
We find the joint probability mass function f(x,y) using combinations as:
What the above represents are the different number of ways we can pick each of the required balls. We substitute for the different values of x (0,1,2,3) and y
(0,1,2) and solve i.e.
f(0,0) is a special case. We don't calculate this and we outright claim that the probability of obtaining zero black balls and zero blue balls is zero. This is because of the
size of the entire population relative to the sample space. We need 4 balls from a bag of 8 balls, in order not to pick black nor blue balls, we would need there to be at
least 4 green balls. But we only have 3 green balls so we know that as a rule we must have at least either one black or blue ball in the sample.
f(x,y)
Column Totals
70
70
18
70
70
70
30
70
70
70
70
18
15
70
40
70
70
70
70
70
30
Row Totals
70
15
70
Continuous joint probability distributions are characterized by the Joint Density Function, which is similar to that of a single variable case, except that this is in two
dimensions.
f(x,y) dx dy = 1
The probability distribution of the random variable Y alone, known as its marginal PDF is given by
Example:
A certain farm produces two kinds of eggs on any given day; organic and non-organic. Let these two kinds of eggs be represented by the random variables X and Y
respectively. Given that the joint probability density function of these variables is given by
Solution:
c) P(X 12, Y 12
where by X is a continuous random variable and Y is a discrete random variable, g(x) is the marginal pdf of X.
Conditional probability distributions can be discrete or continuous, but the follow the same notation i.e.
The conditional probability distribution for a discrete set of random variables can be found from:
where the above is the probability that X lies between a and b given that Y = y.
For a set of continuous random variables, the above probability is given as:
Two random variables are said to be statistically independent if their conditional probability distribution is given by the following:
where g(x) is the marginal pdf of X and h(y) is the marginal pdf of Y.
The mean of a random variable is more commonly referred to as its Expected Value, i.e. the value you expect to obtain should you carry out some experiment whose
outcomes are represented by the random variable.
Given that the random variable X is discrete and has a probability distribution f(x), the expected value of the random variable is given by:
Given that the random variable X is continuous and has a probability distribution f(x), the expected value of the random variable is given by:
Example 1:
The probability distribution of X, the number of red cars John meets on his way to work each morning, is given by the following table:
x
0
1
2
3
4
f(x)
0.41
0.37
0.16
0.05
0.05
Find the number of red cars that John expects to run into each morning on his way to work.
Solution:
This question is asking us to find the average number of red cars that John runs into on his way to work. What makes this different from an ordinary mean question is
that the odds (probability) of running into a given number of cars are not the same.
Although you wouldn't expect to run into 0.88 cars, let's pretend that the above is multiplied by 100 to get the actual number of cars that John comes across on his way
to work.
Example 2:
A certain software company uses a certain software to check for errors on any of the programs it builds and then discards the software if the errors found exceed a
certain number. Given that the number of errors found is represented by a random variable X whose density function is given by
Find the average number of errors the company expects to find in a given program.
Solution:
The random variable X is given as a continuous random variable, thus its expected value can be found as follows:
In some cases, an event is represented by a function of the random variable which we refer to as g(X). To find the expected value of this event, we find substitute the
function for the variable in the expectation formula, i.e.
Example 3:
x
-3
6
9
f(x)
1
6
1
2
1
3
Solution:
For a discrete random variable X, the expected value of an arbitrary function is given by
Example 4:
Solution:
For a continuous random variable, the expected value of an arbitrary function of the random variable g(X) is given by
Example 5:
Given a pair of discrete random variables X and Y whose joint probability distribution function is given by the table below;
f(x,y)
1
2
3
2
0.10
0.20
0.10
4
0.15
0.30
0.15
Solution:
For a pair of discrete random variables, the joint probability distribution is given by:
Example 6:
Given the random variables X and Y and the function g(X,Y) = XY, find E[G(X,Y)] if the joint density function is given by;
Solution:
The Variance of a random variable X is also denoted by ;2 but when sometimes can be written as Var(X).
Variance of a random variable can be defined as the expected value of the square of the difference between the random variable and the mean.
Given that the random variable X has a mean of , then the variance is expressed as:
In the previous section on Expected value of a random variable, we saw that the method/formula for calculating the expected value varied depending on whether the
random variable was discrete or continuous. As a consequence, we have two different methods for calculating the variance of a random variable depending on
whether the random variable is discrete or continuous.
The Standard Deviation in both cases can be found by taking the square root of the variance.
Example 1
A software engineering company tested a new product of theirs and found that the number of errors per 100 CDs of the new software had the following probability
distribution:
x
f(x)
0.01
0.25
0.4
0.3
0.04
Solution
The probability distribution given is discrete and so we can find the variance from the following:
Example 2
Find the Standard Deviation of a random variable X whose probability density function is given by f(x) where:
Solution
Since the random variable X is continuous, we use the following formula to calculate the variance:
We can attempt to simplify this formula by expanding the quadratic in the formula above as follows:
We shall see in the next section that the expected value of a linear combination behaves as follows:
Remember that after you've calculated the mean , the result is a constant and the expected value of a constant is that same constant.
but
We can also derive the above for a discrete random variable as follows:
and
Therefore,
where by;
Hence
whereby
The variance of this functiong(X) is denoted as g(X) and can be found as follows:
Covariance
In the section on probability distributions, we saw that at times we might have to deal with more than one random variable at a time, hence the need to study Joint
Probability Distributions.
Just as we can find the Expected value of a joint pair of random variables X and Y, we can also find the variance and this is what we refer to as the Covariance.
Cov(X,Y).