Você está na página 1de 47

# Centre for Computer Technology

Week2

## Statistics and Frequency Distribution

Objectives
Review Week1 Measures of Central Tendency Measures of Dispersion Sample Statistics Frequency Distribution Mean and Variance from Frequency Table

## Copyright Box Hill Institute

Set : Introduction
A set is a well-defined list, collection or class of objects.

The objects could be anything : numbers, names, people, cities. These objects are called the elements or members of the set.
Example 1: The numbers 1,3,5,7,9,11,13, Example 2: The solutions of the equation x2 4x+3=0 Example 3 : The rivers in Australia

## Copyright Box Hill Institute

Set Notation
Sets are usually denoted by capital letters A, B, P, X, .. The elements are usually represented by lowercase letters a, b, p, x, .. There are two forms for presentation of a set : Tabular form , A = {1,3,5,7,9,11,} Set builder form, A = {x | x is odd}
March 20, 2012

## Copyright Box Hill Institute

Subsets
If every element in a set A is also a member of a set B, then A is called a subset of B In other words, if x A x B for all x, then A is a subset of B It is written as AB or BA A is called a proper subset of B, if A B and A is not equal to B.
March 20, 2012

## Venn Diagram to represent sets

U is the universal set. A and B are disjoint sets R is a subset of S

U
A B

U
S R

## Copyright Box Hill Institute

Set Operations
Let A and B represent two sets. We have the definitions in a compact manner 1. A U B ={ x | x A or x B or x both} 2. A B ={ x | x A and x B } 3. A B ={ x | x A and x B } 4. A/ ={ x | x A } 5. A B={ x | x A or x B but x both} 6. #A = Number of elements in set A
March 20, 2012

## Statistics and Frequency Distribution

Introduction
Statistics is the medium to describe the center spread and shape of a data set. Two components

## Gathering of information or scientific data Inferential statistics/Statistical methods

Statistical Methods are employed to make judgements in the face of uncertainty and variation.
Copyright Box Hill Institute

## Measures of Central Tendency

Measures of Central Tendency are single values that act as a representative of data Three main measures

## Copyright Box Hill Institute

Mean
For a given set of n numbers x1,x2,x3,.....xn. The mean denoted by x1+x2+x3+.....+xn = -----------------------n
March 20, 2012

## Copyright Box Hill Institute

Example : Consider the following set of numbers S = {1, 2, 3, 4, 5, 6, 7, 8, 9} The mean of the set S is 1+2+3+4+5+6+7+8+9 = ------------------------------- = 5 9
March 20, 2012

## Copyright Box Hill Institute

Median
For a given set of n numbers x1,x2,x3,.....xn Median is a value where half the values are of x1,x2,x3,.....xn are larger than the median and the other half are smaller than the median. In other words, Median is the middlemost number
March 20, 2012

## Copyright Box Hill Institute

Median
Example : Consider the following set of numbers S = {1, 6, 3, 8, 2, 4, 9} To find the median, we need to order the list S = {1, 2, 3, 4, 6, 8, 9} The middlemost number is 4 which is the median of the set.
March 20, 2012

## Copyright Box Hill Institute

What happens when we have to find the median of a set with an even number of elements For example: Find the median of S = {1, 6, 3, 8, 2, 12, 4, 9}

## Some More Concepts

For a set of n ordered data points If n is odd, the median is found in the location (n+1)/2 of the set If n is even, the median is the average of the two middle terms. The two terms are found in the location n/2, n/2+1

## Copyright Box Hill Institute

Mode
Mode of a data set is the value that occurs most often If there are two, three or multiple values the data is bimodal, trimodal or multimodal Example: R = {2, 8, 1, 9, 5, 2, 7, 2, 7, 9, 4, 7, 1, 5, 2} The number that appears most is 2, which is the mode of R.

## Copyright Box Hill Institute

Measures of Dispersion
Consider two sets S={5, 5, 5, 5, 5, 5} R={0, 0, 0, 10, 10, 10} for both the above sets, mean = 5 But the above sets are two different data sets. Is it a good practice to use mean, median or mode to describe them?

## Copyright Box Hill Institute

Measures of Dispersion
We

use another descriptive statistic to evaluate the data called Measure of Dispersion. It is a measure of scatter or dispersion. It is a measure of scatter about the mean.
March 20, 2012

## Copyright Box Hill Institute

Measures of Dispersion
What happens to the values of dispersion

If they are concentrated near the mean ? If they are distributed far from the mean?

## Copyright Box Hill Institute

Measures of Dispersion
If the values are concentrated near the mean of the data set, the measure is small. If they are distributed far from the mean of the data set, the measure will be large.

## There are two main measures of dispersion Variance Standard Deviation

Copyright Box Hill Institute

## Variance and Standard Deviation

For a given set of n numbers x1,x2,x3,.....xn, the Variance, denoted by 2 is given by

March 20, 2012

## Copyright Box Hill Institute

Variance (method 2)

## Variance (method 2) = Mean of squares minus Square of Mean

= ( x2 / n) - ( x / n)2
=

x = x1 + x2 + x3........+ xn
March 20, 2012

## Variance and Standard Deviation

The Variance is a non negative number The positive square root of the variance is standard deviation. The simplest spread of variability is Sample Range. Xmax - Xmin
Copyright Box Hill Institute

## Variance and Standard Deviation

Example: Find the variance and standard deviation for the following set of test scores: T = {75, 80, 82, 87, 96} The mean of the set T is 75+80+82+87+96 = ------------------------------- = 84 5
March 20, 2012

## Using the mean we get the variance as

(75-84)2 + (80-84)2 + (82-84)2 + (87-84)2 + (96-84)2

2 = ---------------------------------------------------5

= 50.8

March 20, 2012

## Copyright Box Hill Institute

Sample Space
Set of all possible outcomes of a statistical experiment is called a sample space or sample Each outcome is called an element or a member or sample point A group of samples is called population

## Copyright Box Hill Institute

Sample Statistics

Any quantity obtained from a sample for the purpose of estimating a population parameter is called a sample statistic

A sample along with inferential statistics allow us to draw conclusions about population, with inferential statistics making clear use of elements of Probability.
Copyright Box Hill Institute

## March 20, 2012

Sample Mean
For a given sample of n numbers x1,x2,x3,.....xn. The sample mean denoted by X x1+x2+x3+.....+xn X = -----------------------n
March 20, 2012

## Copyright Box Hill Institute

Weighted Mean
For a given set of data, X = { x1, x2, ..., xn} and corresponding non-negative weights, W = { w1, w2, ..., wn} the weighted mean/average, is given by w1x1+w2x2+w3x3+.....+wnxn X = --------------------------------------w1+w2+w3++wn
March 20, 2012

## Copyright Box Hill Institute

Sample Variance
For a given sample of n numbers x1,x2,x3,.....xn, the Variance, denoted by S2 is given by

March 20, 2012

## Copyright Box Hill Institute

Frequency Distributions
For large samples (or populations) it is difficult to observe various characteristics or to compute statistics Therefore it is useful to organize or group the raw data The data is arranged in intervals of equal width.

## Copyright Box Hill Institute

Frequency Distributions
The intervals are called classes or categories. The number of individuals or elements in each class is determined, called class frequency. The resulting arrangement is called frequency distribution or frequency table.

## Copyright Box Hill Institute

Frequency Distribution
Example : Height of students in XYZ university (frequency table)
Height (cm) 155-159 160-164 165-169 170-174 175-179 Total Number of Students 5 18 42 27 8 100

## Copyright Box Hill Institute

Frequency Distribution
In the previous example The first category 155-159 is called class interval The corresponding class frequency is 5. The mid point of the class interval is called the class mark.

## Copyright Box Hill Institute

Frequency Histogram
Height (cm) 155-159 160-164 165-169 170-174 175-179 Total
March 20, 2012

## Number of Students 5 18 42 27 8 100

45

40

35

30

25 Height (cm) 20

15

10

## Copyright Box Hill Institute

Frequency Polygon
Height (cm) 155-159 160-164 165-169 170-174 175-179 Total
March 20, 2012

## Copyright Box Hill Institute

Frequency Graphs
In

a histogram, the sum of the rectangular areas is 100. A frequency polygon is a graph connecting the midpoints of the tops of the histogram. In a bar graph, the sum of the ordinates is 1.
March 20, 2012

## Copyright Box Hill Institute

Relative Frequency

In relative frequency, the class frequency is replaced by percentage rather than the number. In the histogram the vertical axis will be replaced with relative frequency instead of frequency.

## Copyright Box Hill Institute

In the previous example, what happens if we have a student with a height of 159.7 cm.

## Height (cm) 155-159 160-164 165-169 170-174 175-179 Total

Number of Students

## Continuous Frequency Distribution

The class intervals are chosen such that they are continuous as shown

## Height (cm) 154.5-159.4 159.5-164.4 164.5-169.4 169.5-174.4 174.5-179.4 Total

Number of Students

## Mean and Variance from Frequency Table

Interval mid point (x) frequency (f) f.X f.X2

a0- a1
a1- a2 an-1 an All

x1
x2 xn

f1
f2 fn Total f

f1.x1
f2.x2 fn.xn Total f.x

f1.x1.x1
f2.x2.x2 fn.xn.xn Total f.x.x

## Mean = total (f.x) / total f Variance = [total (f.x.x)/total f] (mean)2

Copyright Box Hill Institute

## Example : Mean and Variance from Frequency Table

Class interval Frequency, f

1.5 1.9 2.0 2.4 2.5 2.9 3.0 3.4 3.5 3.9 4.0 4.4 4.5 4.9
March 20, 2012

2 1 4 15 10 5 3

## Copyright Box Hill Institute

Class interval Class midpoint, x 1.5 1.9 1.7 2.0 2.4 2.2 2.5 2.9 2.7 3.0 3.4 3.2 3.5 3.9 3.7 4.0 4.4 4.2 4.5 4.9 4.7

Frequency, f 2 1 4 15 10 5 3 40

136.5 484.75

## Variance = [total (f.x.x)/total f] (mean)2 = 484.75 / 40 (3.4125)2 = 12.1188 11.6452 = 0.4736

Copyright Box Hill Institute

## March 20, 2012

Summary
There are three main measures of central tendency : Mean, Mode and Median. There are two main measures of dispersion : Variance and Standard Deviation. The organization or grouping of raw data in a table is called Frequency distribution.

## Copyright Box Hill Institute

References

M R Spiegel : Theory and Problems of Statistics, Schaum's Outline Series, McGraw Hill. http://mathworld.wolfram.com