Você está na página 1de 47

# Centre for Computer Technology

Week2

## Statistics and Frequency Distribution

Objectives
Review Week1 Measures of Central Tendency Measures of Dispersion Sample Statistics Frequency Distribution Mean and Variance from Frequency Table

## March 20, 2012

Set : Introduction
A set is a well-defined list, collection or class of objects.

The objects could be anything : numbers, names, people, cities. These objects are called the elements or members of the set.
Example 1: The numbers 1,3,5,7,9,11,13, Example 2: The solutions of the equation x2 4x+3=0 Example 3 : The rivers in Australia

## March 20, 2012

Set Notation
Sets are usually denoted by capital letters A, B, P, X, .. The elements are usually represented by lowercase letters a, b, p, x, .. There are two forms for presentation of a set : Tabular form , A = {1,3,5,7,9,11,} Set builder form, A = {x | x is odd}
March 20, 2012

Subsets
If every element in a set A is also a member of a set B, then A is called a subset of B In other words, if x A x B for all x, then A is a subset of B It is written as AB or BA A is called a proper subset of B, if A B and A is not equal to B.
March 20, 2012

## Venn Diagram to represent sets

U is the universal set. A and B are disjoint sets R is a subset of S

U
A B

U
S R

## March 20, 2012

Set Operations
Let A and B represent two sets. We have the definitions in a compact manner 1. A U B ={ x | x A or x B or x both} 2. A B ={ x | x A and x B } 3. A B ={ x | x A and x B } 4. A/ ={ x | x A } 5. A B={ x | x A or x B but x both} 6. #A = Number of elements in set A
March 20, 2012

## Statistics and Frequency Distribution

Introduction
Statistics is the medium to describe the center spread and shape of a data set. Two components

## Gathering of information or scientific data Inferential statistics/Statistical methods

Statistical Methods are employed to make judgements in the face of uncertainty and variation.

## Measures of Central Tendency

Measures of Central Tendency are single values that act as a representative of data Three main measures

## March 20, 2012

Mean
For a given set of n numbers x1,x2,x3,.....xn. The mean denoted by x1+x2+x3+.....+xn = -----------------------n
March 20, 2012

Example : Consider the following set of numbers S = {1, 2, 3, 4, 5, 6, 7, 8, 9} The mean of the set S is 1+2+3+4+5+6+7+8+9 = ------------------------------- = 5 9
March 20, 2012

Median
For a given set of n numbers x1,x2,x3,.....xn Median is a value where half the values are of x1,x2,x3,.....xn are larger than the median and the other half are smaller than the median. In other words, Median is the middlemost number
March 20, 2012

Median
Example : Consider the following set of numbers S = {1, 6, 3, 8, 2, 4, 9} To find the median, we need to order the list S = {1, 2, 3, 4, 6, 8, 9} The middlemost number is 4 which is the median of the set.
March 20, 2012

What happens when we have to find the median of a set with an even number of elements For example: Find the median of S = {1, 6, 3, 8, 2, 12, 4, 9}

## Some More Concepts

For a set of n ordered data points If n is odd, the median is found in the location (n+1)/2 of the set If n is even, the median is the average of the two middle terms. The two terms are found in the location n/2, n/2+1

## March 20, 2012

Mode
Mode of a data set is the value that occurs most often If there are two, three or multiple values the data is bimodal, trimodal or multimodal Example: R = {2, 8, 1, 9, 5, 2, 7, 2, 7, 9, 4, 7, 1, 5, 2} The number that appears most is 2, which is the mode of R.

## March 20, 2012

Measures of Dispersion
Consider two sets S={5, 5, 5, 5, 5, 5} R={0, 0, 0, 10, 10, 10} for both the above sets, mean = 5 But the above sets are two different data sets. Is it a good practice to use mean, median or mode to describe them?

## March 20, 2012

Measures of Dispersion
We

use another descriptive statistic to evaluate the data called Measure of Dispersion. It is a measure of scatter or dispersion. It is a measure of scatter about the mean.
March 20, 2012

Measures of Dispersion
What happens to the values of dispersion

If they are concentrated near the mean ? If they are distributed far from the mean?

## March 20, 2012

Measures of Dispersion
If the values are concentrated near the mean of the data set, the measure is small. If they are distributed far from the mean of the data set, the measure will be large.

## Variance and Standard Deviation

For a given set of n numbers x1,x2,x3,.....xn, the Variance, denoted by 2 is given by

## (x1- )2 + (x2- )2 + .....+ (xn- )2 2 = ------------------------------------------n

March 20, 2012

Variance (method 2)

## Variance (method 2) = Mean of squares minus Square of Mean

= ( x2 / n) - ( x / n)2
=

x = x1 + x2 + x3........+ xn
March 20, 2012

## Variance and Standard Deviation

The Variance is a non negative number The positive square root of the variance is standard deviation. The simplest spread of variability is Sample Range. Xmax - Xmin

## Variance and Standard Deviation

Example: Find the variance and standard deviation for the following set of test scores: T = {75, 80, 82, 87, 96} The mean of the set T is 75+80+82+87+96 = ------------------------------- = 84 5
March 20, 2012

## Using the mean we get the variance as

(75-84)2 + (80-84)2 + (82-84)2 + (87-84)2 + (96-84)2

2 = ---------------------------------------------------5

= 50.8

## Standard Deviation = 2 = 7.1274

March 20, 2012

Sample Space
Set of all possible outcomes of a statistical experiment is called a sample space or sample Each outcome is called an element or a member or sample point A group of samples is called population

## March 20, 2012

Sample Statistics

Any quantity obtained from a sample for the purpose of estimating a population parameter is called a sample statistic

A sample along with inferential statistics allow us to draw conclusions about population, with inferential statistics making clear use of elements of Probability.

## March 20, 2012

Sample Mean
For a given sample of n numbers x1,x2,x3,.....xn. The sample mean denoted by X x1+x2+x3+.....+xn X = -----------------------n
March 20, 2012

Weighted Mean
For a given set of data, X = { x1, x2, ..., xn} and corresponding non-negative weights, W = { w1, w2, ..., wn} the weighted mean/average, is given by w1x1+w2x2+w3x3+.....+wnxn X = --------------------------------------w1+w2+w3++wn
March 20, 2012

Sample Variance
For a given sample of n numbers x1,x2,x3,.....xn, the Variance, denoted by S2 is given by

## (x1- X)2 + (x2- X)2 + .....+ (xn- X)2 S2 = ------------------------------------------(n-1)

March 20, 2012

Frequency Distributions
For large samples (or populations) it is difficult to observe various characteristics or to compute statistics Therefore it is useful to organize or group the raw data The data is arranged in intervals of equal width.

## March 20, 2012

Frequency Distributions
The intervals are called classes or categories. The number of individuals or elements in each class is determined, called class frequency. The resulting arrangement is called frequency distribution or frequency table.

## March 20, 2012

Frequency Distribution
Example : Height of students in XYZ university (frequency table)
Height (cm) 155-159 160-164 165-169 170-174 175-179 Total Number of Students 5 18 42 27 8 100

## March 20, 2012

Frequency Distribution
In the previous example The first category 155-159 is called class interval The corresponding class frequency is 5. The mid point of the class interval is called the class mark.

## March 20, 2012

Frequency Histogram
Height (cm) 155-159 160-164 165-169 170-174 175-179 Total
March 20, 2012

## Number of Students 5 18 42 27 8 100

45

40

35

30

25 Height (cm) 20

15

10

## 0 155-159 160-164 165-169 170-174 175-179

Frequency Polygon
Height (cm) 155-159 160-164 165-169 170-174 175-179 Total
March 20, 2012

## 45 40 35 30 25 Height (cm) 20 15 10 5 0 157 161 167 172 177

Frequency Graphs
In

a histogram, the sum of the rectangular areas is 100. A frequency polygon is a graph connecting the midpoints of the tops of the histogram. In a bar graph, the sum of the ordinates is 1.
March 20, 2012

Relative Frequency

In relative frequency, the class frequency is replaced by percentage rather than the number. In the histogram the vertical axis will be replaced with relative frequency instead of frequency.

## March 20, 2012

In the previous example, what happens if we have a student with a height of 159.7 cm.

## Height (cm) 155-159 160-164 165-169 170-174 175-179 Total

Number of Students

## Continuous Frequency Distribution

The class intervals are chosen such that they are continuous as shown

## Height (cm) 154.5-159.4 159.5-164.4 164.5-169.4 169.5-174.4 174.5-179.4 Total

Number of Students

## Mean and Variance from Frequency Table

Interval mid point (x) frequency (f) f.X f.X2

a0- a1
a1- a2 an-1 an All

x1
x2 xn

f1
f2 fn Total f

f1.x1
f2.x2 fn.xn Total f.x

f1.x1.x1
f2.x2.x2 fn.xn.xn Total f.x.x

## Example : Mean and Variance from Frequency Table

Class interval Frequency, f

1.5 1.9 2.0 2.4 2.5 2.9 3.0 3.4 3.5 3.9 4.0 4.4 4.5 4.9
March 20, 2012

2 1 4 15 10 5 3

Class interval Class midpoint, x 1.5 1.9 1.7 2.0 2.4 2.2 2.5 2.9 2.7 3.0 3.4 3.2 3.5 3.9 3.7 4.0 4.4 4.2 4.5 4.9 4.7

Frequency, f 2 1 4 15 10 5 3 40

136.5 484.75

## March 20, 2012

Summary
There are three main measures of central tendency : Mean, Mode and Median. There are two main measures of dispersion : Variance and Standard Deviation. The organization or grouping of raw data in a table is called Frequency distribution.