Você está na página 1de 9

Project for Business Statistics

27th September, 2011

by

Anand Srivatsa
EGMP-22, IIM-Bangalore
# 22107

Table of Contents:
BACKGROUND & OBJECTIVE...................................................................................................................................................3
STAGES OF PROJECT...............................................................................................................................................................3
STAGE 1..........................................................................................................................................................................3
STAGE 2..........................................................................................................................................................................6
STAGE 3..........................................................................................................................................................................7
DATA SOURCE.......................................................................................................................................................................9

BACKGROUND & OBJECTIVE


This is the project submission of Anand Srivatsa, EGMP Batch 22, for Business Statistics. The data considered here
is that of Human Sex Ratio of various countries. The human sex ratio is defined as the number of males for each
female. As an example, Switzerlands over all sex ratio is 0.97, which means that for every 970 males there are 1000
females. A ratio of 1 means there are equal numbers of females and males. A total 175 of countries are considered for
this analysis. Additionally, for each of these 175 countries the Gross Domestic Product (Purchasing Power Parity) per
capita [ GDP(PPP) per capita] is also considered.
The objective of the project is to demonstrate the usage of various statistical techniques for the given data set.

STAGES OF PROJECT
This section of the document is divided into three stages of the project. The description of the three stages are as
follows:
Stage 1- Summary Statistics, Pictorial representation and summary of data set
Stage 2 Confidence Interval and Hypothesis test of mean
Stage 3 Linear regression analysis

STAGE 1
a) Summary Statistics:
Measures of Central tendency
Mean

1.0029143

Median

0.99

Mode

0.99

Measures of Dispersion
If the data is of a
Sample

Population

Variance

0.01966445

0.01955208

Range

1.35

St. Dev.

0.14022998

0.13982875

IQR

0.05

Skewness and Kurtosis


If the data is of a
Sample

Population

Skewness

6.11799094

6.06542565

(Relative) Kurtosis

44.7057136

43.4044937

Percentile and Percentile Rank Calculations


x

x-th
Percentile

Percentile
rank of y

50
80

0.99

1.0

49

1.02

1.02

90

79

1.04

1.04

89

Quartiles
1st Quartile

0.96

Median

0.99

3rd Quartile

1.01

IQR

0.05

b) Pictorial Representation:
Histogram:
Interval
<=0.9
(0.9, 1]
(1, 1.1]
(1.1, 1.2]
(1.2, 1.3]
(1.3, 1.4]
(1.4, 1.5]
>1.5

Freq.
9
120
39
1
2
0
1
3

Frequency
140
120
100
80
60
40
20
0

Total

<=0.9

175
Start

0.9

(0.9, 1]

(1, 1.1] (1.1, 1.2] (1.2, 1.3] (1.3, 1.4] (1.4, 1.5]

Interval Width

0.1

End

>1.5

1.5

The above histogram depicts the spread of the data starting from 0.9 to 1.5 with an interval width of 0.1. It is seen here that
the maximum frequency of the Human Sex Ration occurs in the interval 0.9 to 1 ( 120 out of 175).

Polygon and Ogive:


Frequency Polygon

Relative Frequency Polygon

140

0.8

120

0.7
0.6

100

0.5

80

0.4

60

0.3

40

0.2

20

0.1
0

0
<=0.9 (0.9, 1] (1, 1.1] (1.1,
1.2]

(1.2,
1.3]

(1.3,
1.4]

(1.4,
1.5]

>1.5

<=0.9(0.9, 1]( 1, 1.1] (1.1, (1.2, (1.3, (1.4, >1.5


1.2] 1.3] 1.4] 1.5]

Ogive
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
<=0.9 (0.9, 1] (1, 1.1]

(1.1,
1.2]

(1.2,
1.3]

(1.3,
1.4]

(1.4,
1.5]

>1.5

In the above set of pictorial representation, we see the Polygon and Ogive.

Data: This is snap-shot of the data. Detailed data and all the associated analysis is present in the embedded file.
Data
1
2
3
4
5
6
7
8
9
10

0.99
0.99
0.98
0.91
0.98
0.98
1
0.94
1
0.97

c) Inferences
- The major part of the data set lies between 0.9 and 1
- While the data set provides information about the absolute Human Sex Ratio of the country, it will be interesting to
note the various age slabs of the same.
- Based on the analysis of the data ( if available) on the various age slabs, the mortality rates of the women vis--vis
men can be ascertained. A pattern can be thus derived and we should be able to predict between women and men
who live longer

- Also, female infanticide can also be inferred based on the data availability of the Human Sex Ratio at birth vis--vis
at 10 years. Female mortality can be analysed with respect to the type of economy of the country. Countries with high
GDP ( PPP) would typically have better girl child ratio compared to developing countries.
- The Human Sex Ratio for middle east countries like Qatar, UAE, Kuwait are fairly high. It will be interesting to
analyse the religious / regional influence of data collection for Human Sex Ratio.

STAGE 2
Note:
1. The below mentioned section describes the Confidence Interval of Mean. The calculations are for both when
Population Standard Deviation is known as well when it is unknown
2. The Sample data is created from a random sampling done on the 175 data points
Confidence Interval of Mean

Known

0.139829

Sample Size

30

Sample Mean

0.968333

Population Stdev.

(1 )

x-bar

Confidence Interval

99%

0.968333

0.06576

0.902575

1.034092

95%

0.968333

0.05004

0.918297

1.018369

90%

0.968333

0.04199

0.926342

1.010325

80%

0.968333

0.03272

0.935616

1.00105

Unknown
Population Normal?

Yes
n

Sample Size

30

Sample Mean

0.968333

x-bar

Sample Stdev.

0.061705

(1 )

Confidence Interval

99%

0.968333

0.03105

0.937281

0.999386

95%

0.968333

0.02304

0.945292

0.991374

90%

0.968333

0.01914

0.949192

0.987475

80%

0.968333

0.01477

0.953559

0.983108

Hypothesis Testing
Here, the null hypothesis is that Sample Mean is 0.98. The below mentioned calculation describes the Hypothesis
testing with minimum confidence level of 95%. Which means that we are at-least 95% confident that Sample Mean
would be 0.98
Evidence
Sample size

30

Sample Mean

0.96833

x-bar

Sample Stdev.

0.0617

Unknown; Population Normal


Test Statistic

-1.0356

t
At an of

Null Hypothesis

p-value

H0: =

0.98

0.3090

H0:

0.98

0.1545

H0:

0.98

0.8455

5%

The Null Hypothesis is accepted.


Sample Data: This is snap-shot of the data. Detailed data and all the associated analysis is present in the embedded
file.
Sample
Data
0.95
0.89
1.01
0.9
0.95
0.91
1.02
0.86
0.96
1

STAGE 3
Regression Analysis:

Scatter Plot, Regression Line and Regression Equation


1.1

y = 7E-06x + 0.9723

0.9
0

1000

2000

3000

4000

5000

Normal Probability Plot of Residuals

Confidence Interval for Slope


(1- ) C.I. for

1
95%

2.9E-06

95%

+ or -

(1- ) C.I. for


0.96174

+ or -

0.1031

Coefficient of Determination

0.3211

Coefficient of Correlation

1.3E-06

Confidence Interval for Intercept


1

r2

s(b1)

6.6E-07

Standard Error of Slope

4.45912

p-value

0.0000

s(b0)

0.01366

Standard Error of Intercept

0.13319

Standard Error of prediction

0.02696

Prediction Interval for Y


1

95%

4000

(1- ) C.I. for Y given X


0.97352

+ or -

0.26396

Description When the GDP is 4000 ( X) we can say with a min of 95% confidence level that the Human Sex Ratio in that country would be in the
range of 0.97352 +/- 0.26396. The standard error of estimate in 0.13319

DATA SOURCE

http://en.wikipedia.org/wiki/List_of_countries_by_sex_ratio
http://en.wikipedia.org/wiki/List_of_countries_by_GDP_%28PPP%29
Embedded Data :

Data and Analysis

Você também pode gostar