Escolar Documentos
Profissional Documentos
Cultura Documentos
by
Anand Srivatsa
EGMP-22, IIM-Bangalore
# 22107
Table of Contents:
BACKGROUND & OBJECTIVE...................................................................................................................................................3
STAGES OF PROJECT...............................................................................................................................................................3
STAGE 1..........................................................................................................................................................................3
STAGE 2..........................................................................................................................................................................6
STAGE 3..........................................................................................................................................................................7
DATA SOURCE.......................................................................................................................................................................9
STAGES OF PROJECT
This section of the document is divided into three stages of the project. The description of the three stages are as
follows:
Stage 1- Summary Statistics, Pictorial representation and summary of data set
Stage 2 Confidence Interval and Hypothesis test of mean
Stage 3 Linear regression analysis
STAGE 1
a) Summary Statistics:
Measures of Central tendency
Mean
1.0029143
Median
0.99
Mode
0.99
Measures of Dispersion
If the data is of a
Sample
Population
Variance
0.01966445
0.01955208
Range
1.35
St. Dev.
0.14022998
0.13982875
IQR
0.05
Population
Skewness
6.11799094
6.06542565
(Relative) Kurtosis
44.7057136
43.4044937
x-th
Percentile
Percentile
rank of y
50
80
0.99
1.0
49
1.02
1.02
90
79
1.04
1.04
89
Quartiles
1st Quartile
0.96
Median
0.99
3rd Quartile
1.01
IQR
0.05
b) Pictorial Representation:
Histogram:
Interval
<=0.9
(0.9, 1]
(1, 1.1]
(1.1, 1.2]
(1.2, 1.3]
(1.3, 1.4]
(1.4, 1.5]
>1.5
Freq.
9
120
39
1
2
0
1
3
Frequency
140
120
100
80
60
40
20
0
Total
<=0.9
175
Start
0.9
(0.9, 1]
(1, 1.1] (1.1, 1.2] (1.2, 1.3] (1.3, 1.4] (1.4, 1.5]
Interval Width
0.1
End
>1.5
1.5
The above histogram depicts the spread of the data starting from 0.9 to 1.5 with an interval width of 0.1. It is seen here that
the maximum frequency of the Human Sex Ration occurs in the interval 0.9 to 1 ( 120 out of 175).
140
0.8
120
0.7
0.6
100
0.5
80
0.4
60
0.3
40
0.2
20
0.1
0
0
<=0.9 (0.9, 1] (1, 1.1] (1.1,
1.2]
(1.2,
1.3]
(1.3,
1.4]
(1.4,
1.5]
>1.5
Ogive
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
<=0.9 (0.9, 1] (1, 1.1]
(1.1,
1.2]
(1.2,
1.3]
(1.3,
1.4]
(1.4,
1.5]
>1.5
In the above set of pictorial representation, we see the Polygon and Ogive.
Data: This is snap-shot of the data. Detailed data and all the associated analysis is present in the embedded file.
Data
1
2
3
4
5
6
7
8
9
10
0.99
0.99
0.98
0.91
0.98
0.98
1
0.94
1
0.97
c) Inferences
- The major part of the data set lies between 0.9 and 1
- While the data set provides information about the absolute Human Sex Ratio of the country, it will be interesting to
note the various age slabs of the same.
- Based on the analysis of the data ( if available) on the various age slabs, the mortality rates of the women vis--vis
men can be ascertained. A pattern can be thus derived and we should be able to predict between women and men
who live longer
- Also, female infanticide can also be inferred based on the data availability of the Human Sex Ratio at birth vis--vis
at 10 years. Female mortality can be analysed with respect to the type of economy of the country. Countries with high
GDP ( PPP) would typically have better girl child ratio compared to developing countries.
- The Human Sex Ratio for middle east countries like Qatar, UAE, Kuwait are fairly high. It will be interesting to
analyse the religious / regional influence of data collection for Human Sex Ratio.
STAGE 2
Note:
1. The below mentioned section describes the Confidence Interval of Mean. The calculations are for both when
Population Standard Deviation is known as well when it is unknown
2. The Sample data is created from a random sampling done on the 175 data points
Confidence Interval of Mean
Known
0.139829
Sample Size
30
Sample Mean
0.968333
Population Stdev.
(1 )
x-bar
Confidence Interval
99%
0.968333
0.06576
0.902575
1.034092
95%
0.968333
0.05004
0.918297
1.018369
90%
0.968333
0.04199
0.926342
1.010325
80%
0.968333
0.03272
0.935616
1.00105
Unknown
Population Normal?
Yes
n
Sample Size
30
Sample Mean
0.968333
x-bar
Sample Stdev.
0.061705
(1 )
Confidence Interval
99%
0.968333
0.03105
0.937281
0.999386
95%
0.968333
0.02304
0.945292
0.991374
90%
0.968333
0.01914
0.949192
0.987475
80%
0.968333
0.01477
0.953559
0.983108
Hypothesis Testing
Here, the null hypothesis is that Sample Mean is 0.98. The below mentioned calculation describes the Hypothesis
testing with minimum confidence level of 95%. Which means that we are at-least 95% confident that Sample Mean
would be 0.98
Evidence
Sample size
30
Sample Mean
0.96833
x-bar
Sample Stdev.
0.0617
-1.0356
t
At an of
Null Hypothesis
p-value
H0: =
0.98
0.3090
H0:
0.98
0.1545
H0:
0.98
0.8455
5%
STAGE 3
Regression Analysis:
y = 7E-06x + 0.9723
0.9
0
1000
2000
3000
4000
5000
1
95%
2.9E-06
95%
+ or -
+ or -
0.1031
Coefficient of Determination
0.3211
Coefficient of Correlation
1.3E-06
r2
s(b1)
6.6E-07
4.45912
p-value
0.0000
s(b0)
0.01366
0.13319
0.02696
95%
4000
+ or -
0.26396
Description When the GDP is 4000 ( X) we can say with a min of 95% confidence level that the Human Sex Ratio in that country would be in the
range of 0.97352 +/- 0.26396. The standard error of estimate in 0.13319
DATA SOURCE
http://en.wikipedia.org/wiki/List_of_countries_by_sex_ratio
http://en.wikipedia.org/wiki/List_of_countries_by_GDP_%28PPP%29
Embedded Data :