Você está na página 1de 25

Do

LASA Students Tend to


Overestimate Their Height? Does this
Change When Looked at by Gender or
Grade Level?

Neel Rao
Dual Credit Statistics
Period 0B
5/18/16
2015-2016 School Year

Discrepancies between self-reported height and actual height, especially


among men, have been reported in many studies, and can lead to flawed
medical predictions. There is not adequate data to say that this holds
true among LASA students, however. The difference between the mean
stated height and mean actual height was not significant.

Table of Contents
Introduction and hypothesis-1
Population and methods-1
Holistic
Dependent t-test-2
Standard deviation-2
Residuals and R^2-3
By Gender
Independent t-test-4
By Grade Level
ANOVA test-4
Conclusion, sources of error, and possible causes-5
Works Cited-6
Appendices-8

While the question of whether people overestimate their height seems like merely
a curiosity, the answer to this question in fact has significance in the medical field. It is
the subject of numerous studies and books, and has a far-reaching impact on biological
research. Numerous researchers have found that height data that is self-reported is
oftentimes inaccurate (Niedhammer). The discrepancies are more pronounced in men
rather than women (Grodner), white women rather than women of other races
(Needhima), and in the elderly (Weinstein). In addition, education level in men has a
positive correlation with overstatement of height, whereas education level in women has
a negative correlation with overstatement of height (Murray). This inaccuracy in stated
height becomes problematic when trying to correlate other variables with height using
reported heights (Hitti). I wanted to see whether these findings would also manifest in the
LASA student population. Going in, I believed that an overestimation of height would be
visible among LASA students. It seems like human nature to want to be taller, and it
would appear to the casual observer to be harmless white lie to say one is taller than one
really is.
My population for this analysis is of course the student body of LASA high
school, 1011 students among the ages of 14 and 18. The sampling frame was all of the
students who came to school on either May 10 or May 11, and whose names are listed in
the student roster. My sample itself contained 16 females, of which 3 are freshmen, 5 are
sophomores, 4 are juniors, and 4 are seniors, and 14 males, of which 5 are freshmen, 2
are sophomores, 5 are juniors, and 2 are seniors. The sample was chosen using a simple
random sample. I acquired a numbered roster of all of the students at LASA, and

randomly generated numbers, using the students who correlated with the generated
numbers. This sample was dependent on the accuracy and recency of the roster. If there
are students who came to the school after the roster was created, they would not be in my
sampling frame and would mean that my sample may not be representative of all of the
LASA student body. Once I had my subjects, I tracked them down and asked them how
tall they were. I made sure that they did not know I was going to measure them
afterwards, and I made sure that each subject was not visible to any other subject while I
was collecting data, to ensure independence. After hearing their answer, I used a tape
measure to measure them.
The first test I ran on the data was a one-tailed dependent t-test on the whole
sample with 29 degrees of freedom, to find out whether the LASA student body
overstated their height. This analysis tests whether the average difference between stated
and actual height is significantly greater than 0. The assumptions for this test are
normality, pairedness and randomness, and they are all met. Normality is met because the
sample size is 30, pairedness is met because each data point corresponds to another data
point in the other group (the point from the same person), and randomness is met because
the data was collected randomly as discussed previously. The average difference was
.133 in., giving a p-value of .442, well above the alpha value of .05. I therefore do not
have adequate data to say that LASA students as a whole overstate their height.
To get a rough idea of whether LASA students stated heights were different from
their actual heights, I can look at the standard deviation of the differences. Unlike the
previous test, this statistic looks at their absolute value differences. For example, a
student who is two inches shorter than their stated height will now not cancel out one

who is two inches taller than the stated height. This looks at how accurate students were
in stating their height, accounting for both overestimates and underestimates, rather than
just being one-tailed. Here I am looking at one way of seeing how far away each point is
from the mean, which is an approximation of seeing how far each difference is from zero
because the mean itself is close to zero. The standard deviation is .937. This is 31.2% of
the range. This means that the data is fairly spread out in relation to the total spread, so
the actual heights are fairly different from the stated heights, but it is all only relative
because the range itself is three and I collected my data in whole numbers, so I cannot say
anything conclusively. A confidence interval for the standard deviation of the population
is (0.74623, 1.25962), or (24.9%, 42.0%) of the range. This would only be significant if it
included zero and of course it cannot include zero because the only way for the standard
deviation to be zero is if all of the differences were exactly the same and I know they are
not. However, I can get a feel for how large the standard deviation is. It is also very
important to note that this statistic is showing the difference between each data point and
the mean, not the differences between the data points and zero, but since the mean is
close to zero, I am merely using it to get a general idea. To test the significance of the
size of the variance (also a spread statistic, namely the square of the standard deviation) I
would usually run a Chi-Square Test on Variance, with our hypothesized value being
zero, but I already know that the variance cannot be zero, because the data is not uniform.
Another way of thinking about how far away the students stated heights are from
their actual heights is by imposing the line y=x on a graph of actual heights vs. stated
heights and looking at the residuals from that line. (I can also do this color coded by
gender or grade level to see how that affects the data). These can be found in the

appendices. The residuals are quite small, only two are farther away from zero than 1.
The R^2 value of this line is a measure of how large the residuals are, with the closer it is
to 1 meaning the smaller the residuals. The R^2 value calculated from the residuals is
.991, meaning that stated and actual heights are incredibly close to each other. The
residual plot shows no patterns, meaning that the difference between stated and actual
height doesnt change for taller people rather than shorter people.
After analyzing the population as a whole, I decided to see whether these stated
vs. actual height differences changed based on gender. To do this I ran an independent
two-sample t-test to see whether the mean differences were significantly different from
each other. They were not. The assumptions for this test are normality, randomness, and
independence both within and between groups. Randomness is met because it was a
random sample; independence within groups was met because of the methods I described
in the second paragraph, and independence between groups is satisfied because no one in
my study is both male and female. Normality would be satisfied if I had at least 15
subjects in each group and the histograms were approximately normal, but one group has
only 14 subjects, so I must proceed with caution. With 27.92 degrees of freedom, the pvalue is .96, meaning that if the means actually are equal, I would have a 96% chance of
observing a difference this large, which is much higher than our 5% threshold. Therefore
I cannot say that the differences in stated versus actual height differed between genders.
Thirdly, to see whether the mean differences between stated and actual height
varied between grade levels, I ran an ANOVA test. The data did not support the
conclusion that the average differences varied by grade level. The assumptions for such a
test are normality, similar variance, independence, and randomness. Similar variance,

independence and randomness are met, by looking at the histograms, the sampling
methods, and the spread. However, only junior height differences are normal, so I must
proceed with caution. After running the ANOVA test with 29 total degrees of freedom I
got a p-value of .632, once again well over the threshold of .05.
After running these analyses I cannot say that LASA students overstate their
height. In this they differ from the world as a whole. Both genders at LASA exhibit
similar (minimal) differences between actual and stated heights, as do all four grade
levels. I can only speculate as to why this is the case. Two factors that make LASA
different from the worlds population is that LASA is made up of teenagers and that
LASA is preselected to contain only the very brightest students, and these factors may
play a role in explaining the discrepancy. Perhaps teenagers are less confident they can
get away with lying about their height than adults are, or perhaps they go to the doctor
more often and therefore get measured more frequently. Perhaps LASA kids are
stereotypically nerds and therefore less likely to care about other peoples judgments of
them (although this contradicts the earlier findings about the impact of education level on
inaccuracies in height reporting). The lack of results could also be due to error in the
survey. One possible source is surveyor error; my tape measure could have been slanted
when I measured some people. In addition, the use of inches rather than centimeters or a
smaller unit means that there is more room for error and the measurements are less
precise. If someone is off by a quarter inch, it will be lost in the rounding of the data.
From the results I gathered, though, it turns out that you can trust a LASA kid when they
tell you their height.

Works Cited
Grodner, Michele, Sara Long Roth, and Bonnie C. Walkingshaw. Nutritional
Foundations and Clinical Applications: A Nursing Approach. St. Louis,
MO: Mosby/Elsevier, 2012. Print.

Hitti, Miranda. "Do Men Tell Tall Tales About Their Height?" WebMD. WebMD,
23 Aug. 2005. Web. 19 May 2016.
<http://www.webmd.com/men/news/20050823/do-men-tell-tall-talesabout-their-height>.

Murray, Christopher J. L. Summary Measures of Population Health: Concepts,


Ethics, Measurement, and Applications. Geneva: World Health
Organization, 2002. Print.

Needhima, Nancy. "Majority Overstate Their Height and Weight in


Surveys."MedIndia. MedIndia, 30 Jan. 2012. Web. 21 May 2016.

Niedhammer, Bugel, Bonenfant, Goldberg, and Leclerc. "Validity of Selfreported Weight and Height in the French GAZEL Cohort." International
Journal of Obesity. International Journal of Obesity, Sept. 2000. Web. 19
May 2016.
<http://www.nature.com/ijo/journal/v24/n9/full/0801375a.html>.

Weinstein, Maxine, James W. Vaupel, and Kenneth W. Wachter. Biosocial


Surveys. Washington, D.C.: National Academies, 2008. Print.

Appendices
http://neelsstatsproject.weebly.com

Blue=male
Pink=female

green=freshman
blue=sophomores
red=juniors
black=seniors

Inquiry Pitch:
Do LASA Students Tend to
Overestimate Their Height? Does this
Change When Looked at by Gender or
Grade Level?
Neel Rao
Dual Credit Statistics
0B
5/23/16
2015-2016 School Year































While the question of whether people overestimate their height seems like merely
a curiosity, the answer to this question in fact has significance in the medical field. It is
the subject of numerous studies and books, and has a far-reaching impact on biological
research. Numerous researchers have found that height data that is self-reported is
oftentimes inaccurate (Niedhammer). The discrepancies are more pronounced in men
rather than women (Grodner), white women rather than women of other races
(Needhima), and in the elderly (Weinstein). In addition, education level in men has a
positive correlation with overstatement of height, whereas education level in women has
a negative correlation with overstatement of height (Murray). This inaccuracy in stated
height becomes problematic when trying to correlate other variables with height using
reported heights (Hitti). I wanted to see whether these findings would also manifest in the
LASA student population. Going in, I believed that an overestimation of height would be
visible among LASA students. It seems like human nature to want to be taller, and it
would appear to the casual observer to be harmless white lie to say one is taller than one
really is.
The variables in question are stated height in inches and actual height in
inches. I expect stated height to be about an inch greater than actual height, which I
expect to be aroud 67 inches.
My population for this analysis is of course the student body of LASA high
school, 1011 students among the ages of 14 and 18. The sampling frame was all of the
students who came to school on either May 10 or May 11, and whose names are listed in
the student roster. The sample will be chosen using a simple random sample. I acquired a
numbered roster of all of the students at LASA, and randomly generated numbers, using

the students who correlated with the generated numbers. Once I have my subjects, I will
track them down and ask them how tall they were. I will make sure that they do not know
I am going to measure them afterwards, and I will make sure that each subject is not
visible to any other subject while I am collecting data, to ensure independence. After
hearing their answer, I will use a tape measure to measure them.
On this data I will run a dependent t-test to see how the means of stated and actual
height differ, an independent t-test to see how the means of the differences between
stated and actual height differ between gender, and an ANOVA test to see how the means
of the differences between stated and actual height differ between each grade level.
I want to see whether LASA students can accurately gauge their height, anfd
whther this ability changes based on gender or grade level. The sample will be dependent
on the accuracy and recency of the roster. If there are students who came to the school
after the roster was created, they will not be in my sampling frame and it would mean that
my sample may not be representative of all of the LASA student body. In addition, I may
not be able to track down everyone on my list to ask them their height and meausure
them.

Você também pode gostar