Escolar Documentos
Profissional Documentos
Cultura Documentos
Name
What is Data?
Collection of facts
e.g. measurements, traits, outcomes
Start by collecting data (Ch. 2)
Basic Terminology
Observational Study:
Investigators role is
A process or phenomenon is watched and data are recorded, but there is no intervention on the part of the person conducting the study
Examples:
survey studies, economic studies, many social science studies, sports statistics
A researcher keeps track of how many cars drive on a certain stretch of road over
a one-hr period to study why so many accidents occur there
Population
The entire group of objects about which one wishes to gather information in a statistical study.
e.g.
Sample
Group of objects of which one actually gathers data.
e.g.
Example: Of interest is the overall satisfaction of ISU students with the bus system. It
may be costly or impossible to survey all 33,241 students at ISU, so instead, a group of 100
students is randomly chosen to participate in the study.
Sample Size
Number of object, people, etc. in the sample.
In a perfect world we would always have access to the entire population of data, that
is almost never the case.
Census: a study using the entire population
There is always uncertainty involved in statistics. We make guesses about the entire
population based on only a sample.
The larger the sample, the better the guess (Usually due to constraints such as money,
time, etc. we cannot have as large a sample as we would like).
Enumerative Study
A study (experiment) for which there is a particular, well-defined, finite group of
objects under study.
Data are collected on some or all of these objects, and conclusions are intended to
apply only to these objects.
e.g. Gas mileage of all 2015 Ford Taurus automobiles; Strength of 200 2 x 4 boards
to be used to build a specific house.
Analytical Study
A study (experiment) in which a process or phenomenon is investigated at one point in
space and time with the hope that the data collected will be representative of system
behavior at other places and times under similar conditions.
There is rarely, if ever, a particular well-defined group of objects to which conclusions
are thought to be limited.
Most engineering studies are of this type.
e.g. Gas mileage of all Ford mid-size vehicles; Smoothness of all 2 x 4 boards cut
by the primary supplier of Lowes.
Categorical Data
Non-numerial characteristics associated with items in a sample.
Must be aggregated and counted to produce numerical values.
e.g. Eye color (blue, brown, green, etc); Engine status (working, not working &
fixable, not working & not fixable)
Cant average eye color.
Quantitative Data (numerical)
Numerical characteristics associated with items in a sample.
Typically counts of occurrences of a phenomenon of interest or measurements of some
physical property.
3
Can be further broken down into discrete (countable) and continuous (uncountable)
Discrete can be enumerated into a set {. . . ,-1, 0, 1,. . . }
Continuous must be labeled as an interval (-1, 1); [-1,1); [-1,1]
Examples:
1. # of heads in 10 flips of a coin.
4. Total number of bolts machined by employee A that did not meet tolerance specifications
Univariate Data
Arise when only a single characteristic of each sample is observed.
e.g. measure height of students in stat 305.
Multivariate Data
Arise when observations are made on more than one characteristic of each sampled
item.
e.g. measure height, weight and observe eye color of students in stat 305.
When 2 characteristics are measured we call it Bivariate Data.
e.g. measure height and weight of students in stat 305.
Paired
Bivariate data where both variables are attempting to quantify the same thing
e.g. Before and After studies: Metal specimen hardness before and after treating;
Pharmaceutical study on a new drug (pain level with/without drug)
Measurements of the same quantity made with different instruments/systems
Measure the weight of students in stat 305 using 2 different scales
If A has a levels, B has b levels, etc., then we talk about a full a x b (etc) factorial
Example: A with 2 levels, B with 3 levels; 2x3 = 6 combinations
B
+
+
+
+
C
+
+
+
+
D
+
+
+
-
The big question: Where is observed variation coming from? Within a day? Between days? Between weeks?
Measurement
Validity: Faithfully representing the aspect of interest; i.e. usefully or appropriately represents the feature of an object or system.
Precision: Small variation in repeat measurements.
Accuracy (unbiasedness): Producing the true value on average