Escolar Documentos
Profissional Documentos
Cultura Documentos
Shawn Mankad
4316 VMH
smankad@rhsmith.umd.edu
1 / 48
Welcome!
Computers are incredibly fast, accurate, and stupid. Human beings are
incredibly slow, inaccurate, and brilliant. Together they are powerful
beyond imagination.
-Albert Einstein
This course is the foundation for all data analytics and hence, is the most
important Statistics course you will take at UMD!
2 / 48
Faculty: Shawn Mankad
My background
Ph.D. in Statistics
2nd year at UMD and 2nd time teaching this exact course
3 / 48
Syllabus
Everyone have one?
4 / 48
Faculty and Sta
Lectures
Let y
i
represent the sales of store i .
Main idea in all of statistics and data mining: With enough data, y is
very close to E(Y), i.e., y E(Y) with lots of data.
Example:
E(Y|X) =
Y|X
= b
0
+ b
1
X.
18 / 48
Concept Review
Conditional Expectations E(Y|X)
Example:
If Y = 3 + 5X, then what is E(Y|X = 5)?
19 / 48
Concept Review
Conditional Expectations E(Y|X)
Example:
E(Y|X) is the true average sales over all Walmart locations with
advertising expenses equal to X.
E(Y|X = 100) is the true average sales over all Walmart locations
with advertising expenses equal to $100.
The regression model says the true average sales over all Walmart
locations with advertising expenses equal to $100 is
E(Y|X = 100) =
0
+
1
100.
20 / 48
Concept Review
Variance and Standard Deviation
Variance measures how spread out are a collection of numbers are.
A small variance indicates that the data points tend to be very close
together;
A high variance indicates that the data points are spread out.
2
X
is used to denote the true, population-level variance of random
variable X
X
is used to denote the true standard deviation of X.
These values are almost always unknown to us and the goal is then
to estimate them.
Sample Statistics:
n1
i =1
(x
i
x)
2
.
s =
2
X
=
1
n1
n1
i =1
(x
i
x)
2
.
You will need to know how to use the tables in the back of your book.
29 / 48
Concept Review
Example
The stock price for a large retailer in the 4th quarter is assumed to be
normally distributed with mean = 45 and standard deviation = 5.
What is the probability that the stock price in the 4th quarter exceeds 50?
30 / 48
Concept Review
Example
Let X represent the stock price.
Then we are interested in calculating P(X > 50).
P(X > 50) = P(
X 45
5
>
50 45
5
)
= P(Z > 1) = 0.5 0.3413 (Numbers come from Table B.1)
= 0.1587.
31 / 48
Concept Review
Example
We will use this trick throughout the semester (see textbook p. 19-23):
If X N(, ), then Z =
X
N(0, 1).
2
,n1)
s
n
, y + t
(
2
,n1)
s
n
).
34 / 48
Concept Review
Condence Intervals
Notice that we are using the t distribution, which always requires knowing
the degrees of freedom (n 1). In shorthand, we write t(n 1).
t
(
2
,n1)
denes the critical value, a value on the t distribution that
corresponds to area under the curve equaling
2
.
35 / 48
Concept Review
Example:
A manufacturer wants to estimate the average life span of an expensive
electrical component. Because the test to be used destroys the
component, a small sample is desired. The lifetimes in hours of ve
randomly selected components are
92,110,115,103,98.
Find a point estimate and 95% condence interval estimate of the
population average lifetime of the components.
36 / 48
Concept Review
The interval to be used is
( y t
(0.025,4)
s
n
, y + t
(0.025,4)
s
n
).
So, we need
y = 103.6
s =
n
i =1
(y
i
y)
2
n 1
= 9.18
t
(0.025,4)
= 2.776.
After plugging in all the numbers into the formula, we get (92.2, 115.0).
37 / 48
Concept Review
Hypothesis Testing
Hypothesis testing is extremely important in this class. Here are the key
denitions.
Null hypothesis H
0
: states the hypothesis to be tested.
Alternative hypothesis H
a
: includes values of the population
parameter not in the null hypothesis.
n
= 1.56.
Plugging in values, we get
y = 103.6
s =
n
i =1
(y
i
y)
2
n 1
= 9.18
t = 1.56.
43 / 48
Concept Review
Now we use the following decision rule (see p. 35 for more rules)
If H
a
: = 110, then p-value = P(t < 1.56 or t > 1.56)
If H
a
: < 110, then p-value = P(t < 1.56)
If H
a
: > 110, then p-value = P(t > 1.56)
45 / 48
Concept Review
If the p-value is 0.079, do you reject or fail to reject the null hypothesis at
the 10% condence level?
At the 5% condence level?
At the 1% condence level?
46 / 48
Concept Review
If the 95% condence interval is (92.2, 115.0), how would you evaluate
the following hypotheses at the 5% signicance level?
H
0
: = 92.0; H
a
: = 92.0
47 / 48
Concept Review
Since the 95% condence interval is (92.2, 115.0), how would you evaluate
the following hypotheses at the 5% signicance level?
H
0
: = 92.0; H
a
: = 92.0
Reject!! Because 92.0 is not contained in the 95% interval.
A 5% signicance level corresponds to a (100-5)% condence interval.
48 / 48