Prepare For STAT170 Exam

Basic assumptions about you
Many elementary concepts have been skipped. At this stage, it is assumed that you should know them well. In particular, you MUST know how to do HATPC for each of the 8 hypothesis tests. Only important things, or those that inter-connect several topics together, are elaborated here. You have ABSOLUTELY NO hope of passing STAT170 if you do not know the 8 HATPCs. This PP file will NOT push you from F to P. The contents of this file will only help the P or above students, given the presumed basic knowledge.
1
Binding things together
Review of: 5 types of graphics 5 types of research questions 8 statistical tests 8 or MORE types of reports
Displaying Data: 5 types of graphics

DATA
categorical numerical
Displaying Data: 5 types of graphics

(The following table conveys the same information as the previous slide.)
Combination of variable(s) Graphic Bar chart pie chart Histogram stem-&-leaf Clustered bar chart Scatter plot Comparative box plots
4
categorical
clustered bar chart comparative box plots bar chart or pie chart
comparative box plots
bar chart or pie chart histogram or stem-and-leaf plot
One categorical (Lecture 2, 11) One numerical (Lecture 2, 7) Two categorical (Lecture 2, 11, 12) Two numerical (Lectures 2, 9 & 10)
3
numerical
scatter plot histogram or stem-and-leaf plot
One categorical and one numerical (Lecture 2, 8)
5 types of graphics
STAT170 is restricted to only 5 types of combinations of variables, 5 different types of graphics, and 5 possible research questions. The most important step is correctly identifying the types of variables: NUMERICAL vs CATERGORICAL. Surprisingly, many students have difficulty in this very first step. The correct/wrong identification of variables would lead you to the correct/wrong: Type of graphic Research question, and Statistical test.
How to comment on graphics:

1. Comments on a single bar chart
(seldom asked)
Comment depends on whether variable is ordinal or nominal Ordinal: comment similar to histogram Nominal: comment on which categories have the highest count and lowest frequencies 400
350 300 250 200 150 100 50 0 meat vegetarian diet vegan
Skewed to the right.

5
This doesnt make any sense!
2. Comments on a single histogram (or stem-and-leaf plot)

1. 2. 3. 4. 5.
Freq.
500 400 300 200 100 0
0 5 10 15 20 25 30
Comment on shape (skewed left/right, normal) Range from xxxx to xxxx Majority (high frequencies) of data about xxxx Comment outliers (if present) Comment on any unusual features (if present)
Assessment
Example: U-shaped, high frequencies near both ends, lowest frequencies near the centre U-shaped, but slightly skewed left Range from 0 to 12
Freq.
100 80 60 40 20
7
Individual Days
0
0 3 6 9 12
3. Comments on comparative boxplots

Compare medians Compare spread (IRQ) Compare outliers (Even when there are no outliers, say no outliers.)
Class
Class
4. Comments on scatter plot

Comment on linear/curved? Positive or negative slope? Comment on amount of scatter (big or small?) Comment on outliers, if any Comment on residuals Sym on both sides of the line/normal? Constant SD?
UAI
day
evening
15
20
25
30
35
Age
Birth Rate
50 45 40 35 30 25 20 15 10 5 0 10 15 20 25 30 35 40 45
age marriage 55 50 45 40 35 30 25 20 15 10 10 20 30 40 husband age 50 60 70
110 100 90 80 70 60 50 40 30 20 10 -1 0 1 2 GPA 3 4 5
Median Age
10
5. Comments on clustered bar charts

Compare the shapes of the clusters, NOT the sizes.
Shapes (not size) similar
The 2 variables independent (ie have no association) (since % are the same)
Comments on clustered bar charts: explanation
Shapes (not size) not similar

The 2 variables not independent (ie have association) (because % are not the same)
11
Never compare the actual frequencies (sizes). Only compare % (or proportions) (shapes). Since proportions are almost the same, ie about 1/3 and 2/3 for smokers and non-smokers, smoking status is independent of Activity Level (no association)
12
Comments on clustered bar charts: explanation
Never compare the actual frequencies (sizes). Only compare % (or proportions) (shapes). Since percentages of smokers and non-smokers are obviously different for males and females, there is an association between smoking status and gender.
similar in shape (although different sizes)
Different shape, (although same size)
13
14
The 8 hypothesis tests in STAT170

DATA
categorical
Clustered barchart Chi sq test of association + OR
Determining numerical vs. categorical

You only need to be able to identify between numerical and categorical. No need to further classify into continuous or discrete(=integer), nor further classify into nominal or ordinal. If you cannot distinguish between nominal and ordinal, youll only lose a few marks in Q.1. But how about numerical vs categorical ? See next slide.
15 16
numerical comparative boxplots 2-sample t test scatter plot T-test of Histogram 1-sample Z or t test
bar chart z-test of proportion or chi sq test of proportions
categorical
numerical
comparative boxplots 2-sample t test

bar chart z-test of proportion or chi sq test of proportions
Histogram 1-sample Z or t test
Note: 7 tests above + paired t-test +OR= 8 tests in STAT170
Example: Numerical vs Categorical

Age: age in years Numeric (continuous) Histogram / stem-leaf => z-test or t-test
Age: 0-12 children (1), 13-18 teenager (2), > 18 adult (3), Categorical (ordinal) bar chart /pie chart => Chi sq test of proportions (GOF test)
No one can help you

How many such mistakes can you afford to make in exam? 3 such mistakes => youll fail in STAT170 You have absolutely no hope of passing STAT170 if you cannot distinguish between numerical and categorical variables since the whole philosophy of STAT170 is based on classifying categorical and numerical variables. (This is unlike other 1st-year stat courses in other universities.)
18
A mistake will cost you at least 6 marks in HATPC, plus other marks in subsequent parts of the questions. The key is look at the definition, not the meaning we use in daily language. Read the question! The results are unchanged if we use the names ABC or XYZ instead of 17 AGE.
Absolute bottom line:

1. HOW MANY variables? 2. Are the variables numerical or categorical? Answering these 2 questions correctly will lead you to one of the 5 cases, and almost the correct test. The HATPC is then, hopefully, bookwork.
How students fail ?

But many students already have trouble in the first question: How to determine how many variables are there?to make friends with? Who do you find it easier
frequency 400 350 300 250 200 150
For example, How many variables are there? 3 or 1?
100 50 0 same sex opposite sex response either
Think of the survey. How many questions? 3 or 1? How many columns do you need to store the data? 3 or 1? You are doomed if you choose 3 variables. In fact there is no test in STAT170 that involves 3 variables.
20
19
How students fail ?

Smoker Male Female 4 5 Non-smoker 11 8
Getting a pass in STAT170

You need to be able to do ALL of the following: 1. Count how many variables 2. Identify the variables as numerical or categorical 3. Do ALL 8 hypothesis tests You will fail in STAT170 if you cannot do just ONE of them! (In fact, if you can do ALL of them well, a Cr is guaranteed.)
21 22
Another example: How many variables are there? 1, 2 or 4? You are doomed if you choose 4 variables.
How to determine the appropriate test

Variable(s) One categorical Graphics Barchart, pie chart Research Question (e.g.)
Is the proportion of smokers equal to 0.3? Are the proportions of meateaters, vegetarians & vegans equal to 0.8, 0.15 & 0.05?
Answering the research Q: Formal stat test
Beware of the paired t-test

The paired t-test may be mistaken as: 2-sample t-test Regression Read the given Research Question If you see relation or predict => regression If you see difference => 2-sample t or paired t. Then think! Eg: Weight loss program? Y1=Wt before, Y2=Weight after
z-test of proportion (Lect 7) 2 categories only 2 test of proportions (GOF ) (Lect 11) -- 2 or more categories z and t-tests of mean (Lect 7) Chi sq test of association (Lect 11, 12) or Odds ratio Regression analysis: Test of slope (Lect 9,10) 2-sample t-test (Lect 8)
One numerical Two categorical Two numerical
Hist, stem- Is the mean equal to ? leaf, boxplot Clustered barchart Scatter plot Is there an association between and ? Is there a relation between and ?
Comparative Is there a diff in heights One categ (binary) & boxplots between males and one numeric females?
Note: 1. There is the paired t-test which doesnt fit in any of 5 cases above, perhaps it fits best in the 2nd case (one sample t-test). 2. 7 tests above + paired t-test = 8 hypothesis tests in STAT170
23
24
How to determine the appropriate test

Method 1 The ONLY SURE way to determine the correct test is to identify the variable types correctly! Method 2 IF you cannot do (1), then you may look for keywords in the research questions. But be warned it is NOT 100% fool-proof. 100% association => Chi-sq test of association certain relation, predict => Regression (with t-test on slope) difference => 2-sample t-test, or paired t-test Proportion (singular!), percentage => Z-test of proportion Proportions (plural), percentages => Chi-sq test of proportions (GoF) mean, average => One-sample z-test or t-test See the underlined keywords in the previous slide. NOT 100% fool-proof! Eg: Are proportions of smokers the same for 25 males and females? => Chi-sq test of association
How to determine the appropriate test (continued)

Method 3 (Easiest for you) Look at the given graphic, then deduce the appropriate test. This is almost certain, but many questions do NOT show graphs! ONE histogram/stem-leaf => z-test or t-test or paired t Bar chart/pie chart => chi-sq test of proportions (GOF) (if binary, GOF or z-test of proportion) Clustered bar chart => chi-sq test of association Scatter plot => regression: test of slope TWO histograms/stem-leafs and/OR comparative box plots => 2-sample t
26
3 types of statistical tests involving categorical data

Statistical test z-test of proportion Keywords in Res. Q Proportion, % Ho Ho:= 0 Assumptions n05, n(1-0)5 Test statistic
3 types of statistical tests involving categorical data (CONTINUED)

Copy Ho + could be Opposite of Ho + is higher/lower
z=
p 0 0 (1 0) n
(O j E j ) Ej
2
Ho Ho:= 0
95% C.I.
Conclusion Conclusion (NOT reject Ho) (reject Ho) Proportion could be equal to 0 The proportions 1=, 2=, 3= COULD be correct. X and Y COULD be independent (not associated) Proportion is higher/lower than 0. The proportions 1=, 2=, 3= are NOT correct. X and Y are dependent (associated)
Chi sq goodness Proportions, of fit (chi sq percentages test of (plural) proportions)
Ho: 1=, 2=, 3=
.........
Ei=n*i 5
2 =
p 1.96
p(1 p) n
df=c-1
Chi sq test of Association, X and Y are Ei = row tol col tot grand total independence independent, independent 5 (no association) proportions
2 =
(Oij Eij ) 2 Eij
Ho: 1=, 2=, 3=
.........
Read from computer output
df=(r-1) (c-1)
X and Y are independent
......... -----------
27
28
5 types of statistical tests involving continuous data

Statistical test Keywords in Ho Res. Q. Ho:=0 ( known) Ho:=0
( unknown)
Copy Ho + could be Ho Ho: =0 ( known) Ho: =0

( unknown)
5 types of statistical tests involving continuous data (CONTINUED)

95% CI (NOT reject Ho)
Opposite of Ho + is higher/lower
Assumptions Normal population, or n 25 (CLT)
Test statistic
z= y 0 / n
Conclusion (Reject Ho) Ave xxx is higher/lower than 0 The difference is higher/lower than 0 on ave Ave xxx is higher/lower than ave xxx There is a positive/negative relation.
30
1-sample z-test of mean Mean, average 1-sample t-test of mean Paired t-test difference
......... .........
y 0 t= df=n-1 s / n
yd d sd / n df=n-1 t=
t= sp y1 y 2
1
y 1.96
y tn 1
Ho:d=0
Difference from normal popn, or n 25 (CLT) Both groups from normal popn, same SD
Ho: d=0 (paired t)
.........
yd t n1
s n sd
Ave xxx COULD be equal to 0 The difference COULD be 0 on ave
2-sample t-test difference
Ho:1=2
n1
Ho: 1=2 . . . . . . . . . ( y1 y 2 ) (2-sample t)
n2
t s p
df=n1+n2-2
Test of linear relation between 2 variables
There COULD be no difference 1 1 between ave xxx + n1 n2 and ave xxx There COULD be no relation between X & Y
Relation, predict
Ho: =0
Linear Res normal Res const SD
t=b/SEb df=n-2
29
Ho: =0 b tn-2 SEb
In ALL hypothesis tests, include CI in the conclusion.
Examples of the 8 HATPCs?

It is assumed that you know them well at this stage. There are tons of examples of EACH in Lecture and Tutorial notes. You have absolutely no hope of passing STAT170 if you cannot do the 8 HATPCs since hypothesis tests, and related questions, span more than 60% of exam materials. 1. 2. 3. 4. 5. 6. 7. 8.
8 types of Simple Reports involve only 1 hypothesis test only reports

One sample t-test (See Tutorial 8) One-sample z-test Paired t-test 2-sample t-test Z-test of proportion Regression Chi-sq test of proportions Chi-sq test of independence (See Lect 13)
31
32
Key points to write in the Simple report (Check list) 1-hypothesis-test only
Introduction *What this study is about, and why this study if known *Research question any wording is OK *Target population Method *How the sample was collected (why random and representative) *Define variables *Statistical method used *Null hypothesis *Justify assumptions [put under Method or Result, depending on the type of test]
33
Results (NO HATPC; NO calculations) *Test statistic *P-val, decision (reject/not reject null) Conclusion *Decision in words: There is evidence/no evidence [Check that the research question is answered.] *Your conclusion should be almost the same (several sentences) as the conclusion you have in the proper hypothesis test (HATPC), e.g. 95% CI if appropriate. Note: It is most important that you identify the correct statistical method used (how???). For example, if it is a chi-sq test and you mention t-test, then the rest does not make sense, and youll lose most of the marks 34 and your time!
Complex Reports: Involve several hypothesis tests

Reports involving hypothesis tests of the same type: SIBT 2008B, 2009A regressions MQC 2009A, 2009C, 2010B, 2010C regressions SIBT 2009C, MQC 2010A chi squares University 2007, Term 2 2-sample t Reports involving hypothesis tests of different types: SIBT 2008C, 2009B 2-sample t & chi squares MQC 2009B, 2011A, 2011B regressions & 2sample t 35 Note: No matter how complicated it may appear (many Xs), there should only be ONE Y. (Several Ys would bring you to post-graduate level!) Since so many (at least 5) cases are possible, it is stupid to copy a sample report (eg the one in Tute 8) in your crib sheet, since there are 8 possible simple reports at least 5 complex reports
36
1st Example: SIBT 2008B exam (report question)

(I do not have a copy of the exam paper.) Given 6 regressions (6 tables and 6 scatter plots): Y vs x1, y vs x2, y vs x6 Research Question: Which variables X1, X6 are significant predictors for Y, and which best predicts Y?
2nd Example: SIBT 2008C exam

Research question: Which variables X1, X2, X3 and X4 affect Y? Y and X1 Y and X2 Y and X3
Y and X4
37 38
1st General Rule for COMPLEX report

Discard the bad variables: those where assumptions are violated not valid. those whose p-val > 0.05 (ie those where Ho is NOT rejected, because null hypothesis represents no effect) (eg no difference in 2-sample t, no relation in regression, no association in chi-sq test) Variable P-val X1 X2 X3 0.01 0.08 0.02 Significant variable? (Reject Ho?) Yes (Reject Ho) No (Not reject Ho) Yes (Reject Ho) Result Keep X1 ------------(Discard X2) Keep X3
1st General rule for COMPLEX report

Warning: Common mistake: P-val<0.05 => reject Ho => reject the variable X Keep X P-val>0.05 => not reject Ho => => not reject variable X Discard X Golden rule: You may avoid mentioning Ho! p-val<0.05 => Keep X (Small prob (<5%), alarm bell rings) P-val>0.05 => Discard X Warning: If you misunderstand the above, the conclusion of your report will be exactly opposite of what it should be, and you will lose MANY marks!
40
39
2nd General rule for complex report

Sometimes the question may ask for the BEST variable that determines Y. Choose the best one within each group. Do NOT compare the p-val of one type of graph with the pval of another type of graph. (Compare an apple with an apple; compare an orange with an orange.) : Regressions choose best X : : 2-sample ts choose best X : : Chi squares choose best X 41 :
What is the best X and how to choose it? In EACH set, choose the variable with the smallest p-val (ie the one that strongly rejects Ho) EXCEPT regression. For regression, choose the largest r2, not smallest p-val
42
Example of choosing/discarding variables
Needed for choosing the BEST An example on regression to illustrate 1st general rule:variable(s) Significant variable? (Reject Ho?)
2nd Example: SIBT 2008C exam

Research question: Which variables (X1, X2, X3 and X4) affect Y? Y and X1 Y and X2
Variable Assumptions P-val satisfied? X1 X2 X3 X4 X5 No Yes Yes No Yes ----0.006 0.000 ----0.07
r2 ----0.53 0.67 ---------
Result -----
----Yes Yes ----No (p-val>5%)
Best --------Y and X4

44
Y and X3
Hence only X2 and X3 are significant (important) variables 43 affecting Y. And X3 is the best predictor for Y.
Y vs Wt
Y vs WIN
Y vs Starts Compare: Y vs WT : p-val = 0.00055 Y vs STARTS: p-val = 0.0012 Both p-val< 0.05 => both Wt and Starts affect Y, but Wt has a stronger effect (because of smaller p-val).
45
Y vs Payout
Compare: Y vs WIN: p-val=0.5641 Y vs PAYOUT: p-val=0.0000 Hence WIN has no effect on Y. Payout has an effect.
46
Key points to write in the COMPLEX report (No rigid rules!)

Introduction *Research question *Target population Method *How the sample was collected (why random and representative) *Define the Y and X variable(s) *List ALL statistical methods used *Check assumptions [put under Method or Result, depending on the type of test] in EACH case. (But AVOID lengthy repetitive checking the assumptions one by one.)
47
Results (NO HATPC; NO calculations) *Discard poor ones (assumptions violated, or p-val>0.05) (AVOID lengthy repetitive checking p-val one by one.) *IF required by the question, pick the best one within each group. Conclusion Answer the research question! --------------------------------------------------------------BTW, what is the research question like? Two possibilities: Which of the variables X1, x2, . affect variable Y? Which of the variables X1, x2, . BEST affect variable Y?
48
Hints and Tips: normal tables

1. 2-tailed normal table vs. 1-tailed normal table: 1-tailed probability calculations 2-tailed hypothesis testing Suggestions: The FIRST thing you should do in exam, before you start writing anything, is (on the two z-tables): (This applies to the HD students as well!)
49
50
Hints and Tips: t and tcrit

2. T statistic and the t in C.I. (This applies to ALL t tests: 1-sample t, 2-sample t, t in regression slope, paired t.) The t-statistic is calculated (not read from tables) The t in 95% CI is read from table (row and column 0.05) The SECOND thing you should do in exam, before you start writing anything, is (on the t-table): (This applies to the HD students as well!)
51 52
Hints and Tips: chi sq table

3. You should only use the top few rows of chi sq. 4. z =
Hints and Tips: y and y-bar

y
vs.
z=
y n
in probability calculations:
Look for the keyword mean or average => y-bar. Note that there are NO such formulas:
z= y
vs. z =
y n
53
54
Hints and Tips: 2-sample t and paired t

5. Paired-t test vs. 2-sample t-test No rules! 1st clue: Different n1 & n2 => CANNOT be paired t-test; must be 2-sample t-test If n1=n2 => either test is possible. 5. 2nd clue: Ask yourself Can I move the values of one variable without moving the corresponding values of the other variable? Can move values of one variable => independent data => 2-sample t Cannot move values of one variable (need to move BOTH variables) => dependent data => paired-t
56
55
Hints and Tips: z and t tests

2nd clue: From Lect 13: Age difference between husband and wife Can we swap the fathers of ages 33 and 46 WITHOUT moving the wives and the babies? Move alone =>indept => 2-sample Move pairwise together => paired t 6. Z-test vs. t-test Know population standard deviation => z-test Do NOT know => t-test Clues: * It is known that SD=xxx => likely => z-test * Given numerical summary of data (MUST be sample):
n xxxx mean xxxx StDev xxxx
Baby ID 21 22 23 24 25 26 27 28 29 30
Mothers age 28 34 24 34 32 24 30 29 37 41
Fathers age 33 40 26 45 35 27 39 27 34 46 57
The SD from a data set (sample) MUST be s, never => t-test * Do watch out if both and s are given. Once we have , s is useless => use z-test.
58
Hints and Tips: CLT

7. This is a common mistake: When sample size is large (n25), the sample is approximately distributed. The statement means that if we make a histogram of the sample (n25), then the histogram should be approximately bell-shaped. This is NOT CLT; it is WRONG! We know that as sample size n becomes larger and larger, the sample histogram looks more and more like the population, which could be anything. The above statement is NOT CLT. The correct statement of CLT is: When sample size is large (n25), the sample mean (y-bar) is approximately normally distributed. The applies to one-sample z or t test, 2sample t and the pair t-tests. 59
Tips and Hints: Which condition? n5 and n(1-)5, or np5 and n(1-p)5 ?
8. Lect 5 (prob calculation on p) or Lect 7 (z-test on ) p z= Check n5 and n(1-)5
(1 )
n
p (1 p ) Lect 6: CI for p 1.96 Check np5 and n(1-p)5 np n Rule: p goes with p, goes with , p NEVER goes with together. Note that although the above 2 formulas are in the formula sheet, the 2 corresponding conditions are not. You have to know which one is the correct condition for checking.
60 60
Tips and Hints: pth percentile (including LQ, LQ)

9. Find pth percentile
pth percentile
(b) Given population (of infinite size) of known (given) and : = 100 = 15 (i) Given normal: 100 Find z from the given area p (1-tailed) Then find y = +*z Eg: It is known that IQ is normally distributed with
(a) Given ANY sample of size n, use the formula: n*p/100 (Lect 2) Then check result is integer or non-integer etc.
Eg AGE: 12, 17, 28, 32, 33, 40, 40, 67 (MUST be sorted first!)
mean 100 and SD 15. What is the 10th percentile? What is the LQ?
(ii) non-normal (or unknown distribution) CANNOT do it!
61
62
Hints and Tips: Association

Smoker Nonsmoker Male Female 4 5 11 8
Hints and Tips: Writing conclusion when

Ho is NOT rejected
11. Many versions, hence students are confused. Eg in 2-sample t-test: (1) There could be no difference ; (there is strong no evidence to indicate otherwise.) (2) There is probably no difference (3) There is no significant difference (4) There is no evidence to indicate a difference All of the above are correct! Please stick to (1), which is easiest! (3) and (4) are double negatives, which you may make mistakes, with (3) being terrible. Keep things simple! Note that in (2) or (3), if you miss out probably or significant, then There is no difference is wrong 64 (accepting the null hypothesis).
10. No association/association between males and females. No association/association between smokers and nonsmokers (In fact, males, females, smokers and non-smokers are NOT variables.) It should be: Could be no association/There is association between Sex and Smoking Status. 63
Writing conclusion in HATPC: the rules:

Try the chi sq test of association: Ho: There is NO association between X and Y Suppose we do NOT reject Ho. Conclusion: (1) There could be no association ; (there is strong no evidence to indicate otherwise.) (2) There is probably no association (3) There is no significant association (4) There is no evidence of an association Again all of them are correct.
65
Eg 2-sample t: Ho: There is no difference in exam marks on average for boys and girls. Eg chi sq test of association: Ho: There is NO association between X and Y Eg regression: Ho: =0 (No relation between X and Y) P-val<0.05 =>Reject Ho Negate (make opposite) Ho Be certain, use the verb is Also give further info: is greater/less than, is longer/shorter (eg onesample or 2-sample t) except chi sq P-val>0.05 =>Do not reject Ho Copy Ho Change the verb is to could be.
66
Writing conclusion in HATPC: Example 1

Eg 2-sample t: Ho: There is no difference in exam marks on average between boys and girls. P-val<0.05 =>Reject Ho There is a difference in exam marks between boys and girls, with girls have higher average than that of boys. P-val>0.05 =>Do not reject Ho There could be no difference in average exam marks between boys and girls.
Writing conclusion in HATPC: Example 2

Eg chi sq test of association: Ho: There is NO association between sex and smoking status
P-val<0.05 =>Reject Ho There is association between sex and smoking status. P-val>0.05 =>Do not reject Ho There could be no association between sex and smoking status.
67
68
Hints and Tips: Symbols their writings

and meanings
12. Last, but not least, MANY students have lots of problems here. Surprising, it is not much more difficult than Primary 1 !!! (a) Confusion of symbols of similar meanings: 1st yr Uni, STAT170: p=sample proportion =population proportion and y s and A confusion between p and , and y , and s and will cost your dearly in exam. Primary 1: This is my book; this is your book; this is Marys book. My book, your book and Marys book are not the same. You will be in big trouble if you regard Marys money as the same as yours. 69
Hints and Tips: Symbols their writings

and meanings
(b) Confusion of look-alike symbols: Primary 1: 1st yr Uni, STAT170: and u and and B and w and E
i, j g, p, q a, o, e, c d, b h, k m, n l, 1 u, v z, 2
Which is more difficult? Surprisingly, students find the symbols in STAT170 more difficult than the 26 English letters in Primary 1. If you have problems in the left column, you will be in big trouble. You will NOT lose just a a few marks, but many!70
Predicting the future

The following happened in past semesters without exceptions, and WILL likewise occur in the future in this semester (prob=0.99999): 1. Someone will write u instead of . 2. Someone will copy a sample report (from past exam papers or Tute 8) onto the crib (pink) sheet. 3. Someone will leave the whole page blank on the hypothesis test on slope in regression, which is the easiest HATPC. 4. Someone will not know the meaning of r2. 5. Someone will write There is an association between males and females. This makes no sense at all.
71
Predicting the future

6. Someone will write z =
and
z=
y n
7. Someone will use the formula for 2-sample t or paired-t y d ( y1 y2 ) 0 = t= d sd / n ( s1 s2 ) / n
72
Ask yourself
How many hours did I spend on STAT170 each week, on average? Macquarie University recommends (minimum): 3 credit points * 3 hours = 9 hours = 4 hours in class + 5 hours on your own at home Every WEEK.

Profile of students who fail Failure check list

The followings are common characteristics of those who fail: Low class attendance Did not study on a weekly basis No/few attempts of online quizzes #Can do at most one hypothesis test in exam #Cannot do t-test on regression #Cannot count how many variables #Cannot distinguish between categorical and numerical Do not know parameters vs statistics Do not know the symbols , , , and Mix up p and , y-bar and , s and
74
73
Failure check list (continued!)

Did not do the exercises on the tutorial sheets Gave up assignment(s) Do not know how to use calculator to find SD Low marks in Practical Test Copy past exam solutions, word by word, onto crib sheet Copy report(s), word by word, onto crib sheet Do not read past exam papers
How many ticks do you have in the above list ? ____ Unfortunately, even just ONE tick, eg Can do just one hypothesis test, can (and will) make a failure! Note: # = fatal
75

Prepare For STAT170 Exam

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Prepare For STAT170 Exam

Enviado por

Direitos autorais:

Formatos disponíveis

Basic assumptions about you

Binding things together

Displaying Data: 5 types of graphics

Displaying Data: 5 types of graphics

comparative box plots

bar chart or pie chart histogram or stem-and-leaf plot

scatter plot histogram or stem-and-leaf plot

One categorical and one numerical (Lecture 2, 8)

How to comment on graphics:

Skewed to the right.

This doesnt make any sense!

2. Comments on a single histogram (or stem-and-leaf plot)

3. Comments on comparative boxplots

4. Comments on scatter plot

age marriage 55 50 45 40 35 30 25 20 15 10 10 20 30 40 husband age 50 60 70

110 100 90 80 70 60 50 40 30 20 10 -1 0 1 2 GPA 3 4 5

5. Comments on clustered bar charts

Comments on clustered bar charts: explanation

Shapes (not size) not similar

Comments on clustered bar charts: explanation

similar in shape (although different sizes)

Different shape, (although same size)

The 8 hypothesis tests in STAT170

Determining numerical vs. categorical

comparative boxplots 2-sample t test

Histogram 1-sample Z or t test

Note: 7 tests above + paired t-test +OR= 8 tests in STAT170

Example: Numerical vs Categorical

No one can help you

Absolute bottom line:

How students fail ?

For example, How many variables are there? 3 or 1?

100 50 0 same sex opposite sex response either

How students fail ?

Getting a pass in STAT170

How to determine the appropriate test

Answering the research Q: Formal stat test

Beware of the paired t-test

One numerical Two categorical Two numerical

How to determine the appropriate test

How to determine the appropriate test (continued)

3 types of statistical tests involving categorical data

3 types of statistical tests involving categorical data (CONTINUED)

Chi sq goodness Proportions, of fit (chi sq percentages test of (plural) proportions)

Ho: 1=, 2=, 3=

(Oij Eij ) 2 Eij

Ho: 1=, 2=, 3=

Read from computer output

X and Y are independent

5 types of statistical tests involving continuous data

Copy Ho + could be Ho Ho: =0 ( known) Ho: =0

5 types of statistical tests involving continuous data (CONTINUED)

Assumptions Normal population, or n 25 (CLT)

Ho: d=0 (paired t)

Ave xxx COULD be equal to 0 The difference COULD be 0 on ave

2-sample t-test difference

Ho: 1=2 . . . . . . . . . ( y1 y 2 ) (2-sample t)

Linear Res normal Res const SD

Ho: =0 b tn-2 SEb

In ALL hypothesis tests, include CI in the conclusion.

Examples of the 8 HATPCs?

8 types of Simple Reports involve only 1 hypothesis test only reports