Applied Biostatistics Project

APPLIED BIOSTATISTICS PROJECT
ASSIGNMENT# 1 (SEMESTER SPRING-2019)
Submission Date (APRIL 4, 2019)
BY
MAIMOONA KANWAL
ROLL # 17111714-010
ZOO-531 (Applied Biostatistics)
MPhil Zoology (A)
Submitted To
Mr. Zaheer Abbas
Department of Zoology
UNIVERSITY OF GUJRAT
Applied Biostatics Project

In first dataset, data is collected from different hospitals of district Gujrat. It consist of four
nominal, two ordinal and three scale variables. In this data, Areas of medicine, the rank of doctors
in that specific area and their income on monthly bases is checked. Further it is checked that how
many time they preferred operations and that lead to death.
1: Univariate analysis:
In which we use only one variable and all statistical parameters are applied on this data.
Qualitative data
Which describes the quality of something like color. Qualitative data is a categorical
measurement expressed not in terms of numbers, but rather by means of a natural language
description.
Using frequency to study nominal data:
In univariate analysis, a single nominal variable is selected. In which I selected the various
departments in different hospitals and checked that which department is visited more than others
by patients in a month.
For qualitative data, frequency tables are formed.
Departments
Frequency Percent Valid Percent Cumulative
Percent
Cardiotherapy 4 13.3 13.3 13.3
Gynaecology 14 46.7 46.7 60.0
Neurology 3 10.0 10.0 70.0
Valid
Oncology 1 3.3 3.3 73.3
Urology 8 26.7 26.7 100.0
Total 30 100.0 100.0
Interpretation:
The Frequency column describes that how much observations are present in each category. It
indicates that sample consist of 30 observations & gynecology section is visited maximally by
patients as compared to others while oncology department shows least value. The Percent
column indicates the percentage of observations in that category out of all observations which
describes that gynaecology department has highest percentage. The Valid Percent column
displays the percentage of observations in that category out of the total number of nonmissing
responses. As our data has no missing value so it is as similar to percent value. You can verify
the proportions for each group by dividing its count in the "frequency" column by the value of
"Total" that appears after the last valid category.
Pie chart
Interpretation:
Large proportion of pie graph is covered by gynaecology department so in one week highest
number of patients visited this section. It is followed by urology department while least portion
is covered by oncology section.
Ordinal data:
To study ordinal data, patients visit response are selected which is an ordinal variable. Bar
graph is used to study qualitative data.
Bar Graph
Bar graph revealed that patients are highly satisfied from hospitals.
Scale data
In this data, income got by medicines sold in last week is a scale variable. No frequency table
is done for scale data so it is shown by histogram which is used for quantitative data. While
studying quantitative data, mean, median, variance, standard deviation, skewness is also
studied.
Statistics
Preference for operation in last month
Valid 30
N
Missing 0
Mean 4.90
Median 5.00
Std. Deviation 1.242
Variance 1.541
Skewness -.030
Std. Error of Skewness .427
Kurtosis -.879
Std. Error of Kurtosis .833
Minimum 3
Maximum 7
25 4.00
Percentiles 50 5.00
75 6.00
Interpretation:
The statistics table tells that there are 30 valid values.The center of the distribution can be
approximated by the median (or second quartile) 5 which means that half of the values are above
5 and half values are below five. Maximum value is 7 while the minimum value is 3. The mean
is similar to the median, suggesting that the distribution is symmetric which is confirmed by
small negative value. Kurtosis explains that data is normally distributed.
Histogram interpreted that our data is normally distributed that resulted in bell shaped curve.
2: Bivariate analysis:
This type of analysis consist of two variables.
Crosstab:
Crosstab is also called as contingency tables. It provides correlation between different variables.
Gender * Smoking ratio Crosstabulation

Count
Smoking ratio Total
Yes No
Male 19 5 24
Gender
Female 9 15 24
Total 28 20 48
There are total 48 observations which are equally divided into male and female. Gender is
taken as row variable because it is independent in nature while smoking is taken as
dependent variable. Data describes that smoking ratio is higher in male respondants.
Among 24, 19 are smokers while only 9 females are smokers among 24.
Chi square test
It is used to check the association between categorical data which is taken randomly.
Hypothesis:
Ho: Variable A and Variable B are independent.
Ha: Variable A and Variable B are not independent.
Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
Pearson Chi-Square 8.571a 1 .003
Continuity Correctionb 6.943 1 .008
Likelihood Ratio 8.884 1 .003
Fisher's Exact Test .008 .004
Linear-by-Linear Association 8.393 1 .004
N of Valid Cases 48
a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10.00.
b. Computed only for a 2x2 table
Interpretation:
The value of the test statistic is 8.571. The footnote for this statistic pertains to the expected cell
count assumption (i.e., expected cell counts are all greater than 5): no cells had an expected
count less than 5, so this assumption was met because the test statistic is based on a 2x2
crosstabulation table, the degrees of freedom (df) for the test statistic is 1. The corresponding
p-value of the test statistic is p = 0.003. Since the p-value is smaller than our chosen significance
level (α = 0.05), we do reject the null hypothesis. Rather, we conclude that there is enough
evidence to suggest an association between gender and smoking. Based on the results, we can
state the following: Association was found between gender and smoking behavior so we reject
null hypothesis.
Relative risk
Risk ratio (RR) or relative risk is the ratio of the probability of an outcome in an exposed
group to the probability of an outcome in an unexposed group.
Gender * Drinktaken Crosstabulation

Count
Drinktaken Total
yes No
Male 1 1 2
Gende
Femal 1 1 2
r
e
Total 2 2 4
Value 95% Confidence Interval

Lower Upper
Odds Ratio for Gender (Male / Female) 1.000 .020 50.397
For cohort Drinktaken = yes 1.000 .141 7.099
For cohort Drinktaken = no 1.000 .141 7.099
N of Valid Cases 4
Interpretation:
As risk ratio is 1 (or close to 1), it suggests no difference or little difference in risk (incidence in each
group is the same).

Hormone use * Status Crosstabulation
Count
Status Total
uninfecte light
d
yes 1 1 2
Hormone use
no 1 1 2
Total 2 2 4
Risk Estimate
Value 95% Confidence Interval
Lower Upper
Odds Ratio for Hormone use (yes / no) 1.000 .020 50.397
For cohort Status = uninfected 1.000 .141 7.099
For cohort Status = light 1.000 .141 7.099
N of Valid Cases 4
Interpretation:
OR=1 Exposure does not affect odds of outcome
3: Correlation
Assumptions:
Independent of case: Cases should be independent to each other
Linear relationship: Two variables should be linearly related to each other. This can be
assessed with a scatterplot: plot the value of variables on a scatter diagram, and check if the plot
yields a relatively straight line.
Homoscedasticity: the residuals scatterplot should be roughly rectangular-shaped.
Hypothesis:
Ho: Variable A and Variable B are interdependent.
Ha: Variable A and Variable B are not interdependent.
Correlations
Height deadspace
Pearson Correlation 1 .846**
height Sig. (2-tailed) .000
N 15 15
Pearson Correlation .846** 1
deadspace Sig. (2-tailed) .000
N 15 15
**. Correlation is significant at the 0.01 level (2-tailed).
Interpretation:
Table interpreted that the value of pearson correlation is 0.846 which means that there is a
strong relationship between your two variables. This means that changes in one variable are
strongly correlated with changes in the second variable. In our example, Pearson’s r is 0.846.
This number is very close to 1. For this reason, we can conclude that there is a strong
relationship between our height and deadspace variables. However, we cannot make any other
conclusions about this relationship, based on this number.
Scatter plot indicates that there is positive correlation in our data. As one variable increases
other also increases.
4: Linear regression
Simple linear regression is a statistical method that allows us to summarize and study
relationships between two continuous (quantitative) variables:
Descriptive Statistics
Mean Std. N
Deviation
systolic 106.43 3.7589 16
bp 8
Age 13.500 2.3664 16
Model Summaryb
M R R Adjust Std. Error Change Statistics
o Sq ed R of the R Square F Change df1 df2 Sig. F
d uar Squar Estimate Change Change
e e e
l
. .48 .444 2.8040 .481 12.955 1 14 .003
6 1
1 9
3
a
a. Predictors: (Constant), Age

b. Dependent Variable: systolic bp
ANOVAa
Model Sum of df Mean F Sig.
Squares Square
Regressio 101.860 1 101.860 12.955 .003b
n
1
Residual 110.077 14 7.863
Total 211.938 15
a. Dependent Variable: systolic bp
b. Predictors: (Constant), Age
Interpretation:
R-squared is a statistical measure of how close the data are to the fitted regression line. Value
of R square 0.481 indicated that nearly half of the variations in the model are explained by this
model.
Anova table provides the value 0.003 which is less than 0.005 so we can reject null hypothesis.
Histogram is done to check the normality of data. Its bell shape explains that our data has
normal distribution.
Multiple linear regression:
some variable, denoted x, is regarded as the predictor, explanatory, or independent
variables
The other variable, denoted y, is regarded as the response, outcome, or dependent variable
Model Summaryb
Mo R R Adjusted R Std. Error of Durbin-
del Square Square the Estimate Watson
1 .975a .950 .932 .9821 1.473
Predictors: (Constant), Age, Weight, Height, DBP
b. Dependent Variable: SBP
R square value is 0.950 which explains that our model near explains all the variations in the
data.
ANOVAa
Model Sum of df Mean F Sig.
Squares Square
Regressio 201.329 4 50.332 52.189 .000b
1
n
Residual 10.609 11 .964
Total 211.938 15
a. Dependent Variable: SBP
Predictors: (Constant), Age, Weight, Height, DBP
Anova table provided thhe value which is 0.000 that is less thhan 0.005 so
we reject our null hypothesis and accept alternate one.
Normality test:
5: Comparison of variables:
For comparison of mean of two variables we applied t test. In our analysis, we have applied
independent sample t test.
Independent sample t test:

For independent sample t test we compare a variable systolic blood pressure and the kidney
function. For this purpose we take the blood pressure as group variable and Hypertension
as as test variable
Formulation of hypothesis:
Following are the null and alternative hypothesis of our test.
HO = state that there is no mean difference between the blood pressure and hypertension.
H1 = states that there is a mean difference between the blood pressure and Hypertension
Acceptance and rejection of these hypothesis depends upon the p value which is if less than
0.05 than null hypothesis is rejected and vice versa.
Assumptions:
Normal distribution
Independence
No outliers
H0: µ1 = µ2 (“the two population means are equal”)
H1: µ1 ≠ µ2 (“the two population means are not equal”)
Independent Samples Test

Levene's Test for t-test for Equality of Means
Equality of
Variances
F Sig. t df Sig. (2- Mean Std. Error 95% Confidence
tailed) Difference Difference Interval of the
Difference
Lower Upper
Equal 3.276 .095 2.097 12 .058 19.750 9.418 -.771 40.271
variances
heart assumed
attack Equal 2.999 11.998 .011 19.750 6.586 5.400 34.100
variances not
assumed
Interpretation:
As p value is less than 0.05 that is 0.095 hence the HO is accepted i.e. the heart diseases has
significance effect on the heart attack.
Statistics Project After mid
Introduction to some terms
Experimental Unit:
It is the smallest part in which an experiment is divided to provide equal treatments for all units.
For example, if we are going to check effect of growth hormone on animals, animals will be the
experimental unit.
Factor, level and treatments:
Factor is the variable in the experiment which is explanatory in nature and used by researcher. It
is further divided into two or levels and combination of factor levels are termed as treatment. For
example, Urea is a fertilizer, its various concentrations are termed as level and if we combine it
with the concentrations of some other fertilizers, it is termed as treatment,
Experimental error:
It is actually the difference between the measured value and the true value that is due to random
factors. It is measured by its accuracy and precision which are itself measured by closeness of
measured value to the true value. Precision is the indication of closeness of two randomly taken
values.
CRD (Completely Randomized design)
Definition:
A design having all similar experimental units without any grouping. It includes the random
allocation of treatments to all experimental units that are actually homogenous in nature.
Problem Statement
In this experiment, different types of growth hormones are used to check the growth
rate of fruit plants. It is compared with control group to choose the most effective
growth hormone.
Worksheet column Description Variable Type

Growth Hormones Auxin, Gibberellins, Factor
Cytokines, Ethylene, None
Plant growth rate Growth rate of fruit Response
plants
HYPOTHESIS
 Null hypothesis: All treatment means are equal.

 Alternative hypothesis: At least one mean is not same with others.
Null hypothesis will be accepted or rejected it depends on P-value i.e. if the p-value
is less than 0.05 then null hypothesis is rejected and if p-value is greater than 0.05
then null hypothesis is accepted.
EXPERIMENTAL MATERIAL
1. Experimental unit- plants

2. Sample size = 150
3. Treatment- combinations of growth hormones
4. Response- plant growth rate
LOCAL CONTROL
 Local control means the control of all factors except the ones about which we
are investigating.
 Here as we are checking the effect of growth hormones on plant growth rate, so
all other factors that can affect this are kept controlled like temperature.
ANALYSIS
 Stat – ANOVA- One way ANOVA- Response - Factor - Confidence level- Graphs
(check box plots and 4 in one)- Click ok
Among all four in one plots, Normal probability plot and histogram showed that our
data is normally distribute. When we use normal probability plot, the residues must
follow a straight line which is given by our data which indicates that data is normally
distribute while in case of versus fit model. Same number of residues present on the
both side of 0 that is shown by our data so it is ok to say that our residues have
constant variance. In case of last one graph, it is interpreted that our data is
without any auto correlation.
Results Interpretation:
Factor Levels Values
Growth hormones 4 Auxins, Cytokines, Ethylene, Gibberellins
Source DF Adj SS Adj MS F-Value P-Value

Growth hormones 3 1321 440.26 17.75 0.000
Error 196 4861 24.80
Total 199 6182
P value is 0.000 which is below than our significance value so Null hypothesis is rejected and we
accept the Alternate hypothesis which states that all treatment means are not equal.
Model Summary
S R-sq R-sq(adj) R-sq(pred)
4.98011 21.37% 20.16% 18.12%
R Square:
R square values lie between the o and 100. It explains that how good our model is to explains all the
variations. Its value indicates that our model is able to explain 21.37% variations of our model.
Pre. R square
It explains that how good a model can make predictions, in our case, it may predict 18.12% predictions.
Means
Growth hormones N Mean StDev 95% CI
Auxins 50 18.420 5.091 (17.031, 19.809)
Cytokines 50 25.160 4.287 (23.771, 26.549)
Ethylene 50 19.540 5.023 (18.151, 20.929)
Gibberellins 50 20.380 5.447 (18.991, 21.769)
95% confidence interval means if we repeat our experiment 100 times, it will be exact for 95% times. If we
repeat the experiment for auxins for 100 times it value will be between the (17.031, 19.80). If we repeat the
experiment for Cytokines for 100 times it value will be between the (23.771, 26.549). If we repeat the
experiment for Ethylene for 100 times it value will be between the (18.151, 20.929). If we repeat the
experiment for Gibberellins for 100 times it value will be between the (18.991, 21.769).
Tukey Pairwise Comparisons

Grouping Information Using the Tukey Method and 95% Confidence
Growth hormones N Mean Grouping

Cytokines 50 25.160 A
Gibberellins 50 20.380 B
Ethylene 50 19.540 B
Auxins 50 18.420 B
Means that do not share a letter are significantly different.

It means that the mean of cytokines is significantly different from all others.
RCBD (RANDOMIZED COMPLETE BLOCK FDESIGN)
In this example, a person is trying to check the effect of various fertilizers on plant growth. He chooses that
field which is present on mountain stepping towards land which has high level of nitrogen on mountain side
and low level of nitrogen on land side so it can act as blocking factor.
Fertilizers (treatment variable):
4 different types of fertilizers; A, B, C & D were used.
Nitrogen levels (blocking variable):
Total 16 observations, eight from each block (Low level of N and high level of N)
Plants height (response variable)
HYPOTHESIS
1. Null hypothesis: All treatment means are equal.
1. Alternative hypothesis: All treatment means are not equal.
2. Null hypothesis: All block means are equal.
2. Alternative hypothesis: All block means are not equal.
3. Null hypothesis: Treatments and blocks are independent.
3. Alternative hypothesis: Treatments and blocks are dependent.
Null hypothesis will be accepted or rejected it depends on P-value i.e. if the p-value is less than 0.05 then
null hypothesis is rejected and if p-value is greater than 0.05 then null hypothesis is accepted.
EXPERIMENTAL MATERIAL
5. Experimental unit = Plants

6. Sample size = 16
7. Treatment- Fertilizers (A,B,C, D)
1 A
2 B
3 C
4 D
8. Blocking- Nitrogen level
1 High N level
2 Low N level
9. Response- Plant height.
Local Control
 Here as we checked the effect of different fertilizers on the plant height. Only type of fertilizers is
changed but all other factors are kept constant such as pH, temperature and oxygen concentration.
Analysis
 Stat – ANOVA- Generalized linear model- Fit generalized linear model- Response (Production of
Zooplankton)- Factor (Lake water, Supplements)- Model (Select Response and Factor & click
ADD)- Options & Stepwise (by default)- Graphs (check 4 in one)- Storage (check fit and
residuals)- Click OK
General Linear Model: height versus block, fertilizer
 Method

 Factor coding (-1, 0, +1)
 Factor Information
 TYPE LEVEL
 block Fixed 2 high N, Low N
 ferti Fixed 4 0, 1, 2, 3
There is one block factor that has low and high level while fertilizer factor has four level.
Analysis of Variance
 block 1 11.560 11.5600 11.56 0.009
 ferti 3 158.942 52.9808 52.98 0.000
 block*ferti 3 0.235 0.0783 0.08 0.970
 Error 8 8.000 1.0000
 Total 15 178.737
As P value is smaller than 0.05 for block factor so null hypothesis is rejected in case that all block means
are not equal. In the case of fertilizer, p value is 0.000 so it also rejects null hypothesis and accepts
alternate so there is significant difference in all means.
Model Summary
 S R-sq R-sq(adj) R-sq(pred)
 1 95.52% 91.61% 82.10%
R square value is nearly 100 so it means our model nearly explains all variations in our data.
Coefficients
 Term Coef SE Coef T-Value P-Value VIF

 Constant 25.187 0.250 100.75 0.000
 Block
 high N -0.850 0.250 -3.40 0.009 1.00
 Ferti
 0 -4.937 0.433 -11.40 0.000 1.50
 1 2.087 0.433 4.82 0.001 1.50
 2 -0.438 0.433 -1.01 0.342 1.50
 block*ferti
 high N 0 0.100 0.433 0.23 0.823 1.50
 high N 1 -0.175 0.433 -0.40 0.697 1.50
 high N 2 -0.050 0.433 -0.12 0.911 1.50
Designs of experiment
2k factorial
These designs are created to study a large number of factors, with each factor having the minimal
number of levels, just two. The levels are termed as as high and low, +1 and -1, to explain each
factor. They may be qualitative and quantitative in nature.
Description
In this example, I choose two factors each having two level
Blood pressure: Low High (75; 130)
Glucose Uptake: Low High (5, 10)
Response:
Diabetes value as affected by these two factors
Hypothesis:
Null hypothesis:
There is in significant difference on diabetic value based on Blood pressure and Glucose uptake.
Alternate hypothesis:
There is significant difference on diabetic value based on Blood pressure and Glucose uptake.
Pathway:
STAT-----DOE---Factorial-------Create Factorial-----Create design by selecting 2K----Add response
values--------Stat--------DOE------Factorial-----Analyze factorial--------select all graphs----click OK
Factorial Regression: diabetes versus blood pressure, glucose uptake
Model 3 6370.38 2123.46 112.50 0.000
Linear 2 6160.25 3080.13 163.19 0.000
blood pressure 1 5565.13 5565.13 294.84 0.000
glucose uptake 1 595.13 595.13 31.53 0.005
2-Way Interactions 1 210.13 210.13 11.13 0.029
blood pressure*glucose uptake 1 210.13 210.13 11.13 0.029
Error 4 75.50 18.88
Total 7 6445.88
P value is 0.000 which is below than o.o5 so we accept alternate hypothesis by rejecting null hypothesis.
Model Summary
4.34454 98.83% 97.95% 95.31%
Model is explaining 98% variations of the data as predicted by R square.

Coded Coefficients
Term Effect Coef SE Coef T-Value P-Value VIF
Constant 202.38 1.54 131.75 0.000
blood pressure 52.75 26.38 1.54 17.17 0.000 1.00
glucose uptake 17.25 8.62 1.54 5.62 0.005 1.00
blood pressure*glucose uptake 10.25 5.13 1.54 3.34 0.029 1.00
Fractional Factorial Design
There are seven factors with each having two levels. While the response factor is their effect on yield of
plants. So I decided to apply 1/16 fractional factorial design.
A Temperature
B pressure
C Humidity
D Soil fertility
E water level
F wind level
G fertilizer concentration
Hypothesis:
Null hypothesis:
There is in significant difference on yield of plant value based on all seven factors.
There is significant difference on yield of plant value based on all seven factors.
Pathway
DOE---Factorial--- Create Factorial design---- Select fractional factorial 1/16----click ok
Add yield as response and again repeat the steps by choosing analyze factorial design.
Results:
Factors: 7 Base Design: 7, 8 Resolution: III

Runs: 16 Replicates: 2 Fraction: 1/16
Blocks: 1 Center pts (total): 0
Design Generators: D = AB, E = AC, F = BC, G = ABC
Alias Structure
I + ABD + ACE + AFG + BCF + BEG + CDG + DEF + ABCG + ABEF + ACDF + ADEG + BCDE + BDFG
+ CEFG
+ ABCDEFG
A + BD + CE + FG + BCG + BEF + CDF + DEG + ABCF + ABEG + ACDG + ADEF + ABCDE + ABDFG +
ACEFG
+ BCDEFG
B + AD + CF + EG + ACG + AEF + CDE + DFG + ABCE + ABFG + BCDG + BDEF + ABCDF + ABDEG +
BCEFG
+ ACDEFG
C + AE + BF + DG + ABG + ADF + BDE + EFG + ABCD + ACFG + BCEG + CDEF + ABCEF + ACDEG +
BCDFG
+ ABDEFG
D + AB + CG + EF + ACF + AEG + BCE + BFG + ACDE + ADFG + BCDF + BDEG + ABCDG + ABDEF +
CDEFG
+ ABCEFG
E + AC + BG + DF + ABF + ADG + BCD + CFG + ABDE + AEFG + BCEF + CDEG + ABCEG + ACDEF +
BDEFG
+ ABCDFG
F + AG + BC + DE + ABE + ACD + BDG + CEG + ABDF + ACEF + BEFG + CDFG + ABCFG + ADEFG +
BCDEF
+ ABCDEG
G + AF + BE + CD + ABC + ADE + BDF + CEF + ABDG + ACEG + BCFG + DEFG + ABEFG + ACDFG +
BCDEG
+ ABCDEF
Factorial Regression: yield versus A, B, C, D, E, F, G
Model 7 2419.00 345.57 2.00 0.175
Linear 7 2419.00 345.57 2.00 0.175
A 1 9.00 9.00 0.05 0.825
B 1 64.00 64.00 0.37 0.559
C 1 1521.00 1521.00 8.82 0.018
D 1 196.00 196.00 1.14 0.318
E 1 36.00 36.00 0.21 0.660
F 1 529.00 529.00 3.07 0.118
G 1 64.00 64.00 0.37 0.559
Error 8 1380.00 172.50
Total 15 3799.00
P value is 0.175 which is more than o.o5 so we reject alternate hypothesis by accepting null hypothesis.
Only significant results are provided by C whose value is 0.018 that is less than 0.05.
Model Summary
13.1339 63.67% 31.89% 56.00%
Model is explaining 63.67% variations of the data as predicted by R square.
Coded Coefficients
Constant 64.75 3.28 19.72 0.000
A 1.50 0.75 3.28 0.23 0.825 1.00
B 4.00 2.00 3.28 0.61 0.559 1.00
C -19.50 -9.75 3.28 -2.97 0.018 1.00
D -7.00 -3.50 3.28 -1.07 0.318 1.00
E 3.00 1.50 3.28 0.46 0.660 1.00
F 11.50 5.75 3.28 1.75 0.118 1.00
G -4.00 -2.00 3.28 -0.61 0.559 1.00
Full Factorial Split-Plot Design

Description
There are total four factors which are temperature, pressure, catalyst charge and concentration.
Among them one is hard to change factor that is temperature. We want check its effect on yield
of a chemical reaction.
Null hypothesis:
There is in significant difference on yield of chemical value based on all four factors.
There is significant difference on yield of chemical value based on all four factors.
Results
Factors: 4 Whole plots: 4
Hard-to-change: 1 Runs per whole plot: 8
Runs: 32 Whole-plot replicates: 2
Blocks: 1 Subplot replicates: 1
Hard-to-change factors: A
Whole Plot Generators: A
All terms are free from aliasing
Split-Plot Factorial Regression: yield versus A [HTC], B, C, D

A[HTC] 1 1.853 1.8528 1.02 0.005
WP Error 2 149.958 74.9791 2.50 0.018
B 1 4.883 4.8828 3.16 0.007
C 1 35.490 35.4903 1.18 0.002
D 1 24.325 24.3253 1.81 0.003
A[HTC]*B 1 38.940 38.9403 1.30 0.274
A[HTC]*C 1 42.090 42.0903 1.40 0.256
A[HTC]*D 1 2.258 2.2578 0.08 0.788
B*C 1 1.758 1.7578 0.06 0.812
B*D 1 37.628 37.6278 1.25 0.282
C*D 1 1.015 1.0153 0.03 0.857
A[HTC]*B*C 1 1.665 1.6653 0.06 0.817
A[HTC]*B*D 1 39.383 39.3828 1.31 0.271
A[HTC]*C*D 1 10.013 10.0128 0.33 0.573
B*C*D 1 0.690 0.6903 0.02 0.882
A[HTC]*B*C*D 1 5.040 5.0403 0.17 0.688
SP Error 14 419.897 29.9926
Total 31
P value is significant in the case of only main effects which is smaller than 0.005 while it increases in the
case of two way interactions so in first case we accept Alternate hypothesis in place of null hypothesis.
Model Summary
S R-sq(SP) S(WP) R-sq(WP)
5.47655 36.86% 2.37135 1.22%
Model is explaining 36% variations of the data as predicted by R square.
Coded Coefficients
Constant 61.59 1.53 40.24 0.001
A[HTC] 0.48 0.24 1.53 0.16 0.005 *
B -0.781 -0.391 0.968 -0.40 0.018 1.00
C -2.106 -1.053 0.968 -1.09 0.007 1.00
D -1.744 -0.872 0.968 -0.90 0.003 1.00
A[HTC]*B -2.206 -1.103 0.968 -1.14 0.274 1.00
A[HTC]*C 2.294 1.147 0.968 1.18 0.256 1.00
A[HTC]*D 0.531 0.266 0.968 0.27 0.788 1.00
B*C -0.469 -0.234 0.968 -0.24 0.812 1.00
B*D 2.169 1.084 0.968 1.12 0.282 1.00
C*D -0.356 -0.178 0.968 -0.18 0.857 1.00
A[HTC]*B*C 0.456 0.228 0.968 0.24 0.817 1.00
A[HTC]*B*D 2.219 1.109 0.968 1.15 0.271 1.00
A[HTC]*C*D 1.119 0.559 0.968 0.58 0.573 1.00
B*C*D -0.294 -0.147 0.968 -0.15 0.882 1.00
A[HTC]*B*C*D -0.794 -0.397 0.968 -0.41 0.688 1.00
Nested ANOVA
Description
Content of Drug Samples Manufactured at Two Sites (3 randomly chosen batches at each site, 5
randomly chosen pills from each of the 6 batches)
Null hypothesis:
All the group means are same
All the group means are not same
Pathway
File---open worksheet- Nested ANOVA data---ANOVA----Nested ANOVA----OK

RESULTS:
Nested ANOVA: response versus Site, Batch
Analysis of Variance for response

Source DF SS MS F P
Site 1 0.0183 0.0183 0.161 0.709
Batch 4 0.4540 0.1135 9.387 0.000
Error 24 0.2902 0.0121
Total 29 0.7625
P value indicates that all group means are same because p value exceeds than our significance value
which 0.05.
S = 0.109962 R-Sq = 61.94% R-Sq(adj) = 54.01%
Variance Components
% of
Source Var Comp. Total StDev
Site -0.006* 0.00 0.000
Batch 0.020 62.65 0.142
Error 0.012 37.35 0.110
Total 0.032 0.180
* Value is negative, and is estimated by zero.
Expected Mean Squares
1 Site 1.00(3) + 5.00(2) + 15.00(1)

2 Batch 1.00(3) + 5.00(2)
3 Error 1.00(3)
Interpretation
The ANOVA table shows no significant Site effect. However, there is a very highly significant Batch effect,
and some investigation as to how to produce more uniform batches may be in order. Notice that Site is
"tested against" Batch and that Batch is tested against Error.If a variance component estimate is less than
zero, Minitab displays what the estimate is, but sets the estimate to zero in calculating the percent of total
variability.
Response Surface methodology

Data description:
There are two factors with one replicate.
1st factor: Temperature
2nd factor: Time
Response: Chemical yield
Pathway: Stat-----------DOE-----Response Surface-------Create Design-------Click Ok-----Same
steps-----Now select analyze design-----add response------click ok
Results:
Factors: 2 Replicates: 1
Base runs: 13 Total runs: 13
Base blocks: 1 Total blocks: 1
Two-level factorial: Full factorial
Cube points: 4
Center points in cube: 5
Axial points: 4
Center points in axial: 0
α: 1.41421
Response Surface Regression: response versus A, B
Model 5 5038.0 1007.6 0.37 0.006
Linear 2 1963.5 981.8 0.36 0.001
A 1 135.0 135.0 0.05 0.031
B 1 1828.6 1828.6 0.67 0.041
Square 2 1668.2 834.1 0.30 0.127
A*A 1 296.2 296.2 0.11 0.342
B*B 1 1518.6 1518.6 0.55 0.281
2-Way Interaction 1 1406.3 1406.3 0.51 0.497
A*B 1 1406.3 1406.3 0.51 0.297
Error 7 19176.8 2739.5
Lack-of-Fit 3 8766.0 2922.0 1.12 0.239
Pure Error 4 10410.8 2602.7
Total 12 24214.8
P value is significant because it is less than 0.05 so we accept altyernate hypothesis

by rejecting null hypothesis.
Model Summary
52.3406 69.81% 60.00% 67.00%

Applied Biostatistics Project

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Applied Biostatistics Project

Enviado por

Direitos autorais:

Formatos disponíveis

APPLIED BIOSTATISTICS PROJECT

ASSIGNMENT# 1 (SEMESTER SPRING-2019)

Submission Date (APRIL 4, 2019)

ZOO-531 (Applied Biostatistics)

MPhil Zoology (A)

Mr. Zaheer Abbas

Applied Biostatics Project

Using frequency to study nominal data:

For qualitative data, frequency tables are formed.

Gender * Smoking ratio Crosstabulation

Gender * Drinktaken Crosstabulation

Value 95% Confidence Interval

group is the same).

a. Predictors: (Constant), Age

Independent sample t test:

Independent Samples Test

Factor, level and treatments:

CRD (Completely Randomized design)

Worksheet column Description Variable Type

 Null hypothesis: All treatment means are equal.

1. Experimental unit- plants

Source DF Adj SS Adj MS F-Value P-Value

Tukey Pairwise Comparisons

Growth hormones N Mean Grouping

Means that do not share a letter are significantly different.

Fertilizers (treatment variable):

4 different types of fertilizers; A, B, C & D were used.

Nitrogen levels (blocking variable):

Plants height (response variable)

5. Experimental unit = Plants

8. Blocking- Nitrogen level

9. Response- Plant height.

 Term Coef SE Coef T-Value P-Value VIF

Model is explaining 98% variations of the data as predicted by R square.

Fractional Factorial Design

Factors: 7 Base Design: 7, 8 Resolution: III

Design Generators: D = AB, E = AC, F = BC, G = ABC

Factorial Regression: yield versus A, B, C, D, E, F, G

Model is explaining 63.67% variations of the data as predicted by R square.

Full Factorial Split-Plot Design

Whole Plot Generators: A

All terms are free from aliasing

Split-Plot Factorial Regression: yield versus A [HTC], B, C, D

Source DF Adj SS Adj MS F-Value P-Value

File---open worksheet- Nested ANOVA data---ANOVA----Nested ANOVA----OK

Nested ANOVA: response versus Site, Batch

Analysis of Variance for response

* Value is negative, and is estimated by zero.

Expected Mean Squares

1 Site 1.00(3) + 5.00(2) + 15.00(1)

Response Surface methodology

Two-level factorial: Full factorial

Response Surface Regression: response versus A, B

P value is significant because it is less than 0.05 so we accept altyernate hypothesis

Você também pode gostar