Escolar Documentos
Profissional Documentos
Cultura Documentos
Table of Contents
Summary .............................................................................................................................................. 1 Introduction .......................................................................................................................................... 1 Data ...................................................................................................................................................... 2 Methods ................................................................................................................................................ 3 Results .................................................................................................................................................. 4 Discussion ............................................................................................................................................ 6 Conclusion ........................................................................................................................................... 7 Appendix I............................................................................................................................................ 8 Appendix II .......................................................................................................................................... 9
1 2 3 4 5 6 7 8 9 10 11
Summary
Different
enzymes
improve
textile
washing,
and
it
is
of
interest
to
know
performance
under
various
conditions.
The
performance
of
five
enzymes
was
measured
as
the
amount
of
protein
removed
from
a
surface
when
exposed
to
the
factors
hardness
and
detergent.
We
construct
a
model
using
variance
analysis
that
accounts
for
90%
of
the
variance
in
the
data.
We
show
that
hardness
of
water
has
no
significant
influence
on
enzyme
performance,
and
that
addition
of
detergent
yields
the
highest
protein
removal.
Adding
more
protein
to
samples
also
has
an
effect
on
protein
removal,
but
the
effect
is
saturating.
Using
the
15
nM
enzyme
concentration
and
addition
of
detergent
shows
that
protein
A
removes
proteins
of
surfaces
significantly
better
than
the
other
proteins
included.
We
conclude
that
addition
of
detergent,
concentration
of
enzymes
and
enzyme
type
is
of
significant
importance
in
stain
removal
processes.
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Introduction
Efficient
protein
removal
is
of
vital
importance
in
the
textile
cleaning
industry.
Enzymatic
activity
enhances
this
process
by
means
of
catalyzation.
It
is
known
that
factors,
such
as
detergent
and
water
hardness
affect
this
catalyzation
rate,
and
it
is
therefore
of
interest
to
quantify
these
effects.
These
factors
are
some
of
the
conditions
that
appear
in
normal
laundry
wash
processes.
To
optimize
the
effect
of
removing
certain
stains
on
textile
surfaces
the
effect
is
analyzed
by
means
of
a
laboratory
experiment.
The
experiment
measures
how
much
protein
is
removed
from
a
surface
with
Surface
Plasmon
Resonance
technology
(SPR),
which
is
a
biosensor
that
measures
a
resonance
signal
on
a
gold
surface,
and
translates
it
into
a
protein
removal
response.
The
experiment
is
conducted
using
5
different
proteins
under
different
conditions.
The
conditions
are
addition
of
detergent,
hard
and
soft
water,
and
a
range
of
enzymatic
concentrations.
A
replication
of
each
experiment
yields
a
grand
total
of
160
different
samples.
We
wish
to
use
statistical
methods
to
examine
differences
and
trends
in
the
samples.
The
aim
of
the
analysis
is
to
describe
and
compare
the
performance
of
the
enzymes
and
how
different
conditions
affect
enzyme
performance.
This
analysis
includes
factorial
interactions,
determining
whether
the
day
to
day
measurements
are
reliable
and
testing
robustness
of
the
experimental
set-up.
1|Page
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Figure 1: A) Boxplot of protein response with detergent+ and detergent0. Detergent plus alone seems to have a positive effect on the response. B) The response on hardness of water. There does not seem to be a correlation between hardness and response.
Data
The
dataset
consists
of
160
samples,
with
80
different
experimental
combinations.
The
enzymes
were
labeled
A,
B,
C,
D
and
E
and
analyzed
one
at
a
time,
with
the
experiment
running
for
2
consecutive
days.
No
enzymes
were
run
the
same
day,
so
each
specific
enzyme
is
labeled
with
a
single
date.
After
the
experiment
has
run,
the
result
is
sampled
randomly
into
a
variable
called
cycles.
The
output
is
response
which
is
given
as
the
amount
of
protein
removed
in
RU
(10-6
g
m-2).
The
factors
included
in
the
data
are
hardness
of
water
and
addition
of
detergent.
Both
these
variables
are
binary
and
denoted
De+,
De0,
Ca+
and
Ca0,
for
detergent
and
calcium
respectively.
Figure
1
shows
the
log10(Response)
together
with
detergent
and
hardness.
The
enzyme
concentrations
were
tested
under
4
different
levels:
0
nM,
2.5
nM,
7.5
nM
and
15
nM.
Increasing
the
concentration
of
enzymes
in
the
samples
also
increases
the
response.
There
is
some
indication
in
the
data
that
the
enzyme
catalyzation
gets
saturated
at
the
high
concentrations
(Figure 2).
This
may
potentially
yield
two
problems:
fitting
the
data
as
a
linear
model
gives
more
imprecise
results
at
high
concentrations,
and
Figure 2: Response vs concentration of the 5 different enzymes. Black = 0 nM, Red = 2.5 nM, Green = 7.5 nM and Blue = 15 nM. Increasing concentration increases the response, but there is some saturation towards the high concentrations.
extrapolation beyond existing data points is very uncertain. A summary of all data included in our analysis can be seen in Table 1. 2|Page
53 54 55
Table 1: Table of variables and response variable in the model. With Detergent (Det+) or without (Det0). Hardness: with (Ca+) or without (Ca0)
Variable Name Variables Run Date Cycle Enzyme Type Enzyme Concentration Detergent Hardness Output Response
Mean
Levels 10 34 5
6.3
2 2
434.3
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
Methods
Our first approach was to assume the level of response is given as a function of the various factors and enzymes. Initially we assumed that there was no bias in the response measurements for neither cycle nor days. Concentration was used as a continuous variable, as this allowed us to interpolate between levels, if this is needed in the future. An ANCOVA analysis of data was conducted using two-way interactions. The equation for ANCOVA is
!" = ! + ! ! + !"
(1)
Where Y is the Response variable, i is the intercept of the model with factor i. is the slope in the model and Xj are the observations in the data. is noise. When subsetting data and only test for one factorial variable we use a regular ANOVA !" = ! + !" (2)
The assumptions for these models are that residuals are independent, have constant variance and are normal distributed. To verify this we inspected the fit with residuals, a QQ-normality plot, and Cooks distance for the observations (Appendix 1). These figures revealed some inconsistencies in variance. In order to deal with this we log transformed the response variable before making the model. This gave acceptable distributions, with the exception of a single outlier: Enzyme B, 3|Page
75 76 77 78
concentration = 0, Det0 and Ca0 looks biased in the response, and was thus removed from the data set. All statistical analysis was performed using R 2.14.1, scripts and analysis are documented in Appendix II.
79 80 81 82 83 84 85 86 87 88 89 90
Results
We tested a multi-way ANCOVA model with log-response as a function of the continuous variable concentration, and the 3 categorical variables Enzyme, Detergent, Hardness and all their two way interactions. The model showed that Hardness had no significant effect (p > 0.29) on response. After removing Hardness as a factor the model showed that the two-way interaction between Enzyme type and Enzyme concentration had no effect either (p > 0.30). This concludes that detergent, enzyme type and concentration significantly influence protein removal. The sum of squares value shows that the most important factor to remove protein is detergent. The final model explains 89% of the variation in the data. P-values and a summary of the model are shown in table II.
Table 2: Results from the ANCOVA model including Enzyme type, Enzyme Concentration and Detergent. The interactions are denoted with a :. Significance levels: *** p < 0.001, ** p < 0.01, * p < 0.5.
Degrees of Sum of P-value freedom squares 4 1 1 4 1 3,932 15,611 31,327 0,711 2,768 1,33E-14*** 2,33E-27*** 1,31E-44*** 3,54E-03** 3,61E-13***
91 92 93 94 95 96 97 Secondly we wanted to determine whether or not the interactions where the same for all enzymes included in the model. As we already concluded that detergent significantly influenced data, we used a more graphical approach. Figure 3 shows that some of the enzymes react slightly different to addition of detergent. As the difference between the curves are not the same with and without detergent implies that not all the proteins interact the same way with the factor. Enzyme B is apparently a little less efficient relative to the others when detergent is added.
4|Page
Figure 3: log10 response vs enzyme concentration for the 5 different enzymes. A) with detergent added to the sample, B) without detergent added. Structural differences between the two plots suggest stronger interactions between some enzymes with detergent than others.
The next step was to investigate if one enzyme was significantly removing proteins better than the others. To do so we split up the data set into a smaller group, with the optimal conditions for protein removal; with detergent added and 15nM enzyme added. With an ANOVA-test we were able to determine that enzyme A was the best enzyme with a mean protein removal of 1527.3 RU, with a possible minimum of 1391.7 and a maximum of 1676.2 on a 95% confidence interval. In comparison the worst enzyme was D with a mean protein removal of 663.5 RU. The summary of all the enzymes can be seen in Table 3, which also states the p-values compared to enzyme A.
Table 3 The 5 enzymes compared to enzyme A with 15nM added. ANOVA shows that all the other enzymes are significantly different from A. Given is also the mean and the 95% confidence interval. Significance levels: *** p < 0.001, ** p < 0.01, * p < 0.5.
109 110 111 112 113 114 5|Page Analysis of experimental bias To investigate if there was any systematic error in the data sampling, we also tested the response compared to the random variable, cycle. Furthermore we tested day to day variation by testing the samples which has 0 nM concentration of enzymes against one another.
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130
To test if cycle sampling influenced the deterioration of proteins we made an ANOVA with cycle as explanatory variable. As all cycles are sampled randomly, there should be no significant difference between the cycles. This turns out to be true as the difference between cycles is insignificant (p > 0.47). The day-to-day runs were explored by an ANCOVA analysis, using hardness, Enzyme type and detergent as explanatory variables, and with concentration 0. The analysis shows that enzyme types are significantly different (p < 0.0013). As the samples should be equal (no enzyme added), this strongly indicates experimental bias in the setup, or sampling error at the day enzyme B was analyzed. Using concentration as factor instead of continuous variable Enzyme concentration was given at four different levels, which yielded the possibility of using the variable as a factor or as a continuous variable. In the previous section we have used it as a factor, in order to be able to extrapolate the model. Making a model with concentration as a factor, and comparing it to a model with concentration as a continuous variable , and comparing them with an ANOVA shows that using concentration as a factor gives a significantly better fit (p < 1.31E-21). The implication and possible solutions of this are evaluated in the discussion.
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
Discussion
The few data points for each group give large statistical uncertainty in the validation of the data, and therefore the credibility of the results. More data would have given us more samples to do the statistically analysis and check for errors within the different groups. Notably the sample or experimental bias within day B yields high uncertainty. With the data set available it would be reasonable to adjust data to an expected mean with 0 enzyme added. The addition of the reference samples would also be helpful in this particular problem. In order for our analysis to be valid in terms of hardness and detergent in the water, we assume that the concentrations of these two factors are the same in all observations. Concentration was used as a continuous variable instead of a factor, in spite of the fact that using it as a factor gave a significantly better model. This was done in order to be able to interpolate between experiments with different concentration. In order to make a better model it should be considered to make a model more appropriate than a linear fit for enzyme kinetics, which apparently looks more like a Michaelis-Menten saturation model. If the main purpose of the experiment is to determine the catalytic capabilities of enzymes, the addition of less detergent would be appropriate, as it turns out to be the main factor explaining protein degradation from surfaces. 6|Page
Future work with the data would include a more thorough analysis of interactions with the individual enzymes. All five types of enzyme have a significant reaction with detergent, as well as with concentration, but to what degree is important knowledge.
151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
Conclusion
We conclude that hardness does not have an effect on protein degradation in our experiment. Oppositely detergent, enzyme concentration and enzyme type has a highly significant positive effect. Their interactions are also significant, with the interaction between concentration and enzyme type omitted. We also conclude that enzyme A has the highest catalytic capabilities under optimal conditions, and that enzyme D is the worst enzyme for removing proteins.
7|Page
179 180
Appendix
I
Figure
1
shows
the
residuals
of
the
log
transformed
analysis.
8|Page
189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228
Appendix
II
R-code used to calculate statistics in the report.
# Script for case 1 detergent
################################# Load and view data ########################### setwd("~/My Dropbox/02441 Applied Statistics and Statistical Software/Case_1") # Set working directory graphics.off()
#################################### Manipulate data for cleaner analysis ######################### mydata$Cycle <- factor(mydata$Cycle, levels = 1:34) # Put cycles conscutively mydata$lResponse <- log10(mydata[,3]) # Log response for equal variance mydata <- mydata[-14,] # Remove outlier 14 mydata$EnzymeConcF <- factor(mydata$EnzymeConc ,levels = c('15','7.5','2.5','0')) # Make Enzyme a factor for extra model determination ##################################### Make a linear model of all variables ####################### fm <- lm(lResponse ~ (Enzyme+EnzymeConc+DetStock+CaStock)^2 , data = mydata) m <- summary(fm) # See and assign summary anova(fm) # Make ANCOVA to see #### Leave calcium out of the final model, also the interaction betweeen EnzCon and Enz #####
fm1 <- lm(lResponse ~ (Enzyme+EnzymeConc+DetStock)^2 - Enzyme:EnzymeConc, data = mydata) summary(fm1) an <- anova(fm1) an ##### Use concentration as a factor to compare the two models ######## fmfac <- lm(lResponse ~ (Enzyme+EnzymeConcF+DetStock)^2-Enzyme:EnzymeConcF, data = mydata) summary(fmfac) aF <- anova(fmfac) aF anf <- anova(fm1,fmfac, test = 'F') anf #### Using Concentration as a factor describes data significantly better #######################
#### Show graphically that the data have equal variance and are normally distributed ####
9|Page
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
################################ See the response as a function of enzyme concentration ##### graphics.off() cols <- 1:5 plot(lResponse ~ EnzymeConc, data = mydata, type = 'n') points(lResponse ~ EnzymeConc, data = mydata,subset = mydata[,4] == 'A', col = cols[1]) #abline(afm[1], afm[2]) points(lResponse ~ EnzymeConc, data = mydata,subset = mydata[,4] == 'B', col = cols[2]) points(lResponse ~ EnzymeConc, data = mydata,subset = mydata[,4] == 'C', col = cols[3]) points(lResponse ~ EnzymeConc, data = mydata,subset = mydata[,4] == 'D', col = cols[4]) points(lResponse ~ EnzymeConc, data = mydata,subset = mydata[,4] == 'E', col = cols[5]) ############################ Looks as a saturating function #################################
########################## Test if the days differ with 0 protein added ##################### fm0 <- lm(lResponse ~ (Enzyme+CaStock+DetStock)^2, data = mydata ,subset = mydata[,5] == '0') summary(fm0) anova(fm0) ############################ There is significantly differences between the days with 0 enzyme ###
########################## See if the enzymes differ under optimal conditions #################### mydatadet <- mydata[mydata[,6] == 'Det+',] # Find all the data with Det+ fmHigh <- lm(lResponse ~ Enzyme -1, data = mydatadet, subset = mydatadet[,5] == '15') # Make a linear model, and get all intercepts (remove -1 for more usable model). Subset 15 nM p <- summary(fmHigh) # Put summary into a variable ints <- confint(fmHigh) # Calculate the 95% confidence anova(fmHigh) # There is difference among enzymes mens <- fmHigh$coefficient # Retrieve means
####################### Do the enzymes interact the same way between effects ######### enzA <- mydata[mydata[,4] == 'A',] fmA <- lm(lResponse ~ (DetStock + EnzymeConc)^2, data = enzA) summary(fmA) anova(fmA)
10 | P a g e
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310
enzC <- mydata[mydata[,4] == 'C',] fmC <- lm(lResponse ~ (DetStock + EnzymeConc)^2, data = enzC) summary(fmC) anova(fmC)
enzD <- mydata[mydata[,4] == 'D',] fmD <- lm(lResponse ~ (DetStock + EnzymeConc)^2, data = enzD) summary(fmD) anova(fmD)
enzE <- mydata[mydata[,4] == 'E',] fmE <- lm(lResponse ~ (DetStock + EnzymeConc)^2, data = enzE) summary(fmE) anova(fmE) ############################ They are all significantly correlated with both effects ########
graphics.off() cols <- 1:4 windows(width = 3.5,height= 2) par(mar=c(2.5,4.5,0.8,0.5)) dat <- tapply(mydata$lResponse,list(mydata$EnzymeConc,mydata$Enzyme),mean) barplot(dat, beside = TRUE, xlab = 'Enzyme' , ylab = 'log10 Response', col = cols, ylim = c(0,3)) abline(0,0) # Bar plot of concentration vs response for all 5 enzymes
#################### Detergent figures ####################################### mydatadet <- mydata[mydata[,6] == 'Det+',] lvl <- levels(mydata$EnzymeConcF) enZ <- levels(mydata$Enzyme) Detline <- matrix(0,4,5) for (i in 1:4)
11 | P a g e
311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351
mydatadet <- mydata[mydata[,6] == 'Det0',] Detline2 <- matrix(0,4,5) for (i in 1:4) for (j in 1:5){{ sub1 <- mydatadet[mydatadet[,5] ==lvl[i],] Detline2[i,j] <- mean(subset(sub1$lResponse,sub1[,4] == enZ[j])) }} rownames(Detline2) <- lvl colnames(Detline2) <- enZ
graphics.off() windows(width=7,3.5) par(mfrow=c(1,2)) par(mar= c(4,4,2,1)) plot(lResponse ~ EnzymeConc, data = mydatadet , type = 'n', main = 'With detergent', ylim = c(0.5,3.4), xlab = 'Enzyme Concentration (nM)', ylab = 'log10 Response (RU)') for (i in 1:5) { points(lvl,Detline[,i], type = 'o', col = i, pch = 16) } legend('bottomright',enZ, fill = 1:5, horiz=TRUE, cex = 0.8, title = 'Enzyme') text(0.2,3.3, labels = 'A')
plot(lResponse ~ EnzymeConc, data = mydatadet , type = 'n', main = 'Without detergent', ylim = c(0.5,3.4), xlab = 'Enzyme Concentration (nM)', ylab = '') for (i in 1:5) { points(lvl,Detline2[,i], type = 'o', col = i, pch = 16) } text(0.2,3.3, labels = 'B') ## Detergent plots!
########### Make a box plot of response vs addition of detergent and hardness ################
12 | P a g e
352 353 354 355 356 357 358 359 360 361 362 363
13 | P a g e