Escolar Documentos
Profissional Documentos
Cultura Documentos
deviation.
And here I'm writing this as the standard
deviation sub pooled Okay.
And there's lots of ways to, compute a
value for this notion of pooled.
One simple way is to just take the
standard deviation of the control group.
And, you know, one by one arguing here,
hey, the variation's supposed to be the
same.
Because the sample you know your, the two
groups are drawn from the same
population.
Okay another way is to compute, you know,
the actual pool, pool standard deviation.
Which is an expression that looks like
this.
So, we're not going to be going into much
more detail than that.
But just to point out that the overall
notion is pretty simple and that its Not
unreasonable to use a straightforward
definition of, the standard deviation.
Okay, so what's a, what's a big effect
size?
Well because it is standardized, you
could, you could actually reason about
what might be big and what might be
small.
So this is again, one of these cases
where it sort of made up one the fly but
you can one heuristic by Jacob Cohen is a
small effect size is 0.2, a medium effect
size is 0.5 and a large effect size is
.8.
So for, you know.
Remember, this is dividing by the
standard deviation, so this is sort of
for every.
Bit of difference in mean.
How much variance are you accounting for?
Okay.
Finally, I'm not going into too much
detail about confidence interval but I
want to mention that the confidence
interval of effect size, just to give you
the intuition for this.
And advantage here is that these are
maybe easier to interpret in terms of
actual decision making.
So what does a 95% confidence interval of
the effect size mean?
Well, it means that we've repeated the
experiment 100 times, we expect that that
interval would include this particular
effect size measured in this experiment
95 out of 100 times.
A corelator of this is that if that
interval include, if the 95% confidence
interval includes 0.0, that's equivalent
heteroskedasticity, okay.
And so this is when the variance itself.
is not constant.
Alright, so here as an example the
variance is high.
The variance is high over here and low in
the middle, and high again over here.
Now, this is not necessarily a problem.
There are ways to correct for this.
But it's not necessarily a problem
because the estimates that you'll
generate are still unbiased.
Okay.
But it can increase overall error
estimates leading to a reduction
statistical power.
Right?
So, you end up with these really high
error numbers because all these guys
count against you.
Right?
Even though you're actually doing a
pretty good job in predictions.
So to say how this, how this plot was
generated.
This was again simulated data.
Where, you know, I intentionally sort of
varied the variants along the, along the
x axis.
Okay.
So we chose, chose some x values.
And then, and then sampled y values
according to some distribution that
varies in this way.
And here, I just took the exact same x
values, but repeated the sampling of the
y values many, many, many times.
So it gave me this clearer spread.
But you can see these, these solid bars
are because the same x values were used
across all the experiments.
Okay?
And the point here is that drawing the
regression line over and over again it
didn't change all that much.
Okay?
Right, we didn't get anything that looked
like this for example.
So again, if you, the problem here is
that you might increase the air and you
might lose statistical power because it
overlooks a real effect.