0 Votos favoráveis0 Votos desfavoráveis

24 visualizações7 páginasSep 04, 2015

© © All Rights Reserved

PDF, TXT ou leia online no Scribd

© All Rights Reserved

24 visualizações

© All Rights Reserved

- The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
- The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
- Girl, Wash Your Face: Stop Believing the Lies About Who You Are so You Can Become Who You Were Meant to Be
- How To Win Friends and Influence People
- Sapiens: A Brief History of Humankind
- Sapiens: A Brief History of Humankind
- Ask Again, Yes: A Novel
- The Rosie Project: A Novel
- Three Women
- Never Split the Difference: Negotiating as if Your Life Depended on It
- Never Split the Difference: Negotiating As If Your Life Depended On It
- Proof of Heaven: A Neurosurgeon's Journey into the Afterlife
- Steve Jobs
- The Secret
- Uglies
- Mindhunter: Inside the FBI's Elite Serial Crime Unit
- Influence: The Psychology of Persuasion
- What She Knew: A Novel
- What She Knew: A Novel

Você está na página 1de 7

discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/257591589

sizes, confidence intervals, and real-world

meaning

ARTICLE in PSYCHOLOGY OF SPORT AND EXERCISE JANUARY 2013

Impact Factor: 1.77 DOI: 10.1016/j.psychsport.2012.07.007

CITATIONS

DOWNLOADS

VIEWS

385

218

4 AUTHORS, INCLUDING:

Andreas Ivarsson

Magnus Lindwall

Halmstad University

University of Gothenburg

20 PUBLICATIONS 64 CITATIONS

SEE PROFILE

SEE PROFILE

Retrieved on: 14 July 2015

journal homepage: www.elsevier.com/locate/psychsport

Review

and real-world meaning

Andreas Ivarsson a, *, Mark B. Andersen b, Urban Johnson a, Magnus Lindwall c, d

a

School of Sport and Exercise Science and the Institute of Sport, Exercise and Active Living, Victoria University, Melbourne, Australia

c

Department of Food and Nutrition, and Sport(s) Science, University of Gothenburg, Sweden

d

Department of Psychology, University of Gothenburg, Sweden

b

a r t i c l e i n f o

a b s t r a c t

Article history:

Received 26 April 2012

Received in revised form

23 July 2012

Accepted 24 July 2012

Available online 1 September 2012

Objectives: The main objectives of this article are to: (a) investigate if there are any meaningful differences between adjusted and unadjusted effect sizes (b) compare the outcomes from parametric and nonparametric effect sizes to determine if the potential differences might inuence the interpretation of

results, (c) discuss the importance of reporting condence intervals in research, and discuss how to

interpret effect sizes in terms of practical real-world meaning.

Design: Review.

Method: A review of how to estimate and interpret various effect sizes was conducted. Hypothetical

examples were then used to exemplify the issues stated in the objectives.

Results: The results from the hypothetical research designs showed that: (a) there is a substantial

difference between adjusted and non-adjusted effect sizes especially in studies with small sample sizes,

and (b) there are differences in outcomes between the parametric and non-parametric effect size

formulas that may affect interpretations of results.

Conclusions: The different hypothetical examples in this article clearly demonstrate the importance of

treating data in ways that minimize potential biases and the central issues of how to discuss the

meaningfulness of effect sizes in research.

2012 Elsevier Ltd. All rights reserved.

Keywords:

Adjusted effect size

Practical signicance

Statistical interpretation

(NHST) is a convention that has been rmly established in research

for many years (Thompson, 2002a). Even if NHST is one of the most

common methods used in sport and exercise psychology to evaluate the impact of, for example, an intervention, several

researchers (Andersen, McCullagh, & Wilson, 2007; Andersen &

Stoov, 1998) have discussed the problems of using only NHST,

because p levels may have little, if anything, to do with real-world

meaning and practical value.

Nakagawa and Cuthill (2007) suggested two shortcomings of

using NHST: (a) this approach does not provide an estimate of the

magnitude of an effect and (b), there is no indication of the precision of this estimate. Nakagawa and Cuthills suggestion means that

a statistically signicant result, by itself, has little to say about

clinical or practical signicance of the effect (Frhlich, Emrich,

P. O. Box 823, 30118 Halmstad, Sverige, Sweden. Tel.: 46 (0) 35 16 74 48; fax: 46

(0) 35 16 72 64.

E-mail address: Andreas.Ivarsson@hh.se (A. Ivarsson).

1469-0292/$ e see front matter 2012 Elsevier Ltd. All rights reserved.

http://dx.doi.org/10.1016/j.psychsport.2012.07.007

Pieter, & Stark, 2009; Jacobson & Truax, 1991). Another criticism

of NHST is that, for some studies, results may be a reection of the

power of the research design, which can be easily manipulated by

changing sample sizes (Kirk, 1996). Henson (2006) presented an

example where the p value in a randomized intervention study was

.051. By adding just one person (under the conditions that the M

and SD stayed the same) into each group, the p value would

decrease to .049. In this case, if one judged the intervention effect

based on just the p value, then the intervention is effective in the

larger study (N 18) and not effective in the smaller study (N 16).

This issue of sample-size inuences on p values in experimental

and correlational designs, and how there may be potential biases

when discussing the value of research ndings, have also been

explored in sport and exercise science (e.g., Andersen & Stoov,

1998; Edvardsson, Ivarsson, & Johnson, 2012). That p values are

sensitive to sample size also inuences review and meta-analytic

studies that are based on NHST results. For example, Hedges and

Olkin (1980) highlighted vote-counting methods as one problematic approach because they often only count the number of studies

that report statistically signicant results when comparing means

for experimental and control groups.

98

about the clinical, meaningful, or practical signicance of results,

many researchers (e.g., Fritz, Morris, & Richler, 2012; Thompson,

2002a) have suggested the need for reporting and interpreting

effect sizes. Also, the Publication Manual of the American Psychological Association (6th ed., APA, 2010) mandates reporting effect

sizes in quantitative research articles to enhance the interpretability of results.

In response to these recommendations, an increased number of

journals are requiring that effect sizes be reported in quantitative

studies. In sport and exercise psychology journals, Andersen et al.

(2007) found that the reporting of effect sizes had increased in

recent years. Specically, of 54 experimental and correlational

studies examined, 44 included effect sizes in the results sections.

Even though the reporting of effect sizes has increased, Andersen

et al. also found that few of the studies with reported effect sizes

(only 7 out of 44; 16%) had interpretations of what those effects

might suggest in terms of real-world meaning.

There are a number of different effect-size indicators (Cohen,

1992) that are based on, for example, correlations, shared variance, standardized differences between means, or degree of overlaps of distributions (Grissom & Kim, 2012). In general, effect sizes

could be classied into two large categories (Rosenthal, 1991). One

of the categories, which contains r and R2 effect sizes, is based on

correlation coefcients (e.g., correlations, regression, structural

equation modeling, multi-level modeling), whereas the other

category, containing Cohens d, Glasss d, and Hedgess g effect sizes,

is based on the standardized mean differences between two groups

(e.g., t tests ANOVAs; Keef & Roberts, 2004). From the correlation

coefcient category, the bivariate r effect size (Pearsons r) is

probably the most widely used. The r effect size indicates the

magnitude of the correlation between two variables (Ferguson,

2009). The formula for calculating the bivariate r effect size is

(Tabachnick & Fidell, 2007):

P

P

P

N XY X Y

r r

P

P

P

P

N X 2 X2 N Y 2 Y2

When discussing the bivariate r effect size, it could also be useful to

highlight two other effect sizes that could be used in correlational

research. These two effect sizes are: point-biserial correlation (rpb)

that should be used if one of the variables in the correlation is

dichotomous, and the phi coefcient (4 or r4) that is used in

correlations with two dichotomous variables (Bonett, 2007). For

calculation formulas for these other correlations see Fritz et al.

(2012).

Probably the most common effect size, when comparing the

standardized mean differences between two groups, is Cohens

d for independent means (Thomas, Nelson, & Silverman, 2005),

which is the difference between the means of two groups

divided by the pooled standard deviation of the two groups

(Cohen, 1988; Nakagawa & Cuthill, 2007). This effect-size indicator is used when the aim is to compare the magnitude of

difference between two conditions. One formula for calculating

Cohens d, when the distributions of both groups meet the

criteria for using parametric tests and group ns are equal

(Rosenthal & Rubin, 2003), is:

M1 M2

d r

SD12 SD22

2

In the formula M1 and M2 are the group means and SD1 and SD2

are the groups standard deviations. Bivariate r and Cohens d can be

(Ferguson, 2009):

1=2

d2 = d2 4

1=2

d 2r= 1 r 2

An additional group of effect sizes that have also been used is based

on odds ratios (OR; odds ratio, relative risk, risk difference). The OR

group of effect sizes are used to compare relative likelihood or risk

for a specic outcome between two or more groups (Ferguson,

2009). The odds ratio is calculated by the formula: AD/BC. The

letters A/B/C/D represent observed cell frequencies when a study

has two groups and two possible outcomes (A group 1/outcome

1; B group 1/outcome 2; C group 2/outcome 1; D group

2/outcome 2; Nakagawa & Cuthill, 2007). The OR effect size could

be converted into r by using Pearsons (1900) formula:

r cos p=1 OR1=2

Our intention with this article is to highlight and discuss a few

practical issues that might occur when using and interpreting effect

sizes. The main purposes of this article are to: (a) investigate if there

are any meaningful differences between adjusted and unadjusted

effect sizes, (b) compare outcomes from parametric and nonparametric effect sizes that may affect interpretations of results,

(c) discuss the importance of reporting condence intervals for

effect sizes in research, and explore how to interpret effect sizes in

terms of practical real-world meaning.

Adjusted vs. unadjusted effect sizes

One issue that is rarely discussed in the sport and exercise

psychology literature is that any effect size can be in one of two

forms, adjusted and unadjusted. The difference between these two

conditions is that in an adjusted effect size the magnitude of the

effect is "corrected" to allow for generalization to the population.

The unadjusted effect size is sample specic and tends to be an

overestimation of the population effect size (Thompson, 2006).

Thompson (2002a) listed three different design issues that will

affect the potential sampling variance: (a) sample size, (b) number

of variables measured, and (c) population effect size. In order to

adjust for the potential sampling variance, several formulas have

been suggested. All formulas have in common that it is the R2 value

that will be adjusted. Probably the most well-known formula for

adjusted R2 was developed by Ezekiel (1930).1 Wang and Thompson

(2007) found that Ezekiels formula, in comparison to other suggested formulas (e.g., Claudys, 1978), provide a better and more

reliable result (in most cases). Ezekiels formula states that an

adjusted standardized difference (d*) could be calculated by using

the unadjusted standardized difference (d). The steps for

this calculation

are: (a)

p

converting the d into an r (using the formula r d= d2 4, (b) squaring the r, (c) using the Ezekiels

formula to calculate the adjusted effect r2* the formula

is r 2* r 2 1 r 2 v=N v 1 where v is the number of

predictor variables), (d) taking the square root of the r2* (i.e., r*) and

then using this value to calculate d* in the formula

d* 2r*1 r 2* 1=2 (Thompson, 2002a). In addition, Ezekiels

formula could be used to correct bivariate r effect sizes (Wang &

Thompson, 2007).

Ezekiels (1930) correction formula is used in SPSS to calculate the adjusted R2.

Table 1

Unadjusted vs. adjusted effect sizes (d) for hypothesized conditions.

Sample size

N

N

N

N

N

N

N

N

20

40

80

160

20

40

80

160

Unadj. d

Adj. d*

Difference (%)

.50

.50

.50

.50

.80

.80

.80

.80

.16

.38

.44

.47

.65

.71

.76

.77

68%

24%

12%

6%

18.75%

11.25%

5%

3.75%

of using adjusted effect sizes instead of unadjusted ones. Ezekiel

(1930), Thompson (2002a), and Wang and Thompson (2007)

highlighted that it is important to use the adjusted ones, especially in studies with small sample sizes. On the other hand, Roberts

and Henson (2002) stated that the differences between adjusted

and unadjusted effect sizes are so close to zero that they have no

practical value. With small samples and small effects, Ezekiels

equations fall apart and sometimes produce negative r2s, which are

not real numbers and are impossible to use to calculate d*. As an

example, let us say that we have a study with a sample of 16

participants. The Cohens d effect size was .20 and the r2 was

therefore .0099. Using Ezekiels correction formula, to adjust R2, the

calculation will be .0099 ((1 .0099)(1/(16 1 1)) .06. This

correction formula is not really any correction at all. A negative R2 is

a fundamental problem because it means that the predictor variable could explain less than 0% of the variance, which is nonsensical. The problem with negative R2 values has been discussed by

Vacha-Haase and Thompson (2004), and Leach and Henson (2007)

suggested replacing the negative Ezekiel values with zeros in the

calculation formula (equal to Cohens d 0), but this suggestion is

not particularly helpful. To illustrate if there are any meaningful

differences between adjusted and unadjusted ds, we have, in

Table 1, used medium (.50) and large (.80) unadjusted ds to illustrate the patterns of changes in adjusted d*s as one increases

sample size with Ns of 20, 40, 80, and 160.

In Table 1, the results from the ctional cases show that conditions with few participants have more biased effect sizes than

studies with more participants. As an example, we can imagine

a study aimed at investigating a cognitive behavioral therapy (CBT)based interventions effects on self-condence in ice hockey. For

this study, we have 20 players, divided into two groups: experimental and control. Before the intervention started the players

were asked to complete a self-condence questionnaire. After 10

weeks of CBT interventions, the players were asked to complete the

same questionnaire as they did before the intervention started.

The calculated unadjusted Cohens d effect size for the postintervention scores is .50, indicating that the intervention positively inuenced the players self-condence. But if we use Ezekiels

formula to correct for sampling variance, the adjusted Cohens d is

.16, a difference of 68%. The adjusted Cohens d for the intervention

effect size is small and probably would not be considered as having

any practical signicance. If we had done the same study, with the

same Cohens d, but with 400 participants in each group (experimental and control) the unadjusted and adjusted ds would be .50

and .49, respectively (2% difference). This example clearly shows

the importance of adjusting for sampling biases when the sample

size is small (as it often is in sport psychology studies).

Effect sizes for parametric vs. nonparametric tests

Another "adjustment" to effect size reporting would be to

determine when to use parametric versus nonparametric formulas.

99

Effect sizes are used to answer the question, "How big is it?" (i.e.,

what is the magnitude of the effect?; Nakagawa & Cuthill, 2007).

As stated previously in this article, there are several different types

of effect sizes. Most effect size estimates have the assumption that

the data are reasonably normally distributed. For differences

between two independent groups when nonparametric tests have

been performed, the value of the z distribution could be used to

calculate the effect size (Fritz et al., 2012). To calculate the effect

rpb) for some

size (in this case the point-biserial correlation

p

nonparametric tests, the formula rpb z= N could be used. In the

formula, z is the z value that would be obtained from performing

a ManneWhitney or Wilcoxon test, or it could be calculated by

hand, and N is the sample size in the study. To use the effect size

estimate rpb to calculate the Cohens d value, the formula

q

2 is used (Fritz et al., 2012). In looking through

d 2r= 1 rpb

the literature, only a few studies have used this formula for

nonparametric tests. Considering the low numbers of articles using

the formula for nonparametric tests, there might be signicant

underestimations of effect sizes and the subsequent interpretations of their practical signicance.

To illustrate with an example, we present a ctional study with

the aim to test a preventive intervention for lowering sport injury

occurrence. In the study design, two groups with equal ns exists,

one intervention and one control, and the outcome variable is

number of injuries per person (e.g., 0, 1, 2). The data are substantially skewed with more than half the participants in the intervention group receiving scores of "0" injuries. The mean and SD for

the intervention group are M .40 and SD .737, and for the

control group M .93 and SD .799. Using these values results in

a Cohens d of .69.

0:93 0:40

d r

:7992 :7372

2

If we use the z value from a ManneWhitneys U test (with the

same data) instead to calculate the Cohens d with the formula

Fritz et al. p(2012)

suggested, the effect

size will

be .79

p

This gives a difference of .1 in Cohens d effect size between the two

formulas. That difference may or may not be meaningful, but in the

realm of injuries, and the personal, nancial, and performance costs,

it may have practical signicance. In this case, using the parametric

instead of the nonparametric effect size will result in an underestimation of the intervention effect (and possibly an underestimation

of the real-world value of the intervention). The same potential bias

that is exemplied above also could be present in correlational

studies (Ferguson, 2009). Therefore, it is of equal importance to use

the appropriate correlational analysis (Pearsons r or Spearmans rho

or rs) in order to not violate the assumptions parametric and

nonparametric effect sizes. Spearmans rs coefcient is the ordinallevel-of-measurement (ranks) equivalent of Pearsons r (Rupinski

& Dunlap, 1996), but the coefcient is, in general when performed

on the same data, smaller than the r coefcient counterpart. Bishara

and Hittner (2012) emphasized the importance of using rs for studies

with small sample sizes and/or with data that are substantially not

normally distributed. As an illustrative example, we suggest a study

aimed to investigate the relationship between physical selfperception and physical tness. The sample is 109 moderately

active adults between 25 and 60 years of age. The calculated mean

for self-perception is13.89 (SD 4.31, skew .42, kurtosis .91)

and for physical tness the mean is 8.94 (SD 4.24, skew 7.69,

kurtosis 3.21). The tness data are not normally distributed.

Correlational analyses, using both Pearsons r and, rs resulted in an

100

the following formula could be used (Rupinski & Dunlap, 1996):

p

r 2 sin* rs

6

The result from this calculation, using an rs of .18, is that the estimated Pearsons r is approximately .188. The result shows a difference between the two Pearsons r coefcients, where the coefcient

calculated from rs is smaller than the effect size that was directly

calculated from the Pearsons r formula. In this case, however, the

difference is not that large (.21e.188 .021).

Interpretations of effect sizes: what is meaningful?

Even though the reporting of effect sizes has increased in the

sport and exercise psychology literature, Andersen et al. (2007)

have suggested that real-world interpretation of what effect sizes

mean is still not common practice. To supply researchers with

conventions for how to interpret effect sizes for differences

between groups, Cohen (1988) suggested three categories: small

(d .20 r .10 OR 1.50), medium (d .50 r

.24 OR 3.50), and large (d .80 r .37 OR 5.10).

But Cohens conventions are just that, only conventions and not

hard rules of thumb. Kraemer et al. (2003) recommended that

researchers should not only use these suggested categories when

discussing the practical value of the study but also consider what

might be a clinically or meaningfully signicant effect in real-world

terms. A small effect, by Cohens conventions, might translate to

outcomes that may have large effects in terms of costs and benets

for the population in question. In discussing the practical value of

an effect size, Vacha-Haase and Thompson (2004) recommended

considering what variables are being measured as well as the

context of the study.

To help researchers to interpret whether a result is meaningful (e.g., clinical signicance, see Thompson, 2002a), several

different statistics have been developed (Fritz et al., 2012;

Kraemer et al., 2003). Three examples are: condence intervals

for effect sizes (CI; Thompson, 2002b), probability of superiority

(PS; Fritz et al., 2012; Grissom, 1994) combined with common

language effect size (Dunlap, 1994; McGraw & Wong, 1992), and

number needed to treat (NNT; Nuovo, Melinkow, & Chang, 2002).

CI describes the interval where most (90% or 95%) of the

participants in a study are located for a specic variable

(Thompson, 2002b) and could be used to interpret the results

from one study with results from other studies (Thompson,

2002b; Wilkinson & The Task Force on Statistical Inference,

1999). CIs for effect sizes, such as Cohens d, at the 95% level,

can be calculated with the formula: 95% CI ES -1.96se to

ES 1.96se (Nakagawa & Cuthill, 2007) where se is the asymptotic standard error of the effect size (to calculate the 90% level

change 1.96se into 1.645). The se value, based on Cohens d effect

size, can be calculated with the formula:

se d

s

n1 n2 1

4

d2

1

8

n1 n2 3

n1 n2

1

se r p

n3

If the 95% or the 90% CI for the ES does not include .0, or a negative

number, then one can be fairly condent that some effect has taken

place. This interpretation reects NHST in that the effect is significant at the p < .05 (for the 95% condence interval) or p < .10 (for

there is a 95% or a 90% chance that the true population effect is

between the lower and upper scores in the CI (Finch & Cumming,

2009). In discussing CIs, it is important to state that CIs are sensitive to violations of normality (Thompson, 2002b) as well as sample

sizes and standard deviations (Finch & Cumming, 2009). This

sensitiveness might lead to inaccurate interpretations of results. It

is important not to use too small or too heterogeneous samples

when calculating the CI.

The second statistic we have chosen to present is the probability

of superiority index (PS; Fritz et al., 2012). PS was developed to give

a percentage of occasions where a randomly chosen participant

from the group with a higher mean will have higher score than

a randomly chosen participant from the other group (if there is

a two-group design; Fritz et al., 2012; Grissom, 1994). The PS could,

when raw data are not available, be calculated from sample means

and variances using the estimator that McGraw and Wong called

the "common language effect size indicator" (CL) (Grissom & Kim,

2012): The formula for calculating the CL, based on a z score is

(McGraw & Wong, 1992):

X 1 X 2

ZCL q

S21 S22

The proportion of the area under the normal curve that is below ZCL

is the CL statistic that one could use to estimate the PS (Grissom,

1994; Grissom & Kim, 2012). As an example, one could have

a study from which the value of the calculated ZCL is 1.0. For

the 1.0 value, the proportion of the area under the normal curve is

estimated to be .84. If the researcher has the data available, the

calculation formula for PS is: PS U/(mn), where U is the Manne

Whitney statistic and m and n are sample sizes for the groups.

As one example how to use the PS, let us say that we have

conducted a study with the aim to increase the participants

general subjective well-being by using a 4-week stress management intervention. The results of the study show an increase in

well-being for the intervention group compared to no change for

the control group with a Cohens d of .75, which is equal to a PS

score of approximately 70 (to obtain the formula for the PS

calculation for different estimates of effect size see Grissom, 1994;

Ruscio, 2008). A PS score of 70 states that if participants were

sampled randomly, one from each of the groups, the one from the

condition with the higher mean (in this example the intervention

group) should be bigger than the one from the experimental group

for 70% of the pairs. Given that subjective well-being is an

important variable, and that 70% of the experimental group reported higher levels of well-being than the control at the end of

the study, these results point to the potential practical value of the

intervention.

A third example of an indicator developed to clarify the practical value of effect sizes is number needed to treat (NNT). The NNT

score is the number of participants who must be treated to give

one more success/one less failure as one outcome of an intervention. The NNT effect size indicator is primarily used in research

with one dichotomous outcome variable (Ferguson, 2009). To

calculate the NNT indicator, the percent of failure cases, in decimal

form, in the experimental group should be subtracted from the

percent of failure cases in the control group. The score from this

calculation is discussed as risk difference (RD). One formula for

calculating the NNT, using the RD score, is 1/RD. In this formula, the

result of 1 is the best NNT score indicating that the treatment is

perfect (i.e., all participants in the experimental group have

improved whereas no participants in the control group have;

Kraemer et al., 2003). In order to illustrate the use of NNT, we will

sport injuries. The study design involves one experimental group

and one control group and the outcome variable for the study is

injury or no-injury during the competitive season. The calculated

effect size (Cohens d) for this ctional study is .38 (smallish). The

percentage of injured athletes in the experimental group was 46%,

whereas 80% of the participants in the control group experienced

at least one injury during the competitive season. The calculated

RD is .80 .46 .34, and the NNT is 1/.34 2.94. The calculated

NNT score indicates that approximately one person out of three

had a benecial outcome due to the intervention. So, is the result

from this example of practical value, even if it showed only

a smallish effect size? To decide we have to consider that injuries

are a common problem in sports and that there are substantial

health, nancial, performances, and happiness costs associated

with them. If we are able to help one out of three athletes who take

part in the intervention, we would argue that it has meaningful

practical value even if the study showed a smallish effect size. This

example also shows the importance of taking the context into

consideration when discussing the meaningfulness of effect sizes

and not simply using Cohens conventions of small, medium, and

large effects.

Summary

The overall aim of this article was to highlight and discuss some

important issues around reporting effect sizes in sport and exercise

psychology research. The different ctional research examples in

this article clearly demonstrate the importance of treating data in

proper ways to minimize potential biases but also how to discuss

the practical value of effect sizes in research. The ctional examples also illustrate three major points. First, our examples suggest

that it is important to use adjusted effect sizes, especially for

studies with small samples and large effect sizes, to avoid overestimations. Second, using parametric effect size formulas for

nonparametric data will often result in possibly misleading effect

sizes with either underestimations of the effects (e.g., for data with

one categorical variable) or overestimations of the effects (e.g., for

correlational data). Given that our hypothetical examples showed

differences between both parametric and non-parametric and

adjusted and non-adjusted effect sizes, choosing the proper

formula is important for interpreting results (e.g., results from

a meta-analysis). For example, meta-analyses are performed to

determine the mean effect size across studies (Iaffaldano &

Muchinsky, 1985), and the results could be biased if the studies,

integrated in the analysis, had positively (or negatively) biased

effect sizes. For researchers conducting meta-analyses, if there is

enough information to adjust unadjusted effect sizes or to use nonparametric calculations to transform what may be biased effect

sizes, then maybe before effect sizes are entered into metaanalyses, the researchers could do their own adjusting of results

rst. Third, the article highlights three indicators (i.e., CI, PS, NNT)

that have been developed to assess effect sizes, and researchers

may want consider the advantages of reporting these effects as

complements in their discussions about how to interpret research

ndings.

References

American Psychological Association. (2010). Publication manual of the American

Psychological Association (6th ed.). Washington, DC: Author.

Andersen, M. B., McCullagh, P., & Wilson, G. (2007). But what do the numbers really

tell us? Arbitrary metrics and effect size reporting in sport psychology research.

Journal of Sport & Exercise Psychology, 29, 664e672.

101

Andersen, M. B., & Stoov, M. A. (1998). The sanctity of p < .05 obfuscates good

stuff: a comment on Kerr and Goss. Journal of Applied Sport Psychology, 10, 168e

173, org/10.1080/10413209808406384.

Bishara, A. J., & Hittner, J. B. (2012). Testing the signicance of a correlation with

nonnormal data: comparison of Pearson, Spearman, transformation, and

resampling approaches. Psychological Methods. http://dx.doi.org/10.1037/

a0028087, Advance online publication.

Bonett, D. G. (2007). Transforming odds ratios into correlations for meta-analytic

research. American Psychologist, 62, 254e255.

Claudy, J. G. (1978). Multiple regression and validity estimation in one sample.

Applied Psychological Measurement, 2, 595e607. http://dx.doi.org/10.1177/

014662167800200414.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).

Hillsdale, NJ: Erlbaum.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155e159. http://

dx.doi.org/10.1037/0033-2909.112.1.155.

Dunlap, W. P. (1994). Generalizing the common language effect size indicator to

bivariate normal correlations. Psychological Bulletin, 116, 509e511. http://

dx.doi.org/10.1037/0033-2909.116.3.509.

Edvardsson, A., Ivarsson, A., & Johnson, U. (2012). Is a cognitive-behavioural

biofeedback intervention useful to reduce injury risk in junior football

players? Journal of Sport Science and Medicine, 11, 331e338.

Ezekiel, M. (1930). The sampling variability of linear and curvilinear regressions:

a rst approximation to the reliability of the results secured by the graphic

"successive approximation" method. The Annals of Mathematical Statistics, 1.

http://dx.doi.org/10.1214/aoms/1177733062, 275e315, 317e333.

Ferguson, C. J. (2009). An effect size primer: a guide for clinicians and researchers.

Professional Psychology: Research and Practice, 40, 532e538. http://dx.doi.org/

10.1037/a0015808.

Finch, S., & Cumming, G. (2009). Putting research in context: understanding

condence intervals from one or more studies. Journal of Pediatric Psychology,

34, 903e916. http://dx.doi.org/10.1093/jpepsy/jsn118.

Fritz, C. O., Morris, P. E., & Richler, J. J. (2012). Effect size estimates: current use,

calculations, and interpretation. Journal of Experimental Psychology, 141, 2e18.

http://dx.doi.org/10.1037/a0024338.

Frhlich, M., Emrich, E., Pieter, A., & Stark, R. (2009). Outcome effects and effect

sizes in sport sciences. International Journal of Sports Science and Engineering, 3,

175e179.

Grissom, J. R. (1994). Probability of superior outcome of one treatment over another.

Journal of Applied Psychology, 79, 314e316. http://dx.doi.org/10.1037/00219010.79.2.314.

Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate

applications. New York, NY: Taylor & Francis.

Hedges, L. V., & Olkin, I. (1980). Vote-counting methods in research synthesis. Psychological Bulletin, 88, 359e369. http://dx.doi.org/10.1037/0033-2909.88.2.359.

Henson, R. K. (2006). Effect-size measures and meta-analytic thinking in counseling

psychology research. The Counseling Psychologist, 34, 601e629. http://

dx.doi.org/10.1177/0011000005283558.

Iaffaldano, M., & Muchinsky, P. M. (1985). Job satisfaction and job performance:

a meta analysis. Psychological Bulletin, 97, 251e273.

Jacobson, N. S., & Truax, P. (1991). Clinical signicance: a statistical approach to dening

meaningful change in psychotherapy research. Journal of Consulting and Clinical

Psychology, 59, 12e19. http://dx.doi.org/10.1177/00343552060500010501.

Keef, S. P., & Roberts, L. A. (2004). The meta-analysis of partial effect sizes. British

Journal of Mathematical and Statistical Psychology, 57, 97e129. http://dx.doi.org/

10.1348/000711004849303.

Kirk, R. (1996). Practical signicance: a concept whose time has come. Educational

and Psychological Measurement, 56, 746e759. http://dx.doi.org/10.1177/

0013164496056005002.

Kraemer, H. C., Morgan, G. A., Leech, N. L., Gliner, J. A., Vaske, J. J., & Harmon, R. J.

(2003). Measures of clinical signicance. Journal of the American Academy of

Child and Adolescent Psychiatry, 42, 1524e1529. http://dx.doi.org/10.1097/

01.chi.0000091507.46853.d1.

Leach, L. F., & Henson, R. K. (2007). The use and impact of adjusted R2 effects in

published regression research. Multiple Linear Regression Viewpoints, 33, 1e11.

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic.

Psychological

Bulletin,

111,

361e365.

http://dx.doi.org/10.1037/00332909.111.2.361.

Nakagawa, S., & Cuthill, I. C. (2007). Effect size, condence interval and statistical

signicance: a practical guide for biologists. Biological Reviews, 82, 591e605.

http://dx.doi.org/10.1111/j.1469-185X.2007.00027.x.

Nuovo, J., Melnikow, J., & Chang, D. (2002). Reporting number needed to treat and

absolute risk reduction in randomized controlled trails. Journal of the American

Medical Association, 287, 2813e2814. http://dx.doi.org/10.1001/jama.287.21.2813.

Pearson, K. (1900). Mathematical contributions to the theory of evolution. VII. On

the correlation of characters not quantitatively measurable. Philosophical

Transactions of the Royal Society of London. Series A, Containing Papers of Mathematical or Physical Character, 195, 1e47.

Roberts, K. J., & Henson, R. K. (2002). Correction for bias in estimating effect sizes.

Educational and Psychological Measurement, 62, 241e252. http://dx.doi.org/

10.1177/0013164402062062002003.

Rosenthal, R. (1991). Meta-analytic procedures for social research. Newbury Park, CA:

Sage.

Rosenthal, R., & Rubin, D. B. (2003). Requvalent: a simple effect size indicator. Psychological

Methods, 8, 492e496. http://dx.doi.org/10.1037/1082-989X.8.4.492.

102

correlations from Kendalls tau and Spearmans rho. Educational and

Psychological

Measurement,

46,

419e429.

http://dx.doi.org/10.1177/

0013164496056003004.

Ruscio, J. (2008). A probability-based measure of effect size: robustness to base

rates and other factors. Psychological Methods, 13, 19e30. http://dx.doi.org/

10.1037/1082-989X.13.1.19.

Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston:

Pearson Education, Inc.

Thomas, J. R., Nelson, J. K., & Silverman, S. J. (2005). Research methods in physical

activity. Champaign, IL: Human Kinetics.

Thompson, B. (2002a). "Statistical", "practical", and "clinical": how many kinds of

signicance do counselors need to consider? Journal of Counseling and Development, 80, 64e71. http://dx.doi.org/10.1002/j.1556-6678.2002.tb00167.x.

Thompson, B. (2002b). What future quantitative social science research could look

like: condence intervals for effect sizes. Educational Researcher, 31, 25e32.

http://dx.doi.org/10.3102/0013189X031003025.

Thompson, B. (2006). Foundations of behavioral statistics: An insight-based approach.

New York, NY: Guilford Press.

Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret various

effect sizes. Journal of Counseling Psychology, 51, 473e481. http://dx.doi.org/

10.1037/0022-0167.51.4.473.

Wang, Z., & Thompson, B. (2007). Is the Pearson r2 biased, and if so, what is the best

correction formula? Journal of Experimental Education, 75, 109e125. http://

dx.doi.org/10.3200/JEXE.75.2.109-125.

Wilkinson, L., & , The Task Force on Statistical Inference. (1999). Statistical methods

in psychology journals: guidelines and explanations. American Psychologist, 54,

594e604. http://dx.doi.org/10.1037/0003-066X.54.8.594.

- Religiosity and Mental HealthEnviado porFlorencia Quiroga
- 71206904 Greenberg 2001 EmpathyEnviado porAnonymous oAGumZo
- Effects of Tapering on Performance a Meta-AnalysisEnviado porxcite4ever
- 2010 Child Sexual Abuse Prevention Programs a Meta AnalysisEnviado porLeonor Carneiro
- Effectiveness Study Article on psychodynamic psychotherapyEnviado porRiham Ammar
- 56-220-1-PBEnviado porAidaruzi 35
- Uttal Et Al2013 Malleability of Spatial Skills Meta AnalysisEnviado porJesus Jimenez
- Descriptive StatisticsEnviado porsamuel_sahle
- 15!!!-13, 9.pdfEnviado porAriel Ardinda
- Vis Guided Pointing Review StudiesEnviado porAni Dalal
- Educational Neuromyths Among Teachers in Latin AmericaEnviado porRaúl Verdugo
- ScienceEnviado porNiken Resminingpuri
- Chapter 2 RmEnviado porDokter Mazlan
- Why Don’t We Practice What We Preach?Enviado porapi-26768155
- Gender Similarities in RFD and TT in CMJEnviado porkrayfield
- qwertyuiopEnviado porAngelica Joy Abad
- Journal of Marketing Education 2008 Reardon 12 20Enviado porChris Hunter
- buenoEnviado pormemo
- Falha - 5 - Concurrent training in elderly.pdfEnviado poredson
- Effect of Duration and Amplitude of Direct Current when Lidocaine Is Delivered by IontophoresisEnviado porAilen Castro
- Adhd JournEnviado porrugu
- 15 metaanalytic studiesEnviado pormubshrasattar
- Cheklist Meta AnaliysisEnviado porMoceh Arip
- Cr d 42018095707Enviado porieo
- Efficacy of Word Within the WordEnviado porAna-Maria Crisan
- Journal AbstractEnviado porNaufal Hassan Tamim
- Vitamin D for Prevention of Respiratory Tract Infections_ a Systematic Review and Meta-AnalysisEnviado porOscar Reyes II
- JCEM Vitamin D and MuscleEnviado porIsarosae
- Meta-Analysis in Medical Research - Hippokratia-14-29Enviado porDoc Ahead
- Donahue 2009Enviado porAhmadTharmiziDamit

- docu87662_Avamar-7.5.1-for-VMware-User-Guide.pdfEnviado porMiguel83cr
- 134_Sample_Chapter.pdfEnviado porPrince Sharma
- Tutorial & AssignmentEnviado porvsureshkannanmsec
- AnandEnviado porNandani Candy Dcosta
- BOA Spot Quick Start Guide v20160126Enviado porArturo Reyes Carnero
- ATV71 Communication Parameters V27Enviado porspfeifen
- Fuzzy LogisticsEnviado porNaman Goyal
- Tiling Windows in Unity _ S3hh's BlogEnviado porAnonymous csowfEheO
- 8 Pillars of Prosperity - Jim AllenEnviado porReynaldo Estomata
- ELE518Enviado porAmardeep Singh Virdi
- Google Translate - Wikipedia, The Free EncyclopediaEnviado porsatadal902009
- Physics ProjectEnviado porNimy Suzan
- Direction Finding - WikipediaEnviado porJuan Sin Miedo
- PETA #2 ORAL COMM (Students' Copy).docxEnviado porMarielle Maranan
- PERDEV REVIEWEREnviado porSofia
- AP American Government: Chapter One: The Study of American GovernmentEnviado porirregularflowers
- USE OF LOW-INVASION CORING TECHNIQUE FOR ESTIMATION OF RESERVES DEPLETION IN THE FIELD SAMOTLOREnviado porari_si
- Analytical Ultracentrifugation Techniques and Methods.pdfEnviado porJesse Haney III
- Advanced Quality ManualEnviado porYemi Victor Atteh
- Article on Image Building of ArmyEnviado porsarvu13
- Performance Analysis of PAPR Reduction Techniques Based on PTS and GA-SLMEnviado porseventhsensegroup
- 905356750X.amsterdam.university.press.shooting.the.Family.transnational.media.and.Intercultural.values.apr.2005Enviado porpais12
- Applications of DSPEnviado porRajkeen Chamanshaikh
- Money Five ThesesEnviado porLuke Vanderlinden
- RHtests DlyPrcp UserManual 14Aug2013Enviado porRaul Tyv
- Graha StrengthEnviado porpriteshbamania
- CEC403A Assignment 2016Enviado porYASHWANTH R
- ideapad s9e s10 s10eEnviado porSam Ho
- RISA - Direct Analysis Method for AISC 13th Edition.pdfEnviado porYork Zeng
- Pharmaco VigilEnviado porvaamdevaa

## Muito mais do que documentos

Descubra tudo o que o Scribd tem a oferecer, incluindo livros e audiolivros de grandes editoras.

Cancele quando quiser.