Você está na página 1de 21

Educational and Psychological Measurement

http://epm.sagepub.com The Balanced Inventory of Desirable Responding (BIDR): A Reliability Generalization Study
Andrew Li and Jessica Bagger Educational and Psychological Measurement 2007; 67; 525 DOI: 10.1177/0013164406292087 The online version of this article can be found at: http://epm.sagepub.com/cgi/content/abstract/67/3/525

Published by:
http://www.sagepublications.com

Additional services and information for Educational and Psychological Measurement can be found at: Email Alerts: http://epm.sagepub.com/cgi/alerts Subscriptions: http://epm.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations http://epm.sagepub.com/cgi/content/refs/67/3/525

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

The Balanced Inventory of Desirable Responding (BIDR)


A Reliability Generalization Study
Andrew Li
University of Arizona

Educational and Psychological Measurement Volume 67 Number 3 June 2007 525-544 2007 Sage Publications 10.1177/0013164406292087 http://epm.sagepub.com hosted at http://online.sagepub.com

Jessica Bagger
California State University, Sacramento
The Balanced Inventory of Desirable Responding (BIDR) is one of the most widely used social desirability scales. The authors conducted a reliability generalization study to examine the typical reliability coefcients of BIDR scores and explored factors that explained the variability of reliability estimates across studies. The results indicated that the overall BIDR scale produced scores that were adequately reliable. The mean score reliability estimates for the two subscales, Self-Deception Enhancement and Impression Management, were not satisfactory. In addition, although a number of study characteristics were statistically signicantly related to reliability estimates, they accounted for only a small portion of the overall variability in reliability estimates. The results of these ndings and their implications are also discussed. Keywords: BIDR; reliability generalization; impression management; self-deception

n emerging trend in employment selection is the increasing reliance on personality inventories to screen job applicants. The popularity of personality inventories as a selection tool is in part the result of a number of large-scale studies and meta-analyses that have presented impressive evidence supporting the utility of personality constructs in predicting work performance (Barrick & Mount, 1991; Hough, Eaton, Dunnette, Kamp, & McCloy, 1990; Salgado, 1997; Tett, Jackson, & Rothstein, 1991). Despite its widespread acceptance, employers and human resources practitioners frequently express concerns that job applicants responses to personality inventories may be tempered by intentional distortion or socially desirable responses (Hogan & Hogan, 1992).

Authors Note: This article was presented at the 2005 Annual Conference of the Society for Industrial and Organizational Psychology in Los Angeles. We thank Tammi Vacha-Haase, Russell Cropanzano, Judith Rein, Joel Levin, and participants of the Management & Policy Brownbag for their many helpful comments on the earlier versions of this article. Correspondence concerning this article should be addressed to Andrew Li, University of Arizona, Department of Management and Policy, Tucson, AZ 85721-0108; e-mail: andrew@eller.arizona.edu. 525
Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

526

Educational and Psychological Measurement

Social desirability has been dened as the tendency to endorse items in response to social or normative pressures instead of providing veridical selfreports (Ellingson, Smith, & Sackett, 2001, p. 122). Response distortion to personality inventories has been reported in both laboratory and eld studies (Hough, 1998), with most of the laboratory studies using either within- or between-participants designs (Ones & Viswesvaran, 1998). In a within-participants design, participants are instructed to complete the same personality inventory twice. In the rst administration, participants are instructed to respond to the measures honestly, whereas in the second administration, they are asked to provide socially desirable responses. The responses of the same individual are then compared. In a betweenparticipants design, one group of participants is instructed to provide honest responses to a personality measure, whereas another group of participants is instructed to provide socially desirable responses to the same measure. The responses of these two groups are then compared. In a recent meta-analysis, Viswesvaran and Ones (1999) found that for within-participants experimental designs, fake-good participants (i.e., participants who were instructed to present themselves favorably) raised their scores on personality inventories by an average of three quarters of a standard deviation. For between-participants designs, the personality scores of fake-good participants were 0.5 standard deviations higher than those of participants in the honest condition (i.e., participants who were instructed to respond honestly). In addition to laboratory studies, the intentional distortion of responses to personality inventories has also been identied among samples of actual job applicants (Barrick & Mount, 1996; Elliot, 1981; Hough, 1998). To circumvent this problem, researchers have evaluated individual differences in social desirability to infer the credibility of responses to personality inventories (Ellingson, Sackett, & Hough, 1999; Hough, 1998; Ones, Viswesvaran, & Reiss, 1996; Piedmont, McCrae, Riemann, & Angleitner, 2000). High correlations between responses to personality inventories and scores on social desirability scales tend to be interpreted as a sign of response distortion (Ellingson et al., 1999). Over the years, a large number of social desirability scales have been developed, such as the Edwards Social Desirability Scale (Edwards, 1957), the Eysenck Lie Scale (Eysenck & Eysenck, 1964), and the Marlowe-Crowne Social Desirability Scale (MCSDS; Crowne & Marlowe, 1964). More recently, the Balanced Inventory of Desirable Responding (BIDR; Paulhus, 1991) has been gaining recognition. The popularity of the BIDR corroborates the recent reconceptualization of the construct of social desirability. Unlike previous research that treated social desirability as a unitary construct, Paulhus (1984, 1991, 2002) maintained that social desirability can be broken down into two components: self-deception and impression management. Self-deception is an unintentional propensity to portray oneself in a favorable light, manifested in positively biased but honestly believed self-descriptions. Research has shown that individuals high in self-deception tend to be well adjusted, ignore minor criticisms, and have high condence in themselves (Paulhus, 1991).

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Li, Bagger / BIDR

527

Impression management, in contrast, indicates a tendency to intentionally distort ones self-image to be perceived favorably by others. Although myriad scales exist to detect response distortion, most of them were not designed to separately assess self-deception and impression management. The BIDR is one exception (Paulhus, 1984, 1991, 2002). The BIDR includes two subscales, with the Impression Management (IM) subscale tapping the impression management dimension and the Self-Deception Enhancement (SDE) subscale tapping the self-deception dimension. Both subscales consist of 20 items. Users of the BIDR indicate their agreement with the 40 statements about themselves on a 7point (or 5-point) scale, with 1 denoting not true and 7 denoting very true. Sample items of the IM subscale include I have received too much change from a salesperson without telling him or her and I have some pretty awful habits. Sample items of the SDE subscale include I have not always been honest with myself and I never regret my decision. The scale is counterbalanced so that there are equal numbers of positively and negatively keyed items. According to Paulhus (1991), there are two alternative methods to score the BIDR items, namely, dichotomous and continuous scoring. With dichotomous scoring, responses of 6 or 7 are scored 1, and responses of 1 to 5 are scored 0. With continuous scoring, the raw score is used in the subsequent statistical analysis. According to Paulhus (1991), the discriminate and convergent validity of these two subscales can be discerned by examining their correlations with other social desirability scales. For example, the SDE subscale, which measures an unintentional promotion of self-image, was found to be correlated more highly with traditional measures of defense and coping, such as the Edwards Social Desirability Scale (Edwards, 1957), Byrnes Repression-Sensitization Scale, and Gurs SelfDeception Scale. The IM subscale, which measures a deliberate distortion of ones public image, was more closely related to the Minnesota Multiphasic Personality Inventory Lie subscale, Wigginss Social Desirability Scale, and Gurs OtherDeception Scale (all of which are traditional measures of response dissimulation). In addition, the low or medium correlations between these two subscales provide further evidence of their discriminate validity (Paulhus, 1991).

Reliability Generalization
The ability of the BIDR to detect and assess the extent of response distortion to personality inventories is contingent on the degree of reliability of test scores. Internal consistency score reliability is the degree of self-consistency among the scores earned by an individual (Ghiselli, Campbell, & Zedeck, 1981, p. 184). When a test yields low score reliability, it is often an indication that the tests scores are plagued by random errors and not reective of the construct intended to be measured (Thompson, 2003). Unreliable test scores are of little value because

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

528

Educational and Psychological Measurement

they do not provide consistent information about the construct of interest. Moreover, score unreliability attenuates effect sizes, thereby compromising the ability of empirical studies to facilitate meaningful comparisons among individuals and effectively predict behaviors (Ghiselli et al. 1981; Henson, 2001; Hunter & Schmidt, 1990; Thompson & Vacha-Haase, 2000). It is important to note that reliability is a function of test scores, not a property of the assessment instrument. As Nitko (2004) stated, as with validity, reliability refers to the students assessment results or scores, not to the assessment instrument itself (p. 60). The same test may produce different reliability estimates for its scores when it is administered to different samples. In other words, the fact that a test yields reliable scores with one sample does not guarantee that the same level of reliability can be obtained when a different sample is administered the same test (Rowley, 1976). One factor that inuences the variability of reliability coefcients across studies is sample heterogeneity (Henson, 2001; Nunnally & Bernstein, 1994). Generally speaking, more heterogeneous samples tend to yield higher estimates of score reliability. Given that studies vary widely in sample composition, it is imperative that researchers report the reliability coefcients for their data. Unfortunately, as observed by several commentators, the reporting of reliability coefcients is at best sporadic in most empirical studies (Thompson, 2003; VachaHaase, 1998; Vacha-Haase, Henson, & Caruso, 2002). For example, Yin and Fan (2000) identied over 1,200 studies that used the Beck Depression Inventory. Among them, only 90 (less than 8%) reported sample reliability coefcients. The vast majority of the studies did not even mention the issue of reliability. The underreporting of score reliabilities has also been found when other test instruments have been examined (e.g., Beretvas, Meyers, & Leite, 2002; Caruso, 2000; Kieffer & Reese, 2002; Shields & Caruso, 2003; Vacha-Haase, 1998). The low frequency of reliability reporting likely has its root, at least in part, in the erroneous belief that a test (instead of test scores) is reliable. When a test is deemed reliable, it is believed that the same level of reliability coefcient can be obtained independent of the sample used. A number of researchers have discussed the deleterious impact on research studies as a result of failing to report reliability coefcients for the data (Caruso, 2000; Thompson & Vacha-Haase, 2000). Effect sizes that might otherwise be statistically signicant may be attenuated by score unreliability. If researchers conceive of reliability as an inherent property of a test and assume that the test (rather than test scores) they use is reliable, they may fail to consider reliability when interpreting effects or consider correcting the effect size for score unreliability. In light of this, Thompson and Vacha-Haase (2000) stated that the better [reliability reporting] practice would be to estimate score reliability for ones own data. . . . This is the acid test of sample-to-normative-sample match as regards reliability (pp. 190-191). To evaluate the extent of reliability variability across studies, Vacha-Haase (1998) introduced a procedure called reliability generalization. Reliability generalization

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Li, Bagger / BIDR

529

studies apply the same premise as validity generalization studies (Hunter & Schmidt, 1990). In essence, the purpose of a reliability generalization study is to organize a voluminous body of studies that report score reliabilities on a particular test or construct and to explore the sources of variability present in the reliability coefcients across studies. A reliability generalization study provides information about the typical level of score reliability on a certain test. In addition, various characteristics of the studies (such as age and gender of the samples, test length, and sample size) can be coded and used as predictors to explain variation in reliability estimates.

Purpose
The purpose of this study was to conduct a reliability generalization study for the BIDR (Paulhus, 1991). A common assumption is that low correlations between personality scores and BIDR scores indicate that responses to personality inventories are free from the contamination of social desirability (Ellingson et al., 1999). However, this assumption may be problematic, because the low correlations may be caused by low score reliability (Henson, 2001; Thompson, 2003). A reliability generalization study for the BIDR may provide important information pertaining to the typical score reliability coefcient across samples. In addition, by examining characteristics that may potentially inuence reliability variability, our analysis may help researchers determine the appropriate use of this scale in their substantive research.

Method
Instrument
A number of revisions to the BIDR have been undertaken since it was rst developed in 1984. These changes resulted in the construction of two subscales to measure self-deception, namely, SDE and Self-Deception Denial (Paulhus, 1991). The Self-Deception Denial scale has not been used extensively, in part because of its reported high correlation with the IM scale (Paulhus, 1991; Peterson et al., 2003), making it somewhat redundant. Among the several versions of the BIDR, Version 6 (1991) has been most widely used and was therefore the focus of the present reliability generalization (Stober, Dette, & Musch, 2002).

Procedure and Sample of Articles


Both computer-based and manual searches of studies using the BIDR were conducted. The computer-based search included three steps. First, we conducted a search in the Social Science Citation Index that referenced the book chapter by Paulhus (1991) in which the scale was published. Second, we conducted a search in the PsycInfo and ERIC databases using the search term Balanced Inventory of

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

530

Educational and Psychological Measurement

Desirable Responding or BIDR. In the third step, we searched unpublished dissertations in the Dissertation Abstract. Our manual search focused on the reference lists of two comprehensive meta-analyses (Ones et al. 1996; Richman, Kiesler, Weisband, & Drasgow, 1999). The search yielded 236 articles that used the BIDR (Version 6). These articles covered the time span between 1991 and 2004. These 236 articles consisted of 206 published studies and 30 dissertations. Of these 236 articles, 86 (36%) did not mention the issue of reliability, 40 (17%) cited the reliability coefcients of other published works, and 110 (47%) reported the reliability estimates of the sample data. The present reliability generalization was based on the 110 articles in which the reliability estimates of test scores were reported. Because some of these articles reported the reliability coefcients of more than one sample, the number of reliability estimates totaled 215. All estimates were measures of internal consistency (coefcient a). To understand the source of variability in reliability estimates, we examined each article and coded the characteristics of the sample. Following the recommendation of Henson and Thompson (2002) and other published works (Beretvas et al., 2002; Caruso, 2000; Kieffer & Reese, 2002; Shields & Caruso, 2003; VachaHaase, 1998), the sample characteristics of age, gender, sample type, nationality, sample size, and language were included in the analysis. Age was coded as a continuous variable representing the mean age of the sample. Gender was coded as a continuous variable representing the percentage of female participants in each sample. Because most of the study participants were undergraduate students, sample type was coded 1 for undergraduate students and 0 for others. A noticeable number of studies in our database were conducted in countries other than the United States; therefore, nationality was also included as a sample characteristic. Studies conducted in the United States were coded 1, and all else were coded 0. The language of the BIDR was also coded, with 1 representing the English version and 0 for versions translated into other languages. In addition to sample characteristics, we also included three test-related predictors, namely, test length, publication status, and scoring method. Test length was coded as a dichotomous variable, with 1 representing full length and 0 representing a shortened version of the scale. Publication status was coded dichotomously as either published (1) or unpublished (0). Finally, a dichotomous variable was created to represent the dichotomous (1) and the continuous (0) scoring methods.

Results
Table 1 presents the means, standard deviations, and ranges of the reliability estimates for the IM, SDE, and overall BIDR scores. Table 1 also presents the skewness and kurtosis of the distribution of reliability estimates. The mean score reliability coefcient of the IM scale was .74 (SD = .09, n = 107). The mean

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Li, Bagger / BIDR

531

Table 1 Descriptive Statistics for Reliability Coefcients (a)


Scale IM SDE Overall BIDR M .74 .68 .80 Median .76 .69 .80 SD .09 .09 .04 n 107 90 18 Range .32 to .88 .27 to .92 .68 to .86 Skewness 1.34 1.24 0.99 Kurtosis 3.70 4.96 2.13

Note: IM = Impression Management; SDE = Self-Deception Enhancement; BIDR = Balanced Inventory of Desirable Responding.

reliability coefcients were .68 (SD = .09, n = 90) for SDE scores and .80 (SD = .04, n = 18) for overall BIDR scores. We also compared the mean reliability estimates of IM and SDE scores. Because of concerns about violations of the assumption of independence, we did not conduct an independent-samples t test. Instead, we conducted a paired-samples t test on the basis of 76 pairs of reliability estimates of both IM and SDE scores. The results of the paired-samples t test indicated that the score reliability for the IM scale was statistically signicantly higher than for the SDE scale (t = 6.74, p = .001, d = .93). The second purpose of the present study was to explore study characteristics that predict the variability of reliability estimates. The overall BIDR scale was excluded from the analysis because only 18 studies reported its score reliability coefcients. Thus, our subsequent analyses focused only on the IM and SDE scales. Consistent with previous research (Henson & Thompson, 2002; Reese, Keiffer, & Briggs, 2002; Yin & Fan, 2000), we used bivariate correlation to examine the relationship between study characteristics and reliability estimates. Multiple regression was used to estimate the amount of reliability coefcient variance explained by the study characteristics. Table 2 presents the results of the zero-order correlations between study characteristics and the reported reliability coefcients of IM scores. Test length was the only characteristic that was statistically signicantly correlated with reliability coefcients (r = .22, p = .03), with the full-length test versions associated with higher reliability estimates. This is not particularly surprising, because other researchers have also suggested that test length inuences reliability estimates (Nunnally & Bernstein, 1994). All things being equal, the longer the scale, the higher the reliability estimates due to the correction factor for test length in the a formula (Henson, 2001). We also conducted multiple regression analyses, with reliability coefcients entered as criteria and study characteristics (sample size, language, country, gender, sample type, publication status, test length, and scoring method) as predictors. Given that the regression equations included multiple independent variables, we computed the variance ination factor to detect multicollinearity. The results indicated that all the variance ination factors were greater

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

532

Educational and Psychological Measurement

Table 2 Correlation Coefcients and Multiple Regression Coefcients for Prediction of Impression Management Scale Scores
Predictor Sample size Age Language Country Gender Sample type Publication status Test length Scoring method n 107 65 107 107 101 107 107 107 103 r .06 .09 .19 .13 .09 .11 .04 .22 .15 b .01 .17 .11 .16 .27 .12 .14 .26

Note: Regression analyses were based on a sample size of 97. Age was not included in the multiple regression analyses because of missing data. p < .05.

than .5, which indicated that multicollinearity did not bias the standard errors (A & Clark, 1996). The resulting equation was statistically signicant, F(8, 88) = 2.82, p = .008, and accounted for 20.4% (R2) of the overall variation associated with the reliability estimates. Following the advice of Courville and Thompson (2001) that both regression coefcients and bivariate correlation coefcients (or structure coefcients) should be considered when interpreting predictor-criterion relationships, we also report the multiple regression coefcients (b weights) of all the study characteristics in Table 2. Table 3 presents the results of the correlation between study characteristics and the reported reliability coefcients of SDE scores. Both language (r = .33, p = .002) and country (r = .28, p = .009) were statistically signicantly correlated with reliability coefcients, indicating that the SDE scale that was administered in English and in the United States tended to have higher score reliability estimates. In addition, publication status was also statistically signicantly correlated with reliability estimates (r = .21, p = .04), with higher reliability estimates reported in published studies. Multiple regression analyses indicated the equation was statistically signicant, F(8, 73) = 2.25, p = .03. In all, these study characteristics explained 19.8% (R2) of the overall reliability variability. Regression coefcients of all the study characteristics are presented in Table 3. It is worth noting that none of the three predictors (language, country, and publication status) were statistically signicant predictors of reliability estimates in the multiple regression equation. Indeed, it is not uncommon for a predictor to be statistically signicantly correlated with the dependent variable but also have a

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Li, Bagger / BIDR

533

Table 3 Correlation Coefcients and Multiple Regression Coefcients for Prediction of Self-Deception Enhancement Scale Scores
Predictor Sample size Age Language Country Gender Sample type Publication status Test length Scoring method n 90 50 90 90 87 90 90 90 83 r .09 .03 .33 .28 .03 .09 .21 .18 .03 b .08 .23 .19 .03 .02 .21 .08 .06

Note: Regression analyses were based on a sample size of 82. Age was not included in the multiple regression analyses because of missing data. p < .05.

nonsignicant b weight in the multiple regression equation. It is possible that other predictors in the same regression equation are statistically signicantly correlated with the targeted predictor and thus account for some of its shared explanatory variance (see Courville & Thompson, 2001). Because most of the studies included both the IM and the SDE scales, multivariate reliability analysis was expected to provide additional insight into the relationship between scale scores and their correlates (Vacha-Haase, 1998). Our database included studies that reported only the reliability estimates of scores on one scale (either the IM scale or the SDE scale); therefore, these studies were excluded from the multivariate analysis, resulting in a sample of 76 pairs of reliability estimates of both IM and SDE scores (although because of missing data for some of the predictor variables, only 69 pairs were used in the canonical correlation analyses). Canonical correlation analyses were conducted to examine the relationship between the reliability coefcients of IM and SDE scores and the study characteristics described above (Sherry & Henson, 2005). Two canonical variates were generated. The full model that included both variates was statistically signicant, Wilkss = .59, F(16, 118) = 2.26, p < .001, which indicates that the canonical variates accounted for 41% of the variance (with 59% of the variance unaccounted for). The second model, which included the second largest variate, was not statistically signicant, Wilkss = .83, F(7, 60) = 1.80, p > .10. On the basis of these analyses, we retained the largest canonical variate with square canonical correlations of .29, which suggests that 29% of the variance in the linear combination of reliability scores can be accounted for by the linear combination of predictors, and vice versa. Table 4 presents the standardized canonical coefcients and structure coefcients

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

534

Educational and Psychological Measurement

Table 4 Standardized Canonical Correlation Coefcients (n = 69)


Variable Sample size Sample type Language Country Gender Publication status Test length Scoring method IM reliability SDE reliability Standardized Coefcient .13 .31 .56 .13 .19 .30 .57 .26 .71 .51 Structure Coefcient .24 .11 .74 .51 .11 .15 .61 .17 .88 .74

Note: Standard coefcients greater than or equal to .30 are italicized. IM = Impression Management; SDE = Self-Deception Enhancement.

for each of the variables. On the basis of a cutoff score of .30 recommended by Lambert and Durand (1975) as an acceptable minimum coefcients value, the canonical correlation primarily suggested that language, publication status, and test length were positively correlated and sample type was negatively correlated with the reliability coefcients of IM and SDE scores.

Discussion
The primary purposes of the present study were to conduct a reliability generalization analysis to examine the typical level of reliability estimates of BIDR scores across samples and to identify study characteristics that explain the variation of reliability coefcients. Our results suggest that the mean reliability estimates were .74 for IM scores and .80 for overall BIDR scores. Nunnally and Bernstein (1994) recommended a minimum reliability cutoff value of .70 at the early stage of measurement development, a value of .80 for basic research purposes, and a minimum of .90 for high-stakes decisions. As an established scale that has been used extensively in both applied settings and basic research, the reliability estimates of BIDR scores and scores of the subscales should exceed the .80 or .90 cutoff. On the basis of this standard, the results reported in the present study suggest that the overall BIDR is acceptable for basic research but not for important applied decisions. The IM scale is not sufcient for basic research, much less for important applied decisions. Comparatively, the SDE scale had an even lower mean score reliability estimate (.68). Inspection of the studies that reported reliability coefcients of SDE scores revealed that more than half of the reliability estimates (53%) were below .7. Moreover, the mean score reliability coefcient for the IM scale was statistically

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Li, Bagger / BIDR

535

signicantly higher than for the SDE scale. The lower reliability estimates of SDE scores compared with IM scores are consistent with what has been reported in the test manual (Paulhus, 1991). It is likely that the low score reliability is caused by the low commonality among the items (Nunnally & Bernstein, 1994). Alternatively, it is possible that the items in the SDE scale evoke varying needs for selfdefense and self-enhancement. Nevertheless, the low reliability of SDE scores is quite troubling and warrants future research. Given the low reliability coefcients of SDE scores reported in the present study and in previous research, it is recommended that users of this scale exercise caution in interpreting ndings on the basis of the scale scores. It is worth noting that Beretvas et al. (2002) conducted a reliability generalization study for the MCSDS, the most widely used social desirability scale. They reported that the mean internal consistency reliability coefcient of MCSDS scores was .73, which is close to the reliability estimates of IM scores reported in the present study. We also examined study characteristics that inuence the variability of reliability estimates across studies. Consistent with previous research (Nunnally & Bernstein, 1994), test length was statistically signicantly correlated with reliability estimates of IM scores (but not SDE scores). Reliability estimates of SDE scores appeared to be higher when the scale was administered in the United States and when the article using the scale had been published. Our nding of higher reliability estimates reported in published studies conrms prior speculations that because higher quality studies may be more likely to be published, journal editors tend to be less receptive of low-quality studies, such as those studies reporting low reliability coefcients (Caruso, 2000; Shields & Caruso, 2004). Canonical correlation between the reliability estimates of IM and SDE scores and study characteristics revealed that language, publication status, and test length were positively correlated and sample type was negatively correlated with the reliability coefcients of IM and SDE scores. These results mirror what was found in the univariate analysis, although some variations exist. One explanation for these variations is that a smaller set of studies (69) was used in the canonical correlation analysis than in the univariate analysis (107 for the IM scale and 90 for the SDE scale). It is worth noting that scoring method was not statistically signicantly related to reliability coefcients of test scores. Stober et al. (2002) found that the continuous scoring of the BIDR yielded higher score reliability coefcients than the dichotomous scoring method. The divergence of ndings may be because Stober et al. (2002) used a German version of the BIDR, whereas most of the studies included in this reliability generalization were in English. It is somewhat disappointing that the study characteristics included in this reliability generalization accounted for only a modest fraction of the variance (20%), leaving the rest of the variance unaccounted for. Future research should examine other variables that may contribute to the variation of score reliabilities across studies. For example, previous research has indicated that the standard deviation

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

536

Educational and Psychological Measurement

of measurement scores (score variability) tends to be a good predictor of reliability variations (Shields & Caruso, 2003). Unfortunately, only 47% of the studies included in the present reliability generalization provided the standard deviation of measurement scores; as a result, we did not include it as a predictor. Future research should include standard deviation as one of the predictors of variability of reliability estimates. This nding also underscores the need to report reliability coefcients for ones own data, given that a substantial amount of the variability associated with reliability estimates has yet to be explained. Unfortunately, only 47% of the articles using the BIDR reported reliabilities for their data. The majority of the studies either made no mention of reliabilities or simply referred to the results of other published work to substantiate the use of the BIDR. As stated previously, the underreporting of reliability coefcients may lead researchers to believe that reliability is an inherent property of the test, thus obviating the need to correct effect sizes for score unreliability (Caruso, 2000; Thompson & Vacha-Haase, 2000). This is particularly problematic for the use of the BIDR. A low correlation between the BIDR and personality constructs is often seen as a sign of a lack of bias on the part of personality inventories. However, the low correlation may be caused by unreliability of BIDR scores that attenuates the effect size. In view of this, we agree with the conclusions of Wilkinson and the APA Task Force on Statistical Inference (1999) that interpreting the size of observed effects requires an assessment of the reliability of the scores (p. 596). As is true of all other studies, ours is not without its limitations. First, although we went to great lengths to locate unpublished studies, our search was focused mainly on dissertations. It is possible that other unpublished studies that used this scale were not included in the present analysis. As noted previously, unpublished studies tend to report lower reliability estimates than published studies because of publication biases (Rosenthal, 1979). Therefore, the inclusion of more unpublished studies may possibly further depress the mean reliability estimates of BIDR scores. Second, as we alluded to earlier, a majority of the researchers using the BIDR failed to report reliability coefcients for their data. Although it is certainly possible that these studies may have desirable reliability estimates of BIDR scores, the reverse seems to be more likely. As Caruso (2000) suggested, authors who nd low reliability coefcients of scale scores in their own studies may opt to report ndings from previous research. This being the case, the inclusion of those studies that did not report reliability coefcients may have lowered the mean reliability estimates of BIDR scores. In conclusion, the results suggest that on the whole, the overall BIDR scale produced scores that were adequately reliable, whereas the reliability estimates of SDE and IM scores were unsatisfactory. Although a number of study characteristics were statistically signicantly related to reliability estimates, they accounted for only a small amount of the overall reliability variability. These ndings, along with

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Li, Bagger / BIDR

537

other reliability generalization studies, provide further evidence that reliability is not a property of a test. It is our sincere hope that future researchers always examine and report reliability coefcients for their data instead of simply reporting previous reliability results. References
References marked with an asterisk indicate studies included in the meta-analysis. Abildgaard, A. M. (1999). Level of anonymity and its relationship to socially desirable responding (impression management). Unpublished doctoral dissertation, California School of Professional Psychology, Los Angeles. Abrams, D., Viki, G. T., Masser, B., & Bohner, G. (2003). Perceptions of stranger and acquaintance rape: The role of benevolent and hostile sexism in victim blame and rape proclivity. Journal of Personality and Social Psychology, 84, 111-125. A, A. A., & Clark, V. (1996). Computer-aided multivariate analysis. Boca Raton, FL: Chapman & Hall/CRC. Antons, C. M., Dilla, B. L., & Fultz, M. L. (1997, May). Assessing student attitudes: Computer versus pencil. Paper presented at the Annual Forum of the Association for Institutional Research, Orlando, FL. Aquino, K., & Reed, A. (2002). The self-importance of moral identity. Journal of Personality and Social Psychology, 83, 1423-1440. Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 1-26. Barrick, M. R., & Mount, M. K. (1996). Effects of impression management and self-deception on the predictive validity of personality constructs. Journal of Applied Psychology, 81, 261-272. Bearden, W. O., Hardesty, D. M., & Rose, R. L. (2001). Consumer self-condence: Renements in conceptualization and measurement. Journal of Consumer Research, 28, 121-134. Becker, T. E., & Martin, S. L. (1995). Trying to look bad at work: Methods and motives for managing poor impressions in organizations. Academy of Management Journal, 38, 174-199. Begun, A. L., Murphy, C., Bolt, D., Weinstein, B., Strodthoff, T., Short, L., et al. (2003). Characteristics of the Safe at Home instrument for assessing readiness to change intimate partner violence. Research on Social Work Practice, 13, 80-107. Beretvas, S. N., Meyers, J. L., & Leite, W. L. (2002). A reliability generalization study of the MarloweCrowne Social Desirability Scale. Educational and Psychological Measurement, 62, 570-589. Booth-Kewley, S., Edwards, J. E., & Rosenfeld, P. (1992). Impression management, social desirability, and computer administration of attitude questionnaires: Does the computer make a difference? Journal of Applied Psychology, 77, 562-566. Bourgeois, A. E., Loss, R., Meyers, M. C., & LeUnes, A. D. (2003). The Athletic Coping Skills Inventory: Relationship with impression management and self-deception aspects of socially desirable responding. Psychology of Sport and Exercise, 4, 71-79. Bowes-Sperry, L., & Powell, G. N. (1999). Observers reactions to social-sexual behavior at work: An ethical decision making perspective. Journal of Management, 25, 779-802. Burris, C. T., & Tarpley, W. R. (1998). Religion as being: Preliminary validation of the Immanence scale. Journal of Research in Personality, 32, 55-79. Calsyn, R. J., & Klinkenberg, W. D. (1995). Response bias in needs assessment studies. Evaluation Review, 19, 217-225. Calsyn, R. J., & Winter, J. P. (1999). Understanding and controlling response bias in needs assessment studies. Evaluation Review, 23, 399-417.

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

538

Educational and Psychological Measurement

Caruso, J. C. (2000). Reliability generalization of the NEO Personality Scales. Educational and Psychological Measurement, 60, 236-254. Choi, S. J. (2000). Acculturation and method bias in reporting of psychological symptoms among rstgeneration Korean Americans. Unpublished doctoral dissertation, St. Johns University, New York. Christensen, C. H. (2001). Therapist cultural sensitivity and premature termination rates with ethnic minority adolescents. Unpublished doctoral dissertation, University of Akron, Akron, Ohio. Ciaravino, J. G. (1991). Two factor theory of social desirability responding and bias reduction in childrens self-reports. Unpublished doctoral dissertation, Hofstra University, New York. Compton, W. C., Smith, M. L., Cornish, K. A., & Qualls, D. L. (1996). Factor structure of mental health measures. Journal of Personality and Social Psychology, 71, 406-413. Courville, T., & Thompson, B. (2001). Use of structure coefcients in published multiple regression articles: b is not enough. Educational and Psychological Measurement, 61, 229-248. Crowne, D. P., & Marlowe, D. (1964). The approval motive. New York: John Wiley. Cunningham, M. R., Wong, D. T., & Barbee, A. P. (1994). Self-presentation dynamics on overt integrity tests: Experimental studies of the Reid Report. Journal of Applied Psychology, 79, 643-658. Day, E. A., Radosevich, D. J., & Chasteen, C. S. (2003). Construct- and criterion-related validity of four commonly used goal orientation instruments. Contemporary Educational Psychology, 28, 434-464. DeYoung, C. G., Peterson, J. B., & Higgins, D. M. (2002). Higher-order factors of the Big Five predict conformity: Are there neuroses of health? Personality and Individual Differences, 33, 533-552. Dillon, F., & Worthington, R. L. (2003). The Lesbian, Gay and Bisexual Afrmative Counseling SelfEfcacy Inventory (LGB-CSI): Development, validation, and training implications. Journal of Counseling Psychology, 50, 235-251. Edwards, A. L. (1957). The social desirability variable in personality assessment and research. New York: Dryden. Ellingson, J. E., Sackett, P. R., & Hough, L. M. (1999). Social desirability corrections in personality measurement: Issues of applicant comparison and construct validity. Journal of Applied Psychology, 84, 155-166. Ellingson, J. E., Smith, D. B., & Sackett, P. R. (2001). Investigating the inuence of social desirability on personality factor structure. Journal of Applied Psychology, 86, 122-133. Elliot, A. G. P. (1981). Some implications of lie scale scores in real-life selection. Journal of Occupational Psychology, 54, 9-16. Elliot, A. J., & Reis, H. T. (2003). Attachment and exploration in adulthood. Journal of Personality and Social Psychology, 85, 317-331. Elliot, A. J., & Thrash, T. M. (2002). Approach-avoidance motivation in personality: Approach and avoidance temperaments and goals. Journal of Personality and Social Psychology, 82, 804-818. Eysenck, H. J., & Eysenck, S. B. G. (1964). The manual of the Eysenck Personality Inventory. London: University of London Press. Fischer, A. R., Tokar, D. M., Mergl, M. M., Good, G. E., Hill, M. S., & Blum, S. A. (2000). Assessing womens feminist identity development: Studies of convergent, discriminant, and structural validity. Psychology of Women Quarterly, 24, 15-29. Flannery, B. L., & May, D. R. (2000). Environmental ethical decision making in the U.S. metal-nishing industry. Academy of Management Journal, 43, 642-662. Fossum, T. A., & Barrett, L. F. (2000). Distinguishing evaluation from description in the personalityemotion relationship. Personality and Social Psychology Bulletin, 26, 669-678. Fox, S., & Schwartz, D. (2002). Social desirability and controllability in computerized and paper-andpencil personality questionnaires. Computers in Human Behavior, 18, 389-410. Furnham, A., Petrides, K. V., & Spencer-Bowdage, S. (2002). The effects of different types of social desirability on the identication of repressors. Personality and Individual Differences, 33, 119-130.

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Li, Bagger / BIDR

539

Gaither, G. A., & Sellbom, M. (2003). The Sexual Sensation Seeking Scale: Reliability and validity within a heterosexual college student sample. Journal of Personality Assessment, 81, 157-167. Gara, M. A., Woolfolk, R. L., & Allen, L. A. (2002). Social cognitive complexity and depression: Cognitive complexity moderates the correlation between depression self-ratings and global self-evaluation. Journal of Nervous and Mental Disease, 190, 670-676. Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory for the behavioral sciences. San Francisco: Freeman. Gopinath, C., & Becker, T. E. (2000). Communication, procedural justice, and employee attitudes: Relationships under conditions of divestiture. Journal of Management, 26, 63-83. Graziano, W. G., & Tobin, R. M. (2002). Agreeableness: Dimension of personality or social desirability artifact? Journal of Personality, 70, 695-727. Greenberger, E., Chen, C., Dmitrieva, J., & Farruggia, S. P. (2003). Item-wording and the dimensionality of the Rosenberg Self-Esteem Scale: Do they matter? Personality and Individual Differences, 35, 1241-1254. Hamburger, M. E., Hogben, M., McGowan, S., & Dawson, L. J. (1996). Assessing hypergender ideologies: Development and initial validation of a gender-neutral measure of adherence to extreme gender-role beliefs. Journal of Research in Personality, 30, 157-178. Haraburda, E. M. (1998). The relationship of indecisiveness to the ve factor personality model and psychological symptomology. Unpublished doctoral dissertation, The Ohio State University, Columbus. Hemphill, J. F., & Howell, A. J. (2000). Adolescent offenders and stages of change. Psychological Assessment, 12, 371-381. Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefcient alpha. Measurement and Evaluation in Counseling and Development, 34, 177-189. Henson, R. K., & Thompson, B. (2002). Characterizing measurement error in scores across studies: Some recommendations for conducting reliability generalization studies. Measurement and Evaluation in Counseling and Development, 35, 113-127. Hill, M. S., & Fischer, A. R. (2001). Does entitlement mediate the link between masculinity and raperelated variables? Journal of Counseling Psychology, 48, 39-50. Hirschfeld, R. R., Feild, H. S., & Bedeian, A. G. (2000). Work alienation as an individual-difference construct for predicting workplace adjustment: A test in two samples. Journal of Applied Social Psychology, 30, 1880-1902. Hogan, R., & Hogan, J. (1992). Hogan Personality Inventory manual. Tulsa, OK: Hogan Assessment System. Hough, L. M. (1998). Effects of intentional distortion in personality measurement and evaluation of suggested palliatives. Human Performance, 11, 209-244. Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-related validities of personality constructs and the effect of response distortion on those validities. Journal of Applied Psychology, 75, 581-595. Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis: Correcting error and bias in research ndings. Newbury Park, CA: Sage. Inman, A. G. (1999). Development and validation of the south Asian womens cultural values conicts scale. Unpublished doctoral dissertation, Temple University, Philadelphia. Jacobson, B. S. (2003). The relation between a mans sexual orientation and how he is perceived by others. Unpublished doctoral dissertation, Rutgers University, New Brunswick, NJ. Johnson, M. E. (2000). Reliability and validity of the Leadership Self-Efcacy Scale. Unpublished doctoral dissertation, Pennsylvania State University, University Park. Johnston, C., Scoular, D. J., & Ohan, J. L. (2004). Mothers reports of parenting in families of children with symptoms of Attention-Decit/Hyperactivity Disorder: Relations to impression management. Child & Family Behavior Therapy, 26, 45-61.

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

540

Educational and Psychological Measurement

Judge, T. A., & Martocchio, J. J. (1996). Dispositional inuences on attributions concerning absenteeism. Journal of Management, 22, 837-861. Kasof, J. (2001). Eveningness and bulimic behavior. Personality and Individual Differences, 31, 361-369. Keffala, V. J., & Stone, G. L. (1999). Role of HIV serostatus, relationship status of the patient, homophobia, and social desirability of the psychologist on decisions regarding condentiality. Psychology and Health, 14, 567-584. Kieffer, K. M., & Reese, R. J. (2002). A reliability generalization study of the Geriatric Depression Scale. Educational and Psychological Measurement, 62, 969-994. King, W. C., & Miles, E. W. (1996). A quasi-experimental assessment of the effect of computerizing noncognitive paper-and-pencil measurements: A test of measurement equivalence. Journal of Applied Psychology, 80, 643-651. Klein, W. N. (1993). An examination of the interactive effects of monetary consequences and degree of identity disclosure on socially desirable responding. Unpublished doctoral dissertation, Hofstra University, Hempstead, NY. Lambert, Z. V., & Durand, R. M. (1975). Some precautions in using canonical analysis. Journal of Market Research, 12, 468-475. Lastovicka, J. L., Bettencourt, L. A., Hughner, R. S., & Kuntze, R. J. (1999). Lifestyle of the tight and frugal: Theory and measurement. Journal of Consumer Research, 26, 85-98. Lay, C., Fairlie, P., Jackson, S., Ricci, T., Eisenberg, J., Sato, T., et al. (1998). Domain-specic allocentrism-idiocentrism: A measure of family connectedness. Journal of Cross-Cultural Psychology, 29, 434-460. Lee, K., Gizzarone, M., & Ashton, M. C. (2003). Personality and the likelihood to sexually harass. Sex Roles, 49, 59-69. Lee, S., & Klein, H. J. (2002). Relationship between conscientiousness, self-efcacy, self-deception, and learning over time. Journal of Applied Psychology, 87, 1175-1182. Lyn, T. S. (2001). Adult attachment and sexual offender status. Unpublished doctoral dissertation, University of Michigan, Ann Arbor. Madia, B. P., & Lutz, C. J. (2004). Perceived similarity, expectation-reality discrepancies, and mentors expressed intention to remain in Big Brothers/Big Sisters programs. Journal of Applied Social Psychology, 34, 598-623. Martocchio, J. J., & Judge, T. A. (1997). Relationship between conscientiousness and learning in employee training: Mediating inuences of self-deception and self-efcacy. Journal of Applied Psychology, 82, 764-773. McFarland, L. (2003). Warning against faking on a personality test: Effects on applicant reactions and personality test scores. International Journal of Selection and Assessment, 11, 265-276. McHoskey, J. W., Worzel, W., & Szyarto, C. (1998). Machiavellianism and psychopathy. Journal of Personality and Social Psychology, 74, 192-210. Mersman, J. L., & Shultz, K. S. (1998). Individual differences in the ability to fake on personality measures. Personality and Individual Differences, 24, 217-227. Meston, C. M., Heiman, J. R., Trapnell, P. D., & Carlin, A. S. (1999). Ethnicity, desirable responding, and self-reports of abuse: A comparison of European- and Asian-ancestry undergraduates. Journal of Consulting and Clinical Psychology, 67, 139-144. Meston, C. M., Heiman, J. R., Trapnell, P. D., & Paulhus, D. L. (1998). Socially desirable responding and sexuality self-reports. Journal of Sex Research, 35, 148-157. Mick, D. G. (1996). Are studies of dark side variables confounded by socially desirable responding? The case of materialism. Journal of Consumer Research, 23, 106-119. Miles, E. W., & King, W.C.J. (1998). Gender and administration mode effects when pencil-and-paper personality tests are computerized. Educational and Psychological Measurement, 58, 68-76.

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Li, Bagger / BIDR

541

Mills, J. F., & Kroner, D. G. (2003). Anger as a predictor of institutional misconduct and recidivism in a sample of violent offenders. Journal of Interpersonal Violence, 18, 282-294. Moorman, R. H., & Podsakoff, P. M. (1992). A meta-analytic review and empirical test of the potential confounding effects of social desirability response sets in organizational behavior research. Journal of Occupational and Organizational Psychology, 65, 131-149. Moradi, B., & Mezydlo Subich, L. (2002). Perceived sexist events and feminist identity development attitudes: Links to womens psychological distress. Counseling Psychologist, 30, 44-65. Moradi, B., & Mezydlo Subich, L. (2002). Feminist identity development measures: Comparing the psychometrics of three instruments. Counseling Psychologist, 30, 66-86. Nitko, A. J. (2004). Educational assessment of students (4th ed.). Upper Saddle River, NJ: Pearson. Nolen-Hoeksema, S., & Jackson, B. (2001). Mediators of the gender difference in rumination. Psychology of Women Quarterly, 25, 37-47. Nunnally, J., & Bernstein, I. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Ones, D. S., & Viswesvaran, C. (1998). The effects of social desirability and faking on personality and integrity assessment for personnel selection. Human Performance, 11(2-3), 245-269. Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social desirability in personality testing for personnel selection: The red herring. Journal of Applied Psychology, 81, 660-679. ORourke, N. (2004). Cognitive adaptation and womens adjustment to conjugal bereavement. Journal of Women and Aging, 16, 87-104. ORourke, N., & Wenaus, C. A. (1998). Marital aggrandizement as a mediator of burden among spouses of suspected dementia patients. Canadian Journal on Aging, 17, 384-400. OShea, W. A. (2000). Exploring horizontal individualism and vertical individualism in the United States: Implicit and explicit measurement strategies. Unpublished doctoral dissertation, University of Mississippi, University. Oswald, F. L., Schmitt, N., Kim, B. H., Ramsay, L. J., & Gillespie, M. A. (2004). Developing a biodata measure and situational judgment inventory as predictors of college student performance. Journal of Applied Psychology, 89, 187-207. Patry, A. L., & Pelletier, L. G. (2001). Extraterrestrial beliefs and experiences: An application of the theory of reasoned action. Journal of Social Psychology, 141, 199-217. Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of Personality and Social Psychology, 46, 598-609. Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P. R. Shaver, & L. S. Wrightsman (Eds.), Measures of personality and social psychology attitudes (pp. 17-59). New York: Academic Press. Paulhus, D. L. (1998). Interpersonal and intrapsychic adaptiveness of trait self-enhancement: A mixed blessing? Journal of Personality and Social Psychology, 74, 1197-1208. Paulhus, D. L. (2002). Socially desirable responding: The evolution of a construct. In H. I. Braun, & D. N. Jackson (Eds.), Role of constructs in psychological and educational measurement (pp. 49-69). Mahwah, NJ: Lawrence Erlbaum. Pauls, C. A., & Stemmler, G. (2003). Substance and bias in social desirability responding. Personality and Individual Differences, 35, 263-275. Peterson, J. B., DeYoung, C. G., Driver-Linn, E., Seguin, J. R., Higgins, D. M., Arseneault, L., et al. (2003). Self-deception and failure to modulate responses despite accruing evidence of error. Journal of Research in Personality, 37, 205-223. Piedmont, R. L., McCrae, R. R., Riemann, R., & Angleitner, A. (2000). On the invalidity of validity scales: Evidence from self-reports and observer ratings in volunteer samples. Journal of Personality and Social Psychology, 78, 582-593. Pinsent, C. (2000). Power exertion strategies in couples: A Q-methodological investigation of self- and partner-perceived frameworks. Unpublished doctoral dissertation, University of Ottawa, Canada.

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

542

Educational and Psychological Measurement

Porter, L. S., Phillips, C., Dickens, S., & Kiyak, H. A. (2000). Social desirability in patients seeking surgical treatment for dentofacial disharmony: Associations with psychological distress and motivation for treatment. Journal of Clinical Psychology in Medical Settings, 7, 99-106. Potosky, D., & Bobko, P. (1997). Computer versus paper-and-pencil administration mode and response distortion in noncognitive selection tests. Journal of Applied Psychology, 82, 293-299. Randall, D. M., & Fernandes, M. F. (1991). The social desirability response bias in ethics research. Journal of Business Ethics, 10, 805-817. Reese, R. J., Kieffer, K. M., & Briggs, B. K. (2002). A reliability generalization study of select measures of adult attachment style. Educational and Psychological Measurement, 62, 619-646. Reid-Seiser, H. L., & Fritzsche, B. A. (2001). The usefulness of the NEO PI-R positive presentation management scale for detecting response distortion in employment contexts. Personality and Individual Differences, 31, 639-650. Richman, W. L., Kiesler, S., Weisband, S., & Drasgow, F. (1999). A meta-analytic study of social desirability distortion in computer-administered questionnaires, traditional questionnaires, and interviews. Journal of Applied Psychology, 84, 754-775. Riskind, J. H., Williams, N. L., Gessner, T. L., Chrosniak, L. D., & Cortina, J. M. (2000). The looming maladaptive style: Anxiety, danger, and schematic processing. Journal of Personality and Social Psychology, 79, 837-852. Robie, C., Curtin, P. J., Foster, C., Phillips, H. L., Zbylut, M., & Tetrick, L. E. (2000). The effect of coaching on the utility of response latencies in detecting fakers on a personality measure. Canadian Journal of Behavioural Science, 32, 226-233. Robins, R. W., Hendin, H. M., & Trzesniewski, K. H. (2001). Measuring global self-esteem: Construct validation of a single-item measure and the Rosenberg Self-Esteem Scale. Personality and Social Psychology Bulletin, 27, 151-161. Rogelberg, S. G., Fisher, G. G., Maynard, D. C., Hakel, M. D., & Horvath, M. (2001). Attitudes toward surveys: Development of a measure and its relationship to respondent behavior. Organizational Research Methods, 4, 3-25. Rose, P. (2002). The happy and unhappy faces of narcissism. Personality and Individual Differences, 33, 379-392. Rosenfeld, P., Booth-Kewley, S., & Edwards, J. E. (1996). Responses on computer surveys: Impression management, social desirability, and the Big Brother syndrome. Computers in Human Behavior, 12, 263-274. Rosenthal, R. (1979). The le drawer problem and tolerance for null results. Psychological Bulletin, 86, 638-641. Rosse, J. G., Stecher, M. D., Miller, J. L., & Levin, R. A. (1998). The impact of response distortion on preemployment personality testing and hiring decisions. Journal of Applied Psychology, 83, 634-644. Rowatt, W. C., & Schmitt, D. P. (2003). Associations between religious orientation and varieties of sexual experience. Journal for the Scientic Study of Religion, 42, 455-465. Rowatt, W. C., Ottenbreit, A., Nesselroade, K.P.J., & Cunningham, P. A. (2002). On being holier-thanthou or humbler-than-thee: A social-psychological perspective on religiousness and humility. Journal for the Scientic Study of Religion, 41, 227-237. Rowley, G. L. (1976). The reliability of observational measures. American Educational Research Journal, 13, 51-59. Salgado, J. F. (1997). The ve factor model of personality and job performance in the European Community. Journal of Applied Psychology, 82, 30-43. Schmitt, N., Oswald, F. L., Kim, B. H., Gillespie, M. A., Ramsay, L. J., & Yoo, T. Y. (2003). Impact of elaboration on socially desirable responding and the validity of biodata measures. Journal of Applied Psychology, 88, 979-988.

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Li, Bagger / BIDR

543

Sherry, A., & Henson, R. K. (2005). Conducting and interpreting canonical correlation analysis in personality research: A user-friendly primer. Journal of Personality Assessment, 84, 37-48. Shields, A. L., & Caruso, J. C. (2003). Reliability generalization of the Alcohol Use Disorders Identication Test. Educational and Psychological Measurement, 63, 404-413. Shields, A. L., & Caruso, J. C. (2004). A Reliability induction and reliability generalization study of the CAGE Questionnaire. Educational and Psychological Measurement, 64, 254-270. Sinha, R. R., & Krueger, J. (1998). Idiographic self-evaluation and bias. Journal of Research in Personality, 32, 131-155. Sonstroem, R. J., & Potts, S. A. (1996). Life adjustment correlates of physical self-concepts. Medicine and Science in Sports and Exercise, 28, 619-625. Sosik, J. J., Avolio, B. J., & Jung, D. I. (2002). Beneath the mask: Examining the relationship of selfpresentation attributes and impression management to charismatic leadership. Leadership Quarterly, 13, 217-242. Stober, J. (2001). The social desirability scale-17 (SDS-17): Convergent validity, discriminant validity, and relationship with age. European Journal of Psychological Assessment, 17, 222-232. Stober, J., Dette, D. E., & Musch, J. (2002). Comparing continuous and dichotomous scoring of the Balanced Inventory of Desirable Responding. Journal of Personality Assessment, 78, 370-389. Stocker, C. M., Lanthier, R. P., & Furman, W. (1997). Sibling relationships in early adulthood. Journal of Family Psychology, 11, 210-221. Tangney, J. P., Baumeister, R. F., & Boone, A. L. (2004). High self-control predicts good adjustment, less pathology, better grades, and interpersonal success. Journal of Personality, 72, 271-322. Taylor, S. E., Lerner, J. S., Sherman, D. K., Sage, R. M., & McDowell, N. K. (2003). Portrait of the self-enhancer: Well adjusted and well liked or maladjusted and friendless? Journal of Personality and Social Psychology, 84, 165-176. Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A meta-analytic review. Personnel Psychology, 44, 703-742. Thompson, B. (2003). Score reliability: Contemporary thinking on reliability issues. Thousand Oaks, CA: Sage. Thompson, B., & Vacha-Haase, T. (2000). Psychometrics is datametrics: The test is not reliable. Educational and Psychological Measurement, 60, 174-195. Tylka, T. L., & Subich, L. M. (2002). Exploring the construct validity of the eating disorder continuum. Journal of Counseling Psychology, 46, 268-276. Tylka, T. L., & Subich, L. M. (2002). Exploring young womens perceptions of the effectiveness and safety of maladaptive weight control techniques. Journal of Counseling & Development, 80, 101-110. Vacha-Haase, T. (1998). Reliability generalization: Exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6-20. Vacha-Haase, T., Henson, R. K., & Caruso, J. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62, 562-569. Van Lange, P. A. M., Rusbult, C. E., Drigotas, S. M., Arriaga, X. B., Witcher, B. S., & Cox, C. L. (1997). Willingness to sacrice in close relationships. Journal of Personality and Social Psychology, 72, 1373-1395. Vargas, P. T., von Hippel, W., & Petty, R. E. (2004). Using partially structured attitude measures to enhance the attitude-behavior relationship. Personality and Social Psychology Bulletin, 30, 197-211. Viki, G. T., & Abrams, D. (2002). But she was unfaithful: Benevolent sexism and reactions to rape victims who violate traditional gender role expectation. Sex Roles, 47, 289-293. Vispoel, W. P., & Forte Fast, E. E. (2000). Response biases and their relation to sex differences in multiple domains of self-concept. Applied Measurement in Education, 13, 79-97.

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

544

Educational and Psychological Measurement

Viswesvaran, C., & Ones, D. S. (1999). Meta-analyses of fakability estimates: Implications for personality measurement. Educational and Psychological Measurement, 59, 197-210. Wallbridge, H. R. (1995). Self-enhancing illusion and mental health: A longitudinal panel latentvariable structural equation model. Unpublished doctoral dissertation, University of Manitoba, Canada. Wang, Y. W., Davidson, M. M., Yakushko, O. F., Savoy, H. B., Tan, J. A., & Bleier, J. K. (2003). The Scale of Ethnocultural Empathy: Development, validation, and reliability. Journal of Counseling Psychology, 50, 221-234. Watley, L. D., & May, D. R. (2004). Enhancing moral intensity: The role of personal and consequential information in ethical decision-making. Journal of Business Ethics, 50, 105-126. Way, I. F. (1999). Adolescent sexual offenders: The role of cognitive and emotional victim empathy in the victim-to-victimizer process. Unpublished doctoral dissertation, Washington University, St. Louis, MO. Wilkerson, J. M., Nagao, D. H., & Martin, C. L. (2002). Socially desirable responding in computerized questionnaires: When questionnaire purpose matters more than the mode. Journal of Applied Social Psychology, 32, 544-559. Wilkinson, L., & APA Task Force on Statistical Inference. (1999). Statistical methods in psychological journals: Guideline and explanations. American Psychologist, 54, 594-604. Williams, S. S., & Payne, G. H. (2002). Perceptions of own sexual lies inuenced by characteristics of liar, sex partner, and lie itself. Journal of Sex & Marital Therapy, 28, 257-267. Worrell, F. C., & Cross, W. E. (2004). The reliability and validity of Big Five inventory scores with African American college students. Journal of Multicultural Counseling and Development, 32, 18-32. Yin, P., & Fan, X. (2000). Assessing the reliability of Beck Depression Inventory scores: Reliability generalization across studies. Educational and Psychological Measurement, 60, 201-223.

Downloaded from http://epm.sagepub.com by Alexandra Macarie on October 2, 2009

Você também pode gostar