Evaluating The Scienti C Veracity of Publications by Dr. Jens Förster

Carel F.W. Peeters Chris A.J. Klaassen Mark A.
van de Wiel
May 15, 2015
Evaluating the Scientific Veracity

of Publications by dr. Jens Forster
II
Carel F.W. Peeters

cf.peeters@vumc.nl
Chris A.J. Klaassen
c.a.j.klaassen@uva.nl
Mark A. van de Wiel
mark.vdwiel@vumc.nl
Note that the authors do not represent any departmental or institutional affiliation with this report.
Typeset in LATEX 2 by the authors using Springer Verlags svmono.cls document class.
Executive Summary
On the request of the board of the University of Amsterdam we have investigated the scientific
veracity of 24 publications (co)authored by prof. dr Jens Forster. These 24 publications are of the
empirical kind and were produced during dr. Forsters affiliation to the University of Amsterdam.
The results of our investigation are presented in this report.
Several psychological experiments conducted in these publications have a rather rare, linear
relation as their outcome. It would be quite surprising if the population that is supposed to be
represented in such an experiment, would exhibit such a linear relation. Moreover, even under the
hypothesis that such a linear relation holds within the population, the linearity as seen in the
outcomes often is too good to be true and is in conflict with the unavoidable randomness these
outcomes should have. We have quantified the extent to which these features of the outcomes are
conflicting. Too strong such a conflict between linearity and randomness undermines the scientific
veracity of the investigated experiment.
Our investigation has resulted in Tables 17.1 through 17.4, which can be found in Chapter
17. Table 17.1 lists 8 publications that show strong evidence for low scientific veracity, Table 17.2
lists 3 publications that show inconclusive evidence for low scientific veracity, and Table 17.3
lists 4 publications that show no evidence for low scientific veracity. The cumulative evidence of
these tables renders a coincidence hypothesis extremely unlikely. Table 17.4 lists the 9 remaining
publications that could not be scrutinized with our present methods.
Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III

1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Background to the F
orster Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Terms of Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Disclaimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7 Employing the Methods to Reference Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
1
1
2
2
4
4
4
Part I Publications as Sole Author

2
JF11.JEPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Participant Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Expert Ratings: Global vs Local Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Expert Ratings: Local Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
11
12
12
13
13
13
JF10.EJSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Experiment 1: Word recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Experiment 1: Face recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
17
17
17
18
19
JF09.JEPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 First set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Secondary dependent variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
21
22
22
22
25
Part II Publications as First Author
VI
Contents
JF.D12.JESP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Exemplar 1, liking ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.4 Study 1, Exemplar 1, typicality ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.7 Study 1, Exemplar 1, reaction times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
29
29
29
30
31
31
31
31
34
34
35
35
JF.D12.SPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Participant scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.2 Expert ratings (on creativity) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
37
39
39
39
41
JF.EO09.PSPB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Analytic task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.2 Creative task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.3 Global\local processing task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
43
43
43
44
45
45
JF.LS09.JEPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.1 First set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.2 Second set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.3 Remaining dependent variables Experiment 1b . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.4 Follow-up: Collapsing atypical exemplar ratings of Experiment 4a over
valence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
47
47
47
49
50
JF.LK08.JPSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.1 Analysis first set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.2 Analysis second set of independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.3 Analysis third dependent variable Experiment 5 . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.4 Analysis reported Pooled results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
53
53
53
55
55
57
58
Part III Publications as Co-author

Data Collected (Partly) in Amsterdam
51
52
Contents
VII
10 WCY.JF11.JESP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
61
61
61
11 L.JF09.JPSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.1 Analysis independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.2 Analysis secondary set independent samples Study 4 . . . . . . . . . . . . . . . . . . . . .
11.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
63
63
63
66
66
Part IV Publications as Co-author

Data Collected (Partly) in Bremen or W
urzburg
12 D.JF.LR10.PSPB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2.3 Analysis reaction times Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.2.4 Analysis control question ratings Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . .
12.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
69
69
69
70
71
71
71
13 K.JF.D10.SPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13.2.3 Analysis remaining samples Experiment 3 (reaction times) . . . . . . . . . . . . . . . .
13.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
75
75
75
76
77
78
14 D.JF.L09.JESP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.1 Reaction times Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.2 Reaction times Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.2.3 Behavioral aggression measure Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
79
79
79
81
82
82
15 L.JF09.CS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
83
83
83
83
85
Part V Publications as Co-author

Data Indicated as Collected by Other Authors
VIII
Contents
16 FG.JF12.MP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
89
89
89
Part VI Concluding Remarks

17 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
17.1 Classification of investigated publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
17.2 Cumulative evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Appendix: Some Technical Details on the Methods Employed . . . . . . . . . . . . . . . . . . . 97
A.1 The F Test and Fishers Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.1.2 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.1.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A.2 The Evidential Value V . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
A.2.2 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
A.2.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
1
Introduction
1.1 Preamble
Untruthful publications that pretend to present scientifically justified results, undermine the building of scientific knowledge. Moreover, they erode public trust in scientific research and may blacken
an institute as well as complete fields of study. Furthermore, such publications hamper the cumulative nature of the scientific endeavor as studies that build on untruthful results, are useless.
From an economic standpoint, untruthful research may be viewed in terms of wastage of public
funds. From these perspectives, eradication of the tumors that untruthful publications are, is of
the utmost importance.
In September of 2012 a report (Whistle Blower Report, dated September 7, 2012) was filed to
the University of Amsterdam (UvA) regarding the suspicion of misconduct by social psychologist
dr. Jens F
orster (JF). At the time, JF was affiliated with the UvA. This whistle blower (WB)
report investigated three publications by JF and argued that the reported results are much too
nice to be true, i.e., that it is very unlikely that they have been generated by real life data. The
current report evaluates the evidence for scientific irregularities in all publications (co)authored
by JF while affiliated to the University of Amsterdam. It does so from a statistical perspective,
meaning that the likelihood is quantified that the reported results are based on unnatural data.
1.2 Background to the F

orster Case
After receiving the WB report on September 11, 2012, the board of the University of Amsterdam
installed an ad hoc committee on September 29, 2012, that investigated the publications analyzed in
the WB report. This committee expressed concern about these publications, but stated that there
was insufficient evidence to conclude that JF had acted against the mores of scientific integrity. In
2013 the Dutch National Board for Research Integrity (LOWI) accepted the conclusion by the ad
hoc committee. Subsequently, the WB filed a complaint at LOWI against this decision on July 31,
2013, spurring an official investigation into the Forster and Denzler (2012) publication included in
the original WB report. On March 28, 2014, the LOWI finished its new investigation, concluding
that the data underlying the results in Forster and Denzler (2012) must have been manipulated for
which JF is to be held responsible (LOWI, 2014). To date, JF denies the allegations of misconduct
(see the blogroll at http://www.socolab.de/main.php?id=66 for JFs personal defense).
1.3 Terms of Reference

In October 2014 the Rector of the UvA has invited the authors of the current report to investigate
all JF papers published under UvA affiliation. The investigation ideally should shed light on the
status of these publications in terms of judgments on their veracity. This report provides the
statistical evidence for such judgments. Table 1.3 in Section 1.8 lists all publications that have
been investigated.
1 Introduction
1.4 Procedure
The procedure applied in the present report may be characterized by the following statements:
Evaluation will focus on individual publications. The Whistleblower Report (2012) presents,
next to results on 3 investigated publications individually, also summary measures (regarding
the deviance of the results reported by JF) over this complete batch of investigated publications.
Here the focus lies with judgments regarding the veracity of individual manuscripts.
Evaluation will be of a statistical nature only. The weight of evidence lies solely with anomalies
in the reported data and results.
Two methods are employed to each publication to assess its veracity. These methods are (a)
F testing paired with Fishers method and (b) the evidential value V. They are described in
Section 1.5.
These methods are geared towards (anomalies in reported patterns for) one-way ANOVA-type
designs with 3 factor levels. Such designs form a staple for study-setup in JF publications.
Input for these methods are the results reported in the individual JF publications, i.e., only
summary measures (means and standard deviations) are at disposal.
Results from the methods employed form the basis for qualitative judgments regarding the
veracity of a publication. These qualitative judgments take the following form (note that each
JF publication is composed of multiple (sets of) independent (sub)experiments):
Strong evidence: The evidence for low scientific veracity is strong when the F test paired
with Fishers method gives left-tail probabilities of at least 0.999 and/or when the number
of substantial evidential values in relation to the number of constituent (sub)experiments
in a publication abides one of the following classifications:
no. of constituent (sub)experiments no. of substantial Vs
1
25
6 11
12 21
1
2
3
4
An evidential value is deemed substantial when it is greater than or equal to 6. For example,
when a publication has four constituent studies and at least two of those studies sort an
evidential value of at least 6, then the evidence for low scientific veracity is considered
strong.
Inconclusive evidence: When there is no strong evidence for low scientific veracity (according to
the judgment above), but there are multiple constituent (sub)experiments with a substantial
evidential value, then the evidence for low scientific veracity of a publication is considered
inconclusive.
No evidence: When there is no strong evidence nor inconclusive evidence for low scientific
veracity (according to the judgments above), then the publication is considered to show no
evidence for low scientific veracity.
These guidelines should be applied with care.
1.5 Methods
Most publications by JF have many (independent) constituent (sub)experiments. The Whistleblower
Report (2012) pointed, for some of these publications, attention to the linearity of the trend across
experimental conditions in one-way ANOVA-type designs with 3 factor levels (a setup used in
many JF papers). The effects deviate, given the reported standard deviations and sample sizes,
too little from linearity across studies, even under the assumption of perfect linearity in the population. Focus here lies with the evaluation, from multiple angles, of such anomalistic trends in
1.5 Methods
comparable study-designs in JF publications. Usage of different methods may counter the critique
that outcomes are (partially) driven by the chosen approach towards the evaluation of the veracity
of the publication. In addition, when multiple methods yield evidence for low veracity, the basis for
qualitative judgments regarding untruthfulness is strengthened. Below one can find a non-technical
description of the methods employed. Technical details on the methods can be found in Appendix
A.
1. F testing paired with Fishers method : The first method is taken from the Whistleblower
Report (2012) and pairs nested F -testing with the Fisher method. The ANOVA F -model for
one-way factorial designs with 3 levels of an experimental factor has 2 regression parameters. A
linear regression between the low and high levels of the experimental factor has only 1 regression
parameter. This linear regression can be viewed as a reduced model and is nested within the
ANOVA model with 2 regression parameters. One can then perform a nested F -test (F test)
to assess if the more complex model significantly contributes to model fit. The null hypothesis
in this situation states that the means for the factor-levels have a perfect linear relation. If the
empirical results approach linearity, the p-value for the F test approaches the value 1. When
the null hypothesis is true, i.e., when perfect linearity holds in the population, the p-values
for the F test are distributed uniformly between 0 and 1. When the null hypothesis does
not hold, i.e., when linearity does not hold in the population, the p-values for the F test
tend to take values close to 0. Observing p-values that consistently creep towards 1 then raises
suspicion. The deviance of consistently high p(F )-values can then be formalized with the
Fisher method. Combining results on independent samples and usage of left-tail probabilities
then indicates how strongly the accumulation of tests favors the shared null. This enables
probabilistic judgments regarding the extremity of the observed consistency w.r.t. linearity
under the assumption that the null hypothesis is true. This setup is of use when a publication
contains many independent studies.
2. The evidential value V: This method can be found in Klaassen (2015). It is based on the basic
premise that humans tend to underestimate variation due to randomness when fabricating
data. Within the framework of the ANOVA model this is incorporated by allowing for dependence between the measurement errors of the respective factor-levels. The evidential value then
assesses the hypothesis of a dependence structure in the underlying data, which indicates
low scientific veracity, versus the hypothesis of independence, which is the ANOVA model
assumption. Klaassen (2015). The evidential value can take values ranging from 1 to infinity.
Honest experiments can be expected to have a V near 1, while experiments with unnatural
data will sort higher values for V. A V of at least 6 is deemed substantial and thus indicative
of a dependence structure that proper experiments should not exhibit. The evidential value
may thus be used to assess individual constituent (sub)experiments within a publication. When
multiple independent (sub)experiments are available within a publication, an overall evidential value can be obtained by multiplication. The probability, under independence of the test
persons, that an experiment yields a substantial evidential value V equals at most 0.0809, approximately. The maximum probability of 0.0809 is attained only if exact linearity of the means
holds in the population; see Section A.2.2 in the Appendix. The results of any experiment with
a substantial evidential value should be handled with caution.
Assuming that the results of all (sub)experiments within a publication are independent,
quod non, and under independence of the test persons within all experiments, we may bound
the probability of strong evidence as follows:
no. of constituent (sub)experiments no. of substantial Vs probability of strong evidence
1
25
6 11
12 21
1
2
3
4
0.08094
0.05554
0.05344
0.08475
In judging the p(F )-values and the evidential values in order to classify the publications, it
makes sense to choose simplifying thresholds. Of course, information is lost in this way, but it
1 Introduction
enables one to compute probabilities as in the table above. Note that the actual probabilities
for this table will be much smaller than the values given, since these probabilities have been
computed under the assumption that exact linearity of the means holds in all (sub)experiments
involved.
Note that left-tail probabilities and overall evidential values are allowed to grow more extreme
when the number of independent samples increases. When the number of independent samples is
low, the weight of evidence shifts towards evidential values for individual experiments. The F
testing approach paired with Fishers method gives a frequentist perspective. The evidential value
is based on a forensic/Bayesian perspective.
1.6 Disclaimer
Note that the methods employed cannot demarcate witting practices (such as fraud and manipulation) from unwitting practices (such as erroneous or questionable research practices) leading to
low veracity of the reported data. The question is if the veracity of the data on which a given publication is based can be deemed sufficient. If the data patterns are, from a statistical standpoint,
extremely unlikely, the veracity of the reported data is in doubt. Whether such data patterns are
due to witting or unwitting practices then, is of secondary importance: Of main import is that the
data are to be met with distrust, calling into question the scientific value of the publication.
Importantly, it must be emphasized that the empirical trustworthiness of publications by JF is
under scrutiny, not the integrity of his co-authors. The report does not imply, nor does it intend
to imply, that the collaborators of JF were involved in problematic or dubious practices.
1.7 Employing the Methods to Reference Publications

For purposes of comparison and interpretation of the numerical results obtained by employing the
described methods to the JF publications it is deemed useful to employ them also to a collection
of similar publications in the same field of study (social psychology). Such a collection of similar
publications would then serve as a reference or control group. Table 1.1 lists the digital object identifiers (DOIs) of ten publications bearing 21 independent samples. These are the same publications
and samples that served as the reference group in the WB report (Whistleblower Report, 2012)
and in Klaassen (2015). We confine by referring to Whistleblower Report (2012) for information
on how these reference publications were obtained.
Table 1.2 lists the necessary data (cell sizes, cell means and corresponding standard deviations)
as well as the corresponding results on the F test and the evidential value for the samples from
the control publications. Figures 1.1 and 1.2 depict the corresponding trend lines. We note that
(a) the results of the F test comply with those expected under the null hypothesis of linearity,
and (b) the majority of evidential values V are below 2 (and close to unity). These results may
serve as a reference in evaluating the analogous quantities in the JF publications.
1.8 Overview
Table 1.3 contains all JF publications under his UvA affiliation that carry empirical results (original
list provided by UvA). The articles are grouped according to type: Publications as sole author,
publications as first author, publications as co-author with data collected in either Amsterdam
or Bremen/W
urzburg, publications as co-author with data collected in online experiments, and
publications as co-author with data indicated as having been collected by authors other than
JF. Within these groups, the publications are ordered in descending fashion according to year
of publication. The listed order is the order in which the publications will be evaluated in the
remainder of this report. Publications marked by an asterisk are not included in the report. All
1.8 Overview
Table 1.1. DOIs of the control/reference publications. There are ten publications carrying 21 independent
samples. Source: Whistleblower Report, 2012
sample
DOI
Hagtvedt 1
Hagtvedt 2
Hunt
Jia
Kanten 1
Kanten 2
Lerouge 1
Lerouge 2
Lerouge 3
Lerouge 4
Malkoc
Polman
Rook 1
Rook 2
Smith 1
Smith 2
Smith 3
Smith 4
Smith 5
Smith 6
Smith 7
10.1177/0146167211415631
10.1177/0146167211415631
10.1002/acp.1352
10.1016/j.jesp.2009.05.015
10.1016/j.jesp.2011.04.005
10.1016/j.jesp.2011.04.005
10.1086/599047
10.1086/599047
10.1086/599047
10.1086/599047
10.1016/j.obhdp.2010.07.003
10.1177/0146167211398362
10.1080/10400419.2011.621844
10.1080/10400419.2011.621844
10.1037/0022-3514.90.4.578
10.1037/0022-3514.90.4.578
10.1037/0022-3514.90.4.578
10.1037/0022-3514.90.4.578
10.1037/0022-3514.90.4.578
10.1037/0022-3514.90.4.578
10.1016/j.jesp.2006.12.005
Table 1.2. Results for F and V on the reference/control publications. The number of observations per
cell is indicated by n, p(F ) denotes the p-value of the F test, and SD = standard deviation.
means
sample
Hagtvedt1
Hagtvedt2
Hunt
Jia
Kanten1
Kanten2
Lerouge1
Lerouge2
Lerouge3
Lerouge4
Malkoc
Polman
Rook1
Rook2
Smith1
Smith2
Smith3
Smith4
Smith5
Smith6
Smith7
SDs
n low/high medium high/low low/high medium high/low

141/6
141/6
75/3
132/3
269/6
269/6
63/3
63/3
54/3
54/3
521/3
65/3
168/6
168/6
73/3
76/3
113/3
140/3
125/3
97/3
144/3
4.39
3.22
1.48
1.09
3.29
3.02
4.24
2.95
4.90
3.69
4.72
4.69
6.22
5.39
4.38
14.83
0.42
4.70
14.52
10.85
4.64
3.97
3.84
1.04
0.70
3.14
2.99
2.48
2.81
3.31
2.67
5.36
3.50
6.13
5.22
4.26
12.69
0.53
7.90
13.43
8.64
4.84
3.84
4.11
1.04
0.59
2.66
2.85
2.14
2.62
2.79
2.50
6.19
2.91
4.73
4.61
3.55
11.88
0.56
11.80
12.85
8.32
5.49
0.76
0.98
0.82
0.89
1.11
0.80
1.51
2.44
2.22
2.78
4.96
2.37
3.05
2.14
1.53
4.62
0.20
7.40
2.81
5.07
1.30
1.26
1.02
0.68
0.69
0.94
0.84
2.16
1.81
2.09
2.51
9.08
2.09
2.19
2.58
1.36
4.95
0.19
11.40
3.27
3.61
1.56
1.14
1.46
0.68
0.62
0.71
0.70
2.13
2.25
1.66
1.66
10.58
2.42
1.95
2.28
1.07
4.75
0.19
20.40
3.94
4.17
1.28
F
0.2852
0.3483
1.5152
1.0437
0.9318
0.1478
1.8439
0.0018
0.8550
0.3874
0.0143
0.2462
1.3421
0.1649
0.7938
0.3275
1.0743
0.0190
0.1588
1.0289
0.8435
p(F )
0.5951
1.3955
0.5570
1.1741
0.2224
1.0000
0.3089
1.0000
0.3362
1.0014
0.7013
1.7535
0.1796
1.0000
0.9660 12.226513.0148
0.3595
1.0094
0.5364
1.2055
0.9048 5.25585.2663
0.6215
1.3369
0.2501
1.0000
0.6857
1.6933
0.3760
1.0146
0.5689
1.2640
0.3023
1.0000
0.8905
4.0388
0.6909
1.6268
0.3130
1.0000
0.3600
1.0200
1 Introduction
Fig. 1.1. Trend lines for the Hagtvedt1 to Polman samples from the control publications. The error bars
represent one standard deviation from the cell mean.
experiments studied in such a publication have designs that fall outside the scope of our methods.
We do not have the means to assess these publications formally at current.
Each following chapter considers a single JF publication. Each chapter then elaborates on the
specific design of the studies employed, reports on the results obtained with the methods discussed
1.8 Overview
Fig. 1.2. Trend lines for the Rook1 to Smith7 samples from the control publications. The error bars
represent one standard deviation from the cell mean.
in Section 1.5 according to the procedure stated in Section 1.4. Also, the results obtained are
evaluated in light of the results on the control papers of Section 1.7. Chapter 17 contains an
overview of the investigated publications that, according to the statistical evidence, appear to be
scientifically compromised.
We note that the terms study, and experiment are not used consistently throughout the
JF papers, but that we will use them in accordance with the individual publications evaluated.
We also note that each chapter is self-contained (in conjunction with this introduction and the
Appendix), implying that there is some redundancy in presentation. This text is accompanied
by two R scripts: DataVeracity.R and Analysis.R. The former contains functions implementing
the methods of Section 1.5. The latter script then contains the annotated code for obtaining the
presented results.
1 Introduction
Table 1.3. The 24 JF publications under UvA affiliation that carry empirical results. The abbreviations
are used as a shorthand to denote the respective papers either in this text or in the accompanying R code.
The 9 publications marked with an asterisk (*) were not assessed formally with the methods described in
Section 1.5.
Abbreviation
JF11.JEPG
JF10.EJSP
JF09.JEPG
JF09.JESP*
JF.B12.EJSP*
JF.D12.JESP
JF.D12.SPPS
JF.OE10.JESP*
JF.EO09.PSPB
JF.LS09.JEPG
JF.LK08.JPSP
WCY.JF11.JESP
L.JF09.JPSP
D.JF.LR10.PSPB
K.JF.D10.SPPS
D.JF.L09.JESP
L.JF09.CS
L.JF08.SC*
S.JF08.PACA*
W.JF07.JASP*
VE.JF08.HR*
Publication
As sole author :
F
orster, J. (2011).
Journal of Experimental Psychology: General, 140: 364-389
F
orster, J. (2010).
European Journal of Social Psychology, 40: 524-535
F
orster, J. (2009).
F
orster, J. (2009).
Journal of Experimental Social Psychology, 45: 444-447
As first author among co-authors:
F
orster, J. and Becker, D. (2012).
F
orster, J. and Denzler, M. (2012)
F
orster, J. and Denzler, M. (2012).
Social Psychological and Personality Science, 3: 108-117
F
orster, J., Ozelsel,
A., and Epstude, K. (2010).
F
orster, J., Epstude, K., and Ozelsel,
A. (2009).
Personality and Social Psychology Bulletin, 35: 1479-1491
F
orster, J., Liberman, N., and Shapira, O. (2009).
F
orster, J., Liberman, N., and Kuschel, S. (2008).
Journal of Personality and Social Psychology, 94: 579-599
As co-author, data collected (partly) in Amsterdam:
Woltin, K.-A., Corneille, O., Yzerbyt, V.Y., and F
orster, J. (2011).
Liberman, N. and F
orster, J. (2009).
As co-author, data collected (partly) in Bremen or W
urzburg:
Denzler, M., F
orster, J., Liberman, N., and Rozenman, M. (2010).
Kuschel, S., F
orster, J., and Denzler, M. (2010).
Denzler, M., F
orster, J., and Liberman, N. (2009).
Liberman, N. and F
orster, J. (2009).
Cognitive Science, 33: 1330-1341
Liberman, N. and F
orster, J. (2008).
Social Cognition, 26: 515-533
Schimmel, K. and F
orster, J. (2008).
Psychology of Aesthetics, Creativity, and the Arts, 2: 53-60
Werth, L. and F
orster, J. (2007).
Journal of Applied Social Psychology, 37: 2764-2787
As co-author, data collected in online experiment:
Voelpel, S.C., Eckhoff, R.A., and F
orster, J. (2008).
Human Relations, 61: 271-295
As co-author, data indicated as having been collected by other authors:

GV.JF.MS12.EJSP* Gervais, S.J., Vescio, T.K., F
orster, J., Maass, A., and Suitner, C. (2012).
FG.JF12.MP
Friedman, R.S., Gordis, E., and F
orster, J. (2012).
Media Psychology, 15: 249-266
DH.JF11.PSPB*
Denzler, M., H
afner, M., and F
orster, J. (2011).
Part I
Publications as Sole Author
2
JF11.JEPG
Publication Investigated
F
orster, J. (2011). Local and global cross-modal influences between vision and hearing, tasting,
smelling, or touching. Journal of Experimental Psychology: General, 140: 364389.
2.1 Synopsis
This publication was also included in the Whistleblower Report (2012). It features 16 studies. Studies 5A to 5D feature participant scores as well as expert ratings. The expert ratings actually imply
nested data (participants rated by experts), however, they will be evaluated from the perspective
(as in the publication investigated) of a between factorial design. The participant scores and expert
ratings are treated separately in the construction of sets of independent samples. Tables 2.1 and
2.2 provide an overview of the design of the studies regarding participant scores and expert ratings,
respectively. The publication reports that in each study the participants were assigned randomly
to a local, control, or global condition. Studies 2C, 3C, and 4C feature 2 factor levels and are not
analyzed here.
Table 2.1. Design studies regarding participant scores.
Study
Design
Dependent variables
1A
1B
1C
2A
2B
2C
3A
3B
3C
4A
4B
4C
5A
5B
5C
5D
3
3
3
3
3
2
3
3
2
3
3
2
3
3
3
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
between
between
between
between
between
between
between
between
between
between
between
between
between
between
between
between
12
2 JF11.JEPG
Table 2.2. Design studies regarding expert ratings.
Study
Design
Dependent variables
5A
5B
5C
5D
3
3
3
3
1
2
1
1
between
between
between
between
2.2 Results
2.2.1 Participant Scores
Trend lines for the independent samples regarding participant scores can be found in Figure 2.1.
The trend lines indicate very consistent linear effects. They may hint that (at least for the sample
sizes reported) the variation in group means may deviate too little from linearity given the spread
reported for the respective conditions. Table 2.3 lists the corresponding data (cell sizes, cell means
and corresponding standard deviations) as well as the corresponding results on the F test and
the evidential value.
Table 2.3. Results on the independent samples regarding participant scores. The number of observations
per cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, NaN =
not a number.
Study
study1A
study1B
study1C
study2A
study2B
study3A
study3B
study4A
study4B
study5A
study5B
study5C
study5D
n
48/3
58/3
57/3
61/3
45/3
44/3
44/3
44/3
43/3
42/3
42/3
42/3
42/3
means
SDs
low control high
low control high
27.00
19.00
20.00
16.00
19.00
22.00
21.00
22.00
21.00
6.90
2.79
3.00
2.96
31.00
27.00
28.00
25.00
31.00
30.00
30.00
29.00
30.00
9.74
3.79
5.05
6.14
38.00
37.00
39.00
33.00
40.00
36.00
39.00
37.00
39.00
12.38
4.86
7.00
9.50
11.00
13.00
12.00
14.00
15.00
14.00
13.00
15.00
13.00
3.06
1.31
1.20
1.26
13.00
14.00
11.00
15.00
9.00
13.00
11.00
11.00
9.00
3.71
1.19
2.22
3.80
10.00
12.00
10.00
13.00
10.00
14.00
9.00
13.00
10.00
3.23
1.51
3.61
5.96
F
0.1846
0.0760
0.2342
0.0172
0.1663
0.0523
0.0000
0.0142
0.0000
0.0083
0.0063
0.0036
0.0044
p(F )
0.6695
1.6403
0.7839
2.3664
0.6303
1.4074
0.8960
4.8226
0.6855
1.4756
0.8203
2.6599
1.0000
NaN
0.9056
4.7322
1.0000 15.2314NaN
0.9277 6.17197.0386
0.9370 6.46447.2233
0.9524
9.4973
0.9475
8.7926
The p-values for the F test are consistently high (most are above .8), while under the null
hypothesis of perfect linearity in the population these p-values (by definition) are expected to be
uniformly distributed between 0 and 1. Employing Fishers method in combining these p-values
gives a left-tail probability of 1 4.255229e-7 .9999996. Thus, the accumulation of tests on
the similar null hypotheses of linearity very strongly favors the shared null. Or, roughly speaking,
under the assumption of perfect linearity in the population, the probability of finding results at
least as consistent w.r.t. linearity amounts to 1 in 2, 350, 050.
The instances of NaN for the evidential value in Table 2.3 are due to divisions by 0 (in its
calculation). In a sense, one could conceive of as being a lower-bound to NaN in these instances.
Many evidential values in Table 2.3 lie above 6. The overall V is found to have a lower-bound
(when leaving out V for Study 3B) of 24, 833, 154.
2.3 Remarks
13
2.2.2 Expert Ratings: Global vs Local Descriptions

Trend lines for the independent samples regarding expert ratings can be found in Figure 2.2. These
trend lines also display very consistent linear effects. Table 2.4 lists the corresponding data and
results on the F test and the evidential value. Again, consistently high p-values for the F test
and substantial (ranges for) evidential values are found. Fishers method gives a left-tail probability
of 1 .0004790825 = .9995209 giving a probability of finding results at least as consistent w.r.t.
linearity of 1 in 2, 087. The overall V has a lower-bound of 294.4692.
Table 2.4. Results on the independent samples regarding expert ratings. The number of observations per
cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.
means
Study
study5A.2
study5B.2
study5C.2
study5D.2
SDs
n low control high low control high

42/3
42/3
42/3
42/3
2.75
2.21
2.55
2.33
3.91
3.54
4.14
3.71
4.89
4.90
5.55
5.12
1.38
1.37
1.87
1.34
1.10
2.07
2.04
1.93
1.33
1.60
1.73
1.82
F
0.0464
0.0007
0.0213
0.0007
p(F )
0.8305
2.6986
0.9787 3.953025.1064
0.8847
4.3744
0.9788 6.310324.1770
2.2.3 Expert Ratings: Local Descriptions

A second dependent variable is reported for the expert ratings regarding Study 5B: local descriptions. The trend line can be found in Figure 2.3 while Table 2.5 lists the corresponding data and
results. Again, a high p-value for the F test is found as well as a substantial range for the
evidential value.
Table 2.5. Results on the expert ratings regarding local descriptions. The number of observations per cell
is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.
means
Study
SDs
study5B.3 42/3 2.36 3.57
4.71 1.38 1.93
p(F )
1.17 0.0049 0.9445 3.19589.8906
2.3 Remarks
Note that the JF11.JEPG publication was also included in the Whistleblower Report (2012).
The results on the F test reported above concur with this report. They convey that the linear
pattern seems too consistent. In addition, the evidential values imply the presence of a dependence
structure between test persons. Comparison to the results obtained on the control publications (see
Table 1.2) may further strengthen the notion of deviance of the results reported in JF11.JEPG.
The evidence for low scientific veracity of this publication is considered strong according to the
criterion of Section 1.4.
14
2 JF11.JEPG
Fig. 2.1. Trend lines for the independent samples regarding participant scores. The error bars represent
one standard deviation from the cell mean.
2.3 Remarks
15
Fig. 2.2. Trend lines for the independent samples regarding expert ratings. The error bars represent one
standard deviation from the cell mean.
Study 5B: Local descriptions (expert ratings)
6
global
control
local
Fig. 2.3. Trend line for the expert ratings regarding local descriptions. The error bars represent one
3
JF10.EJSP
F
orster, J. (2010). How love and sex can influence recognition of faces and words: A processing
model account. European Journal of Social Psychology, 40: 524535.
3.1 Synopsis
This publication features 2 experiments. Table 3.1 provides an overview of their design. Experiment
2 features 4 levels for the between-factor and is not analyzed here. Experiment 1 can be analyzed
as a one-way design with 2 dependent variables (word recognition and face recognition).
Table 3.1. Design experiments.
Experiment
Design
Dependent variables
1
2
3 between 2 within
4 between 2 within
1
1
3.2 Results
3.2.1 Experiment 1: Word recognition
The trend line for the word-recognition part of Experiment 1 can be found in Figure 3.1. It conveys
a perfect linear effect for the experimental condition (supraliminal priming). Table 3.2 lists the
corresponding data (cell size, cell means and corresponding standard deviations) as well as the
corresponding results on the F test and the evidential value.
The p-value for the F test amounts to unity (1) due to the perfect linearity. Also, there is
a substantial lower-bound for the evidential value (10.4326). The upper-bound for the evidential
value may be termed extreme.
18
3 JF10.EJSP
Table 3.2. Results on Experiment 1: Word recognition. The number of observations per cell is indicated
by n, p(F ) denotes the p-value of the F test, and SD = standard deviation.
means
Study
SDs
n low control high low control high F p(F )
Exp1.words 45/3 0.14 0.29 0.44 0.23 0.22 0.16
V
10.43261.4716e+15
Experiment 1: Words recognition

0.6
0.4
0.2
0.0
love
control
sex
Fig. 3.1. Trend line for Experiment 1: Word recognition. The error bars represent one standard deviation
from the cell mean.
3.2.2 Experiment 1: Face recognition

The trend line for the face-recognition part of Experiment 1 can be found in Figure 3.2. It conveys
a very linear effect for the experimental condition. Table 3.3 lists the corresponding data as well
as the corresponding results on the F test and the evidential value. Again, a high p-value for the
F test (0.9307) and a substantial evidential value (6.2330) are found.
Table 3.3. Results on Experiment 1: Face recognition. The number of observations per cell is indicated
by n, p(F ) denotes the p-value of the F test, and SD = standard deviation.
means
Study
SDs
p(F )
Exp1.faces 45/3 0.38 0.51 0.65 0.23 0.14 0.16 0.0076 0.9307 6.2330
3.3 Remarks
19
Experiment 1: Faces recognition

0.8
0.6
0.4
0.2
sex
control
love
Fig. 3.2. Trend line for Experiment 1: Face recognition. The error bars represent one standard deviation
from the cell mean.
3.3 Remarks
Due to the low number of independent samples the weight of evidence lies, as indicated in Chapter
1, with the evidential value. The obtained evidential values imply the presence of a dependence
structure between test persons. Indeed, the evidence for low scientific veracity of this publication
is considered strong according to the criterion of Section 1.4.
4
JF09.JEPG
F
orster, J. (2009). Relations between perceptual and conceptual scope: How global versus local processing fits a focus on similarity versus dissimilarity. Journal of Experimental Psychology: General,
138: 88111.
4.1 Synopsis
This publication was also included in the Whistleblower Report (2012). It features 12 experiments.
Study 4 features expert evaluations, implying nested data (participants evaluated by experts).
Study 4 will however be analyzed without regard to this hierarchical structure (as in the publication
investigated). Experiment 8a features 2 factor-levels and is not analyzed here. Table 4.1 provides
an overview of the design of the experiments. Note that most experiments have a second (between
or within) factor. This means that different sets of independent samples can be constructed (see
also Section 4.3 below).
Table 4.1. Design Experiments.
Experiment
Design
1
2
3a
3b
4
5
6
7a
7b
8a
8b
9
3
3
3
3
3
3
3
3
3
2
3
3
between
between
between
between
between
between
between
between
between
between
between
between
Dependent variables
2
2
2
2
2
2
within
between
between
between
within
within
2 between
2 within
2 between
2 between
1
1
1
1
1
1
1
1
1
1
1
1
22
4 JF09.JEPG
4.2 Results
4.2.1 First set of independent samples
Trend lines for the first set of independent samples can be found in Figure 4.1. The trend lines
indicate very consistent linear effects. They may hint that (at least for the sample sizes reported)
the variation in group means may deviate too little from linearity given the spread reported for the
respective conditions. Table 4.2 lists the corresponding data (cell sizes, cell means and corresponding standard deviations) as well as the corresponding results on the F test and the evidential
value.
Table 4.2. Results on the first set of independent samples. The number of observations per cell is indicated
by n, p(F ) denotes the p-value of the F test, SD = standard deviation, NaN = not a number, sim =
similarities, dis = dissimilarities, glob = global.
Study
exp1.sim
exp2.sim
exp2.dis
exp3a.sim
exp3a.dis
exp3b.sim
exp3b.dis
exp4.sim
exp5.glob
exp6
exp7a.sim
exp7a.dis
exp7b.sim
exp8b.sim
exp8b.dis
exp9.sim
exp9.dis
means
SDs
low control high
low control high
54/3 4.67 6.56

88/6 5.43 6.60
88/6 5.00 6.31
75/6 3.00 4.00
75/6 3.75 5.23
71/6 4.72 6.42
71/6 2.46 3.64
55/3 0.83 1.17
50/3 594.00 689.00
42/3 7.10 8.00
101/6 4.76 6.76
101/6 6.24 7.00
60/3 8.20 9.90
45/6 5.00 7.40
45/6 6.27 6.73
90/6 4.87 7.00
90/6 5.67 6.67
8.67
7.71
7.36
5.00
7.00
8.00
5.50
1.79
759.00
8.93
8.59
7.35
11.00
8.53
7.13
8.67
7.67
2.35
1.83
2.08
1.29
2.18
1.42
1.56
1.29
88.00
1.14
2.39
3.56
2.84
2.00
2.02
2.17
2.97
p(F )
2.53 2.25 0.0256 0.8734

3.9565
3.16 3.93 0.0009 0.9760 12.586320.2388
2.77 1.86 0.0321 0.8588 3.23583.8275
1.54 0.71 0.0000 1.0000
3.1610NaN
1.83 0.95 0.0584 0.8105
2.6542
1.88 2.49 0.0073 0.9327
6.9735
1.43 1.62 0.3852 0.5392
1.1600
1.04 1.44 0.1491 0.7010
1.5706
91.00 138.00 0.1485 0.7017
1.5867
1.62 0.83 0.0014 0.9707 2.772919.1029
2.46 2.09 0.0151 0.9028
5.1282
2.80 3.14 0.0466 0.8300
2.7174
2.63 2.37 0.1748 0.6775
1.5858
2.32 1.73 0.4887 0.4927
1.1513
1.87 2.20 0.0011 0.9740 9.409417.6775
2.23 2.06 0.1140 0.7374
1.9318
2.47 3.11 0.0000 1.0000
10.2393NaN
The p-values for the F test are consistently high, while under the null hypothesis of perfect
linearity in the population these p-values (by definition) are expected to be uniformly distributed
between 0 and 1. Employing Fishers method in combining these p-values gives a left-tail probability
of 1 3.504679e-7 .9999996. Thus, the accumulation of tests on the similar null hypotheses of
linearity very strongly favors the shared null. Or, roughly speaking, under the assumption of perfect
linearity in the population, the probability of finding results at least as consistent w.r.t. linearity
amounts to 1 in 2, 853, 328.
Many experiments are represented by a substantial (lower-bound for the) evidential value. The
overall V is found to have a lower-bound of 357, 847, 863.
4.2.2 Secondary dependent variables
The experiments that have a second within-factor can be viewed as carrying a secondary dependent
variable. Trend lines for these secondary dependent variables can be found in Figure 4.2. These
4.2 Results
23
Fig. 4.1. Trend lines for the first set of independent samples. The error bars represent one standard
deviation from the cell mean.
trend lines also display very consistent linear effects. Table 4.3 lists the corresponding data and
results on the F test and the evidential value. Again, consistently high p-values for the F
24
4 JF09.JEPG
Fig. 4.2. Trend lines for the secondary dependent variables. The error bars represent one standard deviation from the cell mean.
test and substantial (lower-bounds for) evidential values are found. Fishers method gives a lefttail probability of 1 .0005404186 = .9994596, giving a probability of finding results at least as
consistent w.r.t. linearity of 1 in 1, 850. The overall V has a lower-bound of 1, 506.134.
Table 4.3. Results on the secondary dependent variables. The number of observations per cell is indicated
by n, p(F ) denotes the p-value of the F test, SD = standard deviation, dis = dissimilarities, loc =
local.
Study
exp1.dis
exp4.dis
exp5.loc
exp7b.dis
means
SDs
low control high
low control high
54/3 6.17 6.72

55/3 0.42 1.50
50/3 675.00 735.00
60/3 8.00 9.15
7.33
2.56
786.00
10.05
3.54
0.90
63.00
1.95
2.74
1.15
63.00
3.01
3.22
2.36
86.00
3.25
F
0.0011
0.0005
0.0440
0.0267
p(F )
0.9741 9.799817.3454
0.9827 14.084124.2029
0.8347
2.7929
0.8708
3.9071
4.3 Remarks
25
4.3 Remarks
Note that the JF09.JEPG publication was also included in the WB report (2012). The results
on the F test reported above concur with this report. Note, however, that the reported lefttail probabilities obtained by Fishers method differ due to slightly differing sets of independent
samples (for example, here the exp1.sim sample is included in the first set of independent samples
while the WB report includes it in the set of secondary dependent variables). The overall results are
qualitatively similar though: They convey that the linear pattern seems too consistent. In addition,
the evidential values imply, for many experiments, the presence of a dependence structure between
test persons. Comparison to the results obtained on the control publications (see Table 1.2) may
further strengthen the notion of deviance of the results reported in JF09.JEPG. The evidence for
low scientific veracity of this publication is considered strong according to the criterion of Section
1.4.
Part II
Publications as First Author
5
JF.D12.JESP
F
orster, J. and Denzler, M. (2012). When any worx looks typical to you: Global relative to local
processing increases prototypicality and liking. Journal of Experimental Social Psychology, 48:
416419.
5.1 Synopsis
This publication contains 2 studies. For study 1, typicality ratings, liking ratings, and reaction times
are reported as dependent measures. For study 2, only liking ratings are reported as a dependent
measure. Table 5.1 provides an overview of the design of the experiments. The purpose of Study
2 was to replicate Study 1 (w.r.t. liking ratings). The between factor (processing style with levels:
global, local, control) and within factor (typicality with levels: exemplar 1, exemplar 2, exemplar
3) thus concur for Studies 1 and 2.
Table 5.1. Design Studies.
Study
Design
Dependent variables
1
2
3 between 3 within
3 between 3 within
3
1
5.2 Results
5.2.1 Exemplar 1, liking ratings
Trend lines for the liking ratings on the first within factor-level (Exemplar 1) can be found in
Figure 5.1. The trend lines indicate very linear effects. Table 5.2 lists the corresponding data (cell
sizes, cell means and corresponding standard deviations) as well as the corresponding results on
the F test and the evidential value.
The p-values for the F test are high. Employing Fishers method in combining these p-values
gives a left-tail probability of 1 0.00136641 = .9986336. Roughly speaking, under the assumption
of perfect linearity in the population, the probability of finding results at least as consistent w.r.t.
linearity amounts to approximately 1 in 732. The listed evidential values have substantial lowerbounds. The overall V has a lower-bound of 61.71324.
30
5 JF.D12.JESP
Table 5.2. Results on liking ratings for Exemplar 1. The number of observations per cell is indicated by
n, p(F ) denotes the p-value of the F test, SD = standard deviation.
means
Study
SDs
n low medium high low medium high
study1.ex1.liking 60/3 1.65

1.95
2.30
2.20 3.0
2.48 1.7
1.40
1.58
p(F )
1.44 0.0019 0.9652 7.543711.8122

1.95 0.0005 0.9823 8.180826.0208
Fig. 5.1. Trend lines for the liking ratings on Exemplar 1. The error bars represent one standard deviation
from the cell mean.

Trend lines for the liking ratings on the second within factor-level (Exemplar 2) can be found in
Figure 5.2. The linear effects seem less extreme in comparison with the trend lines of Exemplar 1.
Table 5.3 lists the corresponding data and results on the F test and the evidential value.
Employing Fishers method in combining the F p-values gives a left-tail probability of
1 0.1432497 = .8567503. Roughly speaking, under the assumption of perfect linearity in the
population, the probability of finding results at least as consistent w.r.t. linearity amounts to
approximately 1 in 7. The listed evidential values are below 2. The overall V amounts to 2.639857.
means
Study
SDs
study1.ex2.liking 60/3 -0.05

0.80
1.09
2.00 2.61
2.09 2.79
1.51
1.44
p(F )
1.41 0.1106 0.7407 1.7564

1.13 0.1548 0.6953 1.5030
5.2 Results
31
from the cell mean.

Trend lines for the liking ratings on the third within factor-level (Exemplar 3) can be found in Figure
5.3. Table 5.4 lists the corresponding data and results. Employing Fishers method in combining
the F p-values gives a left-tail probability of 1 0.09028803 = .909712. That is, roughly, under
the assumption of perfect linearity in the population, the probability of finding results at least as
consistent w.r.t. linearity amounts to approximately 1 in 11. The listed evidential value for Study
1 has a substantial lower-bound (6.6032). The overall V has a lower-bound of 9.926733.
means
Study
SDs
study1.ex3.liking 60/3 -2.35 -0.45

study2.ex3.liking 68/3 -1.48 -0.05
1.55 2.43
1.91 2.39
1.85
2.36
p(F )
2.09 0.0073 0.9322 6.60326.6669

2.02 0.2072 0.6505
1.5033
5.2.4 Study 1, Exemplar 1, typicality ratings

The trend line for the typicality ratings of Study 1 on the first within factor-level (Exemplar 1)
can be found in Figure 5.4. Table 5.5 lists the corresponding data and results.
The trend line for the typicality ratings of Study 1 on the second within factor-level (Exemplar 2)
The trend line for the typicality ratings of Study 1 on the third within factor-level (Exemplar 3)
32
5 JF.D12.JESP
from the cell mean.
Table 5.5. Results on typicality ratings for Exemplar 1 in Study 1. The number of observations per cell
means
Study
SDs
study1.ex1.typ 60/3 1.95
2.75
2.85 2.4
1.97
p(F )
1.93 0.3666 0.5473 1.1786
means
Study
SDs
study1.ex2.typ 60/3 -0.7
0.7
2.95 2.39
2.34
p(F )
1.54 0.5328 0.4684 1.1118
means
Study
SDs
study1.ex3.typ 60/3 -2.6
-1.2
2.25 2.04
2.51
p(F )
2.65 2.4036 0.1266 1.0000
5.2 Results
33
Study 1: Exemplar 1, typicality ratings

5
control
global
local
Fig. 5.4. Trend line for the typicality ratings on Exemplar 1 in Study 1. The error bars represent one
local
control
global
5.0
2.5
0.0
2.5
5.0
local
control
global
34
5 JF.D12.JESP
5.2.7 Study 1, Exemplar 1, reaction times

The trend line for the reaction times of Study 1 on the first within factor-level (Exemplar 1) can
be found in Figure 5.7. Table 5.8 lists the corresponding data and results.
Table 5.8. Results on reaction times for Exemplar 1 in Study 1. The number of observations per cell is
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation.
means
Study
SDs
study1.ex1.reac 60/3 4771
4779
4801 159
124
p(F )
128 0.0344 0.8536 3.1713

The trend line for the reaction times of Study 1 on the second within factor-level (Exemplar 2)
means
Study
SDs
4887
5034 143
353
p(F )
568 0.0192 0.8902 4.1917
Study 1: Exemplar 1, reaction times
4900
4800
4700
4600
global
control
local
Fig. 5.7. Trend line for the reaction times on Exemplar 1 in Study 1. The error bars represent one standard
5.3 Remarks
35
5400
5100
4800
4500
global
control
local

The trend line for the reaction times of Study 1 on the third within factor-level (Exemplar 3) can
be found in Figure 5.9. Table 5.10 lists the corresponding data and results. The evidential value
(8.8874) is deemed substantial.
means
Study
SDs
5056
5361 375
491
p(F )
529 0.0049 0.9445 8.8874
5.3 Remarks
Left-tail probabilities and overall evidential values are allowed to grow more extreme when the
number of independent samples increases. When the number of independent samples is low, the
weight of evidence shifts towards evidential values for individual (sub)studies. There are multiple
(sub)studies with a substantial (lower-bound for the) evidential value (studies regarding liking
ratings and the reactions times of Study 1). These evidential values imply the presence of a dependence structure between test persons. The evidence for low scientific veracity of this publication is
considered strong according to the criterion of Section 1.4.
36
5 JF.D12.JESP
5600
5200
4800
4400
global
control
local
6
JF.D12.SPPS
F
orster, J. and Denzler, M. (2012). Sense creative! The impact of global and local vision, hearing, touching, tasting and smelling on creative and analytic thought. Social Psychological and
Personality Science, 3: 108117.
6.1 Synopsis
This publication was also investigated in the Whistleblower Report (2012) and in Klaassen (2015).
It features 12 studies. Studies 6 to 10b feature participant scores (analytic performance) as well
as expert ratings (on participant creativity). The expert ratings actually imply nested data (participants rated by experts). However, they will be evaluated from the perspective of a secondary
dependent variable. The participant scores and expert ratings are treated separately in the construction of sets of independent samples. Table 6.1 provides an overview of the design of the
studies.
Table 6.1. Design studies.
Study
Design
Dependent variables
1
2
3
4
5
6
7
8
9a
9b
10a
10b
3
3
3
3
3
3
3
3
3
3
3
3
1
1
1
1
1
2
2
2
2
2
2
2
between
between
between
between
between
between
between
between
between
between
between
between
38
6 JF.D12.SPPS
Fig. 6.1. Trend lines for the participant scores. The error bars represent one standard deviation from the
cell mean.
6.2 Results
39
6.2 Results
6.2.1 Participant scores
Trend lines for the independent samples regarding participant scores can be found in Figure 6.1.
The trend lines indicate very consistent linear effects. They may hint that (at least for the sample
sizes reported) the variation in group means may deviate too little from linearity given the spread
reported for the respective conditions. Table 6.2 lists the corresponding data (cell sizes, cell means
and corresponding standard deviations; see Section 6.3 for information on how these standard
deviations were obtained) as well as the corresponding results on the F test and the evidential
value.
Table 6.2. Results on participant scores. The number of observations per cell is indicated by n, p(F )
denotes the p-value of the F test, SD = standard deviation, NaN = not a number, ana = analytic
performance.
means
Study
study1
study2
study3
study4
study5
study6.ana
study7.ana
study8.ana
study9a.ana
study9b.ana
study10a.ana
study10b.ana
SDs

60/3
60/3
60/3
60/3
60/3
60/3
60/3
60/3
60/3
45/3
60/3
45/3
2.47
2.51
2.40
2.41
2.14
1.00
0.95
0.85
0.75
1.13
0.95
0.93
3.04
2.95
2.90
2.98
2.82
1.75
1.75
1.65
1.50
2.00
1.70
1.73
3.68
3.35
3.45
3.64
3.41
2.50
2.50
2.35
2.15
2.80
2.40
2.67
1.21
0.71
0.86
1.07
1.20
0.86
1.10
0.93
0.85
1.13
1.00
0.70
0.72
0.49
0.51
0.51
0.78
1.21
1.21
1.09
1.19
1.00
1.30
1.28
0.68
0.64
0.80
0.95
0.71
1.20
1.10
1.31
0.81
0.94
0.99
0.98
p(F )
0.0200
0.0139
0.0152
0.0351
0.0317
0.0000
0.0064
0.0265
0.0358
0.0116
0.0068
0.0476
0.8879
0.9067
0.9022
0.8520
0.8592
1.0000
0.9363
0.8712
0.8506
0.9146
0.9345
0.8284
3.9228
4.6815
4.2635
2.71732.7184
3.2118
7.8744NaN
7.8271
3.7232
3.08273.6508
5.5861
4.54468.0421
2.70843.2234
The p-values for the F test are consistently high (all are above .8), while under the null
hypothesis of perfect linearity in the population these p-values (by definition) are expected to be
uniformly distributed between 0 and 1. Employing Fishers method in combining these p-values
gives a left-tail probability of 1 2.079375e-8. Thus, the accumulation of tests on the similar null
hypotheses of linearity favors the shared null hypothesis too strongly. Or, roughly speaking, under
the assumption of perfect linearity in the population, the probability of finding results at least as
consistent w.r.t. linearity amounts to 1 in 48, 091, 374.
The lions share of evidential values in Table 6.2 lies above 3. The overall V has a lower-bound of
33, 235, 148.
6.2.2 Expert ratings (on creativity)
Trend lines for the independent samples regarding expert ratings can be found in Figure 6.2.
These trend lines also display very consistent linear effects. Table 6.3 lists the corresponding data
and results on the F test and the evidential value. Again, consistently high p-values for the
F test and substantial (lower-bounds for) evidential values are found. Fishers method gives a
left-tail probability of 1 6.053435e-6 .9999939, giving a probability of finding results at least
as consistent w.r.t. linearity of approximately 1 in 165, 196. The overall V has a lower-bound of
127, 200.2
40
6 JF.D12.SPPS
Fig. 6.2. Trend lines for the expert ratings. The error bars represent one standard deviation from the cell
mean.
Table 6.3. Results on expert ratings. The number of observations per cell is indicated by n, p(F ) denotes
the p-value of the F test, SD = standard deviation, NaN = not a number, cre = creativity rating.
means
Study
study6.cre
study7.cre
study8.cre
study9a.cre
study9b.cre
study10a.cre
study10b.cre
SDs

60/3
60/3
60/3
60/3
45/3
60/3
45/3
3.19
2.63
2.87
2.35
2.55
2.66
2.42
4.01
3.73
3.83
3.66
3.72
3.69
3.73
4.79
4.73
4.79
4.76
4.78
4.81
5.02
1.07
1.49
1.24
1.01
1.16
1.21
0.82
1.21
1.21
1.09
1.19
1.00
1.30
1.28
0.82
1.55
1.53
1.71
1.47
1.54
1.45
F
0.0049
0.0164
0.0000
0.0823
0.0201
0.0147
0.0007
p(F )
0.9446 4.94769.4121
0.8985
4.4324
1.0000
13.9492NaN
0.7753
2.0960
0.8878
3.9481
0.9041
4.9429
0.9793 10.166123.9234
6.3 Remarks
41
6.3 Remarks
Note that the JF.D12.SPPS publication was also investigated in the Whistleblower Report (2012)
and in Klaassen (2015). The results on the F test and the evidential value reported above concur
with these references. Also note that the standard deviations in Tables 6.2 and 6.3 were not obtained
from the JF.D12.SPPS publication (as they were not reported). These standard deviations were
obtained from the Whistleblower Report (2012). This report indicates that standard deviations
and cell sizes were communicated by JF through email.
The reported left-tail probabilities obtained by Fishers method differ from the WB report due
to differing sets of independent samples (here, participant scores and expert ratings are demarcated). The overall results are, however, qualitatively similar: They convey that the linear pattern
seems too consistent. In addition, the evidential values imply the presence of a dependence structure
between test persons. Comparison to the results obtained on the control publications may further
strengthen the notion of deviance of the results reported in JF.D12.SPPS. The evidence for low
scientific veracity of this publication is considered strong according to the criterion of Section 1.4.
7
JF.EO09.PSPB
F
A. (2009). Why love has wings and sex has not: How reminders
of love and sex influence creative and analytic thinking. Personality and Social Psychology Bulletin,
35: 14791491.
7.1 Synopsis
This publication features 2 studies. Study 1 has 2 dependent variables: analytic and creative task
performance (all participant scores). Study 2 has 3 dependent variables: analytic and global\local
processing task performance (participant scores), and creative task performance (expert ratings).
Table 7.1 provides an overview of the design of the studies. The purpose of Study 2 was to replicate
Study 1 with subliminal instead of supraliminal priming.
Table 7.1. Design studies.
Study
Design
Dependent variables
1
2
3 between
3 between
2
3
7.2 Results
7.2.1 Analytic task
Trend lines for the analytic task can be found in Figure 7.1. The trend lines convey very linear
effects. Table 7.2 lists the corresponding data (cell sizes, cell means and corresponding standard
deviations) as well as the corresponding results on the F test and the evidential value.
The p-values for the F test are high. Employing Fishers method in combining these p-values
gives a left-tail probability of 1 0.01092307 = .9890769. Roughly speaking, under the assumption
linearity amounts to approximately 1 in 92. The listed (lower-bound to the) evidential value for
study2.ana is substantial. The overall V has a lower-bound of 33.39086.
44
7 JF.EO09.PSPB
Table 7.2. Results on the analytic task. The number of observations per cell is indicated by n, p(F )
denotes the p-value of the F test, SD = standard deviation, ana = analytic task.
means
Study
SDs
study1.ana 60/3 1.55

study2.ana 60/3 0.80
2.1
1.5
p(F )
2.70 0.83 0.60 1.10 0.0111 0.9166

4.9937
2.25 1.06 0.95 1.25 0.0070 0.9338 6.68666.8333
Fig. 7.1. Trend lines for the analytic task. The error bars represent one standard deviation from the cell
mean.
7.2.2 Creative task

Trend lines for the creative task can be found in Figure 7.2. Again, the trend lines convey quite
linear effects. Table 7.3 lists the corresponding data and results on the F test and the evidential
value. Employing Fishers method in combining the F p-values gives a left-tail probability of
1 0.03422873 = .9657713. Roughly speaking, under the assumption of perfect linearity in the
approximately 1 in 29. The overall V amounts to 12.0694.
Table 7.3. Results on the creative task. The number of observations per cell is indicated by n, p(F )
denotes the p-value of the F test, SD = standard deviation, cre = creative task.
means
Study
SDs
p(F )
study1.cre 60/3 0.25 0.75 1.30 0.44 0.64 0.92 0.0172 0.8960 4.4808
study2.cre 60/3 3.59 4.23 4.98 1.21 0.75 0.90 0.0427 0.8371 2.6936
7.3 Remarks
45
Fig. 7.2. Trend lines for the creative task. The error bars represent one standard deviation from the cell
mean.
7.2.3 Global\local processing task

The trend line for the global\local processing task can be found in Figure 7.3. Table 7.4 lists the
corresponding data and results. Again, a high evidential value is encountered.
Table 7.4. Results on the global\local processing task. The number of observations per cell is indicated
by n, p(F ) denotes the p-value of the F test, SD = standard deviation, gl = global\local processing
task.
means
Study
SDs
p(F )
study2.gl 60/3 26.3 33.4 40.1 8.87 8.58 9.61 0.0065 0.9358 7.3405
7.3 Remarks
weight of evidence shifts towards evidential values for individual (sub)studies. Two (sub)studies
have a substantial (lower-bound for the) evidential value. These evidential values imply the presence
of a dependence structure between test persons. The evidence for low scientific veracity of this
publication is considered strong according to the criterion of Section 1.4.
46
7 JF.EO09.PSPB
Study 2: Global/local processing task
50
40
30
20
lust
control
love
Fig. 7.3. Trend line for the global\local processing task. The error bars represent one standard deviation
from the cell mean.
8
JF.LS09.JEPG
F
orster, J., Liberman, N., and Shapira, O. (2009). Preparing for novel versus familiar events: Shifts
in global and local processing. Journal of Experimental Psychology: General, 138: 383-399.
8.1 Synopsis
This publication features 10 experiments. The publication reports that in each experiment the
participants were randomly assigned to experimental conditions. Table 8.1 provides an overview of
the design of the experiments. Experiments 2 and 6 feature 4 and 2 factor levels respectively and
are not analyzed here.
Table 8.1. Design experiments.
Experiment
Design
1a
1b
2
3a
3b
4a
4b
5a
5b
6
3
3
4
3
3
3
3
3
3
2
between
between
between
between
between
between
between
between
between
between
Dependent variables
2 within
2 within
2 between
2 within 2 within
1
3
1
1
2
2
2
1
1
1
8.2 Results
8.2.1 First set of independent samples
Trend lines for a first set of independent samples can be found in Figure 8.1. The trend lines do not
convey very consistent linear effects. Table 8.2 lists the corresponding data (cell sizes, cell means
Employing Fishers method in combining the p-values for the F test gives a left-tail probability
of 1 0.1371673 = .8628327. Thus, the accumulation of tests on the similar null hypotheses of
48
8 JF.LS09.JEPG
linearity does not very strongly favor the shared null. Or, roughly speaking, under the assumption
linearity amounts to approximately 1 in 7. The overall V is found to be 591.6182 which in
comparison with the overall V reported in preceding chapters is not to be deemed high. However,
for Experiments 5a and 5b the evidential values are substantial (above 7).
by n, p(F ) denotes the p-value of the F test, SD = standard deviation, glob = global letters, RT =
reaction times, GCT = gestalt completion task, PA = positive valence, atypical exemplars, NA = negative
valence, atypical exemplars, Aty = atypical exemplars.
Study
exp1a.glob
exp1b.globRT
exp3a
exp3b.GCT
exp4a.PA
exp4a.NA
exp4b.Aty
exp5a
exp5b
means
SDs
low medium high
low medium high
p(F )
45/3 668.00 698.00 756.00 70.00 123.00 156.00 0.1325 0.7176 1.7891
48/3 496.00 526.00 582.00 78.00 86.00 104.00 0.2226 0.6394 1.4125
42/3 6.50
7.50
8.80 1.90 1.20
0.08 0.1246 0.7260 1.7696
53/3 6.30
7.50
8.60 0.24 0.11
0.11 1.0799 0.3037 1.0000
72/6 2.08
2.48
3.33 0.76 0.36
0.99 0.7201 0.4022 1.0000
72/6 1.91
2.70
2.95 0.35 0.49
0.49 2.9029 0.0978 1.0000
36/3 2.19
2.44
2.74 0.47 0.22
0.26 0.0445 0.8342 2.5083
60/3 0.02
0.13
0.25 0.23 0.22
0.20 0.0071 0.9333 7.2848
42/3 0.00
0.14
0.29 0.19 0.18
0.18 0.0069 0.9340 7.2405
8.2 Results
49
Fig. 8.2. Trend lines for the second set of independent samples. The error bars represent one standard
8.2.2 Second set of independent samples

Trend lines for a second set of independent samples can be found in Figure 8.2. Table 8.3 lists the
corresponding data and the corresponding results on the F test and the evidential value. Fishers
method gives a left-tail probability of 1 0.008662376 = 0.9913376, roughly implying a probability
of finding results at least as consistent w.r.t. linearity of approximately 1 in 115. The overall V
has a lower-bound of 137.9026. At least 2 reported experiments have substantial (lower-bounds for
the) evidential values (exp4a.PT and exp4b.Typ).
Table 8.3. Results on the second set of independent samples. The number of observations per cell is
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, loc = local, RT =
reaction times, Gen = general knowledge task, PT = positive valence, typical exemplars, NT = negative
valence, typical exemplars, Typ = typical exemplars.
Study
exp1a.loc
exp1b.locRT
exp3b.Gen
exp4a.PT
exp4a.NT
exp4b.Typ
means
SDs
low medium high
low medium high
p(F )
45/3 775.00 816.00 912.00 139.00 103.00 112.00 0.5342 0.4689

1.0533
48/3 583.00 596.00 631.00 77.00 54.00 86.00 0.2384 0.6277
1.2721
53/3 5.10
5.50
5.80
0.15
2.40
0.20 0.0152 0.9025 1.07946.9625
72/6 6.91
7.03
7.12
0.55
0.51
0.65 0.0055 0.9414 6.53897.7784
72/6 6.95
7.07
7.29
0.53
0.41
0.31 0.1101 0.7422
1.8992
36/3 6.66
6.71
6.75
0.31
0.22
0.21 0.0032 0.9554 7.677610.1341
50
8 JF.LS09.JEPG
8.2.3 Remaining dependent variables Experiment 1b

Trend lines for the remaining dependent variables on Experiment 1b can be found in Figure 8.3.
Table 8.4 lists the corresponding data and results (note that these do not constitute independent
samples). One conjunction of dependent variable and within factor-level (exp1b.locER) is found to
have a substantial evidential value (6.0519).
Table 8.4. Results on the remaining dependent variables of Experiment 1b. The number of observations
per cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, glob =
global, loc = local, ER = (mean no. of) errors, PR = measure of recognition accuracy.
means
Study
exp1b.globER
exp1b.locER
exp1b.globPR
exp1b.locPR
SDs

48/3
48/3
48/3
48/3
0.19
0.56
0.43
0.33
0.31
1.00
0.56
0.41
0.81
1.38
0.78
0.56
0.40
0.89
0.20
0.23
0.48
0.89
0.26
0.25
1.52
1.31
0.20
0.27
p(F )
0.4277
0.0087
0.4390
0.2082
0.5164
0.9260
0.5110
0.6504
1.0331
6.0519
1.1999
1.4739
Fig. 8.3. Trend lines for the remaining dependent variables of Experiment 1b. The error bars represent
8.2 Results
51
8.2.4 Follow-up: Collapsing atypical exemplar ratings of Experiment 4a over valence

Experiment 4a has a 3 between (priming: novelty, control, oldness) 2 between (valence of priming:
positive, negative) design with 2 dependent variables (Typicality ratings for atypical exemplars and
typicality ratings for typical exemplars). From Table 8.2 it can be seen that the evidential values
for the atypical exemplar ratings of Experiment 4a are low. In the positive valence of priming
condition a V of 1 is found. In the negative valence of priming condition V is also found to be 1.
JF.LS09.JEPG also reports on pooled results, where the atypical exemplar ratings of Experiment
4a are collapsed over the valence factor. Reviewing these pooled results, a different picture emerges.
Figure 8.4 gives the trend line. Table 8.5 lists the corresponding data and results. The positive and
negative valence effects seem to cancel out into a very linear effect that sorts a substantive V of
4.1277.
Table 8.5. Results when collapsing experiment 4a over the valence factor. The number of observations
per cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, Atyp =
atypical exemplars, coll = collapsed.
means
Study
SDs
exp4a.Atyp.coll 72/3
2.59
3.14 0.6
0.43
p(F )
0.8 0.0162 0.8991 4.1277
Experiment 4a: Mean typicality ratings of atypical exemplars

4
oldness
control
novelty
Fig. 8.4. Trend lines for the atypical exemplar ratings of Experiment 4a. The black line represent the
pooled data collapsed over the valence factor. The error bars represent one standard deviation from the
(pooled) cell mean. The blue line represents the atypical exemplar ratings in the positive valence condition
while the red line represents the atypical exemplar ratings in the negative valence condition.
52
8 JF.LS09.JEPG
8.3 Remarks
Note that one can construct different sets of independent samples. Note also that the reported
standard deviation in the novelty condition of exp3b.Gen (2.4) might be the result of a typo in
JF.LS09.JEPG (the standard deviations in the remaining conditions are much smaller). Moreover,
there are some discrepancies in the summary measures reported by JF.LS09.JEPG:
Table 2 in JF.LS09.JEPG reports the mean of the novelty condition in Exp1b.glob.RT to be

496 while the accompanying text on page 387 of JF.LS09.JEPG reports this mean to be 469;
Table 2 in JF.LS09.JEPG reports the standard deviation for the control condition in
Exp1b.locER to be 0.89 while the accompanying text on page 387 of JF.LS09.JEPG reports
this standard deviation to be 0.90.
It was chosen to work with the summary measures as reported in the tables printed in
JF.LS09.JEPG. Changes in the construction of the sets of independent samples, the removal
of exp3b.Gen for testing, or usage of the alternative summary measures for Exp1b.glob.RT or
Exp1b.locER, however, would not change the overall conclusion regarding this publication. For
example, using 469 instead of 496 as the mean for the novelty condition in Exp1b.glob.RT would
give a V with a lower-bound of 19.9060 (and an upper-bound of 32.6979). Using 0.90 instead of
0.89 as the standard deviation for the control condition in Exp1b.locER would sort a V of 6.0895.
These evidential values are higher than the ones currently reported in Tables 8.2 and 8.4.
Thus, while the Fisher test does not give immediate reason for concern, the evidential values
for multiple (sub)experiments are substantial, implying the presence of a dependence structure
between test persons. In addition, the reported pooled results also render certain experiments
suspect. The evidence for low scientific veracity of this publication is considered strong according
to the criterion of Section 1.4.
9
JF.LK08.JPSP
F
orster, J., Liberman, N., and Kuschel, S. (2008). The effect of global versus local processing styles
on assimilation versus contrast in social judgment. Journal of Personality and Social Psychology,
94: 579599.
9.1 Synopsis
This publication contains 5 experiments. Their design can be found in Table 9.1. Experiment 4
features only 2 factor levels and is not analyzed here.
Experiment
Design
1
2
3
4
5
3
3
3
2
3
between
between
between
between
between
Dependent variables
2
2
2
2
2
between
between
between
between 2 between 2 within 2 within
between
2
2
2
2
3
9.2 Results
9.2.1 Analysis first set of independent samples
Trend lines for a first set of independent samples can be found in Figure 9.1. The trend lines do not
convey very consistent linear effects. Table 9.2 lists the corresponding data (cell sizes, cell means
of 1 0.7297946 = .2702054. Thus, the accumulation of tests on the similar null hypotheses of
linearity does not very strongly favor the shared null. Or, roughly speaking, under the assumption
linearity amounts to approximately 1 in 1.4. The overall V is found to have a lower-bound of
7.986024 which in comparison with the overall V reported in some preceding chapters is not to
be deemed high.
54
9 JF.LK08.JPSP
by n, p(F ) denotes the p-value of the F test, SD = standard deviation, RA = ratings of aggression, SS
= subjective scale, ap = aggression priming, np = neutral priming, hs = high standard, ls = low standard.
means
Study
exp1.RA.ap
exp1.RA.np
exp2.SS.hs
exp2.SS.ls
exp3.SS.hs
exp3.SS.ls
exp5.SS.hs
exp5.SS.ls
SDs

82/6
82/6
124/6
124/6
126/6
126/6
123/6
123/6
2.86 5.63
4.15 4.29
-0.29 -0.28
-0.20 0.12
-0.71 -0.25
-0.67 0.25
-0.53 -0.25
-0.35 0.30
6.53
4.62
0.30
0.36
0.64
0.68
0.32
0.52
1.15
1.25
1.10
0.66
0.72
0.78
0.74
0.68
1.25
1.23
0.34
1.49
0.85
1.01
0.85
0.75
1.21
1.16
0.89
1.07
0.75
0.72
1.07
1.37
F
5.4944
0.0558
1.5854
0.0174
1.0765
1.1743
0.3569
0.6531
p(F )
0.0244
1.0000
0.8145
2.6569
0.2130
1.0000
0.8955 2.49605.4282
0.3036
1.0001
0.2829
1.0003
0.5525
1.1935
0.4223
1.0086
9.2 Results
55
9.2.2 Analysis second set of independent samples

method gives a left-tail probability of 1 .696499 = .303501, roughly implying a probability of
finding results at least as consistent w.r.t. linearity of approximately 1 in 1.4. The overall V has
a lower-bound of 154.0803. Two reported experiments have substantial (ranges for the) evidential
values (exp1.RU.ap and exp1.RU.np).
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, RU = ratings of
traits unrelated to aggression, OS = objective scale, ap = aggression priming, np = neutral priming, hs =
high standard, ls = low standard.
means
Study
exp1.RU.ap
exp1.RU.np
exp2.OS.hs
exp2.OS.ls
exp3.OS.hs
exp3.OS.ls
exp5.OS.hs
exp5.OS.ls
SDs

82/6
82/6
124/6
124/6
126/6
126/6
123/6
123/6
4.43 4.58
4.41 4.53
-0.41 0.07
-0.47 -0.45
-0.35 0.36
-0.58 -0.36
-0.51 0.25
-0.42 -0.35
4.72
4.64
0.66
0.60
0.56
0.36
0.62
0.50
0.44
0.40
0.83
0.57
0.33
0.51
0.51
0.35
0.67
0.45
0.79
0.98
1.08
0.77
0.78
0.48
0.43
0.49
1.13
0.99
1.04
1.29
1.38
1.38
F
0.0008
0.0011
0.0483
4.8392
1.1587
1.0429
0.5622
2.7626
p(F )
0.9772 3.128524.1990
0.9733
18.0583
0.8268
2.6349
0.0318
1.0000
0.2860
1.0012
0.3113
1.0000
0.4564
1.0338
0.1018
1.0000
9.2.3 Analysis third dependent variable Experiment 5

Experiment 5 has a third dependent variable. Trend lines are found in Figure 9.3. Table 9.4 lists
the corresponding data and results.
Table 9.4. Results on the third dependent variable of experiment 5. The number of observations per cell
is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, RHA = right
hemispheric activation, hs = high standard, ls = low standard.
means
Study
SDs
exp5.RHA.hs 123/6 -0.70

exp5.RHA.ls 123/6 -0.63
0.30
0.23
0.97 1.68
0.97 1.25
1.05
1.89
p(F )
1.85 0.1519 0.6981

1.4714
1.92 0.0167 0.8975 4.91044.9717
56
9 JF.LK08.JPSP
Fig. 9.3. Trend lines for the third dependent variable of Experiment 5. The error bars represent one
9.2 Results
57
9.2.4 Analysis reported Pooled results

JF.LK08.JPSP also reports on pooled results: The aggression ratings of Experiment 1 are also
reported as collapsed over the semantic priming factor and the right hemispheric activation measure
of Experiment 5 is also reported as collapsed over the standard factor. Figure 9.4 depicts the trend
lines while Table 9.5 lists the corresponding data and results. The results lead to a low overall V of
1.787103. The results convey that the deviance (as indicated by the evidential value) of the right
hemispheric activation measure of Experiment 5 in the low standard condition (exp5.RHA.ls) does
not translate to the pooled results.
Table 9.5. Results on the reported pooled results. The number of observations per cell is indicated by n,
p(F ) denotes the p-value of the F test, SD = standard deviation, RA = ratings of aggression, RHA =
right hemispheric activation.
means
Study
SDs
exp1.RA
82/3 3.70
exp5.RHA 123/3 -0.67
5.04
0.26
5.43 1.44
0.97 1.47
1.40
1.54
p(F )
1.71 1.7727 0.1869 1.0000

1.86 0.1241 0.7252 1.7871
Figure 9.5 contains visualizations of the remaining poolings (not reported as such in
JF.LK08.JPSP). The factor-levels of the 2-between factor (high standard and low standard) seem
to cancel out into quite linear effects for the dependent variables of Experiment 3 (subjective scale
and objective scale). Using grand means over the reported cell means of the main experimental factor and using the corresponding average variance in order to obtain pooled standard deviations the
subjective scale and the objective scale would sort substantive evidential values with lower-bounds
of 5.05 and 18.13, respectively.
Fig. 9.4. Trend lines for pooled results on Experiments 1 and 5. In the left-hand figure the black line
represents the pooled data on aggression ratings (collapsed over the semantic priming factor). The error
bars represent one standard deviation from the (pooled) cell mean. The blue line represents the aggression
ratings in the aggression priming condition while the red line represents the aggression ratings in the
neutral priming condition. The right-hand figure can be interpreted analogously.
58
9 JF.LK08.JPSP
Fig. 9.5. Trend lines for remaining poolings. The error bars represent one standard deviation from the
cell mean.
9.3 Remarks
Note that one can construct different sets of independent samples. The (overall) results will however
remain qualitatively similar. While the Fisher test does not give immediate reason for concern, the
evidential value for at least 1 (sub)experiment is substantial, implying the presence of a dependence
structure between test persons. This high evidential value for exp1.RU.np in conjunction with the
high evidential values for the pooled results of Experiment 3 (see Figure 9.5 and the last sentence of
Section 9.2.4) deems the conclusion that the evidence for low scientific veracity of this publication
is inconclusive.
Part III
Publications as Co-author
Data Collected (Partly) in Amsterdam
10
WCY.JF11.JESP
Woltin, K.-A., Corneille, O., Yzerbyt, V.Y., and Forster, J. (2011). Narrowing down to open up
for other peoples concerns: Empathic concern can be enhanced by inducing detailed processing.
Journal of Experimental Social Psychology, 47: 418424.
10.1 Synopsis
This publication contains 4 experiments. Their design can be found in Table 10.1. Experiments 2-4
feature only 2 factor levels and are not analyzed here. The data for Experiment 1 are reported to
have been collected in Amsterdam.
Experiment
Design
Dependent variables
1
2
3
4
3
2
2
2
1
1
1
1
between
between
between
between
10.2 Results
The trend line for Experiment 1 can be found in Figure 10.1. Table 10.2 lists the corresponding
data (cell sizes, cell means and corresponding standard deviations) as well as the corresponding
results on the F test and the evidential value. The evidential value V is found to be 1.
10.3 Remarks
weight of evidence shifts towards evidential values for individual experiments. The evidential value
of Experiment 1 equals the smallest possible value (1). According to the criterion of Section 1.4,
there is thus no statistical evidence for low scientific veracity.
62
10 WCY.JF11.JESP
Table 10.2. Results experiment 1. The number of observations per cell is indicated by n, p(F ) denotes
the p-value of the F test, SD = standard deviation.
means
Study
SDs
exp1 41/3
3.05 3.31 0.25 0.27
p(F )
0.4 1.0201 0.3189 1.0000
Experiment 1
3.75
3.50
3.25
3.00
2.75
global
control
local
Fig. 10.1. Trend lines for Experiment 1. The error bars represent one standard deviation from the cell
mean.
11
L.JF09.JPSP
Liberman, N. and F
orster, J. (2009). Distancing from experienced self: How global-versus-local perception affects estimation of psychological distance. Journal of Personality and Social Psychology,
97: 203216.
11.1 Synopsis
This publication contains 8 studies. The publication reports that in each study the participants
were randomly assigned to experimental conditions. The design of the studies can be found in Table
11.1. Data for studies 1, 2B, 2C, 3B, and 3C are reported to have been collected in Amsterdam.
The publication states that analyzes were performed on z-transformed data, but reports only the
untransformed data. All analyzes below are based on the reported untransformed data.
Studies
Design
1
2A
2B
2C
3A
3B
3C
4
3
3
3
3
3
3
3
3
between
between
between
between
between
between
between
between
Dependent variables
2
2
2
2
2
2
2
2
between
between
between
between
between
between
between
between 2 within
1
1
1
1
1
1
1
1
11.2 Results
11.2.1 Analysis independent samples
Trend lines for the set of independent samples can be found in Figure 11.1. Table 11.2 lists the
corresponding data (cell sizes, cell means and corresponding standard deviations) as well as the
Employing Fishers method in combining the p-values for the F test gives a left-tail probability of 1 .0001072879 = .9998927. Thus, the accumulation of tests on the similar null hypotheses
64
11 L.JF09.JPSP
of linearity quite strongly favors the shared null. Or, roughly speaking, under the assumption of
perfect linearity in the population, the probability of finding results at least as consistent w.r.t.
linearity amounts to approximately 1 in 9, 321. The instance of NaN for the evidential value in
Table 11.2 is due to division by 0 (in its calculation). In a sense, one could conceive of as
being a lower-bound to NaN in this instance. The overall V is found to have a lower-bound of
3, 569, 168. Two reported (sub)studies have substantial (lower-bounds for the) evidential values
(study2A.ego,Study3B.Nego). Other (sub)studies sort evidential values that may be termed substantive.
Table 11.2. Results on the set of independent samples. The number of observations per cell is indicated
by n, p(F ) denotes the p-value of the F test, SD = standard deviation, NaN = not a number, ego =
egocentric, Nego = nonegocentric, P = positive.
means
Study
study1.ego
study2A.ego
study2B.ego
study2C.ego
study3A.ego
study3B.ego
study3C.ego
study4.egoP
study1.Nego
study2A.Nego
study2B.Nego
study2C.Nego
study3A.Nego
study3B.Nego
study3C.Nego
study4.NegoP
n
95/6
126/6
120/6
106/6
113/6
79/6
127/6
120/6
95/6
126/6
120/6
106/6
113/6
79/6
127/6
120/6
low
5.33
-17.14
1214.00
144.00
3.70
3.00
1.48
40.00
13.38
-7.65
1608.00
174.00
4.44
4.50
2.55
37.00
SDs
medium high
17.81
0.15
1713.00
186.00
4.37
4.50
1.86
56.00
14.19
-7.64
1612.00
182.00
4.72
4.50
2.55
42.00
28.38
17.05
2325.00
209.00
5.21
5.38
2.68
69.00
14.75
6.67
1732.00
186.00
5.11
4.54
2.55
43.00
low medium high

3.68
20.77
682.00
32.00
1.98
1.87
0.67
18.00
10.55
18.58
905.00
48.00
1.72
1.78
1.57
17.00
8.30
30.95
907.00
32.00
1.34
2.03
0.73
24.00
18.70
24.33
1002.00
43.00
1.71
1.83
0.76
13.00
23.81
29.47
934.00
40.00
1.08
1.45
1.42
22.00
9.04
36.92
831.00
48.00
1.59
1.45
1.33
19.00
F
0.0445
0.0007
0.0591
0.8741
0.0395
0.2603
0.6834
0.0650
0.0009
0.9334
0.0535
0.0219
0.0135
0.0012
0.0000
0.1954
p(F )
0.8339
2.4151
0.9789 6.073724.3184
0.8088
2.6543
0.3543
1.0007
0.8431
2.9443
0.6130
1.4208
0.4117
1.0032
0.7996
2.5959
0.9760 2.237424.3335
0.3378
1.0000
0.8179
2.8161
0.8830
3.9994
0.9078
5.3018
0.9723 9.496618.0647
1.0000
4.7583NaN
0.6602
1.3930
11.2 Results
65
Fig. 11.1. Trend lines for the set of independent samples. The error bars represent one standard deviation
from the cell mean.
66
11 L.JF09.JPSP
11.2.2 Analysis secondary set independent samples Study 4

Due to the within-factor a secondary set of independent samples can be constructed for Study
4. Figure 11.2 visualizes the corresponding trend lines. Table 11.3 lists the corresponding data
and results on the F test and the evidential value. Fishers method gives a left-tail probability
of 1 .05720528 = .9427947, giving a probability of finding results at least as consistent w.r.t.
linearity of approximately 1 in 17. The overall V amounts to 8.774929.
Table 11.3. Results on the secondary set of independent samples for Study 4. The number of observations
per cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, ego =
egocentric, Nego = nonegocentric, N = negative.
means
Study
SDs
study4.egoN 120/6 6
study4.NegoN 120/6 17
11
18
17
21
11
13
10
16
10
12
p(F )
0.0312 0.8605 3.4346

0.0703 0.7919 2.5549
11.3 Remarks
By employing Fishers method both an extreme and a moderate result are obtained. The evidential
values for multiple (sub)studies are substantial. The evidence for low scientific veracity of this
publication is considered strong according to the criterion of Section 1.4.
Fig. 11.2. Trend lines for the secondary set of independent samples for Study 4. The error bars represent
Part IV
Data Collected (Partly) in Bremen or W
urzburg
12
D.JF.LR10.PSPB
Denzler, M., F
orster, J., Liberman, N., and Rozenman, M. (2010). Aggressive, funny, and thirsty:
A motivational inference model (MIMO) approach to behavioral rebound. Personality and Social
Psychology Bulletin, 36: 13851396.
12.1 Synopsis
This publication contains 3 Experiments. Table 12.1 gives an overview of their design. Data for
Experiment 2 are reported to have been collected in Bremen. Data for Experiment 3 are reported to
have been collected in W
urzburg. In Experiment 2 the dependent variable behavioral aggression
is only reported for the 3-between condition. Reaction times are reported for the full design.
Experiment 3 has 4 additional dependent variables (control questions) for a single level of the within
factor (the suppression phase). The publication states that for the reaction times of Experiment
2 analyzes were performed on log-transformed data, but reports only the untransformed data. All
analyzes below are based on the reported untransformed data. Experiment 3 features expert ratings
rather than participant scores. The expert ratings actually imply nested data (participants rated
by experts) but are not treated as such.
Experiment
Design
Dependent variables
1
2
3
3 between 2 within
3 between 2 within 2 within
3 between 2 within
conjunction of within-factor and dependent variable

2
1
12.2 Results
Trend lines for a first set of independent samples can be found in Figure 12.1. Table 12.2 lists the
of 1 .9999955 = 4.5e-6. Thus, the accumulation of tests on the similar null hypotheses of linearity
70
12 D.JF.LR10.PSPB
does not favor the shared null. Two evidential values equal 1 (the smallest possible value) and the
remaining evidential value is not deemed substantial. The overall V is found to have a lower-bound
of 2.2802.
Table 12.2. Results on the first set of independent samples. The number of observations per cell is
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, Supp = suppression
phase, Agrs = behavioral aggression.
means
Study
SDs
Exp1.Supp 42/3 1.64

Exp2.Agrs 41/3 4.00
Exp3.Supp 52/3 1.36
2.07
4.53
1.69
12.64 2.62
5.18 0.41
6.42 1.51
1.82
0.72
1.78
p(F )
4.85 21.3576 0.00004

1.0000
0.58 0.0962 0.7581 2.28022.2807
2.80 12.6265 0.0009
1.0000

Trend lines for a second set of independent samples can be found in Figure 12.2. Table 12.3 lists
the corresponding data and the corresponding results on the F test and the evidential value.
Fishers method gives a left-tail probability of 1 .8821835 = .1178165. The overall V equals 1.
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, Expr = expression
phase.
Study
means
SDs
low medium high
low medium high
p(F )
Exp1.Expr 42/3 32.29 33.14 80.57 45.12 46.58 35.09 2.7935 0.1027 1.0000
Exp3.Expr 52/3 5.03 6.08 6.12 0.78 1.46 1.91 1.3840 0.2451 1.0000
12.3 Remarks
71
12.2.3 Analysis reaction times Experiment 2

Trend lines for the reaction times of Experiment 2 can be found in Figure 12.3. Table 12.4 lists the
corresponding data and results (note that these do not constitute independent samples). Exp2.IUR
sorts a substantial evidential value (7.3019).
Table 12.4. Results on the reaction times of Experiment 2. The number of observations per cell is indicated
by n, p(F ) denotes the p-value of the F test, SD = standard deviation, IRA = block I, words related
to aggression, IIRA = block II, words related to aggression, IUR = block I, words unrelated to aggression,
IIUR = block II, words unrelated to aggression.
means
Study
Exp2.IRA
Exp2.IUR
Exp2.IIRA
Exp2.IIUR
SDs

41/3
41/3
41/3
41/3
784
801
539
755
794
856
670
763
868 71
898 242
743 70
835 238
157
232
156
188
258
242
100
222
p(F )
0.2908
0.0068
0.5859
0.1981
0.5929
0.9349
0.4487
0.6588
1.2466
7.3019
1.1631
1.4276
12.2.4 Analysis control question ratings Experiment 3

Trend lines for the ratings on the control questions of Experiment 3 can be found in Figure
12.4. Table 12.5 lists the corresponding data and results (again, note that these do not constitute
independent samples). All evidential values are (near) unity.
12.3 Remarks
remain qualitatively similar. The evidential value of one (sub)experiment is larger than 7. The
remaining evidential values are small and often close to unity. In addition, the Fisher test does
72
12 D.JF.LR10.PSPB
Fig. 12.3. Trend lines for the reaction times of Experiment 2. The error bars represent one standard
Table 12.5. Results on the control question ratings of Experiment 3. The number of observations per
cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, Q = (control)
question.
means
Study
Exp3.Q1
Exp3.Q2
Exp3.Q3
Exp3.Q4
SDs

52/3
52/3
52/3
52/3
2.94
2.72
2.44
1.94
4.81
4.94
3.83
5.06
5.50
5.19
3.88
5.06
2.13
1.81
2.41
2.80
3.21
3.28
2.73
3.43
3.13
2.69
3.20
3.40
p(F )
0.4898
1.5813
0.6622
2.7070
0.4873
0.2145
0.4197
0.1063
1.1434
1.0000
1.0337
1.0000
not give reason for concern. According to the criterion of Section 1.4, there is thus no statistical
evidence for low veracity of this publication.
12.3 Remarks
73
Fig. 12.4. Trend lines for the control questions of Experiment 3. The error bars represent one standard
13
K.JF.D10.SPPS
Kuschel, S., F
orster, J., and Denzler, M. (2010). Going beyond information given: How approach
versus avoidance cues influence access to higher order information. Social Psychological and Personality Science, 1: 411.
13.1 Synopsis
This publication contains 3 experiments. Table 13.1 gives an overview of their design. The publication states that analyzes were performed on log-transformed data as well as the untransformed
data, but reports only the latter. All analyzes below are based on the reported untransformed data.
Data for the experiments appear to have been collected in Bremen.
Experiment
Design
Dependent variables
1
2
3
3 between 2 within
3 between 2 within
3 between 2 within
1
1
2
13.2 Results
of 1 .05406237 = .9459376. Roughly speaking, under the assumption of perfect linearity in the
approximately 1 in 18. The overall V is found to be 8.647894 and may be deemed low.
76
13 K.JF.D10.SPPS
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, masked = masked
items, correctHit = correct decision metaphor.
means
Study
SDs
Exp1.masked
67/3 5.05
60/3 5.80
Exp2.masked
Exp3.correctHit 45/3 3.93
7.0
7.6
4.4
9.36 3.35
10.10 3.14
4.80 0.59
2.28
2.09
0.51
p(F )
4.28 0.0540 0.8169 2.3053

2.69 0.2283 0.6346 1.3126
0.41 0.0473 0.8288 2.8579

method gives a left-tail probability of 1 .08078424 = .9192158. The overall V has a lower-bound
of 292.4731. The instance of NaN for the evidential value in Table 13.3 is due to division by 0 (in
its calculation). In a sense, one could conceive of as being a lower-bound to NaN in this case.
The evidential value for Exp3.correctReject has a substantial lower-bound (116.6748).
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, NaN = not a number,
intact = intact items, correctReject = correct rejection nonmetaphor.
means
Study
SDs
Exp1.intact
67/3 6.30
Exp2.intact
60/3 6.65
Exp3.correctReject 45/3 4.60
6.91
6.70
4.80
8.09 2.03
7.90 1.98
5.00 0.83
4.97
4.00
0.41
p(F )
3.45 0.0891 0.7663 2.06632.4903

2.47 0.5082 0.4788
1.2131
0.00 0.0000 1.0000 116.6748NaN
13.2 Results
77
13.2.3 Analysis remaining samples Experiment 3 (reaction times)

Trend lines for the reaction times of Experiment 3 can be found in Figure 13.3. Table 13.4 lists
the corresponding data and results (note that these do not constitute independent samples). All
evidential values are low.
Table 13.4. Results on the reaction times of Experiment 3. The number of observations per cell is indicated
by n, p(F ) denotes the p-value of the F test, SD = standard deviation, met.RT = reaction times correct
decision metaphor, Nmet.RT = reaction times correct rejection nonmetaphor.
means
Study
SDs
Exp3.met.RT 45/3 2809

Exp3.Nmet.RT 45/3 4042
3982
4159
4831 1349
4916 1509
1052
1507
p(F )
825 0.2183 0.6428 1.4253

1268 0.4990 0.4838 1.1167
78
13 K.JF.D10.SPPS
13.3 Remarks
remain qualitatively similar. Left-tail probabilities and overall evidential values are allowed to grow
more extreme when the number of independent samples increases. When the number of independent
samples is low, the weight of evidence shifts towards evidential values for individual experiments.
The evidential value for Exp3.correctReject (see Table 13.3) is substantial, implying the presence
of a dependence structure between test persons. Of note is also that the approach condition in
this (sub)experiment contains no variation (see Table 13.3 and the right-hand panel of Figure
13.2). Thus while according to a strict application of the criterion of Section 1.4 this publication
should be classified as containing no evidence for low veracity, we conclude, on the basis of the
peculiarity of the results pertaining to Exp3.correctReject, that the evidence for low veracity of
this publication has to be classified as inconclusive.
14
D.JF.L09.JESP
Denzler, M., F
orster, J., and Liberman, N. (2009). How goal-fulfillment decreases aggression. Journal of Experimental Social Psychology, 45: 90100.
14.1 Synopsis
This publication contains 3 experiments. Their designs can be obtained from Table 14.1. Data for
Experiment 1 are reported to have been collected in W
urzburg. Data for Experiments 2 and 3 are
reported to have been collected in Bremen. In Experiment 2 the dependent variable behavioral
aggression is only reported for the 3-between condition. Reaction times for Experiment 2 are
reported for the full design. The publication states that for the reaction times of Experiments 1
and 2 analyzes were performed on log-transformed data, but reports only the untransformed data.
All analyzes below are based on the reported untransformed data. Experiment 3 features 4 factor
levels and is not analyzed here.
Experiment
Design
Dependent variables
1
2
3
3 between 2 between 2 within 3 within

3 between 2 within 2 within
4 between 2 within
1
2
1
14.2 Results
14.2.1 Reaction times Experiment 1
Trend lines for the reaction times of Experiment 1 can be found in Figure 14.1. Table 14.2 lists the
corresponding data (cell sizes, cell means and corresponding standard deviations) as well as the corresponding results on the F test and the evidential value. (Note that only the stabbing-type samples within block number and within word-type constitute independent samples). (Sub)experiments
exp1.WU.NS.B1 and exp1.WU.S.B3 sort a substantial (lower-bound to the) evidential value.
80
14 D.JF.L09.JESP
Table 14.2. Results on the reaction times of Experiment 1. The number of observations per cell is
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, WA = words related
to aggression, WU = words unrelated to aggression, S = stabbing, NS = no stabbing, B = block.
means
Study
exp1.WA.S.B1
exp1.WU.S.B1
exp1.WA.NS.B1
exp1.WU.NS.B1
exp1.WA.S.B2
exp1.WU.S.B2
exp1.WA.NS.B2
exp1.WU.NS.B2
exp1.WA.S.B3
exp1.WU.S.B3
exp1.WA.NS.B3
exp1.WU.NS.B3
SDs

91/6
91/6
91/6
91/6
91/6
91/6
91/6
91/6
91/6
91/6
91/6
91/6
577
718
653
704
526
704
648
715
652
699
609
709
648
738
681
710
703
716
704
720
790
713
699
709
729 59
738 98
714 73
711 100
719 79
725 84
718 137
734 98
812 90
723 84
732 84
740 93
70
256
107
112
78
62
84
77
73
61
71
71
p(F )
132 0.0294 0.8647

67 0.0381 0.8462
101 0.0070 0.9336
88 0.0063 0.9373
58 12.5290 0.0010
151 0.0020 0.9643
46 0.4788 0.4927
109 0.0224 0.8817
107 4.1017 0.0492
92 0.0063 0.9371
180 0.5537 0.4609
77 0.3715 0.5455
3.1937
1.49334.1380
5.51637.7375
6.27168.1405
1.0000
4.242111.0578
1.0790
3.7294
1.0000
6.8163
1.0086
1.1550
deviation from the cell mean. WA = words related to aggression, WU = words unrelated to aggression,
S = stabbing, NS = no stabbing, B = block, AGT = aggression-goal-thwarting, AGF = aggression-goalfulfillment, NA = no aggression.
14.2 Results
81
14.2.2 Reaction times Experiment 2

Trend lines for the reaction times of Experiment 2 can be found in Figure 14.2. Table 14.3 lists
the corresponding data and results (note that these do not constitute independent samples). All
evidential values are relatively low.
Table 14.3. Results on the reaction times of Experiment 2. The number of observations per cell is
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, WA = words related
to aggression, WU = words unrelated to aggression, B = block.
means
Study
exp2.WA.B1
exp2.WU.B1
exp2.WA.B2
exp2.WU.B2
SDs

51/3
51/3
51/3
51/3
704
787
680
720
737
843
773
769
753
844
836
783
145
224
109
150
119
169
155
134
135
172
86
191
p(F )
0.0460
0.2374
0.1767
0.1353
0.8311
0.6283
0.6761
0.7146
2.7492
1.3450
1.7798
1.6460
deviation from the cell mean. WA = words related to aggression, WU = words unrelated to aggression, B
= block, GT = goal-thwarting, GF = goal-fulfillment, NA = non aggressive conflict solution.
82
14 D.JF.L09.JESP
14.2.3 Behavioral aggression measure Experiment 2

The trend line for the behavioral aggression measure of Experiment 2 can be found in Figure 14.3.
Table 14.4 lists the corresponding data and results. The upper-bound for the evidential value may
be termed substantial (12.3772).
Table 14.4. Results on the behavioral aggression measure of Experiment 2. The number of observations
per cell is indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, BA =
behavioral aggression.
means
Study
SDs
exp2.BA 51/3 4.13
4.61
5.11 0.57
0.75
p(F )
0.5 0.0030 0.9566 3.842112.3772
14.3 Remarks
Due to the low number of independent samples the weight of evidence lies, as indicated in Chapter 1,
with the evidential value. The evidential values for two (sub)experiments are substantial. According
to the criterion of Section 1.4 this publication has to be classified as bearing inconclusive evidence
for low scientific veracity.
Experiment 2: Behavioral aggression
5.5
5.0
4.5
4.0
3.5
GT
GF
NA
Fig. 14.3. Trend line for the behavioral aggression measure of Experiment 2. The error bars represent one
standard deviation from the cell mean. GT = goal-thwarting, GF = goal-fulfillment, NA = non aggressive
conflict solution.
15
L.JF09.CS
Liberman, N. and F
orster, J. (2009). The effect of psychological distance on perceptual level of
construal. Cognitive Science, 33: 13301341.
15.1 Synopsis
This publication contains 3 studies. Table 15.1 gives an overview of their design. In all studies
the dependent variable is response time. The publication states analyzes were performed on ztransformed data, but reports only the untransformed data. All analyzes below are based on the
reported untransformed data. Data for all studies appear to have been collected in Bremen.
Study
Design
Dependent variables
1
2
3
3 between 2 within
3 between 2 within
3 between 2 within
1
1
1
15.2 Results
of 1 .2889791 = .7110209. Roughly speaking, under the assumption of perfect linearity in the
approximately 1 in 3.5. The overall V is found to be 2.31976 and can be deemed low.
Trend lines for a second set of independent samples can be found in Figure 15.2. Table 15.3 lists
the corresponding data and the corresponding results on the F test and the evidential value.
Fishers method gives a left-tail probability of 1 .1562055 = .8437945. The overall V amounts to
5.510969.
84
15 L.JF09.CS
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, global = global
letters.
means
Study
SDs
study1.global 54/3 619

661
641
652
766 122
696 115
711 130
131
131
113
p(F )
140 0.6917 0.4095 1.0304

114 0.4675 0.4979 1.1487
119 0.0998 0.7537 1.9600
indicated by n, p(F ) denotes the p-value of the F test, SD = standard deviation, local = local letters.
means
Study
SDs
study1.local 54/3 682

710
690
716
759 94
765 100
753 119
115
109
102
p(F )
141 0.0946 0.7596 2.0418

145 0.6938 0.4096 1.0177
132 0.0482 0.8274 2.6522
15.3 Remarks
85
15.3 Remarks
weight of evidence shifts towards evidential values for individual studies. On the basis of these
there is no statistical evidence for low scientific veracity.
Part V
Data Indicated as Collected by Other Authors
16
FG.JF12.MP
Friedman, R.S., Gordis, E., and F
orster, J. (2012). Re-exploring the influence of sad mood on music
preference. Media Psychology, 15: 249266.
16.1 Synopsis
This publication contains 3 Experiments. Table 16.1 gives an overview of their design. Experiments
1 and 2 feature 2 factor-levels and are not analyzed here. The data is reported to have been collected
at a state university in the northeastern United States.
Experiment
Design
Dependent variables
1
2
3
2 between
2 between
3 between
7
6
8
16.2 Results
Trend lines for the dependent variables of Experiment 3 can be found in Figure 16.1. Table 16.2 lists
the corresponding data (cell sizes, cell means and corresponding standard deviations) as well as the
corresponding results on the F test and the evidential value. Note that these do not constitute
independent samples. The trend lines do not convey consistent linear effects. In addition, the
evidential values obtained are all low.
16.3 Remarks
weight of evidence shifts towards evidential values for individual studies. On the basis of these
there is no statistical evidence for low scientific veracity of this publication.
90
16 FG.JF12.MP
Table 16.2. Results on Experiment 3. The number of observations per cell is indicated by n, p(F ) denotes
the p-value of the F test, SD = standard deviation, variable1 = Desire to listen to happy songs, variable2
= Desire to listen to sad songs, variable3 = How would listening to [happy songs] make you feel right now?,
variable4 = How would listening to [sad songs] make you feel right now?, variable5 = Appropriateness of
listening to happy songs, variable6 = Appropriateness of listening to sad songs, variable7 = Manipulation
check 1: How did the clip make you feel?, variable8 = Manipulation check 2: How do you feel right now,
at this very moment?.
means
Study
Exp3.Variable1
Exp3.Variable2
Exp3.Variable3
Exp3.Variable4
Exp3.Variable5
Exp3.Variable6
Exp3.Variable7
Exp3.Variable8
SDs

93/3
92/3
93/3
92/3
93/3
92/3
93/3
93/3
2.56 4.58 4.77 1.72

2.30 3.62 3.78 1.52
0.55 1.09 1.43 1.49
-0.98 -0.45 -0.07 1.28
-0.52 0.39 0.92 1.23
-0.78 0.15 0.16 1.15
-2.06 0.00 1.24 0.89
-0.93 0.21 1.15 1.24
1.77
1.89
1.29
1.38
0.86
0.87
0.38
1.11
1.90
1.83
1.07
1.14
1.28
1.26
1.22
1.35
p(F )
5.3506
2.2350
0.1233
0.0712
0.5752
3.5392
4.2980
0.1350
0.0230
0.1385
0.7263
0.7901
0.4502
0.0632
0.0410
0.7142
1.0000
1.0000
1.8343
2.4511
1.0221
1.0000
1.0000
1.6899
Fig. 16.1. Trend lines for Experiment 3. The error bars represent one standard deviation from the cell
mean.
Part VI
Concluding Remarks
17
Concluding Remarks
17.1 Classification of investigated publications

Section 1.4 states that, on the basis of the methods employed, a publication may receive one of three
qualitative judgments: Strong evidence for low veracity, inconclusive evidence for low veracity, or
no evidence for low veracity. Table 17.1 contains the publications for which the statistical evidence
leads to the conclusion that there is strong evidence for low veracity. The Whistle Blower Report
(2012) treats the first, third, and fifth publication from this table.
Table 17.1. JF publications under UvA affiliation labeled as: strong statistical evidence for low veracity.
Abbreviation
JF11.JEPG
JF10.EJSP
JF09.JEPG
Publication
As sole author :
F
orster, J. (2011).
F
orster, J. (2010).
F
orster, J. (2009).

F
JF.D12.SPPS F
JF.EO09.PSPB F
A. (2009).
JF.LS09.JEPG F
orster, J., Liberman, N., and Shapira, O. (2009).
JF.D12.JESP
L.JF09.JPSP

Liberman, N. and Forster, J. (2009).
94
17 Concluding Remarks
Table 17.2 contains the publications for which the statistical evidence leads to the conclusion
that there is inconclusive evidence for low veracity.
Table 17.2. JF publications under UvA affiliation labeled as: inconclusive statistical evidence for low
veracity.
Abbreviation
Publication

JF.LK08.JPSP F
orster, J., Liberman, N., and Kuschel, S. (2008).
urzburg:
K.JF.D10.SPPS Kuschel, S., Forster, J., and Denzler, M. (2010).
D.JF.L09.JESP Denzler, M., Forster, J., and Liberman, N. (2009).
Table 17.3 contains the publications for which the statistical evidence leads to the conclusion
that there is no evidence for low veracity. (Note that not all constituent studies in the no evidence
category could be investigated as some studies have designs that fall outside the scope of the
methods employed.)
Table 17.3. JF publications under UvA affiliation labeled as: no statistical evidence for low veracity.
Abbreviation
Publication

WCY.JF11.JESP Woltin, K.-A., Corneille, O., Yzerbyt, V.Y., and Forster, J. (2011).
urzburg:
D.JF.LR10.PSPB Denzler, M., F
orster, J., Liberman, N., and Rozenman, M. (2010).
L.JF09.CS
Cognitive Science, 33: 1330-1341
FG.JF12.MP

Friedman, R.S., Gordis, E., and Forster, J. (2012).
Media Psychology, 15: 249-266
17.2 Cumulative evidence
95
Table 17.4 contains the publications that were not formally investigated. This is the case when
all studies conducted for a certain publication have designs that fall outside the scope of the
methods employed.
Table 17.4. JF publications that were not assessed formally with the methods described in Section 1.5.
Abbreviation
Publication
JF09.JESP
As sole author :
F
orster, J. (2009).
JF.B12.EJSP
JF.OE10.JESP
L.JF08.SC
S.JF08.PACA
W.JF07.JASP
VE.JF08.HR

F
orster, J. and Becker, D. (2012).
F
orster, J., Ozelsel,
A., and Epstude, K. (2010).
urzburg:
Social Cognition, 26: 515-533
Schimmel, K. and Forster, J. (2008).
Psychology of Aesthetics, Creativity, and the Arts, 2: 53-60
Werth, L. and Forster, J. (2007).
Journal of Applied Social Psychology, 37: 2764-2787
As co-author, data collected in online experiment:
Voelpel, S.C., Eckhoff, R.A., and Forster, J. (2008).
Human Relations, 61: 271-295

GV.JF.MS12.EJSP Gervais, S.J., Vescio, T.K., Forster, J., Maass, A., and Suitner, C. (2012).
DH.JF11.PSPB
Denzler, M., Hafner, M., and Forster, J. (2011).
17.2 Cumulative evidence

The publications investigated in this report, contain 188 (sub)experiments that could be analyzed
with our methods. Of these, 37 yield a substantial evidential value of at least 6. Under the standard
assumption of independence of the test persons, the strong assumption of linearity in the means
in the relevant populations, and the (only partially valid) assumption of independence of the
(sub)experiments, the probability that at least 37 (sub)experiments yield a substantial evidential
value equals 4e-7 = .0000004, approximately. In case the strong assumption of linearity in the
means does not hold in one or more or all (sub)experiments, this probability is even smaller.
A
Appendix: Some Technical Details on the Methods
Employed
The approaches employed consider the basic one-way ANOVA model with three factor levels:
Xij = i + ij ,
i = 1, 2, 3,
j = 1, . . . , n.
In this model i denotes an index over experimental groups (the factor-levels), j denotes the index
over subjects with the number of observations for each factor-level being n. Furthermore, the i denote unknown factor-level means and the ij N (0, i2 ) denote measurement errors. Furthermore,
consider the following notation (used in the description of the approaches below):
x
i = realization of factor-level mean i ;
s2i = estimate of the variance i2 ;
3
= the observed grand mean

x
1X
x
i .
3 i=1
A.1 The F Test and Fishers Method

This is the approach used in the Whistleblower Report (2012).
A.1.1 Assumptions
The following assumptions are made:
A1.
A2.
A3.
A4.
A5.
One-way ANOVA setting with three factor-levels;

Random assignment of subjects to experimental conditions;
Equal number of observations per cell (balanced design);
ij N (0, i2 );
The x
i are ordered, s.t. x
1 x
2 x
3 .
A.1.2 Measure
The ANOVA F -model for one-way factorial designs with 3 levels of an experimental factor has
2 regression parameters. A linear regression between the low and high levels of the experimental
factor has only 1 regression parameter. This linear regression can be viewed as a reduced model
and is nested in the ANOVA model with 2 regression parameters. One can then perform a nested
F -test (F test) to assess if the more complex model significantly contributes to model fit. The
null hypothesis in this situation can be stated as:
H0 : perfect linearity of the cell means.
The test statistic under the stated assumptions amounts to:
98
A Appendix: Some Technical Details on the Methods Employed
F =
xi
i (
)2 n2 (
x
x3 x
1 )2
n(
x1 2
x2 + x
3 )2
P 2
=
,
1
2
2
2(s1 + s2 + s23 )
i si
3
which basically is the sum of squares of the full model minus the sum of squares of the reduced
model divided by the pooled within-group variance. Note that F vanishes if x
2 = (
x1 + x
3 )/2,
which implies that the cell means are on a straight line. Under the null hypothesis the statistic
follows an F -distribution:
F F1,3(n1) .
The p-value for F (p(F )) can be obtained in reference to this distribution. It is the probability
under this distribution that a random variable equals at least the value obtained for F . If the
empirical results approach linearity, p(F ) approaches the value 1. When the null hypothesis is
true, the p-values for the F test must be uniformly distributed. Observing p-values that consistently creep towards 1 then raises suspicion. The deviance of consistently high p(F )-values can
then be formalized with the Fisher method.
Suppose we have p-values for the F test (p(F )) on e = 1, . . . , E independent samples for
which similar null-hypotheses of linearity hold. We may then be interested to what extent the
accumulation of tests favors the shared null hypothesis. To this end Fishers method may be
employed (Fisher, 1925). The statistic for this method is
G = 2
E
X
ln[p(F )e ],
e=1
which follows a 2 distribution with 2E degrees of freedom:

G 22E .
The p-value for the shared null hypothesis can be obtained in reference to this distribution. The
left-tail probability 1p(G) then indicates how strongly the accumulation of tests favors the shared
null.
A.1.3 Interpretation
When the observed trend moves away from linearity, p(F ) will move towards 0. When the observed trend will move towards perfect linearity, p(F ) will move towards 1. When H0 holds in
the population then, by definition, the p-values for the F test (p(F )) must be uniformly distributed. Observing p-values that consistently creep towards 1 then raises suspicion. The deviance
of consistently high p(F )-values can then be formalized with the Fisher method. Combining
results on independent samples and usage of left-tail probabilities then indicates how strongly
the accumulation of tests favors the shared null. For example, say we find from Fishers method
p(G) = 4.255229e-7 (Example taken from the results of Chapter 2). This gives a left-tail probability of 1 4.255229e-7 .9999996. Thus, the accumulation of tests on the similar null hypotheses
of linearity very strongly favors the shared null. Or, roughly speaking, under the assumption of
perfect linearity in the population, the probability of finding results at least as consistent w.r.t.
linearity amounts to 1 in 2, 350, 050 (1/4.255229e-7).
A.2 The Evidential Value V

Here, the evidential value is used as developed by Klaassen (2015).
A.2.1 Assumptions
The following assumptions are made:
A1. One-way ANOVA setting with three factor-levels;
A.2 The Evidential Value V
A2.
A3.
A4.
A5.
A6.
A7.
99
Random assignment of subjects to experimental conditions;

Equal number of observations per cell (balanced design);
ij N (0, i2 );
The x
i are ordered, s.t. x
1 x
2 x
3 ;
1 22 + 3 = 0;
The correlations between the random variables ij are denoted by
(1j , 2j ) = 3
(1j , 3j ) = 2
(2j , 3j ) = 1 ,
and necessarily satisfy 1 21 22 23 + 21 2 3 0.
A.2.2 Measure
The basic premise is that one tends to underestimate variation due to randomness when fabricating data. In the framework of the ANOVA model this is incorporated by allowing for dependence
between the measurement errors of the respective factor-levels. This dependence might also be
the result of questionable research practices. The evidential value then assesses the hypothesis of
a dependence structure in the underlying data, which indicates fabrication (or questionable research practices), versus the hypothesis of independence, which is the ANOVA model assumption
Klaassen (2015). More formally, it pitches the hypothesis of data dependence
HF : at least one i is nonzero, i = 1, 2, 3,
against the hypothesis of independence
HI : 1 = 2 = 3 = 0.
The evidential value of HF versus HI then is given as:
sup1<i <1,
f (E|HF )
V=
f (E|HI )
121 22 23 +21 2 3 >0, s(1 ,2 ,3 )s(0,0,0)
fn (z; 1 , 2 , 3 )
fn (z; 0, 0, 0)
where E stands for the evidence (in this case summary measures obtained from the publication
under investigation), and where
z=x
1 2
x2 + x
3 ,
r

n
1
nz 2
fn (z; 1 , 2 , 3 ) =
exp 2
,
2 s(1 , 2 , 3 )
2s (1 , 2 , 3 )
s(1 , 2 , 3 ) = (s21 + 4s22 + s23 4s1 s2 3 + 2s1 s3 2 4s2 s3 1 )1/2 .
Note that the condition s(1 , 2 , 3 ) s(0, 0, 0) is a restriction on HF that reflects the basic premise
that variation due to randomness tends to be underestimated when fabricating data. Consequently,
large deviations from linearity will not sort a high evidential value.
The evidential value can be computed with the help of the following theorem:
Theorem A.1 (Klaassen, 2015). Define
SL2 =
and
inf
1<i <1, 121 22 23 +21 2 3 >0, s(1 ,2 ,3 )s(0,0,0)
s2 (1 , 2 , 3 )
2 )

q
SL2 = min [2s2 (s1 + s3 )]2 , 2s2 s21 + s23
,
(
and write S02 = s2 (0, 0, 0) = s21 + 4s22 + s23 . Then

SL2 SL2 S02
holds. Furthermore, we have:
100
A Appendix: Some Technical Details on the Methods Employed
If
SL2 nz 2 S02
holds, then the evidential value becomes

S0
1
1
1
V=
exp nz 2
1.
2
nz 2
S02
nz 2
(A.1)
If
SL2 nz 2
holds, the the evidential value satisfies
"
#)
(
1
1 2 1
S0
1,
exp nz
2
V
2
S0
SL
SL2
and equals at most (A.1).

If
nz 2 S02
holds, then the evidential value becomes V = 1.
The DataVeracity.R script makes use of this result in computing V.

Some computation shows that V
6 is approximately equivalent to n|z|/S0 0.1016. Since
under the hypothesis of linearity n|z|/S0 has approximately a standard normal distribution,
this implies that the probability under the hypothesis of linearity that V equals at least 6, is
approximately equal to 0.0809. Note that this probability is smaller if the linearity does not hold,
because of the symmetry and unimodality of the normal distribution.
A.2.3 Interpretation
From Theorem A.1 it is clear that the evidential value is always greater than or equal to unity
(1). Thus, in the words of Klaassen (2015): within this framework there does not exist exculpatory evidence. Studies with data that adhere to the standards of scientific experimentation can
be expected to sort a V close to unity. The larger the value for V, the more the evidence favors
the hypothesis of dependence, i.e., of fabrication or questionable research practices, versus the hypothesis of independence. When multiple (sub)experiments are available for independent samples,
an (lower-bound to the) overall evidential value can be obtained by multiplication of individual
(lower-bound) values for V.
References
Fisher, R. (1925). Statistical methods for research workers. Edinburg: Oliver and Boyd. Available
from http://psychclassics.yorku.ca/Fisher/Methods/
F
orster, J., & Denzler, M. (2012). Sense creative! The impact of global and local vision, hearing,
touching, tasting and smelling on creative and analytic thought. Social Psychological and
Personality Science, 3 , 108117 [RETRACTED].
Klaassen, C. A. J. (2015). Evidential value in ANOVA-regression results in scientific integrity
studies. arXiv:1405.4540v2 [stat.ME] .
LOWI. (2014). Lowi advies 2014, nr. 05. Available from https://www.knaw.nl/shared/
resources/thematisch/bestanden/LOWIadvies2014nr5.pdf
Whistleblower Report. (2012). Suspicion of scientific misconduct by dr. Jens F
orster. Available
from https://retractionwatch.files.wordpress.com/2014/04/report foerster.pdf

Evaluating The Scienti C Veracity of Publications by Dr. Jens Förster

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Evaluating The Scienti C Veracity of Publications by Dr. Jens Förster

Enviado por

Direitos autorais:

Formatos disponíveis

Carel F.W. Peeters Chris A.J. Klaassen Mark A.

Evaluating the Scientific Veracity

Carel F.W. Peeters

Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III

Part I Publications as Sole Author

Part II Publications as First Author

Part III Publications as Co-author

Part IV Publications as Co-author

Part V Publications as Co-author

Part VI Concluding Remarks

1.2 Background to the F

1.3 Terms of Reference

1.7 Employing the Methods to Reference Publications

n low/high medium high/low low/high medium high/low

As co-author, data indicated as having been collected by other authors:

Publications as Sole Author

low control high

low control high

2.2.2 Expert Ratings: Global vs Local Descriptions

n low control high low control high

2.2.3 Expert Ratings: Local Descriptions

n low control high low control high

study5B.3 42/3 2.36 3.57

4.71 1.38 1.93

1.17 0.0049 0.9445 3.19589.8906

n low control high low control high F p(F )

Exp1.words 45/3 0.14 0.29 0.44 0.23 0.22 0.16

Experiment 1: Words recognition

3.2.2 Experiment 1: Face recognition

n low control high low control high

Experiment 1: Faces recognition

low control high

low control high

54/3 4.67 6.56

2.53 2.25 0.0256 0.8734

low control high

low control high

54/3 6.17 6.72

Publications as First Author

n low medium high low medium high

study1.ex1.liking 60/3 1.65

1.44 0.0019 0.9652 7.543711.8122

5.2.2 Exemplar 2, liking ratings

n low medium high low medium high

study1.ex2.liking 60/3 -0.05

1.41 0.1106 0.7407 1.7564

5.2.3 Exemplar 3, liking ratings

n low medium high low medium high

study1.ex3.liking 60/3 -2.35 -0.45

2.09 0.0073 0.9322 6.60326.6669

5.2.4 Study 1, Exemplar 1, typicality ratings

n low medium high low medium high

study1.ex1.typ 60/3 1.95

1.93 0.3666 0.5473 1.1786

n low medium high low medium high

study1.ex2.typ 60/3 -0.7

1.54 0.5328 0.4684 1.1118

n low medium high low medium high

study1.ex3.typ 60/3 -2.6

2.65 2.4036 0.1266 1.0000

Study 1: Exemplar 1, typicality ratings

5.2.7 Study 1, Exemplar 1, reaction times

n low medium high low medium high