Você está na página 1de 15

Intelligence 48 (2015) 1529

Contents lists available at ScienceDirect


Age-related changes in the mean and covariance structure of

uid and crystallized intelligence in childhood
and adolescence
Ulrich Schroeders a,, Stefan Schipolowski b, Oliver Wilhelm c

Department of Educational Science, University of Bamberg, Germany

Department of Psychology, Humboldt-Universitt zu Berlin, Germany
Department of Psychology and Education, Ulm University, Ulm, Germany

a r t i c l e

i n f o

Article history:
Received 6 July 2014
Received in revised form 21 September 2014
Accepted 16 October 2014
Available online 8 November 2014
Fluid intelligence
Crystallized intelligence
Age differentiation
Local structural equation models

a b s t r a c t
Evidence on age-related differentiation in the structure of cognitive abilities in childhood
and adolescence is still inconclusive. Previous studies often focused on the interrelations or
the g-saturation of broad ability constructs, neglecting abilities on lower strata. In contrast,
we investigated differentiation in the internal structure of fluid intelligence/gf (with verbal,
numeric, and figural reasoning) and crystallized intelligence/gc (with knowledge in the
natural sciences, humanities, and social studies). To better understand the development
of reasoning and knowledge during secondary education, we analyzed data from 11,756
students attending Grades 5 to 12. Changes in both the mean structure and the covariance
structure were estimated with locally-weighted structural equation models that allow
handling age as a continuous context variable. To substantiate a potential influence of
school tracking (i.e., different learning environments), analyses were additionally conducted
separated by school track (academic vs. nonacademic). Mean changes in gf and gc were
approximately linear in the total sample, with a steeper slope for the latter. There was little
indication of age-related differentiation for the different reasoning facets and knowledge
domains. The results suggest that the relatively homogeneous scholastic learning
environment in secondary education prevents the development of more pronounced ability
or knowledge profiles.
2014 Elsevier Inc. All rights reserved.

1. Introduction
The structure and development of cognitive abilities
have been a focus of intelligence research for over 100 years
(e.g., Cudeck & MacCallum, 2007). Among the different
factors discussed in many contemporary theories on the
structure of intelligence (Carroll, 1993; Horn & Noll, 1997;
During the preparation of this manuscript, Stefan Schipolowski was a
fellow of the International Max Planck Research School The Life Course:
Evolutionary and Ontogenetic Dynamics (LIFE).
Corresponding author at: Bamberg Graduate School of Social Sciences,
University of Bamberg, 96045 Bamberg, Germany.
E-mail address: ulrich.schroeders@uni-bamberg.de (U. Schroeders).

0160-2896/ 2014 Elsevier Inc. All rights reserved.

McGrew, 2009), fluid intelligence (gf) and crystallized

intelligence (gc) are the most prominent ones. Gf reflects
individual differences in decontextualized reasoning, that
is, the ability to arrive at understanding relations among
stimuli, comprehend implications, and draw inferences,
while gc is defined as acculturation knowledge measured
with tasks indicating breadth and depth of the knowledge
of the dominant culture (Horn & Noll, 1997, p. 69).
In the present study, we examine age-related changes in the
mean and covariance structure of fluid and crystallized
intelligence in order to better understand the development of
these prominent abilities. Previous studies on differentiation of
cognitive abilities frequently used second-order factor models.


U. Schroeders et al. / Intelligence 48 (2015) 1529

Specifically, very broad ability constructs such as gf, gc, and gs

(mental speed) are modeled as first-order factors below a
second-order g factor (e.g., Li et al., 2004; Tucker-Drob, 2009).
In such models differentiation is expressed in terms of g
saturation and the magnitude of first-order factor loadings.
However, these studies did not consider structural changes that
may occur on lower strata of the ability hierarchy. In the
present paper, we address this research desideratum by
investigating age-related changes in the factor structure of gf
and gc. We focus our examination to late childhood and
adolescence because in these periods of time important
decisions with respect to later academic or vocational training
are made and the initially homogeneous learning environments begin to diverge. Furthermore, Tucker-Drob and Briley
(2014) showed in a comprehensive meta-analysis of longitudinal twin and adoption studies that the stability of cognitive
abilities approached asymptote in late childhood. From a
behavior genetic perspective this may be due to the fact that
sources of interindividual differences in cognition (i.e., genetic,
shared environmental, and nonshared environmental) reached
high levels of stability by early adulthood. Therefore, the period
of time before reaching adulthood seems of particular importance in the present context.
1.1. Age-related changes in the mean structure of gf and gc
Studies on mean changes in gf and gc were already
conducted by Cattell, Horn, and colleagues in the context of
the theory of fluid and crystallized intelligence (Cattell,
1971; Horn & Cattell, 1967; Horn & Donaldson, 1980; Horn &
Hofer, 1992). Summarizing these results, Horn (2008)
emphasized the stability of gc with maintenance or improvements through much of adulthood whereas gf reaches
a peak in late adolescence or early adulthood followed by a
steady decline. These results were also supported by
findings from developmental psychology over the life span.
For instance, Baltes, Staudinger, and Lindenberger (1999)
described a two-component model of life span intellectual
development similar to the original distinction between gc
and gf. They concluded that the crystallized cognitive
pragmatics remain relatively stable until old age and start
to decline only in very old age. In contrast, the age trajectory
of the fluid cognitive mechanics is marked by an early rise
and decline. The characteristic mean changes with age have
been conceptualized as the result of a combination of
biological factors (e.g., maturation and aging of the brain)
and cultural factors (Baltes, 1997).
Besides the characteristic differences in the mean
trajectories between gf and gc, the changes in reasoning
ability or crystallized knowledge presumably also depend on
the specific measurement instrument. For instance, for gc
the assessment may focus on general knowledge typically
acquired in school or on more subject-specific knowledge
such as vocational knowledge. Because specialized knowledge is mainly acquired outside of school in early adulthood
(e.g., vocational training, Ackerman, 2008), the learning
curve substantially increases even beyond middle adulthood
(Ackerman, 2000; Ackerman & Rolfhus, 1999). Furthermore,
there is evidence that the age trajectory of gc strongly
depends on the content domain captured. Ackerman (2000)
reported for a sample of 228 adults aged 21 to 62 on the one

hand positive correlations between age and knowledge in

the social sciences, humanities, and civics. On the other
hand, he found significantly negative correlations between
age and the physical sciences. Among the different knowledge domains, the physical sciences had the relatively
lowest correlations with a composite of traditional verbal
gc indicators, but were comparatively highly associated with
gf. Gf itself was substantially negatively correlated with age.
These results suggest that the higher the relation of a specific
measure to gf, for example, a science knowledge test, the
more pronounced the age-related decline in that measure. In
the case of gf measures, the prototypicality of the measure
could relate to the magnitude of the decline with age.
Reversely, one could assume that the higher the relation of a
specific measure to gc, the higher the positive age-related
1.2. Age-related changes in the covariance structure of gf and gc
The research reviewed so far focused on the mean structure,
that is, on the development of the average ability level in the
population. However, possible changes in the covariance
structure of cognitive abilities are of particular importance for
several reasons: The first reason concerns the invariance of a
measurement instrument which holds if the indicators of a
measure are invariant with respect to an external variable
(e.g., gender or age; Grimm & Widaman, 2012). Measurement
invariance is often tested with multi-group confirmatory factor
analysis (MGCFA; Vandenberg & Lance, 2000). Given that
statements concerning the mean structure of latent variables
and, thus, the interpretation of mean differences on a construct
level are only feasible if strong measurement invariance holds,
the inspection of possible changes in the covariance structure
is an essential prerequisite for such statements. Second,
assumptions about differentiationdedifferentiation processes
are highly relevant for the development of theories on
cognitive abilities. For example, in the domain of crystallized
abilities Carroll (1993, p. 145) advocated an age-related
differentiation of language skills, according to which language
skills are limited to understanding oral input and to rudimentary speaking skills in early childhood. But through informal
and formal education they are hypothesized to become
increasingly more complex and diverse. The question of
development is inherently linked to questions of the generalizability of findings to different phases of the life span.
Research on age-related differentiation and dedifferentiation
of cognitive abilities (e.g., Baltes, Cornelius, Spiro, Nesselroade,
& Willis, 1980; Baltes et al., 1999) dates back to the early
observation that in childhood intellectual abilities might be
correlated more highly than in adulthood (Garrett, 1946).
Accordingly, the factor structure of cognitive abilities might be
less differentiated in childhood than it is in adolescence. This
assumption is usually studied by investigating a) the number of
factors required to account for individual differences in cognitive
abilities or b) the factor intercorrelations of a fixed set of
presumed cognitive abilities. With maturation the factor structure is hypothesized to differentiate and to be relatively stable in
adulthood until old age when dedifferentiation takes place. The
lower complexity of the structure of cognitive abilities both at
the beginning and at the end of life span may be explained with a
stronger influence of neurobiological constraints on intellectual

U. Schroeders et al. / Intelligence 48 (2015) 1529

functioning in these stages of ontogenesis (i.e., brain maturation

and aging, including dementia with a high prevalence in very old
age; see Baltes et al., 1999).
The age differentiationdedifferentiation hypothesis has
received mixed evidence. For example, Li et al. (2004)
administered a broad battery of 15 cognitive tests including
memory, reasoning, and verbal knowledge tasks, to 291
participants aged 6 to 89 years. They considered the number
of dominant principal components (i.e., with eigenvalues N1)
as an estimate for structural complexity in different age groups
and reported fewer dimensions at both ends of the life span
compared to middle-aged individuals. For a young age group,
that is, children aged 3 to 7 years, Tideman and Gustafsson
(2004) found empirical support for the notion that cognitive
abilities differentiate with increasing age.
However, there is also contradicting evidence. Gignac
(2014) tested the strength of the g factor expressed by omega
hierarchical across a wide age range (2.5 to 90 years) and
observed a small increase in its magnitude from 2.5 to
10.0 years. This could be interpreted as dedifferentiation in
early childhood, even though this finding is possibly a statistical
artifact due to differences in test compilation across age groups
(see Gignac, 2014). For this early age, also a reanalysis of the
German standardization sample of the SON-R 2 1/2-7
(Tellegen, Laros, & Petermann, 2007), a nonverbal intelligence
test battery for children between the ages of 2.5 and 7 years,
showed very little change in the relationship between a
reasoning and a performance factor in early childhood (Hlr,
Wilhelm, & Robitzsch, 2011). The age differentiationdedifferentiation hypothesis was also rejected for other phases of life,
that is, for early childhood to adulthood (Bickley, Keith, &
Wolfle, 1995; Juan-Espinosa, Garca, Colom, & Abad, 2000) and
early maturity to senescence (Escorial, Juan-Espinosa, Garca,
Rebollo, & Colom, 2003). Considering the entire life span,
Tucker-Drob (2009) provided little evidence for age differentiation by means of nonlinear factor analyses of the standardization data of the WJ-III (Woodcock-Johnson III; Woodcock,
McGrew, & Mather, 2001) which included measures of seven
broad ability factors (among others gf and gc) and individuals
aged 4 to 101 years. In contradiction to the assumptions of the
age differentiationdedifferentiation hypothesis, the analyses
even hinted at the reverse effect pattern, that is, dedifferentiation in school age.
In contrast to most studies in the research literature on
differentiation that investigated broad cognitive abilities,
Tucker-Drob and Salthouse (2008) also considered differentiation of performance in different subtests for gf, gc, and other
ability factors. In a sample of adults (aged 24 to 91 years) the
loadings of the indicators on the factors changed only slightly
with age, providing evidence for a high stability of the factor
structure on lower-order strata.


into subsamples using arbitrary cutoff values to subsequently

analyze and compare these subsamples using MGCFA. While
this method may be appropriate or at least plausible for gender
or migration status, most context variables such as age are
continuous in nature. In these cases, defining artificial groups
leads to a loss of information and can produce spurious results.
More precisely, results may differ depending on how one
chooses the arbitrary cutoff values because near the thresholds,
observations are more similar across groups than within
groups. In the research literature on differentiation of cognitive
abilities, different approaches have been proposed to avoid
arbitrary groups. Specifically, Latent Moderated Structural
Equation Models (Klein & Moosbrugger, 2000) have been used
to introduce additional age terms to the standard model in
order to establish whether the linear relationship between two
variables is moderated by age (for an application see TuckerDrob, 2009).
In the present paper, we estimated Local Structural Equation
Models (LSEMs; Hildebrandt, Wilhelm, & Robitzsch, 2009) which
are traditional SEMs with weighted observations. The weight is
defined by the proximity of an observation to a specific value on
a context variable such as age at a certain focal age point. With
increasing distance of the observation from the focal point,
the weight decreases according to a normal distribution. This
functional relationship is exemplarily depicted in Fig. 1 with a
focal age point of 15.8 years. The core idea is that observations
near the focal point provide more information for the corresponding SEM than the more distant observations (e.g., students
aged 13 years). If the age of the student is equal to the focal point,
the weight is set to 1; if the difference is 10 months, the weight is
about .50, and so on. For each focal value of the context variable, a
separate SEM is estimated resulting in a series of models that
provide gradients of model parameters (e.g., means, correlations,
factor variances, and model fit indices). In the terminology of
LSEMs, the weights of a MGCFA are set to 1 in one age group and
to 0 for all other age groups. The more the weights of the LSEM
resemble such a dichotomous weighting, the closer the results of
both methods will match. Hildebrandt et al. (2009) pointed out
that both MGCFA and latent moderated SEMs can be understood
as approximations of LSEMs.

1.3. Methodological aspects of testing for differentiation

Testing for differentiation of cognitive abilities poses a
number of methodological and statistical challenges. This is not
limited to the investigation of age differentiation, but also holds
true to a greater extent for testing ability differentiation
(e.g., Molenaar, Dolan, & Verhelst, 2010; Molenaar, Dolan,
Wicherts, & van der Maas, 2010; Murray, Dixon, & Johnson,
2013). In developmental research, samples are often divided

Fig. 1. The weight function for the focal age point 15.1 years. Note. Dashed lines
indicate the age points at which an observation will get a weight of 0.5.


U. Schroeders et al. / Intelligence 48 (2015) 1529

Compared to MGCFA, LSEMs have some advantageous

features for analyzing differentiation. First, LSEMs make the
artificial categorization of a continuous context variable such as
age obsolete and avoid the associated loss of information. In
many instances, this method would describe the underlying
relationship more appropriately. Even in seemingly dichotomous
cases (e.g., migration status) the question arises whether the
relevant attribute is not in fact continuously scaled (e.g., mastery
of test language). Second, LSEMs impose lower requirements
regarding sample size because for each focal point all cases are
considered in the computation with varying weights. Thus, low
density or even small gaps in the sample distribution can be
smoothed with adjacent observations. The higher flexibility
regarding sampling and the effective usage of observations make
this method particularly attractive for developmental research
across the life span. These advantages also apply to latent
moderated SEM. Third and in contrast to nonlinear methods,
LSEMs produce nonparametric age curves. Thus, the fluctuation
of parameters (e.g., factor loadings) can be analyzed as a function
of a context variable. In the MGCFA approach parameters of two
or more groups are often considered without establishing their
functional relationship (e.g., negative linear vs. logarithmic
trend) or the onset of change.1 Gradients of model parameters
as derived from LSEMs can provide valid information for theory
1.4. Research questions
To summarize, the evidence for age differentiation of
cognitive abilities is still inconclusive and often relying on
statistical methods that are not adequately dealing with the
continuous context variable. But most importantly, research
efforts in this field have concentrated on broad ability constructs
without considering structural changes on lower strata of the
ability hierarchy. That is, the internal structure of broad factors
such as gf and gc is considered stable over age. As a consequence
of examining differentiation with second-order factor models,
our knowledge about changes in the factor structure of cognitive
abilities across the life span is still limited (Baltes et al., 1999) and
often restricted to broad abilities (i.e., stratum II factors). In the
present study, we provide an in-depth analysis of age-related
changes in the mean structure and the factor structure of fluid
and crystallized intelligence in late childhood and adolescence.
To study the internal structure of gf we assessed fluid
intelligence with reasoning scales covering verbal, numeric,
and figural content materials, because the distinction between
these content domains has been shown to be the most
important distinction under a common gf factor (Wilhelm,
2004). In line with the description of gc as acculturation
knowledge in the extended gfgc model (Horn & Noll, 1997,
p. 69) we assessed gc with declarative knowledge items. The
knowledge tests covered factual knowledge in a broad variety
of 16 content domains from the natural sciences, the humanities, and social studies.
In our examination, we first describe the average mean
changes over time for the gc factors (sciences knowledge,
humanities knowledge, and social studies knowledge) and for
Please note, however, that by imposing linear and quadratic age constraints
across groups, it is also possible to examine age differences in the factor
structure by means of MGCFA (see Tucker-Drob & Salthouse, 2008).

the reasoning factors (verbal, numeric, and figural reasoning).

With regard to the age trajectories for gf and gc outlined above,
we expected that the slope of the age gradients is higher for gc
than for gf. Furthermore, we want to shed light on the question
whether the knowledge gain is stronger for specific domains in
childhood and adolescence. In accordance with the findings of
Ackerman (2000) for adults, one could assume that the higher
the relation of a specific measure to gc, the higher the positive
age-related gains. Measures depending more strongly on gf
such as items tapping the natural sciences may be subject to the
early decline in gf abilities which might result in a less
pronounced increase. On the other hand, such knowledge
items were presumably more strongly anchored in the school
curriculum which might cause a steeper knowledge gain in the
sciences. With respect to the internal structure of gf, we would
predict the mean change to be especially small for the figural
aspect since it is often conceptualized as prototypical for
reasoning (Wilhelm, 2004). On the other hand, the mean
change is presumably stronger for the verbal domain due to the
higher relations to gc.
Second, we investigate age-related changes in the
internal structure of gf and gc as well as the relationship
between both ability factors. The assumption is that overall
gc is more prone to age-related differentiation than gf, since
gc is mainly the result of educational training and the
opportunities for learning become more diverse in the
course of educational training with the stratification of the
school system. Secondary education, usually starting with
Grade 5, is a particularly important stage of formal training
in Germany because it marks the beginning of a tracked
school system. Accordingly, students in different tracks are
exposed to different learning environments (Baumert,
Stanat, & Watermann, 2006; Becker, Ldtke, Trautwein,
Kller, & Baumert, 2012). From Grade 9 on, most students
also get the opportunity to specialize through course
selection which should expand the knowledge in one area
at the expense of another (e.g., by dedicating more study
time to the sciences than to the humanities). Furthermore,
differences in the exposure to out-of-school knowledge due
to different home environments and increasingly different
leisure time preferences (e.g., reading at home; Rolfhus &
Ackerman, 1999) may contribute to the differentiation of gc.
There is some evidence for complex knowledge structures in
young and middle-aged adults (Ackerman, 2000; Rolfhus &
Ackerman, 1999). Whether a differentiation of knowledge
structures is already present in secondary education has not
yet been studied systematically. Gf has been shown also
to be sensitive to differences in the individual learning
environment (e.g., Becker et al., 2012; Ceci, 1991), even
though to a lesser degree than gc. In order to substantiate a
potential influence of school tracking (i.e., different learning environments) on differentiation, results will also be
reported for academic vs. nonacademic-track school types.
2. Method
2.1. Design and participants
Analyses were based on the German standardization
sample of the Berlin Test of Fluid and Crystallized Intelligence
(BEFKI; see e.g., Wilhelm, Schroeders, & Schipolowski, 2014). A

U. Schroeders et al. / Intelligence 48 (2015) 1529

multiple-matrix booklet design (Gonzalez & Rutkowski, 2010)

was used to keep the individual workload for the students
within manageable limits. That is, not all students were
administered all items for all constructs. In total, 72 different
booklets were used. Half of the booklets contained items on
both gf and gc, the other half either gf (24) or gc (12). After
completing the test items all students were given a
sociodemographic questionnaire where they indicated their
age, among other demographics. Depending on the grade level,
the complete test session including covariates and the
questionnaire took between 90 and 180 min.
After excluding students that did not indicate their age
(n = 212), the total sample for the following analyses
consisted of N = 11,756 students (50.6% female; nmiss. = 7)
attending Grades 5 to 12. Students from all German federal
states (except for the smallest one) and all school types of the
German general educational system were included in the
sample. Overall, almost one third of the sample (31.5%)
attended academic-track Gymnasium schools, 16.4% were
enrolled in intermediate-track Realschule schools, and 20.2%
attended vocational-track Hauptschule schools. The rest of
the sample (31.9%) was enrolled in mixed-track schools
(e.g., Gesamtschule) or advanced vocational schools. In the
following, we refer to Gymnasium as academic-track schools
and all remaining school types as nonacademic-track schools.
Mean age was 14.81 years (SD = 2.29) and 95% of the sample
was between 10.6 and 18.7 years old (range 9.123.6). Testing
took place between April 2008 and June 2010 at different time
points within the school year.
Despite the overall good coverage of the different school
tracks within the sample, there were some restrictions
with respect to the sample composition in Grade 8. More
specifically, the BEFKI standardization sample was embedded in an educational assessment study that intentionally
targeted students who pursued the least demanding regular
school certificate. As a consequence, no students from the


intermediate-track Realschule or academic-track Gymnasium were included in the sampling frame for Grade 8,
resulting in a considerable gap in the age distribution for
academic-track students between 13.5 and 14.5 years (see
Fig. 2, for the number of students in the academic group, left
side, and nonacademic group). The few remaining observations in the gap represented students from adjacent grades
(i.e., mostly students that either repeated or skipped one or
more grades). As pointed out above, through the weighting
of observations in LSEMs, it was possiblewithin certain
limitsto compensate for gaps within the distribution of the
context variable. Furthermore, the overrepresentation of
students attending Hauptschule and comparable tracks
leading to the lowest school certificate also resulted in a
distinct ability-restricted subsample for this particular
grade. These restrictions in the sampling procedure have
to be kept in mind when interpreting the following results.
2.2. Measures
To allow for efficient measurement, three age-adapted test
forms were used to assess fluid and crystallized intelligence.
Specifically, students in Grades 5 to 7 received the easiest test
form, whereas the most difficult test was administered in
Grades 11 and 12; scales of intermediate difficulty were used in
Grades 8 to 10. The test forms were linked with anchor items,
allowing placing students of all grades on a common metric.
The fluid intelligence part of the BEFKI covered the three
content facets verbal, numeric, and figural. The content material
has been shown to be the most prominent distinction under a
common gf factor (Wilhelm, 2004) which in turn is very closely
related or even identical to g (Kyllonen & Christal, 1990). The
verbal aspect of fluid intelligence (gfv) was measured with tasks
for relational reasoning; participants had to derive logically valid
conclusions based on a set of given premises. The numeric part
(gfn) consisted of mathematical text problems; solving these

Fig. 2. Number of observations for students attending academic-track schools and nonacademic-track schools at all focal age points. Note. The age distributions depicted
here are restricted to the range of focal age points between 11.6 and 18.5, even though adjacent observations outside this range were included in the weighting in
subsequent LSEM analyses. Left panel: nacad. = 3705 comprised of all students enrolled in academic-track who worked on at least one of the intelligence tests. The gap in
the distribution can be ascribed mainly to the missingness in Grade 8. Right panel: nnonacad. = 8051 comprised of all students enrolled in nonacademic-track schools
who worked on at least one of the intelligence tests.


U. Schroeders et al. / Intelligence 48 (2015) 1529

items primarily required mathematical modeling, but only basic

mathematical knowledge. The figural reasoning scale (gff) was
composed of a sequence of geometric drawings that changed
their shading and form according to certain rules. Participants
had to infer these rules in order to determine the missing
drawings. The individual student worked on 16 to 32 multiple
choice items per content facet. Considering the different test
forms, a total of 192 multiple choice items were used in the study
(64 per facet). There were 24 anchor items linking Grades 57
with Grades 810 and 24 items linking Grades 810 with Grades
1112. A total of 12 of these anchor items were identical in all
three test forms. An additional set of 12 items form a link
between Grades 57 and Grades 1112.
Crystallized intelligence was assessed with declarative
knowledge items covering 16 content domains from three
broad areas: natural sciences (physics, chemistry, biology,
medicine, geography, technology), the humanities (art, literature, music, religion, philosophy), and social studies (history,
law, politics, economy, finance). Item development and the
choice of knowledge domains aimed at covering the breadth
and depth of the knowledge of the dominant culture (Horn &
Noll, 1997, p. 69), taking into account knowledge that is
commonly rewarded in society, considered important, and a
cultural goodin contrast to trivial, short-lived knowledge
(such as soccer results or the bus schedule). Both curriculumrelated and out-of-school knowledge was included. The
individual student worked on 64 to 128 multiple choice items
per knowledge area. Considering the different test forms, 240
knowledge items were used (90 for knowledge in the natural
sciences, and 75 for the humanities and the social sciences,
respectively). Adjacent test forms were linked with 32 anchor
items, that is, there were 32 items linking Grades 57 with
Grades 810 and 32 items linking Grades 810 with Grades 11
12. A total of 16 of these anchor items were part of all three test
forms. An additional set of 16 items were presented both in
Grades 57 and in Grades 1112.

et al., 2009; Hlr et al., 2011). In this permutation test, LSEMs

were run with 1000 datasets for statistical inference. In each
data set age was randomly assigned to individuals, although
the distribution of age originated from the observed sample.
This approach ensured that the ability data were completely
independent of the context variable in the permuted data set.
Thus, allowing testing whether changes of the gradients in the
real data set are connected to age. Through the random
assignment, the results of the permutation test were adjusted
for a general effect of age on ability. Therefore, the shape of the
parameter estimates has to be compared between the observed
and the permuted data sets rather than absolute values.
In order to estimate the ability of all participants on a single
scale regardless of the test form, we used Weighted Likelihood
Estimates (WLEs; Warm, 1989) derived from estimations with
the one-parameter logistic model. WLEs were standardized
(M = 0; SD = 1) with all students in Grade 9 as reference
group. IRT scaling is the recommended procedure to compensate for slight deficits in test compilation with regard to the
distribution of item difficulties (Tucker-Drob, 2009). Scaling
was conducted separately for each gf facet and for each of four
item parcels per facet. For gc, the scaling was conducted
separately for each of the 16 content domains. Accordingly,
12 gf and 16 gc indicators were used in the LSEMs. From the 420
items aggregated in the 28 parcels only 2 items had an
information-weighted item fit value (infit) above 1.1 and
none had an infit above 1.2 which is often used as a cutoff-value
for good item fit (Bond & Fox, 2001).
In the total sample, skewness was not an issue for the
context variable age (see statistical objection of Murray et al.,
2013). R statistics (R Development Core Team, 2014) was used
for data preparation, weighting, and plotting. WLEs were
estimated in ConQuest 2.0 (Wu, Adams, Wilson, & Haldane,
2007). LSEMs were estimated with Mplus 7.11 (Muthn &
Muthn, 19982014) taking into account the nested structure
of the data, that is, students within classes, with the CLUSTER

2.3. Statistical analyses

3. Results
In this paper, we implemented the recently introduced
method of Local Structural Equation Models (LSEMs). As mentioned above, in LSEMs weights are used to treat context
variables as continuous variables instead of creating artificial
categories. The statistical procedures for defining the bandwidth
of the kernel function, calculating the weights for every focal point and rescaling of the weights have been described
elsewhere (Hildebrandt, Sommer, Herzmann, & Wilhelm, 2010;
Hildebrandt, Wilhelm, Herzmann, & Sommer, 2013; Hildebrandt
et al., 2009; Hlr et al., 2011). Note that weighted models could
not be calculated for focal points at the very edges of the
distribution of the context variable because estimation for each
focal point requires a sufficient number of cases below and above
that focal point (symmetrical weighting). Measurement error
was taken into account by estimating the 95% confidence interval
for all reported correlations.
One challenge of this new method is that the samples
used for the LSEMs are not independent (as in MGCFA), but
overlapping and therefore dependent. This prevents the use of
established indices to assess the significance of changes in
measurement parameters. In order to allow for traditional
inference tests we applied a permutation test (Hildebrandt

3.1. Changes in mean structure across age

The subject of the first research question was the
average gain in gf and gc in late childhood and adolescence.
Graphical representations of age-related changes in the
mean structure of the gf and gc facets in the total sample
are given in Fig. 3. The data points represent the means of
latent variables that were estimated with locally-weighted
three-dimensional measurement models in separate runs
for gf and gc. In order to freely estimate the means of the
latent variables, the effects-coding method was implemented. This method of identifying and scaling latent
variables fixes the mean of the loadings to 1 and the sum of
the intercepts to 0 for each factor in each measurement
model (Little, Slegers, & Card, 2006). As a consequence,
factor loadings, variances, and latent means can be
estimated and tested simultaneously. In contrast to other
scaling methods, effects-coding has the advantage that the
estimated latent variances and means reflect the observed
metric of the indicators, optimally weighted by the degree
to which each indicator represents the underlying latent

U. Schroeders et al. / Intelligence 48 (2015) 1529


Fig. 3. Change in the means of fluid and crystallized intelligence facets across age in the overall sample. Note. ngf = 10,652; ngc = 8673. Parameters are standardized
means of the latent variables in the locally-weighted SEMs. Shaded areas indicate the 95% confidence interval. gfv = verbal fluid intelligence, gfn = numeric fluid
intelligence, gff = figural fluid intelligence, sci = science, crystallized intelligence, hum = humanities, crystallized intelligence, soc = social studies, crystallized

construct. (Little et al., 2006, p. 63). As Tucker-Drob and

Salthouse (2008, p. 455) pointed out, the variances can be
directly interpreted as the average amount of variance in
each indicator accounted for by the factor. In the context
of differentiationdedifferentiation testing, decreasing
factor loadings and factor variances over age therefore
indicate differentiation at the indicator level. In Fig. 3 the
gradients around the means mark the 95% confidence
interval (mean 1.96 SE) for the means of the latent
For all variables there was an approximately linear increase,
whereas the difference between the youngest participants
(aged 11.7) and the oldest participants (aged 18.8) was
substantial for both gf and gc. Table 1 gives the mean change
per year for all subdimensions. The average age-related change
per annum was higher for gc (d = 0.33) than for gf (d = 0.22).
The slight decrease in the means around age 14 which is more
prominent for the gf curves was due to the ability restriction in
the subsample for Grade 8 (see Design and participants).
Knowledge gains over time were slightly higher for the natural
sciences and social studies (average d = 0.36) than for the
humanities (d = 0.27, see Table 1). Considering the shape of
the trajectories for the different subdimensions, there was no
indication of differential effects across secondary education.
Fig. 4 displays the means for all latent variables separated by
school track. Age-related changes in the nonacademic track
were essentially linear for the gf and gc subdimensions. The
trajectories of the mean changes in the academic track were
somewhat s-shaped. Beyond age 16 there is no substantial gain
in the reasoning or knowledge scale (Table 1). As in the total
sample, the slopes of the gradients were higher for gc.
Interestingly, for gcwith the exception of the humanities
the mean differences between the academic-track vs. the
nonacademic-track students increased with age whereas for gf,
the difference between the school types decreased. Similar to

the total sample, the age trajectories were very similar for all
subdimensions of gf and gc.
3.2. Age-related differentiation
To check for structural differentiation in gf and gc, we
estimated 1) a three-dimensional model for gf distinguishing
between the verbal, numeric, and figural facets and 2) a model
for gc with three correlated factors capturing knowledge in the
natural sciences, the humanities, and social studies. According
to widely used cut-off values (Hu & Bentler, 1999), the fit of the
three-dimensional models in the total sample was good (for gf:
CFI .992, RMSEA .009; for gc: CFI .969, RMSEA .016).
Fig. 5 displays the correlations between latent variables
representing the gf and gc facets with their corresponding 95%
confidence intervals in the overall sample. The correlations
between the broad knowledge domains were exceedingly high
( N .90). Despite the high correlations, the three-dimensional
model described the data slightly better than the onedimensional model across the entire age range; the difference
in model fit ranged between .002 (CFI) .005 in favor of
the three-dimensional model.2 As has been pointed out
previously (Molenaar, Dolan, & Verhelst, 2010; Molenaar,
Dolan, Wicherts, & van der Maas, 2010), the factor correlations
of gf and gc represent only one potential source of differentiation. For the three-dimensional models considered in our
analyses, other possible sources of differentiation were a) the
variances of the latent variables, b) the factor loadings of the
indicators, and c) the residual variances of the indicators. The
Because in LSEM the effective n varies between the focal points of the
context variable, model comparisons between the one and three-dimensional
models also depend on the sensitivity of the t measure to sample size. We
chose the Comparative Fit Index (CFI) and the Root Mean Square Error of
Approximation (RMSEA) which are less dependent on sample size than other


U. Schroeders et al. / Intelligence 48 (2015) 1529

Table 1
Change in gf and gc facets per year expressed as effect sizes.
Ability construct

Effect size






Avg. (d)

Total sample
Social studies








Academic subsample
Social studies








Nonacademic subsample
Social studies








Note. Effect sizes are given as standardized mean differences between consecutive years (e.g., d(12/13) indicates the mean difference between students aged 13 and
students aged 12); higher values indicate better mean performance of older students. In the last column the average change per annum is given.

gradients of factor variances are suited to assess differentiation

on the construct level, whereas the other parameters would
reflect changes in the indicators (i.e., issues with the measurement instrument). An age-related decrease in factor variances
would imply differentiation in the ability constructs (see also
Tucker-Drob & Salthouse, 2008). The changes in the gradients
of factor variances were small (see Fig. 6). Similar to the factor
correlation gradients, there was a small decline in gf and gc

variances above age 17 which indicated a small differentiation

effect. For all other potential sources of differentiation, the agerelated parameter changes were small and unsystematic over
time. Changes of all measurement parameters are given in
Tables 1 and 2 of the online supplement.
The gradients of the permutation test ran parallel to the
x-axis (see Fig. 5). Because the results of the permutation test
were trimmed for the main effect of age, not the differences in

Fig. 4. Age-related change in the means of gf and gc across age split by school track. Note. The sample was split into academic-track schools (data points with lighter shades of
gray; ngf, acad. = 3590; ngc, acad. = 2750) versus all remaining school types (ngf, nonacad. = 7062; ngc, nonacad. = 5923). gfv = verbal fluid intelligence, gfn = numeric fluid
intelligence, gff = figural fluid intelligence, sci = science, crystallized intelligence, hum = humanities, crystallized intelligence, soc = social studies, crystallized intelligence.

U. Schroeders et al. / Intelligence 48 (2015) 1529


Fig. 5. Age differentiation in terms of factor correlation gradients of gf and gc subdimensions. Note. ngf = 10,652; ngc = 8673. gfv = verbal fluid intelligence, gfn = numeric
fluid intelligence, gff = figural fluid intelligence, sci = science, crystallized intelligence, hum = humanities, crystallized intelligence, soc = social studies, crystallized

the height of parameters but in the progression are decisive for

examining differentiation. There was no indication of structural
differentiation for any of the gc dimensions; all gradients in the
real data set corresponded to their permutated counterparts.
For gf, however, there was a small decrease. The first small dent
for the gf factor correlationsespecially those that included the
figural aspectat about age 14 might be attributed to the

ability restriction of the sample in Grade 8 (see Design and

participants). This explanation was reaffirmed by splitting the
sample into the academic track and nonacademic-track school
types (see Fig. 7). When considering the academic trackfor
which there was no ability restriction or overrepresentation of
students with low educational aspirationsthe gradients were
essentially linearly declining. The slight and constant decrease


U. Schroeders et al. / Intelligence 48 (2015) 1529

Fig. 6. Age differentiation in terms of factor variance gradients of gf and gc subdimensions. Note. ngf = 10,652; ngc = 8673. gfv = verbal fluid intelligence, gfn = numeric
fluid intelligence, gff = figural fluid intelligence, sci = science, crystallized intelligence, hum = humanities, crystallized intelligence, soc = social studies, crystallized

that was observable in the academic track subsample, but

not in the nonacademic track subsample could be taken as
evidence of an interaction between ability and age. In the total
sample, these relationships were less clearly visible due to
different selection effects. Taking into consideration the
confidence intervals of the factor correlations, however, the
observed differentiation effects were small and of no practical
consequences. It is interesting to note that small differentiation
effects seemed to occur when changes in the mean structure
are comparatively weak, that is, in the academic sample and for

gf. From a measurement perspective, one might have expected

that large changes in the means provide optimal conditions for
the expression of differentiation.
One anonymous reviewer suggested that the fact that
children of different age groups received ability-tailored tests
was problematic because if parameters showed age-related
changes over time, it remained unclear whether this was due to
the fact that the children in the different grades completed
different items or due to differentiation. In order to examine
the influence of linking on the results, we replicated the LSEMs

Fig. 7. Age differentiation as indicated by the gradients of factor correlations split by school track. Note. ng f, acad. = 3590; ngc, acad. = 2750; ng f, nonacad. = 7062;
ngc, nonacad. = 5923. The results for gf and gc are reported separately for academic-track schools (left panel) and all remaining schools. The gradients of data
points in lighter gray represent correlations between the gc-facets. gfv = verbal fluid intelligence, gfn = numeric fluid intelligence, gff = figural fluid
intelligence, sci = science, crystallized intelligence, hum = humanities, crystallized intelligence, soc = social studies, crystallized intelligence.

U. Schroeders et al. / Intelligence 48 (2015) 1529

with a reduced item pool of 12 gf and 16 gc items that were

administered to all subjects (i.e., anchor items present in all test
forms). Even though the gradients were expectedly less
precisely estimated for these anchor items, the gradients for
both the gf facets and the gc facets were similar to the original
analyses, especially when taking into account the underlying
number of items (gf: 12 vs. 180; gc: 16 vs. 240) and the different
estimators (ML for the continuously scaled WLEs vs. WLSMV
for the dichotomously scored items). A comparison of the
gradients obtained from the different scoring methods can be
found in the online supplement (Fig. 1). A further argument for
using age-adjusted test forms with subsequent linking to locate
the participants' ability on a common scale is that it exerts
considerable advantages in contrast to alternatives (e.g., sum
score of a fixed set of items). First, it is challenging to construct
items that are suited for both 12 and 18 year old students with
respect to item difficulty. Second, substantially more items and
testing time would be required for each individual. Third, in
contrast to a sum score, IRT scaling can compensate specificities
in test compilation. Tucker-Drob (2009) showed in a simulation study that sum scores are particularly problematic when
there is a mismatch between item difficulties and the
distribution of person abilities and that IRT scaling can reduce
such biases.
We also specified a two-dimensional higher-order model
with gf and gc as second-order factors and the six facets
(i.e., verbal, numeric, and figural reasoning, and knowledge in
the natural sciences, the humanities, and social studies) as firstorder factors to examine the relationship between gf and gc in
more detail. The correlation between gf and gc was stable over
age, both for the total sample (.74 .84) and the
subsamples (academic track: .61 .75; nonacademic
track: .73 .80; see Fig. 8). To summarize, with the
exception of the above mentioned local decrease for gff, we
found no indication of age-related differentiation, neither

Fig. 8. Correlation of gf and gc in a two-dimensional higher-order model over

time. Note. ntotal = 11,756; nacad. = 3705; nnonacad. = 8051.


between gf and gc nor within these constructs. This observation

also holds for the subgroup-specific analyses.
4. Discussion
In the present study we investigated developmental
changes in the mean and the covariance structure of fluid and
crystallized intelligence. In comparison to previous research
on age differentiationdedifferentiation of cognitive abilities
(e.g., Li et al., 2004; Tucker-Drob, 2009), we did not limit our
investigation to the broad ability constructs, but particularly
considered changes in the internal structure of gf and gc.
4.1. Age-related changes in the mean structure of gf and gc facets
Considering the subdimensions of gf (i.e., verbal, numeric,
and figural reasoning) and gc (i.e., knowledge in the sciences,
humanities, and social studies), we found empirical support for
the hypothesis that crystallized abilities show stronger agerelated gains. The steeper slope for gc-type abilities during
secondary education was expected since gc is especially sensitive
to learning and (formal) education, arguably more so than fluid
intelligence (Hunt, 2008), even though this is still subject to
debate (see Gustafsson, 2008, for diverging results). In the age
range considered in this studyfrom 11.5 to 18 yearswe also
found an approximately linear positive trend for gf-type abilities
in the total sample.
No differential effects could be ascertained for the mean
gradients of the different subdimensions, neither for the gf facets
nor for the gc content domains. For example, knowledge gains in
the natural sciences were neither higher (as might be assumed
because science knowledge is more strongly connected to the
school curriculum) nor lower (as might have been the case due
to the higher overlap with gf) than gains for the other knowledge
domains. It is conceivable that the opposing effects for the
science domain neutralized one another. A slight deviation from
the approximately linear trend could be observed only for gff that
is probably due to fidelity problems of the gff measure in lower
grades (i.e., a lack of discriminative power at the lower end of the
ability distribution). Furthermore, the small dent in the mean
gradients at age 14 can be attributed to the sampling effects in
Grade 8 (see Design and participants).
The school track-specific investigation revealed linear trends for both the gf and the gc subdimensions in
the nonacademic group whereas the mean trajectories of
students attending academic-track schools were s-shaped.
That is, for students in the most demanding school type
gains in gf and gc over time followed a continuous function
that is flattened at the beginning and particularly at the
end of secondary education when compared to the steeper
gradients in the middle age range. These different functional relationships across school track may be due to
selection effects: Whereas the academic group is homogeneous over the course of secondary education (i.e., most
students enrolled in Gymnasium in Grade 5 remain in this
school type), the composition of the nonacademic group is
changing. That is, especially after completing Grade 9 and
again after Grade 10 many students in nonacademic tracks
leave the general educational system and as a result the
remaining students are on average more capable and strive
for more advanced certificates.


U. Schroeders et al. / Intelligence 48 (2015) 1529

In the same vein, the group of students in academic-track

schools represents an ability-selected subsample. We found
diminishing gains for both gf and gc at the end of adolescence.
The mean gf trajectories were consistent with the literature on
the development of cognitive abilities assuming an early peak of
fluid abilities in early adulthood (e.g., Horn, 2008). Surprisingly, this curvilinear trend was also present for the gains in
knowledge. One explanation for this finding is that the schooling
system may be less effective in conveying additional knowledge
to competent students at the end of upper secondary education,
but further research is needed to investigate this possibility.
Another explanation that cannot be ruled out completely is that
the measures were not sensitive enough to adequately capture
the knowledge gains that occur at the end of secondary
education. However, the use of age-adapted test forms (see
Measures) and the absence of ceiling effects in this (or any other)
age group contradict this explanation.
With regard to mean differences between school tracks,
Becker et al. (2012) examined the influence of a tracked
school system on decontextualized reasoning in a longitudinal data set from Grades 7 to 10 and reported increasing
differences between the tracks. Constraining the findings
of our study to the same age range (1316 years), we could
replicate these results. However, taking into account
the extended age range investigated here, the difference
between academic vs. nonacademic-track schools was
smaller at the end of secondary education (see Fig. 4). This
may be largely due to the composition of the nonacademic
group described earlier, since only the more competent
students will continue their general school education
beyond Grades 9 and 10, respectively (e.g., instead of taking
up an apprenticeship), which reduces the performance gap
to the academic-track group.
4.2. Age-related changes in the covariance structure of gf and gc
Despite its importance for interpreting changes in the
mean structure and ensuring the validity of the measure, the
covariance structure is often neglected in research on
the development of cognitive abilities. For gf, changes of
the internal factorial structure in the course of secondary
education may seem unlikely since decontextualized reasoning is often characterized as being less sensitive to
education (Ackerman, 2000). However, empirical evidence
challenging the strong assumption that changes in reasoning
ability are independent of the context (i.e., schooling) is
accumulating (Becker et al., 2012; Cahan & Cohen, 1989). For
gc, on the other hand, the structural invariance of the
construct over age is questionable: As the dependency on
learning, education, and acculturation is at the core of the
definition of crystallized intelligence (Cattell, 1943, 1971), it
is plausible to assume that gc changes its very nature
with culture and age (Cattell, 1971, p. 128). Beier and
Ackerman's (2001) examination of knowledge about events
from the 1930s to the 1990s is an example illustrating the
variability in declarative knowledge over time. Cattell
(1971) argued that a structural differentiation of gc takes
place only after school (p. 121; italics in original) due to
the lack of a common, standardized educational treatment.
However, individual differences in the learning environment

and increasing specialization are already present during

secondary education and may cause an early differentiation
of knowledge structures. Hitherto, this assumption has
not yet been systematically tested by means of a broad
knowledge assessment.
In our analyses of cross-sectional data with the recently
introduced method of LSEMs, we found little evidence for age
differentiation in the internal structure of reasoning from late
childhood to the onset of adulthood. Even though in comparison to the results of the permutation test there was a slight
decline in the gradients of factor correlations, the effect was
rather small, which is consistent with Cattell's (1971; see
also Ackerman, 1996) assumptions. We also observed a high
stability between the second-order factors, that is, the
correlation between gf and gc (see Fig. 8). The small deviations
from stability of the gradients over time were likely due to
sampling issues combined with effects of ability differentiation
(i.e., stronger decline in the academic-track sample). In
addition, as mentioned earlier the test assessing the figural
content facet of fluid intelligence seemed to be affected by
fidelity problems at the lower end of the ability distribution;
due to the sampling procedure these issues were especially
prominent in Grade 8.
The lack of clear evidence for age-related differentiation
for both gf and gc was surprising given that the learning
environments apparently become increasingly different
for individuals in the course of secondary education. On the
other hand, our findings coincide with previous results
showing a relatively high stability of interindividual
differences in primarily knowledge-based achievement
tests even during primary education (Weinert & Helmke,
1998). The main reason for this could be that all participants shared a learning environment thatdespite the
mentioned differenceswas comparatively homogeneous,
as they all attended regular school classes. Moreover,
secondary education in Germany provides only limited
opportunities for individualized courses or subject choices.
Thus, differences in domain-specific interest, motivation,
or intellectual investment traits such as typical intellectual
engagement that are all assumed to steer differences in
knowledge acquisition (von Stumm & Ackerman, 2012)
may not be decisive within the homogeneous scholastic
learning environment. More precisely, these traits may
drive knowledge acquisition in general, but not (yet)
specialization. More pervasive specialization begins only
at the end of compulsory education (e.g., with important
course choices in Grade 11 and above in academic-track
schools or by starting vocational training) and in adulthood
(e.g., by acquiring specialized occupational knowledge;
Ackerman, 1996, 2000).
Recently, Kan, Kievit, Dolan, and van der Maas (2011)
questioned the status of gc as a psychological capacity, that is,
as a causal variable by arguing. The authors argued that a causal
relationship requires that latent variables can be distinguished
from their indicators. Through the reanalysis of a data set
from the Human Cognitive Abilities Project (McGrew, 2009) they
demonstrated that gc can be conceptualized as a purely
statistical entity. However, their line of argumentation relies
on rather strong assumptions: First, the perspective that gf is
the decisive source of the correlations between crystallized
abilities is problematic since gc has been shown to also depend

U. Schroeders et al. / Intelligence 48 (2015) 1529

on additional sources such as investment traits (von Stumm &

Ackerman, 2012) and interests (Su, Rounds, & Armstrong,
2009). Kan et al. (2011, p. 296) also acknowledged the fact that
investment theory is broader in scope than the investment
hypothesis, nevertheless their conclusions were based on this
assumption. Second, we think that neither the presence nor
the absence of transfer of training across domains can be
interpreted as proof or counterproof of gc being a psychological
construct. In this study, we conceptualize gc as behavior
domain scores (Markus & Borsboom, 2013), which do not
necessarily imply a causal interpretation (McDonald, 2003;
Schipolowski, Wilhelm, & Schroeders, 2014).


approach would amount to producing as many different

tests as there are occupations, etc.. Naturally, even the
most comprehensive knowledge assessment would fall
short of this demand. We question if an assessment of gc in
an extensive longitudinal design can be conducted with
fixed item sets because the knowledge considered relevant
for gc is not only culture-specific, but also changing with
time; this problem is more relevant for some domains
(e.g., technology) than for others. Nevertheless, we concur
with Ackerman (2000) that in spite of these hurdles, the
assessment of specialized knowledge is a prerequisite for
understanding intellectual development and changes in
the structure of intelligence across the life span.

4.3. Limitations and future research

Appendix A. Supplementary data
A limitation of the present study is that our analyses were
based on cross-sectional data, rendering it impossible to make
definite statements about the developmental perspective.
While we assume that cohort effects were a minor problem,
replication of the results with longitudinal data is pending.
Second, although the sample can be considered representative
for students in German general-education schools and the
overall sample size was very large, the sample characteristics in
Grade 8 deviated strongly from the population. Third, it was not
possible to consider all types and tracks of vocational schools
during sampling, although more and more students are
dropping out of general education after Grade 9.
Future research could extend the study in multiple ways: To
draw a more complete picture, the extension to a larger age
range in a longitudinal design would be desirable; more
precisely, to begin assessment in preschool and to extend it
into adulthood when according to Cattell (1971, p. 121), the gc
factor may extend into Protean forms. There is empirical
evidence that the predictive power of reasoning and knowledge is not constant over the life span, but that in the course of
educational training domain-specific knowledge becomes
more and more important for the prediction of scholastic
(Baumert, Ldtke, Trautwein, & Brunner, 2009), academic
(Kuncel, Hezlett, & Ones, 2001), and vocational success
(Ackerman, 2000). The question as to what extent changes in
the factor structure are responsible for this finding is an
objective for future research. In our study with high school
students, the learning environments were comparatively
homogeneous. The diversification of learning opportunities in
adulthood is presumably accompanied by a substantial differentiation of knowledge structures (see Ackerman & Rolfhus,
1999; Rolfhus & Ackerman, 1999) that may lead to factors
being more predictive for vocational success. To gain further
insights into the underlying mechanisms of knowledge
acquisition, it would be interesting to supplement the assessment with relevant covariates such as domain-specific interest
or investment traits such as typical intellectual engagement.
From a psychological assessment perspective an extension to a much larger age range would require a very large
item pool that covers the huge differences in cognitive
abilities from childhood to middle adulthood. This is a
surmountable obstacle for the assessment of gf, but
especially challenging for gc since it would be necessary
to capture the breadth and depth of (specialized) knowledge in adulthood, including but not limited to occupational knowledge. As Cattell (1971, p. 121) put it, such an

Supplementary data to this article can be found online at

Ackerman, P.L. (1996). A theory of adult intellectual development: Process,
personality, interests, and knowledge. Intelligence, 22, 227257. http://
Ackerman, P.L. (2000). Domain-specific knowledge as the dark matter of adult
intelligence. The Journals of Gerontology Series B: Psychological Sciences and
Social Sciences, 55, 6984. http://dx.doi.org/10.1093/geronb/55.2.P69.
Ackerman, P.L. (2008). Knowledge and cognitive aging. In F.I.M. Craik, & T.A.
Salthouse (Eds.), The handbook of aging and cognition (pp. 433489) (3rd
ed.). New York, NY: Psychology Press.
Ackerman, P.L., & Rolfhus, E.L. (1999). The locus of adult intelligence:
Knowledge, abilities, and nonability traits. Psychology and Aging, 14,
314330. http://dx.doi.org/10.1037/0882-7974.14.2.314.
Baltes, P.B. (1997). On the incomplete architecture of human ontogeny:
Selection, optimization, and compensation as foundation of developmental
theory. American Psychologist, 52, 366380. http://dx.doi.org/10.1037/
Baltes, P.B., Cornelius, S.W., Spiro, A., Nesselroade, J.R., & Willis, S.L. (1980).
Integration versus differentiation of fluid/crystallized intelligence in old
age. Developmental Psychology, 16, 625635.
Baltes, P.B., Staudinger, U.M., & Lindenberger, U. (1999). Lifespan psychology:
Theory and application to intellectual functioning. Annual Review of
Psychology, 50, 471507. http://dx.doi.org/10.1146/annurev.psych.50.1.
Baumert, J., Ldtke, O., Trautwein, U., & Brunner, M. (2009). Large-scale student
assessment studies measure the results of processes of knowledge acquisition: Evidence in support of the distinction between intelligence and student
achievement. Educational Research Review, 4, 165176. http://dx.doi.org/10.
Baumert, J., Stanat, P., & Watermann, R. (2006). Schulstruktur und die
Entstehung differentieller Lern- und Entwicklungsmilieus [School structure and the emergence of differential learning environments]. In J.
Baumert, P. Stanat, & R. Watermann (Eds.), Herkunftsbedingte Disparitten
im Bildungswesen: Differenzielle Bildungsprozesse und Probleme der
Verteilungsgerechtigkeit. Wiesbaden: Springer-Verlag.
Becker, M., Ldtke, O., Trautwein, U., Kller, O., & Baumert, J. (2012). The
differential effects of school tracking on psychometric intelligence:
Do academic-track schools make students smarter? Journal of
Educational Psychology, 104, 682699. http://dx.doi.org/10.1037/
Beier, M.E., & Ackerman, P.L. (2001). Current-events knowledge in adults: An
investigation of age, intelligence, and nonability determinants. Psychology
and Aging, 16, 615628. http://dx.doi.org/10.1037/0882-7974.16.4.615.
Bickley, P.G., Keith, T.Z., & Wolfle, L.M. (1995). The three-stratum theory of
cognitive abilities: Test of the structure of intelligence across the life span.
Intelligence, 20, 309328. http://dx.doi.org/10.1016/0160-2896(95)90013-6.
Bond, T.G., & Fox, C.M. (2001). Applying the Rasch model: Fundamental
measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum.
Cahan, S., & Cohen, N. (1989). Age versus schooling effects on intelligence
development. Child Development, 60, 12391249. http://dx.doi.org/10.
Carroll, J.B. (1993). Human cognitive abilities: A survey of factor-analytic studies.
New York: Cambridge University Press.


U. Schroeders et al. / Intelligence 48 (2015) 1529

Cattell, R.B. (1943). The measurement of adult intelligence. Psychological

Bulletin, 40, 153193. http://dx.doi.org/10.1037/h0059973.
Cattell, R.B. (1971). Abilities: Their structure, growth, and action. Boston:
Houghton Mifflin.
Ceci, S.J. (1991). How much does schooling influence general intelligence and
its cognitive components? A reassessment of the evidence. Developmental
Psychology, 27, 703722. http://dx.doi.org/10.1037/0012-1649.27.5.703.
Cudeck, R., & MacCallum, R.C. (2007). Factor analysis at 100: Historical
developments and future directions (1st ed.). Mahwah, N.J.: Routledge.
Escorial, S., Juan-Espinosa, M., Garca, L.F., Rebollo, I., & Colom, R. (2003). Does g
variance change in adulthood? Testing the age de-differentiation hypothesis across sex. Personality and Individual Differences, 34, 15251532. http://
Garrett, H.E. (1946). A developmental theory of intelligence. American
Psychologist, 1, 372378. http://dx.doi.org/10.1037/h0056380.
Gignac, G.E. (2014). Dynamic mutualism versus g factor theory: An
empirical test. Intelligence, 42, 8997. http://dx.doi.org/10.1016/j.
Gonzalez, E., & Rutkowski, L. (2010). Principles of multiple matrix booklet
designs and parameter recovery in large-scale assessments. In D. Hastedt, &
D. von Davier (Eds.), IERI monograph series: Issues and methodologies in
large-scale assessments, Vol. 3. (pp. 125156). Hamburg: IEA-ETS Research
Grimm, K.J., & Widaman, K.F. (2012). Construct validity. In H. Cooper, P.M.
Camic, D.L. Long, A.T. Panter, D. Rindskopf, & K.J. Sher (Eds.), APA handbook
of research methods in psychology. Foundations, planning, measures, and
psychometrics, vol. 1. (pp. 621642). Washington: American Psychological
Gustafsson, J. -E. (2008). Schooling and intelligence: effects of track of study on
level and profile of cognitive abilities. In P.C. Kyllonen, R.D. Roberts, & L.
Stankov (Eds.), Extending intelligence: Enhancement and new constructs
(pp. 3759). New York: LEA.
Hildebrandt, A., Sommer, W., Herzmann, G., & Wilhelm, O. (2010). Structural
invariance and age-related performance differences in face cognition.
Psychology and Aging, 25, 794810. http://dx.doi.org/10.1037/a0019774.
Hildebrandt, A., Wilhelm, O., Herzmann, G., & Sommer, W. (2013). Face and
object cognition across adult age. Psychology and Aging, 28, 243248. http://
Hildebrandt, A., Wilhelm, O., & Robitzsch, A. (2009). Complementary and
competing factor analytic approaches for the investigation of measurement
invariance. Review of Psychology, 16, 87102.
Horn, J.L. (2008). Spearman, g, expertise, and the nature of human cognitive
capability. In P.C. Kyllonen, R.D. Roberts, & L. Stankov (Eds.), Extending
intelligence: Enhancement and new constructs (pp. 185230). New York: LEA.
Horn, J.L., & Cattell, R.B. (1967). Age differences in fluid and crystallized
intelligence. Acta Psychologica, 26, 107129. http://dx.doi.org/10.1016/
Horn, J.L., & Donaldson, G. (1980). Cognitive development in adulthood. In O.G.
Brim, & J. Kagan (Eds.), Constancy and change in human development
(pp. 445529). Cambridge, MA: Harvard University Press.
Horn, J.L., & Hofer, S. (1992). Major abilities and development in adult period. In
R.J. Sternberg, & C.A. Berg (Eds.), Intellectual development (pp. 4499).
Cambridge; New York: Cambridge University Press.
Horn, J.L., & Noll, J. (1997). Human cognitive capabilities: GfGc theory. In D.P.
Flanagan, J.L. Genshaft, & P.L. Harrison (Eds.), Contemporary intellectual
assessment: Theories, tests and issues (pp. 5391). New York: Guilford Press.
Hu, L., & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance
structure analysis: Conventional criteria versus new alternatives. Structural
Equation Modeling: A Multidisciplinary Journal, 6, 155. http://dx.doi.org/10.
Hlr, G., Wilhelm, O., & Robitzsch, A. (2011). Intelligence differentiation in
early childhood. Journal of Individual Differences, 32, 170179. http://dx.doi.
Hunt, E. (2008). Improving intelligence: What's the difference from education?
In P.C. Kyllonen, R.D. Roberts, & L. Stankov (Eds.), Extending intelligence:
Enhancement and new constructs (pp. 1535). New York: LEA.
Juan-Espinosa, M., Garca, L.F., Colom, R., & Abad, F.J. (2000). Testing the
age related differentiation hypothesis through the Wechsler's scales.
Personality and Individual Differences, 29, 10691075. http://dx.doi.org/10.
Kan, K. -J., Kievit, R.A., Dolan, C., & van der Maas, H. (2011). On the interpretation of
the CHC factor Gc. Intelligence, 39, 292302. http://dx.doi.org/10.1016/j.intell.
Klein, A., & Moosbrugger, H. (2000). Maximum likelihood estimation of
latent interaction effects with the LMS method. Psychometrika, 65,
Kuncel, N.R., Hezlett, S.A., & Ones, D.S. (2001). A comprehensive meta-analysis
of the predictive validity of the graduate record examinations: Implications
for graduate student selection and performance. Psychological Bulletin, 127,
162181. http://dx.doi.org/10.1037//0033-2909.127.1.162.

Kyllonen, P.C., & Christal, R.E. (1990). Reasoning ability is (little more than)
working-memory capacity?! Intelligence, 14, 389433. http://dx.doi.
Li, S.C., Lindenberger, U., Hommel, B., Aschersleben, G., Prinz, W., & Baltes,
P.B. (2004). Transformations in the couplings among intellectual
abilities and constituent cognitive processes across the life span.
Psychological Science, 15, 155163. http://dx.doi.org/10.1111/j.09567976.2004.01503003.x.
Little, T.D., Slegers, D.W., & Card, N.A. (2006). A non-arbitrary method of
identifying and scaling latent variables in SEM and MACS models.
Structural Equation Modeling, 13, 5972. http://dx.doi.org/10.1207/
Markus, K.A., & Borsboom, D. (2013). Reflective measurement models, behavior
domains, and common causes. New Ideas in Psychology, 31, 5464. http://
McDonald, R.P. (2003). Behavior domains in theory and in practice. Alberta
Journal of Educational Research, 49, 212230.
McGrew, K.S. (2009). CHC theory and the human cognitive abilities project:
Standing on the shoulders of the giants of psychometric intelligence
research. Intelligence, 37, 110. http://dx.doi.org/10.1016/j.intell.
Molenaar, D., Dolan, C.V., & Verhelst, N.D. (2010). Testing and modelling
non-normality within the one-factor model. British Journal of
Mathematical and Statistical Psychology, 63, 293317. http://dx.doi.
Molenaar, D., Dolan, C.V., Wicherts, J.M., & van der Maas, H.L.J. (2010). Modeling
differentiation of cognitive abilities within the higher-order factor model
using moderated factor analysis. Intelligence, 38, 611624. http://dx.doi.
Murray, A.L., Dixon, H., & Johnson, W. (2013). Spearman's law of diminishing
returns: A statistical artifact? Intelligence, 41, 439451. http://dx.doi.org/10.
Muthn, L.K., & Muthn, B.O. (19982014). Mplus user's guide (7th ed.). Los
Angeles, CA: Muthn & Muthn.
R Development Core Team (2014). R: A language and environment for
statistical computing (version 3.0.3) [Computer software]. Retrieved from.
Rolfhus, E.L., & Ackerman, P.L. (1999). Assessing individual differences in
knowledge: Knowledge, intelligence, and related traits. Journal of
Educational Psychology, 91, 511526. http://dx.doi.org/10.1037/
Schipolowski, S., Wilhelm, O., & Schroeders, U. (2014). On the nature of
crystallized intelligence: The relationship between verbal ability and
factual knowledge. Intelligence, 46, 156168. http://dx.doi.org/10.1016/j.
Su, R., Rounds, J., & Armstrong, P.I. (2009). Men and things, women and people:
A meta-analysis of sex differences in interests. Psychological Bulletin, 135,
859884. http://dx.doi.org/10.1037/a0017364.
Tellegen, P.J., Laros, J.A., & Petermann, F. (2007). SnijdersOomen Nonverbaler
Intelligenztest von 2 1/2 bis 7 Jahren (SONR 2 1/2-7). Handanweisung und
deutsche Normen [SnijdersOomen nonverbal intelligence test for 2 1/2- to
7-year-olds. Manual and German standardization] (2nd ed.). Gttingen:
Tideman, E., & Gustafsson, J. -E. (2004). Age-related differentiation of cognitive
abilities in ages 37. Personality and Individual Differences, 36, 19651974.
Tucker-Drob, E.M. (2009). Differentiation of cognitive abilities across the life
span. Developmental Psychology, 45, 10971118. http://dx.doi.org/10.1037/
Tucker-Drob, E.M., & Briley, D.A. (2014). Continuity of genetic and environmental
influences on cognition across the life span: A meta-analysis of longitudinal
twin and adoption studies. Psychological Bulletin, 140, 949979. http://dx.doi.
Tucker-Drob, E.M., & Salthouse, T.A. (2008). Adult age trends in the
relations among cognitive abilities. Psychology and Aging, 23, 453460.
Vandenberg, R.J., & Lance, C.E. (2000). A review and synthesis of the
measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3,
470. http://dx.doi.org/10.1177/109442810031002.
von Stumm, S., & Ackerman, P.L. (2012). Investment and intellect: A review and
meta-analysis. Psychological Bulletin, 139, 841869. http://dx.doi.org/10.
Warm, T.A. (1989). Weighted likelihood estimation of ability in item
response theory. Psychometrika, 54, 427450. http://dx.doi.org/10.1007/
Weinert, F.E., & Helmke, A. (1998). The neglected role of individual
differences in theoretical models of cognitive development. Learning
and Instruction, 8, 309323. http://dx.doi.org/10.1016/S09594752(97)00024-8.

U. Schroeders et al. / Intelligence 48 (2015) 1529

Wilhelm, O. (2004). Measuring reasoning ability. In O. Wilhelm, & R.W. Engle
(Eds.), Handbook of understanding and measuring intelligence (pp. 373392).
Thousand Oaks, CA: Sage Publications.
Wilhelm, O., Schroeders, U., & Schipolowski, S. (2014s). Berliner Test zur
Erfassung fluider und kristalliner Intelligenz fr die 8. bis 10. Jahrgangsstufe
[Berlin test of fluid and crystallized intelligence for grades 810]. Gttingen:
Hogrefe (2014).


Woodcock, R.W., McGrew, K.S., & Mather, N. (2001). Examiner's manual.

Woodcock-Johnson III Tests of Cognitive Ability. Itasca, IL: Riverside
Wu, M.L., Adams, R.J., Wilson, M.R., & Haldane, S.A. (2007). ACER ConQuest
Version 2: Generalised item response modelling software. Camberwell:
Australian Council for Educational Research.