Abdel Aty2000 PDF

Accident Analysis and Prevention 32 (2000) 633 642
www.elsevier.com/locate/aap
Modeling traffic accident occurrence and involvement

Mohamed A. Abdel-Aty *, A. Essam Radwan
Department of Ci6il Engineering, Uni6ersity of Central Florida, Orlando, FL 32816 -2450, USA
Received 26 February 1999; received in revised form 10 June 1999; accepted 24 June 1999
Abstract
The Negative Binomial modeling technique was used to model the frequency of accident occurrence and involvement. Accident
data over a period of 3 years, accounting for 1606 accidents on a principal arterial in Central Florida, were used to estimate the
model. The model illustrated the significance of the Annual Average Daily Traffic (AADT), degree of horizontal curvature, lane,
shoulder and median widths, urban/rural, and the sections length, on the frequency of accident occurrence. Several Negative
Binomial models of the frequency of accident involvement were also developed to account for the demographic characteristics of
the driver (age and gender). The results showed that heavy traffic volume, speeding, narrow lane width, larger number of lanes,
urban roadway sections, narrow shoulder width and reduced median width increase the likelihood for accident involvement.
Subsequent elasticity computations identified the relative importance of the variables included in the models. Female drivers
experience more accidents than male drivers in heavy traffic volume, reduced median width, narrow lane width, and larger number
of lanes. Male drivers have greater tendency to be involved in traffic accidents while speeding. The models also indicated that
young and older drivers experience more accidents than middle aged drivers in heavy traffic volume, and reduced shoulder and
median widths. Younger drivers have a greater tendency of being involved in accidents on roadway curves and while speeding.
2000 Elsevier Science Ltd. All rights reserved.
Keywords: Accident occurrence; Accident involvement; Negative Binomial models; Roadway geometric characteristics; Driver characteristics;
Traffic safety
1. Introduction
1.1. Accident prediction methodology
Safety and efficiency are the two primary goals of

transportation engineering. The effort that public agencies put into reducing traffic accidents is highly justifiable. Traffic accidents place a huge financial burden
on society. Two major factors usually play an important role in traffic accident occurrence. The first is
related to the driver, and the second is related to the
roadway design. Many of the important road user
factors in traffic safety depend strongly on the gender
and the age of the driver (Miaou and Lum, 1993). This
study investigates the factors that affect accident occurrence on highway segments, and also the variables that
affect the accident involvement of the different driver
gender and age groups.
Researchers have attempted three approaches to relate accidents to geometric characteristics and traffic
related explanatory variables: Multiple Linear regression, Poisson regression and Negative Binomial regression. However, recent research shows that multiple
linear regression suffers some undesirable statistical
properties when applied to accident analysis, some of
which have been discussed by Jovanis and Chang
(1986). To overcome the problems associated with multiple linear regression models, Jovanis and Chang proposed Poisson regression for modeling accident
frequencies. They argued that Poisson regression is a
superior alternative to conventional linear regression
for applications related to highway safety. In addition,
it could be used with generally smaller sample sizes
than linear regression.
Joshua and Garber (1990) studied the relationship
between highway geometric factors and truck accidents
in Virginia using both linear and Poisson regression
* Corresponding author. Tel.: + 1-407-8235657; fax: + 1-4078233315.

E-mail address: mabdel@mail.ucf.edu (M.A. Abdel-Aty)
0001-4575/00/$ - see front matter 2000 Elsevier Science Ltd. All rights reserved.
PII: S 0 0 0 1 - 4 5 7 5 ( 9 9 ) 0 0 0 9 4 - 9
634
M.A. Abdel-Aty, A.E. Radwan / Accident Analysis and Pre6ention 32 (2000) 633642
models. They also concluded that linear regression techniques used in their research did not describe the relationship between truck accidents and the independent
variables adequately but that the Poisson models did.
Miaou et al., (1992) used a Poisson regression model
to establish the empirical relationship between truck
accidents and highway geometric on a rural interstate
in North Carolina. The estimated Poisson model suggested that Average Annual Daily Traffic (AADT) per
lane, horizontal curvature, and vertical gradient were
significantly correlated with truck accident likelihood.
During their work, a limitation of the Poisson model
was uncovered. Using the Poisson model necessitates
that the mean and variance of the accident frequency
variable (the dependent variable) be equal. In most
accident data, the variance of the accident frequency
exceeds the mean and, in such case, the data would be
over dispersed. They discussed that, although over dispersion was present, it did not change the conclusion
about the relationship between truck accidents and the
examined traffic and highway geometric design variables. However, they did suggest a correction to overcome the problem of over dispersion.
A follow-up study was completed by Miaou and
Lum (1993). While this study was similar in scope to
the first, the main purpose was to evaluate the statistical properties of two conventional linear regression
models and two Poisson regression models. The models
studied by Miaou and Lum were comparable to those
developed in previous studies to explore the relationship between vehicle accidents and highway geometric
design. The four types of models considered were (1) an
additive linear regression model; (2) a multiplicative
linear regression model; (3) a multiplicative Poisson
regression with exponential function and; (4) a multiplicative Poisson regression with non-exponential rate
function. The authors found that Poisson regression
models outperformed linear regression models. Furthermore, the Poisson regression model with the exponential rate function was the favored model. Miaou and
Lum also attempted to address over dispersion in their
frequency data. When over dispersion existed in the
data and Poisson model is used, the variance of the
estimated model coefficients tended to be underestimated. They attempted to relax the Poisson constraint
of the mean being equal to the variance by using
Wedderburns over dispersion parameter. They found
that with such over dispersed data, using the Poisson
model may not be appropriate for making probabilistic
statements about vehicle accidents because the model
may under or overestimate the likelihood of occurrence.
Because of the over dispersion difficulties, the authors
suggested the use of a more general probability distribution such as the Negative Binomial.
Miaou (1994) studied the relationship between highway geometric and accidents using Negative Binomial
regression. In this study, Miaou evaluated the performance of the Poisson regression, zero-inflated Poisson
regression, and Negative Binomial regression. Maximum likelihood was used to estimate the coefficients of
the models. As an initial step in developing a model,
Miaou suggested that the Poisson regression model
should be used to establish the relationship between
highway geometric and accidents. If over dispersion
exists and is found to be moderate or high, both the
Negative Binomial and zero inflated Poisson regression
models can be explored. He suggested that the zeroinflated Poisson regression model appears to be appropriate when the data exhibits a high number of zero
frequency observations.
Ivan and OMara (1997) applied Poisson regression
for the prediction of traffic accidents using the Connecticut Department of Transportations accident data.
Results of the model suggest that the posted speed
limit, the annual average daily traffic of the highway
are critical accident prediction variables leading to the
conclusion that the Poisson regression model is preferred than the linear regression model.
Shankar et al. (1995) used both the Poisson and
Negative Binomial distributions (Poisson when the data
was not significantly over dispersed and negative binomial when it was) to evaluate the effects of roadway
geometrics and environmental factors on rural accident
frequency in Washington State. In addition to the
overall accident frequency on sections of highway, they
modeled the frequency of specific types of accidents.
The authors concluded that separate regression models
for a specific type of accidents would have a greater
explanatory power, and that this was statistically
confirmed.
Poch and Mannering (1996) applied the Negative
Binomial regression to predict the accident frequency
on sections of principal arterials in Washington State.
They concluded that the Negative Binomial regression
is a powerful predictive tool and one that should be
increasingly applied in future accident frequency
studies.
Fridstrom et al. (1995) measured the contribution of
randomness, exposure, weather, and daylight to the
variation in road accident counts. They stated that the
formulation of the generalized Poisson regression models for accident counts allows for the decomposition of
the total variation in the dependent variable into one
part due to normal random (inexplicable) variation,
and another part due to systematic, causal factors.
They concluded also that the simple Poisson regression
models can come very close to explaining almost all the
systematic variation in a cross-section/time series accident data set. However, when the events analyzed are
not independent, it would be strongly advisable to use
Negative Binomial rather than pure Poisson specification, as certain amount of over dispersion must always
be expected in such cases.
In summary, from a methodological perspective, previous researchers have shown that multiple linear regression is not a suitable method for modeling the
relationship between accident occurrence, and the geometric and traffic factors. Poisson regression, and in
case of over dispersion, Negative Binomial regression
are more appropriate approaches for accident
modeling.
1.2. Factors affecting highway accidents

A number of studies have attempted to quantify the
effects of highway geometric design variables and traffic
volume on accident rates or frequencies. For example,
Jovanis and Chang (1986) estimated Poisson regression
models using accident, travel mileage, and environmental data. Their models revealed that accident occurrence
increases with the vehicle miles of travel (VMT). Agent
and Deen (1975) attempted to identify high-accident
locations with respect to the functional type and geometry of the highway, using accident and volume data
from rural highways in Kentucky collected from 1970
through 1972. They found that four-lane undivided
highways had the highest accident, injury and fatality
rates. Also, two-lane highways had the highest percentage of accidents that involved curvature.
Milton and Mannering (1996) attempted to develop a
model for an arterial street in Washington State. They
found that narrow shoulder width, sharp horizontal
curve, reduced lane width and high volume of traffic all
have a potential effect on increasing accident frequency.
They also found that the number of lanes is a highly
significant factor in predicting accident frequency.
More lanes tend to increase accident frequency.
Knuiman et al. (1993) studied the effect of median
width on accident rates using a Negative Binomial
regression model. For a median without barrier, they
found that the accident rate declines rapidly when
median width exceeded about 7.6 m (25 ft). The decreasing trend seemed to become level at median widths
of approximately 18.9 24.4 m (60 80 ft).
Several studies have presented accident relationships
for design elements of horizontal curves. In general,
accident rate increases as a function of increasing degree of curvature, although the relationship is affected
by other variables, including the lane and shoulder
widths, roadside design, and the length of curve
(McGee et al., 1995).
A common shortfall of many of the previous studies
is that they did not consider the effect of the drivers
characteristics. Sabey and Taylor (1980) showed that
human factors are involved in around 95 percent of all
traffic accidents, either alone or in combination with
other factors. If motorists were cognizant of every
geometric deficiency encountered and warned to be
careful of these deficiencies, accident potential would be
635
reduced. However, because this is an impossible task,

correcting geometric deficiencies is an important step
toward reducing accidents.
1.3. Research objecti6e

The primary objective of this research was to develop
a mathematical model that explains the relationship
between the frequency of accidents and highway geometric and traffic characteristics. Other objectives include developing models of accident involvement for
different gender and age groups using the Negative
Binomial regression technique. Previous research have
shown significant differences in accident involvement
between the different gender and age groups (see for
example Abdel-Aty et al., 1999a,b; Mostofa, 1998;
Chen, 1997). An elasticity method was applied to the
developed models in an attempt to identify the most
critical variables that contribute to accident occurrence
and involvement and their relative significance.
2. Data collection
In order to develop a mathematical model that correlates accident frequencies to the roadway geometric and
traffic characteristics, one needs to select a roadway
that posses a wide variety of geometric and traffic
characteristics. The goal of this data collection exercise
is to divide this roadway into segments with homogenous characteristics. After reviewing several roadways
in Central Florida, it was decided that State Road 50
(SR 50) is most appropriate for this task.
SR 50 is a 227 km major principal arterial that
connects the east and west coasts of Central Florida
passing through the center of Orlando. Parts of SR 50
are rural, and the number of lanes varies between 2, 4
and 6 lanes. This roadway also experiences high accident rates, and had very limited changes during the
3-year study period (199294). This arterial is also long
enough to produce an adequate number of segments to
develop the model.
Traffic and roadway data were obtained from Roadway Characteristics Inventory (RCI) database maintained by the Florida Department of Transportation
(FDOT). This database may be used to process, store,
and report information that describe all of the states
highway system in Florida. Information on roadways
include geometric characteristics such as horizontal
curves, shoulder widths, median widths, and traffic
characteristics such as traffic volumes and speed limits.
SR 50 was divided into 566 highway segments defined
by any change in the geometric and/or roadway variables (e.g. a new section would be identified when
median changes from 3 to 6 m). Therefore, each highway segment is uniform with respect to all the possible
636
geometric and traffic features recorded by the FDOT

database. The data included the following variables;
AADT, degree of horizontal curvature, shoulder type,
divided/undivided, rural/urban classification, posted
speed limit, number of lanes, road surface and shoulder
types, and lane, median, and shoulder widths. Data
extracted from the RCI system was coded into a new
database based on the identified sections.
Accident data was obtained from an electronic accident database for three years from 1992 to 1994. This is
a relational database maintained by the Florida Department of Highway Safety and Motor Vehicles
(DHSMV). The DHSMVs accident database is the
most complete accident data available in the state of
Florida. DHSMV assembles all the accident reports in
the state (from counties, cities, police departments,
etc.). In all, 1606 accidents were available for SR 50.
This relational database contains specific information
about each accident including the driver characteristics.
Finally the accident data and the database developed
from the RCI system were merged based on the milepost of each accident and the beginning and ending
milepost of each segment. The resulting database contained information about the accidents occurring on
each segment together with the geometric and traffic
characteristics of this segment.
3. Modeling methodology
The Poisson regression methodology was initially
attempted. However, the Poisson distribution was rejected because the mean and variance of the dependent
variables are different, indicating substantial over dispersion in the data. Such over dispersion suggests a
Negative Binomial model. The Negative Binomial modeling approach is an extension of the Poisson regression
methodology and allows the variance of the process to
differ from the mean. The Negative Binomial model
arises from the Poisson model by specifying:
lnli =bxi +o
(1)
Where, li is the expected mean number of accidents on

highway section i; b is the vector representing parameters to be estimated; xi is the vector representing the
explanatory variables on highway segment i; o is the
error term, where exp(o) has a gamma distribution with
mean 1 and variance a 2.
The resulting probability distribution is as follows
exp[li exp(o)]l ni i
Prob (ni o)=
.
(2)
ni !
Where, ni is the number of accidents on highway section i over a time period t.
Integrating o out of this expression produces the
unconditional distribution of ni. The formulation of this
distribution is:
Prob (ni )=
G(u + ni ) u
u (1ui )ni
(G(u)ni !) i
(3)
Where ui = u/(u +li ) and u= 1/a.

The Negative Binomial model can be estimated by
standard maximum likelihood methods. The corresponding likelihood function is:
G(u + ni ) u
u i (1ui )ni
i = 1 G(u)ni !
N
L(li )= 5
(4)
Where N is the total number of highway sections.

This function is maximized to obtain coefficient estimates for b and a. Compared with Poisson model, this
model has an additional parameter a, such that
Var[ni ]= E[ni ]{1+aE[ni ]}
(5)
The choice between the Negative Binomial model and

the Poisson model can largely be determined by the
statistical significance of the estimated coefficient a. If a
is not significantly different from zero (as measured by
t-statistics) the Negative Binomial model simply reduces to a Poisson regression with Var[ni ]= E[ni ]. If a
is significantly different from zero, then the Negative
Binomial is the correct approach.
It worth mentioning that if a model has several
variables, there is a possibility that some of the explanatory variables would be related causing the property
known
as
multicollinearity.
Although
multicollinearity would not cause the estimators to be
biased, inefficient, or inconsistent, and does not affect
the forecasting performance of the model (Ramanathan, 1995), it might increase the standard errors
of the coefficients, thus making coefficients less significant. Multicollinearity could be identified by low values
of the t-statistics, high value for correlation coefficients
between variables, and the sensitivity of the estimated
coefficients to specification (Ramanathan, 1995). Non
of these symptoms were identified in the models presented in this paper. Pairwise correlations among explanatory variables did not have high values, and there
was no observation that the estimated coefficients were
drastically altered when variables were added or
dropped. Furthermore, the coefficients in the estimated
models were significant and had meaningful signs and
magnitudes. Therefore, there is no need to be concerned
about multicollinearity.
3.1. Goodness of fit

In order to decide which subset of independent variables should be included in an accident estimation
model, AIC (Akaikes information criterion) was used.
AIC identifies the best approximating model among a
class of competing models with different numbers of
parameters. AIC is defined as follows:
AIC= 2*ML+2*k
(6)
Where ML is the maximum L(b) and k is the number

of variables in the model.
The smaller the value of AIC, the better the model.
Starting with full set of independent variables and their
interactions, a stepwise procedure has been used to
select the best model based on minimizing the AIC
value.
To measure the overall goodness-of-fit statistics, the
deviance value 2(LL(b) LL(0)) which follows a x 2
distribution has been used for testing overall goodness
of fit as suggested by Agresti (1990). The log-likelihood
ratio r 2(=1 LL(b)/LL(0)) value of the model
(analogous to R-square test in Linear regression models)1, which is an indication of the additional variation
in accident frequency explained by the obtained model
to the constant term, was also used (for a thorough
discussion of the goodness-of-fit measures for generalized Poisson regression models, the reader is referred to
Fridstrom et al., 1995).
3.2. Relati6e significance of 6ariables

A simple plot of the mean number of accidents
estimated using the Negative Binomial regression models against the different variables may be thought of as
a method of evaluating model quality. However, the
slope of this type of plotting does not indicate the
relative effects of variables with respect to the accident
frequency as well as accident involvement (Shankar et
al., 1995). To resolve this issue, one may apply the
partial derivative of E(y) or l with respect to the
independent variables.
(l (exp(xb) (xb
=
= exp (xb)b =lb
(x
(b
(x
(7)
Since the Negative Binomial regression is nonlinear, the

value of the marginal effect depends on both the coefficient for independent variable x and the expected value
of y. The larger the value of l or E(y), the larger the
rate of change in E(y) that is the probability of accidents. So, if we conclude anything regarding relative
effects of the independent variables from plotting accident frequencies versus an independent variable, it
would be misleading. As an example, if we plot the
accident involvement of male and female drivers
against the median width, any conclusion drawn from
the slope of the plotted line regarding the relative effect
of the median width on these two different groups of
drivers, would be misleading. The fact that the slope of
the line is influenced by the accident involvement of
male and female drivers supports the decision of not
using this method of assessment.
1
There are a variety of measures which are analogous to R 2.

However, none of them produce a true R 2 except under restricted
conditions.
637
To overcome this problem and to examine the true

relative effects of the variables included in the models,
Shankar et al. (1995) suggested the computation of an
elasticity parameter, which would measure the true
relative effect of the variable on accident frequency. In
general, elasticity is computed as,
E(y)=
(l x
(x l
(8)
Where l is the mean number of accidents, x are the

explanatory variables.
4. Estimation results
4.1. Modeling accident frequency

The Negative Binomial results for arterial accident
frequency are presented in Table 1. This table shows
that all the variables have the expected sign (with a
positive sign indicating an increase in the accident
frequency and a negative sign indicating a decrease).
The deviance value (2(LL((b)LL(0)) which follows
x 2 distribution has been used for testing the overall
goodness of fit. The x 2 test of the deviance value (266,
and df=7), rejects the null hypothesis that the obtained model has explanatory power equal to that of
the model with the constant term only. Therefore, the
model shows an overall good statistical fit. The r 2
value of the model, which is an indication of the
additional variation in accident frequency explained by
the model to the constant term alone, is relatively low.
This low value is usual for accident estimation because
there are many variables (e.g. human factors) which are
Table 1
Negative binomial model of accident frequency
Independent variable
Constant
Log of the section length (km)
Log of AADT per lane
Degree of horizontal curve
(degrees/100 m arc)
Shoulder width (m)
Median width (m)
Lane width (m)/no. of lanes
Urban section dummy variable (1
if urban, 0 otherwise)
Over dispersion parameter (a)
Summary statistics
Number of sections
Log-likelihood at zero
Log-likelihood at covergence
r 2 =1LL(b)/LL(0)
2(LL((b)LL(0))
Coefficient
t-statistics
4.182
0.325
0.622
0.124
3.78
7.62
5.59
4.46
0.122
0.024
0.364
0.302
2.63
1.58
2.09
3.78
0.235
5.45
566
1210
1077
0.11
266
638
Table 2
Elasticity estimates for the accident frequency model
Variable
Elasticity
Section length (km)

AADT per lane
Shoulder width (m)
Median width (m)
Lane width (m)/number of lanes
0.33
0.62
0.07
0.13
0.12
0.38
not measurable2 (Jovanis and Chang, 1986; Poch and

Mannering, 1996).
Turning to the specific variables entered in the
model, two exposure variables were found to be significant. The first is the log of the sections length. The
longer the length of the roadway section, the more
likely accidents would occur on these sections. A similar conclusion was reached for the log of the AADT per
lane. An increase in AADT per lane has a positive
impact on the likelihood of accidents.
The sharpness of the horizontal curve has a positive
effect on the likelihood of accidents. Accidents increase
with the increase of the degree of curve. The increase in
shoulder width and median width reduce the frequency
of accidents. Whether the roadway is divided or not, is
accounted for implicitly in the median width variable.
If the roadway is undivided, then the median width
would be equal to zero. There is an interaction effect
between the lane width and the number of lanes. When
the lane width increase, and at the same time the
number of lanes decrease, the frequency of accidents
decline. No effect of vertical alignment entered the
model, possibly because Florida has relatively flat topography (i.e. little variation in slopes). Also the fact
that urban areas experience higher accident frequency
than rural areas as depicted by the model may be
explained by the larger number of access points and the
higher level of congestion. Finally, the significance of
the over dispersion parameter (a) indicates that the
Negative Binomial formulation is preferred to the more
restrictive Poisson formulation.
To examine the relative effects of the variables included in the model, average elasticity of all continuous
variables are presented in Table 2. The results show
that AADT per lane has the greatest relative effect
(0.62) on the accident frequency among all the independent variables. The interaction between lane width and
2
Note that even in a full model explaining 100% of the variation of
the expected number of accidents in the population, the log-likelihood ratio usually obtains very low value if the average expected
number of accidents in the data is low (as is the case usually). This is
attributed to the relatively large pure random variation of the observed accident numbers around the expected numbers of accidents in
each unit of the population. This can be accounted for by applying
one of the approaches proposed by Fridstrom et al. (1995).
number of lanes has the next relative effect on accident

frequency. Shoulder width and median width have the
same relative effect, which is greater than the effect of
the degree of horizontal curve.
4.2. Modeling the accident in6ol6ement by gender

Two Negative Binomial models of accident involvement were estimated. The first model represents the
males accident involvement, while the second represent
the females. The models are presented in Table 3. All
the variables that entered in these models are similar to
the accident frequency model presented before in Table
1. Only one additional variable was significant in the
estimation of the accident involvement of male drivers.
The variable is a speeding indicator ((estimated traveling speed posted speed limit)/posted speed limit).
This variable represents the extent of speeding at the
time of the accident. The coefficient of this variable is
positive and significant, indicating that as male drivers
speed their likelihood of being involved in an accident
increase. This variable was not significant and did not
enter in the females model, showing less variation in
speed among female drivers, which probably indicates
that male drivers have a tendency to be involved in
accidents while speeding.
The deviance value (2(LL(b) LL(0))) for the male
involvement model (2014, df=8) and female (1088,
df= 7) which follows x 2 distribution is significant (at
95% confidence interval, x 2 value is equal to 14 for
df= 7, and 15.5 for df=8). Therefore, both models
show good statistical fit. Furthermore, the value of the
likelihood ratio index is higher for both the male model
(r 2 = 0.275), and the female model (r 2 = 0.16) than the
general
accident
frequency
model
(r 2 =
3
0.11) .Therefore, the predictability of accident frequency is improved when accident involvement models
are estimated for each gender group. This could be
attributed to that some behavioral variables explained
by gender are implicitly considered in the model.
The over dispersion parameter in both models are
significant, indicating that the mean varies from the
variance, which confirms the appropriateness of the
Negative Binomial relative to the Poisson formulation
for predicting accident involvement for both male and
female drivers.
Elasticity values computed from both models are
depicted in Table 4. Although both male and female
driver models show that an increase in AADT per lane
3
Note that the higher r2 obtained for separate gender equations is
a reflection of non-additivity or interaction. That is, the relationship
between the explanatory variables and accidents differs for males and
females. In the absence of interaction, one would not expect the
separate equations to yield substantially different predictive power
than the single equation.
639
Table 3
Negative binomial models of male and female drivers accident involvement
Variables
Male accident model

Coefficient
Constant
Log of section length (km)
Shoulder width (m)
Median width (m)
Lane width (m)/no. of lane
Speed difference/speed limit
Urban (1 if urban, 0 if rural)
Summary statistics
Number of sections
Log-likelihood at convergence
r 2 = 1LL(b)/LL(0)
2(LL((b)LL(0))
0.323
0.096
0.128
0.119
0.108
0.025
0.356
0.095
0.367
0.094
Female accident model

t-statistics
0.93
4.25
2.50
7.29
4.74
3.77
4.88
5.40
8.64
4.94
566
3657
650
0.275
2014
has a positive effect on accident involvement, the relative effect of AADT per lane on accident involvement is
higher for female drivers than male drivers. This shows
a tendency to more accident involvement by females
during heavy traffic. The decrease in median width
increases accident involvement frequencies for both
male and female drivers. But the relative effect of
median width for female drivers is more pronounced
than that for male drivers. The negative correlation
between the interaction of the lane width and number
of lanes and accident involvement is higher for females
than males. So it can be concluded that narrow lane
width and larger number of lanes have larger effect on
accident involvement for female than male drivers. For
Male drivers, there is a positive correlation between the
percentage of speed and accident involvement, which is
not significant for female drivers. This indicates that
male drivers have a tendency to be involved in accidents that are related to speeding.
4.3. Modeling the accident in6ol6ement by age

Three Negative Binomial models of young, middle
age, and old driver accident involvement were estimated. Young drivers were defined as drivers between
15 and 25 years old. Middle age are drivers between 26
and 75, and old drivers are those above the age of 75.
Based on results from a previous study (Abdel-Aty et
al., 1999a), it was decided that for the purpose of this
study it is adequate to divide age into these three
categories with the aforementioned cut off values.
Young drivers (25 or below) and old drivers (above 75)
have higher risk than middle aged drivers (Abdel-Aty et
al., 1999a). Including a wide middle age category also
Coefficient
t-statistics
2.52
0.092
0.375
0.107
0.077
0.063
0.800
0.317
0.137
3.43
3.21
4.87
6.11
2.63
6.56
8.11
6.31
4.10
566
3408
2864
0.16
1088
solve the problem of double counting accidents for the

two groups of interest (i.e. the young and the old).
Screening the data proved that in most cases, multiple
accidents that involved an old or a young driver, the
other driver is from the middle age group.
The models are presented in Table 5. All the variables that entered in these models are similar to the
accident frequency model presented before in Table 1.
Two additional variables were significant. A speeding
indicator variable ((estimated traveling speedposted
speed limit)/posted speed limit), entered in the young
and the middle age drivers involvement models, and a
dummy variable of shoulder pavement entered in the
old age involvement model. These variables indicated
an increase in the probability of an accident as the
estimated speed of the accident increase for both the
young and middle age drivers. Also older drivers likelihood of accident involvement decreases when the
shoulder is paved.
Table 4
Elasticity estimates for male and female accident involvement models
Variables
Section length (km)

AADT per lane
Degree of horizontal
curve
Shoulder width (m)
Median width (m)
Lane width (m)/no.
of lanes
Speed
difference/speed
limit
Elasticity (male
model)
Elasticity (female
model)
0.09
0.12
0.08
0.09
0.37
0.08
0.18
0.13
0.35
0.13
0.34
0.79
0.07
640
Table 5
Negative binomial models of young, middle, and old drivers accident involvement
Variables
Constant
Log of section length (km)
Shoulder width (m)
Median width (m)
Lane width (m)/no. of lane
Shoulder pavement (1 if paved, 0
otherwise)
Urban (1 if urban, 0 if rural)
Summary statistics
Number of sections
Log-likelihood at convergence
r 2 = 1LL(b)/LL(0)
2(LL((b)LL(0))
Young age model
Middle age model
Old age model
Coefficient
Coefficient
Coefficient
t-statistics
t-statistics
t-statistics
3.152
0.099
0.373
0.312
0.087
0.030
0.706
4.04
3.23
4.74
16.47
2.66
2.98
5.71
0.321
0.105
0.165
0.123
0.074
0.036
0.448
4.01
4.60
2.81
8.91
2.98
5.08
5.37
3.020
0.218
0.342
0.162
0.094
0.725
0.236
2.69
4.47
3.01
2.87
6.92
5.21
2.10
0.534
0.113
0.28
9.48
14.42
1.1
0.174
0.039
0.195
4.56
2.01
9.60
0.458
0.211
4.40
2.13
566
2763
2313
0.16
900
The models have an overall good statistical fit. The

deviance values (=2(LL(b) LL(0))) for young (900;
df= 8), middle (790; df =8) and old (1922; df = 8),
which follow x 2 distribution, are significant (at 95
percent confidence interval Chi-square value is equal to
15.5 for 8 df). The value of r 2 is higher for young (0.16)
and old (0.385) accident involvement models than the
general accident frequency model (0.11).
For both the middle and old drivers accident involvement models, the over dispersion parameters were
found significant. Therefore, the Negative Binomial
formulation is preferred to the more restricted Poisson.
However, the over dispersion parameter for the young
drivers accident involvement model was insignificant,
indicating that the Negative Binomial formulation simply reduces to a Poisson regression with Var[ni ]=E[ni ].
To measure the relative effects of the different variables on accident involvement of young, old and middle
age drivers, elasticities were evaluated. Table 6 presents
a comparison of the relative effects of the different
independent variables on the different age groups accident involvement.
The results in Table 6 show that an increase in
AADT per lane increases the likelihood of accident
frequencies. The relative effect of AADT per lane on
accident involvement is higher for young and old drivers than middle age drivers. Therefore, younger and
older drivers suffer more problems than middle aged
drivers with heavy volume of traffic.
The elasticity values for both the degree of horizontal
curve and the speeding indicator variable show that the
effect of this variable is higher for young drivers than
566
3994
3599
0.09
790
566
2518
1557
0.385
1922
middle age drivers. These two variables are sometimes

related because speeding on a horizontal curve increases
the likelihood of an accident. These two variables did
not enter in the old drivers accident involvement
model, which means less variation in the speed variable
for this group, and also indicates no safety problem on
curves for this group of drivers. The above findings
confirm previous results (Abdel-Aty et al., 1999b).
Shoulder and median widths affect negatively the
frequency of accident involvement, however, the elasticity values indicate that these results are higher for older
drivers. The interaction variable (lane width/no. of
lanes) also affect negatively the accident involvement
frequency.
5. Summary and conclusion

This paper presents a model of accident frequency as
well as models of accident involvement for two driver
demographic factors: age and gender. The literature
suggests that the normal distribution, which underlies
the traditional multiple linear regression method,
should be used with caution because of the problems
associated with non-negativity and error terms. If the
underlying accident process is one in which the mean
accident frequency is functionally related to the variance (e.g. Poisson distribution), then parameters in a
linear regression model would give correct signs but
would have incorrect confidence limit. The literature
also suggests that the Poisson regression and the Negative Binomial models possess most of the desirable
statistical properties in describing vehicle accident
events. However, one of the stated limitations of the

Poisson regression model is that the variance of the
accident frequency is constrained to be equal to the
mean. Most accident frequency data are over dispersed
(having a variance greater than the mean), pointing to
the need for a correction to the Poisson formulation.
To overcome this problem the Negative Binomial modeling methodology is used in this paper.
Negative binomial models were developed to estimate
accident frequencies, and accident involvement on a
major principal arterial (SR 50) in Central Florida. A
massive data collection effort has been done for SR 50
from two data sources: Roadway characteristic Inventory database and State Accident database. The former
contains the geometric characteristic of the roadway
and the later contains the accidents that occurred on
this roadway.
The results showed that several roadway design and
traffic factors affect the safety of an arterial. Among
those, AADT is the most critical factor. As AADT per
lane increases accident frequency significantly increases.
Lane width and the number of lanes are the next
potential geometric parameters that contribute to accident occurrence. Narrow lane width and larger number
of lanes increase accident occurrence. Moreover, narrow shoulder width, reduced median width and larger
degree of horizontal curve (i.e. sharper curves) are also
potential geometric features that increase the frequency
of accidents. Urban roadway segments have higher
potential for accident occurrence than rural sections.
From the male and female accident involvement
models, it could be concluded that female drivers experience higher probability of accidents than male drivers
during heavy traffic volume and with reduced median
width. Moreover, narrow lane width and larger number
of lanes have more effect on accident involvement for
female drivers than male drivers. Male drivers have
greater tendency to be involved in accidents while
speeding.
Young and older drivers have a larger possibility of
accident involvement than middle aged drivers when
experiencing heavy traffic volume. There is no effect of
horizontal curve on older age drivers accident involvement. Older age drivers have greater tendency to acci-
641
dent occurrence than middle and young drivers for

reduced shoulder width and median widths. Decreasing
lane width and increasing number of lanes create more
problems for older drivers and younger drivers than
middle age drivers. Older drivers experience fewer number of accidents if the shoulder is paved. Also, the
likelihood of younger drivers accident involvement increases with speeding.
This paper confirmed some of the results reached in
previous studies. Shoulder and lane, widths, and sharp
horizontal curves are found to affect the safety of a
roadway as in Milton and Mannering (1996), and median width are significant as in Knuiman et al. (1993).
Posted speed was also found as a significant accident
predictor in Ivan and OMara (1997), while AADT was
significant in the study by Miaou (1994), Ivan and
OMara (1997). This study, however, showed that the
log of the section length and the log of the AADT are
significant explanatory variables in modeling accident
frequency or involvement. The study also showed that
the interaction between the lane width and the number
of lanes is significant, and that the type of shoulder
pavement affects the probability of accident involvement for older drivers. Speed was introduced in this
study in a different form to capture the magnitude of
speeding relative to the posted speed limit. This speeding indicator variable was shown to affect the accident
involvement of male and young drivers. The exact
degree of horizontal curve was an important variable in
all the estimated models, which might indicate that the
continuous relationship is preferred than the more restrictive categorization of this variable (Milton and
Mannering, 1996).
This paper support conclusions from previous literature that the Negative Binomial formulation is superior
to the more restricted Poisson regression. The paper
attempted to add to the literature the dimension of
including the effect of age and gender in modeling
accident involvement. It is worth mentioning that the
estimation of a model in which a multiple vehicle
accident is represented by two or more involvements,
gives rise to a possible correlation of disturbances,
which refers to variations in unobserved contributing
Table 6
Elasticity estimates for age accident involvement models
Variables
Elasticity (young age model)
Elasticity (middle age model)
Elasticity (old age model)
Section length (km)

AADT per lane
Shoulder width (m)
Median width (m)
Lane width (m)/no. of lanes
0.10
0.31
0.18
0.70
0.22
0.71
0.075
0.10
0.16
0.07
0.12
0.19
0.44
0.02
0.21
0.34
0.26
0.50
0.72
642
factors. Although, it is unlikely that this problem is

present in the female, young age, or old age accident
involvement models, it is likely that it is present in the
male and middle age models of accident involvement.
An important direction for future research, which is
methodological in nature, would be to account for this
correlation. This task will not be easy because of the
complexity of the error term structure in Negative
Binomial models.
Acknowledgements
The authors wish to acknowledge the comments and
suggestions of the anonymous referees. Their recommendations resulted in a substantially improved paper.
References
Abdel-Aty, M., Chen, C., Radwan, E., Brady, 1999a. Analysis of
accident-involvement trends by drivers age in Florida. ITE Journal on the Web (Feb. 1999), pp. 6974.
Abdel-Aty, M., Chen, C., Radwan, E. 1999b. Using conditional
probabilities to explore the driver age effect in accidents. ASCE
Journal of Transportation Engineering 125(6).
Agent, K., Deen, R., 1975. Relationship between roadway geometrics
and accidents. Transportation Research Record 541, 111.
Agresti, A., 1990. Categorical Data Analysis. Wiley, New York.
Chen, C., 1997. Statistical Analysis of the Effect of Demographic and
Roadway Factors on Traffic Crash Involvement. M.S. thesis,
Department of Civil Engineering, University of Central Florida.
Fridstrom, L., Ifver, J., Ingebrigtsen, S., Kulmala, R., Thomsen, L.,
1995. Measuring the contribution of randomness, exposure,
weather, and daylight to the variation in road accident counts.
Accident Analysis and Prevention 27 (1), 120.
Ivan, J., OMara, P., 1997. Prediction of Traffic Accident Rates

Using Poisson Regression. Presented at the 76th Annual Meeting
of the Transportation Research Board.
Joshua, S., Garber, N., 1990. Estimating truck accident rate and
involvement using linear and Poisson regression models. Transportation Planning and Technology 15, 41 58.
Jovanis, P., Chang, H., 1986. Modeling the relationship of accidents
to miles traveled. Transportation Research Record 1068.
Knuiman, M., Council, F., Reinfurt, D., 1993. The effect of median
width on highway accident rates. Transportation Research
Record 1401, 70 80.
McGee, H., Hughes, W., Daily, K., 1995. Effect of Highway Standards on Safety. NCHRP Report 374, Transportation Research
Board.
Miaou, S., 1994. The relationship between truck accidents and geometric design of road section: Poisson versus Negative Binomial
regression. Accident Analysis and Prevention 26(4).
Miaou, S., Lum, H., 1993. Modeling vehicle, accidents and highway
geometric design relationships. Accident Analysis and Prevention
25 (6), 689 709.
Miaou, S., Hu, P., Wright, T., Rathi, A., Davis, S., 1992. Relationship between truck accidents and highway geometric design: a
Poisson regression approach. Transportation Research Record
1376, 10 18.
Milton, J., Mannering, F., 1996. The Relationship Between Highway
Geometrics, Traffic Related Elements and Motor Vehicle. Washington State Dept. of Transportation.
Mostofa, H., 1998. Modeling of Traffic Accidents on Principal Arterial, M.S. thesis, Department of Civil Engineering, University of
Central Florida.
Poch, M., Mannering, F., 1996. Negative binomial analysis of intersection accident frequencies, Journal of Transportation Engineering 122(2).
Ramanathan, R., 1995. Introductory Econometrics with Applications. The Dryden Press, Fort Worth, TX.
Sabey, B., Taylor, H., 1980. The Known Risks we Run: The Highway. Supplementary Report SR 567, Transport and Road Research Laboratory, UK.
Shankar, V., Mannering, F., Barfield, W. 1995. Effect of roadway
geometric and environment factors on rural freeway accident
frequencies. Accident Analysis and Prevention 27(30).

Abdel Aty2000 PDF

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Abdel Aty2000 PDF

Enviado por

Direitos autorais:

Formatos disponíveis

Accident Analysis and Prevention 32 (2000) 633 642

Modeling traffic accident occurrence and involvement

1.1. Accident prediction methodology

Safety and efficiency are the two primary goals of

* Corresponding author. Tel.: + 1-407-8235657; fax: + 1-4078233315.

1.2. Factors affecting highway accidents

reduced. However, because this is an impossible task,

1.3. Research objecti6e

geometric and traffic features recorded by the FDOT

Where, li is the expected mean number of accidents on

Where ui = u/(u +li ) and u= 1/a.

Where N is the total number of highway sections.

The choice between the Negative Binomial model and

3.1. Goodness of fit

Where ML is the maximum L(b) and k is the number

3.2. Relati6e significance of 6ariables

Since the Negative Binomial regression is nonlinear, the

There are a variety of measures which are analogous to R 2.

To overcome this problem and to examine the true

Where l is the mean number of accidents, x are the

4.1. Modeling accident frequency

Section length (km)

not measurable2 (Jovanis and Chang, 1986; Poch and

number of lanes has the next relative effect on accident

4.2. Modeling the accident in6ol6ement by gender

Male accident model

Female accident model

4.3. Modeling the accident in6ol6ement by age

solve the problem of double counting accidents for the

Section length (km)

Young age model

Middle age model

Old age model

The models have an overall good statistical fit. The

middle age drivers. These two variables are sometimes

5. Summary and conclusion

events. However, one of the stated limitations of the

dent occurrence than middle and young drivers for

Elasticity (young age model)

Elasticity (middle age model)

Elasticity (old age model)

Section length (km)

factors. Although, it is unlikely that this problem is

Ivan, J., OMara, P., 1997. Prediction of Traffic Accident Rates

Você também pode gostar