Você está na página 1de 23

ARTICLE IN PRESS

Journal of Accounting and Economics 47 (2009) 108130

Contents lists available at ScienceDirect

Journal of Accounting and Economics


journal homepage: www.elsevier.com/locate/jae

An empirical analysis of changes in credit rating properties:


Timeliness, accuracy and volatility$
Mei Cheng , Monica Neamtiu
The University of Arizona, Tucson, AZ 85721, USA

a r t i c l e i n f o abstract

Article history: In recent years, credit rating agencies have faced increased regulatory pressure and
Received 8 March 2007 investor criticism for their ratings lack of timeliness. This study investigates whether
Received in revised form and how rating agencies respond to such pressure and criticism. We nd that the rating
24 September 2008
agencies not only improve rating timeliness, but also increase rating accuracy and
Accepted 6 November 2008
reduce rating volatility. Our ndings support the criticism that, in the past, rating
Available online 25 November 2008
agencies did not avail themselves of the best rating methodologies/efforts possible.
JEL classication: When their market power is threatened by the possibility of increased regulatory
G10 intervention and/or reputation concerns, rating agencies respond by improving their
G29
credit analysis.
G38
& 2008 Elsevier B.V. All rights reserved.
M41

Keywords:
Credit ratings
Rating properties
Regulatory pressure
Investor criticism

I am troubled by the extreme concentration in this [credit rating] industry. Two rms control the vast majority of
market share. To put it mildly, this is not an efcient market with robust competition. Rather, it has been identied,
accurately I might add, as a duopoly, a shared monopoly, and a partner monopoly.

Michael G. Oxley, 2005, House Financial Services Committee Chairman

1. Introduction

In recent years, the nationally recognized credit rating agencies (e.g., Moodys, Standard and Poors (S&P), and Fitch)
have faced widespread criticism for their credit ratings lack of timeliness in predicting some high-prole bankruptcies.1
These rating agencies maintained investment-grade ratings for Enron, California utilities, and other bankrupt companies,
days before each declared bankruptcy.

$
We thank Dan Bens, Dan Dhaliwal, Dan Givoly, Adam Kolasinski (discussant), Thomas Lys (editor), Karl Muller, Mark Soliman (referee),
K.R. Subramanyam, Bill Waller, Hal White, and seminar participants at the University of Arizona, the AAA 2007 Annual Meeting, the JAE 2007 Conference
and the FARS 2008 Meeting for their valuable comments and suggestions.
 Corresponding author.
E-mail addresses: meicheng@email.arizona.edu (M. Cheng), mis125@email.arizona.edu (M. Neamtiu).
1
Before 2001, there were only three nationally recognized rating agencies: Moodys, S&P, and Fitch. Since 2001, the SEC also granted nationally
recognized status to Dominion Bond Rating Service and A.M. Best Co.

0165-4101/$ - see front matter & 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.jacceco.2008.11.001
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 109

Since the Enron debacle, some corporate nancial ofcers and treasury professionals (Association for Financial
Professionals, 2002; Kahn, 2002) have blamed the lack of competition in an industry dominated by several major agencies
for the lack of incentives to timely respond to the needs of credit rating users.2 In addition, the oligopolistic structure of
credit rating industry has received intense regulatory scrutiny. Both the Security and Exchange Commission (SEC) and the
US Congress have conducted a series of hearings on the possibility of a new regulatory regime for credit rating agencies,
one that promotes better competition and increased regulatory oversight by changing the way the SEC currently designates
rating agencies as nationally recognized rating agencies.
This study investigates whether and how the nationally recognized credit rating agencies have changed the properties
of their credit ratings in response to the recent increase in regulatory pressure and investor criticism. More specically, we
examine whether the agencies have changed the timeliness, accuracy and volatility of their credit ratings. We focus on
these three rating properties, because prior academic literature (Altman and Rijken, 2004; Lofer, 2004; Beaver et al., 2006)
and the credit rating agencies (Cantor and Mann, 2003) identify them as being important to credit rating users.
In recent years, the lack of timeliness was the most criticized and highly visible rating property in the aftermath of some
high-prole bankruptcies. Given this increased investor and regulatory scrutiny, failing to timely downgrade a debt issue to
predict a deterioration in credit quality (i.e., a type I prediction error) has become more costly to the rating agencies.3
Therefore, if the rating agencies are concerned about potential loss of reputation and additional regulatory burden, we
expect the agencies to take measures to improve rating timeliness in the post increased regulatory pressure and criticism
period compared to the pre period (hereafter, we refer to these periods as the PRE and POST periods).4 Improvements in
this highly criticized rating property can help the agencies repair reputation damages and, even if not completely stop,
at least limit the degree of regulatory intervention in their industry.
Conditional on nding an increase in rating timeliness, we also investigate how a shift toward improved rating
timeliness affects accuracy. While, in recent years, investors and regulators focused explicitly on rating timeliness, it is
important to recognize that timely ratings are not useful, as long as the ratings are not accurate as well. If the agencies were
to downgrade debt issues very early (i.e., achieve improved timeliness), and in the end such downgrades would prove to be
unnecessary, then the needs of credit rating users would be poorly served. Given the importance of both rating timeliness
and accuracy, the national recognized agencies may attempt to improve their credit analysis and thus, achieve better credit
ratings accuracy along with improved timeliness in the POST period.
However, ex ante, it is unclear whether it is possible to improve both rating timeliness and accuracy at the same time,
despite both of them being important to rating users. The rating agencies (e.g., Cantor and Mann, 2006) argue that there are
unavoidable trade-offs between these desirable rating properties. In fact, the agencies use the existence of such trade-offs
between rating properties as an explanation for their perceived lack of timeliness. If the agencies are right and the trade-off
between rating properties are unavoidable then, under pressure from regulators and investors, the rating agencies may
improve rating timeliness at the expense of rating accuracy.
There are at least two ways rating agencies can trade-off rating accuracy for improved timeliness. First, since in the POST period
the relative cost of failing to timely downgrade has increased considerably, rating agencies can tighten their credit standards to
make sure they do not miss any defaulting issues. This strategy will result in more timely downgrades for higher risk issues, but
less accurate (i.e., too harsh) ratings for lower risk issues. In this case, a reduction in type I prediction errors (i.e., a reduction in the
number of missed defaults) will come at the expense of an increase in type II errors (i.e., an increase in the number of false
warnings). Second, to increase timeliness, agencies can shorten the period of time spent on information collection and analysis
before any rating change decision. Such a change in rating policy will lead to more timely but less accurate ratings.
Shortening the information collection period and reacting faster to new information (for example, reacting to
information revealed in daily stock prices) can affect rating volatility as well. By its nature, a borrowers credit risk changes
only gradually over time. Therefore, the long-term credit risk implications of new information can be fully understood only
over time, when new clarifying facts become available. If the agencies decide to shorten the period of time they wait for
conrmatory/clarifying information before changing a rating, then they will have to reverse some rating changes when
additional information invalidates the initial change. Such a decision would result in more timely, but more volatile credit
ratings. Ceteris paribus, high volatility in credit ratings is not desirable (Beaver et al., 2006), since the use of credit ratings
for contracting (e.g., the use of ratings for regulatory purposes and for portfolio governance rules for institutional investors)
makes volatile ratings and unexpected rating reversals costly for the contracting parties.5

2
The credit rating industry is dominated by two rating agencies, Moodys and S&P, with a market share of about 80% of the rating business. Moodys
operates with prot margins of more than 50%, according to testimony by Glenn Reynolds, chief executive of Credit Sights, an independent research group.
He estimates prot margins at S&P (a segment of McGraw-Hill) to be more than 40% (Hughes, 2006).
3
In this study, we dene type I error as missed defaults (i.e., instances where the rating agencies assign/maintain favorable ratings to defaulting
issues) and type II errors as false warnings (i.e., instances where the rating agencies assign/maintain unfavorable ratings to non-defaulting issues).
4
We dene the pre increased criticism and regulatory pressure period as the period between January 1, 1996 and July 25, 2002. We dene the post
increased criticism and regulatory pressure period as the period after July 25, 2002 until December 31, 2005, the end of our sample period (see Section 4
and Fig. 1 for a more detailed explanation).
5
To mitigate agency problems between investors in debt mutual funds and the fund managers, many funds include portfolio governance rules that
require the fund managers to hold only debt issues with credit ratings above a certain threshold. Therefore, volatile and unexpected rating changes would
force the managers to trade at inopportune times. In addition, frequent rating reversals over short periods of time would cause some institutional
investors to sell and then repurchase the same debt securities with high frequency, imposing large transaction costs.
ARTICLE IN PRESS
110 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

Our empirical analysis focuses on identifying changes in the rating properties between the PRE and POST periods.6 To
detect changes in rating timeliness, we analyze the timeliness of credit ratings with respect to the event of default, since, by
denition, ratings are intended to capture the relative default probabilities and the expected loss recovery in case of
default. First, we conduct pre and post comparisons of the average rating levels at several time points during the year
leading to the default date. Second, our multivariate timeliness tests compare the number of days between downgrade
dates and corresponding default dates and the weighted average rating levels over the 1-year period leading to default. We
nd that the nationally recognized rating agencies downgrade defaulting bonds earlier and assign signicantly closer-to-
default (i.e., more timely) ratings in the POST period relative to the PRE period.
More importantly, we also nd that the improved rating timeliness is accompanied by an increase in credit rating
accuracy. To examine potential changes in accuracy in the PRE and POST periods, we focus on the frequency of types I and II
prediction errors. We nd that both types I and II errors decrease signicantly, suggesting improved rating accuracy in the
POST period. To further conrm our improved accuracy conclusion, we examine two additional accuracy measures based
on a comprehensive sample of default and non-default debt issues. First, following Moodys methodology of measuring
relative credit rating accuracy, we use a comprehensive sample of all rated issues and plot cumulative accuracy proles (or
power curves), to assess the power of credit ratings to correctly rank order debt issues according to their relative credit risk.
Second, we benchmark agency issued rating downgrades (upgrades) to increases (decreases) in the probability of
bankruptcy indicated by bankruptcy prediction models. We use both a market-based model proposed by Hillegeist et al.
(2004) and the Altman Z score (Altman, 1968) to compute changes in the probability of bankruptcy and nd that when the
benchmark models indicate a change in the default risk, the rating agencies issue credit rating changes for a signicantly
larger percentage of rms in the POST period relative to the PRE period. In conclusion, both our additional accuracy
measures indicate improved rating accuracy following the regulatory pressure and investor criticism period.
We also nd a reduction in rating volatility measured as the standard deviation of credit rating levels, in the POST
period. These empirical ndings indicate that the agencies have been able to improve both rating timeliness and accuracy
without causing an increase in rating volatility, consistent with the conclusion that the nationally recognized rating
agencies have improved their credit analysis.
Our study contributes to the credit rating literature since, to our knowledge, this is the rst study to explicitly examine
the impact of recent changes in the operational environment of nationally recognized credit rating agencies on the
properties of their credit ratings. Understanding the properties of credit ratings is important, given that credit ratings play a
signicant and increasingly important role in borrowers access to capital and in federal and state legislation (Fabozzi,
2001; Sinclair, 2003; Security and Exchange Commission (SEC), 2003). Our ndings show that, when faced with the
possibility of increased regulatory oversight, and/or concerns about their reputation, rating agencies increase the quality of
their credit analysis.
In addition, our study contributes to the literature that explains the lack of timeliness of the credit ratings issued by the
nationally recognized rating agencies. Recent academic literature (Beaver et al., 2006; Altman and Rijken, 2004; Lofer,
2004) and the nationally recognized rating agencies themselves attribute the lack of timeliness in credit ratings to the fact
that there are unavoidable trade-offs between rating properties (i.e., the rating agencies have to sacrice some accuracy to
achieve more timely ratings). Critics of the agencies, however, attribute the lack of rating timeliness to the market power
and lack of competition enjoyed by the nationally recognized agencies. Since we nd that the agencies have been able to
improve rating timeliness without sacricing rating accuracy and volatility, our results suggest that the trade-off between
desirable rating properties cannot be a complete explanation of credit ratings lack of timeliness. In response to the threat
of regulatory intervention rating agencies improve their credit analysis.
The remainder of this study is organized as follows. The next section provides background information on credit rating
agencies. Sections 3 and 4 present the hypothesis development and sample selection. Section 5 describes our research
design and empirical ndings. Section 6 concludes the study.

2. Institutional background

Despite the large size of the debt markets, the credit rating industry has long been dominated by a handful of companies
designated as nationally recognized by the SEC. The term Nationally Recognized Statistical Ratings Organizations was
originally adopted by the SEC in 1975 solely for determining capital requirements for certain security brokers and dealers.
Over time, as capital markets and regulators relied more heavily on credit ratings, the use of the nationally recognized
certication became more widespread. Today, the ratings issued by nationally recognized agencies are used not only for
valuation purposes, but also in federal and state legislation, in capital adequacy rules issued by bank regulators and other
regulators, and in corporate debt contracts, while ratings issued by non-nationally recognized agencies are only used for

6
In this study, we focus on two rating property trade-offs: timeliness vs. accuracy and timeliness vs. volatility. Since we are interested in the effect of
increased criticism and scrutiny on rating properties, we start our analysis with the most criticized aspect of credit ratings: timeliness. Then, we
investigate how changes in rating timeliness interact with changes in accuracy and volatility. We acknowledge that, in reality, the interaction between
rating properties may be more complex than depicted in this study. For example, there could also be a trade-off between rating volatility and accuracy. For
the purpose of keeping our analysis at a manageable level, we do not directly investigate the trade-off between volatility and accuracy. We believe that
even if a trade-off between rating volatility and accuracy exists, it would not change or invalidate our current analysis.
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 111

valuation (Beaver et al., 2006). Thus, having the nationally recognized status gives a rating agency substantial inuence
compared to non-nationally recognized rating agencies. Despite its importance, the term nationally recognized is not
dened anywhere in the regulations, and the criteria a rating agency must meet to qualify as a nationally recognized
agency are not explicitly specied. At the time of the Enron and WorldCom scandals, the SEC granted the nationally
recognized status to only three agencies out of about 130 agencies in the US.
In recent years, the idea of a new regulatory regime for credit rating agencies has gained considerable momentum
(see Fig. 1A for a timeline of the events discussed below). After Enron and WorldComs ling for Chapter 11 (on December 2,
2001 and July 21, 2002, respectively), as part of an overall effort to restore investors condence in capital markets, Section
702 (b) of the Sarbanes-Oxley Act (passed on July 25, 2002 by both the Senate and the House) requires the SEC to carefully
study the role and function of credit rating agencies in the operation of securities markets. In response to the requirements
of Section 702 (b), the SEC has issued a series of reports (starting with a preliminary report on January 26, 2003) on the role
of credit rating agencies as nancial intermediaries. In addition, the US Congress has conducted a series of hearings
(starting on April 2, 2003) focusing on the structure of credit rating industry. At least two key proposals have emerged from
this review process. One proposal deals with reforms in the process by which the SEC certies rating agencies as nationally
recognized, to facilitate the recognition of more rating agencies and thereby introduce competition in the industry. A
second proposal focuses on an increase in the level of regulatory oversight applied to credit rating agencies, which,
historically, have been self-regulated.
This lengthy regulatory review process has lead to the adoption of the Credit Rating Agency Duopoly Relief Act of 2006
(signed into law by the President on September 29, 2006). This act removes the SEC from the process of approving rating
agencies as nationally recognized. It replaces the SECs opaque designation process with a system that allows any rating
agency that has 3 years of experience and meets certain quality standards to register with the SEC as a nationally
recognized rating agency. The act also increases oversight of credit rating agencies by bringing them under federal
regulation by the SEC.

Dec2, 2001 Jul21, 2002 Jul25, 2002 Jan26,2003 Apr2, 2003 Sep29, 2006

Enron Worldcom SOX SEC Review Congress Hearing Bill Passage

Jan1, 1996 Jan1, 1997 Jul25, 2002 Jul25, 2003 Dec31, 2005

Ratings
Pre Period Post Period

Defaults
Pre Period Deleted Post Period

Fig. 1. (A) Presents the timeline of regulatory pressure faced by the nationally recognized credit rating agencies. December 2, 2001 and July 21, 2002 are
the dates when Enron and Worldcom, respectively, led for Chapter 11 bankruptcy. July 25, 2002 is the date when both the Senate and the House passed
the Sarbanes-Oxley Act. Section 702 (b) of SOX requires the SEC to study the function of rating agencies. On January 26, 2003, the SEC issued a rst report
on the role of rating agencies in the capital markets. On April 2, 2003, the Congress started its series of hearings on the credit rating industry. On
September 29, 2006, the President signed the Credit Rating Agency Reform Act of 2006 into law. (B) Presents our default sample timeline. Since the SOX
date (i.e., July 25, 2002) is the rst ofcial date requiring regulatory investigation, we classify ratings before (after) this date as ratings in the PRE (POST)
period. The PRE and POST periods for ratings and defaults are not perfectly aligned in time. The default timeline starts a year later than the ratings
timeline because our tests use rating downgrades 1 year before default. We exclude all defaults occurring between July 25, 2002 and July 25, 2003 to
ensure that we eliminate contaminated observations where defaults happen in the POST period, but the ratings leading up to the defaults are issued in
the PRE period.
ARTICLE IN PRESS
112 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

3. Hypothesis development

3.1. Does rating timeliness improve?

The use of credit ratings in valuation (e.g., selecting investment opportunities in the capital markets) and in assessing
credit worthiness by third parties such as regulators, suppliers, customers and employees leads to demand for rating
timeliness (i.e., demand for providing early signals of changes in borrowers credit risk). In recent years, the nationally
recognized rating agencies have faced intense investor criticism and reputation concerns with respect to the timeliness of
their ratings. A survey prepared by the Association for Financial Professionals in 2002 shows that many corporate treasuries
and nancial ofcers believe that rating agencies are slow in responding to changes in corporate credit quality. Many
investors (Ip, 2002; Tafara, 2005) believe that the nationally recognized rating agencies do not face enough competition
and as a result, do not have incentives to promptly respond to rating users needs. Given this increased investor scrutiny
with respect to rating timeliness, additional failures to timely warn market participants about deteriorations in credit
quality could inict considerable damage to credit rating agencies reputation and thus, could be costly to the agencies.
Rating agencies lack of timeliness in predicting some high-prole bankruptcies also attracted regulatory scrutiny. In
particular, regulators have questioned whether the nationally recognized agencies were thorough in their review of some
bankrupt companies public lings and whether they probed opaque nancial disclosures and aggressive accounting
practices (Security and Exchange Commission (SEC), 2003). The regulatory review process (started in the aftermath of some
highly visible bankruptcies and the SOX passage period) has centered on the possibility of a new regulatory regime for
rating agencies meant to improve the regulatory oversight of rating agencies and foster more competition. During the
entire regulatory scrutiny period, the nationally recognized rating agencies have strongly opposed and lobbied against
more regulation of their industry (Barlas, 2005; Partnoy, 2006; Shaw and Reason, 2006), on the grounds that the regulators
would second-guess their credit risk assessments and rating methodologies and, thus, compromise their independence,
which is crucial to the credibility of credit ratings. In this context of increased regulatory scrutiny, additional instances of
missed defaults (i.e., type I errors) could increase the risk and/or the extent of regulatory intervention in the credit rating
industry and thus, be costly to the rating agencies.
Since the increased investor criticism and regulatory scrutiny make it more costly to rating agencies to fail to timely
predict deteriorations in credit quality (i.e., the cost of type I errors has increased), we hypothesize that, following the
regulatory pressure and criticism period, the nationally recognized credit rating agencies improve the most criticized
aspect of their ratings, i.e., credit rating timeliness.7 Therefore, our rst hypothesis (stated in alternative form) is:

H1. In the post increased regulatory pressure and criticism period, the nationally recognized rating agencies improve the
timeliness of their credit ratings.

Whether the nationally recognized rating agencies will improve the timeliness of their credit ratings is an open question.
Given the oligopolistic structure of the industry and the large market share the nationally recognized agencies enjoy, it is
unclear whether concerns about reputation and/or regulation are strong enough to make these agencies act.8 In addition, if
the agencies are right and there are unavoidable trade-offs between different desirable properties of credit ratings, they
may not signicantly improve timeliness, because they do not want to sacrice rating accuracy.

3.2. How does rating accuracy change?

Conditional on nding an improvement in the timeliness of credit rating in the POST period, we examine how such an
improvement affects rating accuracy. According to the rating agencies, there are unavoidable trade-offs between desirable
rating properties. If this is the case, we expect a potential improvement in timeliness of rating downgrades to come at the
expense of rating accuracy.
Given that the cost of failing to downgrade a high risk debt issue, when in fact a downgrade is warranted, has increased
in the POST period (i.e., the cost of type I errors has increased), the agencies have great incentives to avoid cases where they
assign/maintain favorable ratings for defaulting issues. If the rating agencies have decided to tighten the rating standards to
avoid missed defaults, we expect to observe more timely downgrades for defaulting issues and thus, a lower frequency of
type I prediction errors.9 However, tighter standards can also lead to unnecessary harsh ratings for some lower risk issues
and thus, to an increase in the frequency of type II errors.

7
Based on the anecdotal evidence about their lobbing activities, we infer that the rating agencies are concerned about regulatory intervention in their
industry. Even if the agencies are not sure they can completely preempt regulatory intervention by improving rating timeliness, they may try to limit the
extent of regulatory intervention.
8
We thank the referee for pointing this possibility out.
9
Rating accuracy may also be sacriced, if rating agencies decide to shorten the period of time allocated to information collection and analysis before
any rating change decision. In an attempt to provide more timely warnings to market participants, credit ratings agencies may make their rating decisions
faster. Such a rating methodology may lead to reduced accuracy, if rating decisions are made based on less available information and/or less credible
information.
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 113

If the rating agencies are right, and timeliness cannot be improved without sacricing accuracy, we expect to observe a
trade-off between rating timeliness and accuracy, i.e., a move from point B0 to point A0 along the timeliness-accuracy
frontier in Fig. 2A. Therefore, our hypothesis (in alternative form) is:

H2(a). In the post increased regulatory pressure and criticism period, the nationally recognized rating agencies achieve an
improvement in rating timeliness by sacricing rating accuracy.

However, the existence of a trade-off between rating timeliness and accuracy in the POST period is not necessarily
unavoidable. If concerns over loss of reputation and potential additional regulatory intervention have motivated rating
agencies to intensify their efforts and improve their analysis, they would better discriminate between defaulting and non-
defaulting issues. Thus, the agencies can target more precisely those debt issues that are closer to default and a tightening
of rating standards for all issues would not be necessary. In this case, we expect an improvement in ratings timeliness for
higher risk debt issues without an increase in the frequency of false warning (i.e., the frequency of type II errors) for lower
risk issues. A move from point C0 to point D0 in Fig. 2B illustrates a case where the nationally recognized agencies are able to
expand the timeliness-accuracy frontier and improve timeliness without reducing accuracy.10 Note that an improvement in
rating timeliness without a decrease in rating accuracy implies that there used to be some slack in the timeliness-accuracy
frontier in the PRE period (i.e., the timeliness-accuracy frontier was not binding) and the agencies are able to step up and
eliminate some of this slack in the POST period. When their market power is threatened by regulatory intervention and/or
by reputation concerns, rating agencies could improve their credit analysis by increasing the scope and frequency of their
monitoring, intensifying their analysis of different credit risk factors, spending more resources on collecting data, hiring
more qualied personnel, and so forth.11,12 These improvements would lead to increased ability to detect changes in the
long-term credit risk of the debt issues, in a more timely and accurate manner. Therefore, our hypothesis (in alternative
form) is:

H2(b). In the post increased regulatory pressure and criticism period, the nationally recognized rating agencies are able to
achieve an improvement in rating timeliness without sacricing rating accuracy.

3.3. How does rating volatility change?

We also examine how an improvement in rating timeliness affects rating volatility. On the one hand, a potential shift
towards more timely rating changes can be achieved at the expense of increased volatility. This trade-off exists because, by
its nature, information about borrowers credit risk is revealed and can be processed only gradually over time. When rst
disclosed each new piece of information suggests a potential change in credit quality. However, the long-term implications
of such new information can be fully understood only over time, as more clarifying information becomes available. If the
agencies wait longer for additional conrmatory information to become available, then rating changes better reect long-
term trends in borrowers credit risk and the probability of future rating reversals decrease. In this case, however, rating
timeliness suffers. If in the POST period the agencies react faster to new information about potential changes in borrowers
credit quality (for example, make rating changes based on information revealed in daily stock prices), we expect rating
timeliness to increase, but rating volatility will also increase. Our hypothesis (in alternative form) is:

H3(a). In the post increased regulatory pressure and criticism period, the nationally recognized rating agencies achieve an
improvement in rating timeliness at the expense of increased ratting volatility.

On the other hand, better rating timeliness in the POST period does not necessary come at the expense of increased
volatility, if the nationally recognized agencies are able to expand the timeliness-volatility frontier through improved credit
analysis. Our hypothesis (in alternative form) is:

H3(b). In the post increased regulatory pressure and criticism period, the nationally recognized rating agencies are able to
achieve an improvement in rating timeliness without increasing rating volatility.

10
The two ways of improving timeliness are not mutually exclusive. The agencies could improve timeliness by expanding the timeliness-accuracy
frontier and at the same time putting more weight on timeliness at the expense of accuracy. In Fig. 2B, a move from point C0 to anywhere between points
X and Y0 on the expanded frontier indicates the coexistence of the two ways of improving timeliness. A move from point C0 to anywhere between points Y0
0

and Z0 indicates an improvement in timeliness without sacricing accuracy. Given that rating agencies acknowledge that both timeliness and accuracy are
important, we are ultimately interested in examining whether the agencies are able to push out the trade-off frontier further enough so that they can
achieve the desired improvement in timeliness without sacricing accuracy.
11
Besides improved credit analysis, another possible explanation for the expanded frontier is that better quality accounting information is available
to rating agencies in the POST period. See Section 5.4.1 for further discussion of this point.
12
The nationally recognized agencies have announced some internal reform measures meant to improve credit analysis. For example, S&P has
compiled a study of nancial statement disclosure issues for the S&P 500 companies, and it has also announced measures to strengthen its in-house
accounting expertise to begin addressing low quality disclosure concerns. Moodys has announced that it invested in technology to obtain and extract the
information content of current market data, and it substantially intensied its liquidity risk analysis.
114
The timeliness-accuracy frontier The expanded timeliness-accuracy frontier

M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130


Point Dis clearly
s

preferable to Calong both


X Y the timeliness and
accuracy dimensions (if
A shift from point Bto point D point Dfalls anywhere
A between Yand Zon the

ARTICLE IN PRESS
x A improved timeliness at
expanded timeliness-
the expense of accuracy accuracy frontier)
Z
Timelines

Timeliness
C
B
A shift from point Ato x
point B improved
accuracy at the expense of
timeliness

Pre Post

Accuracy Accuracy

Fig. 2. (A) Presents an example of a trade-off between timeliness and accuracy along an existing frontier. (B) Shows how both timeliness and accuracy can improve when the trade-off frontier is expanded
(i.e., a move from point C0 to anywhere between point Y0 and Z0 ).
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 115

4. Sample selection

Since we focus on several rating properties, our sample composition changes for different empirical tests designed
to assess changes in credit rating properties. Some of our empirical tests are based on a sample of defaulting debt issues
(e.g., the timeliness tests), while others (e.g., the types I and II error tests) use a comprehensive sample of defaulting and
non-defaulting rated bonds. Appendix A provides more detailed information about our sample composition for various
empirical tests. We use the Mergent Fixed Investment Securities Database (FISD) to obtain both a sample of defaulted bond
issues with default dates between 1997 and 2005 and a comprehensive sample of all rated debt issues. For these debt
issues, we collect from FISD information about credit rating changes issued between 1996 and 2005.13 We also collect from
FISD: (1) default information (e.g., default date and default type); (2) credit rating information (e.g., credit rating levels and
rating change dates); and (3) issue-specic information (e.g., offering amount, maturity, seniority level, etc.). We collect
information about issuer-specic characteristics (e.g., quarterly total assets, quarterly debt-to-equity ratio, quarterly
interest coverage ratio, etc.) from Compustat.
Our sample includes credit ratings issued by the three nationally recognized agencies covered by FISD: Moodys, S&P,
and Fitch. Two other rating agencies that are currently certied by the SEC as nationally recognized (Dominion Bond Rating
Service and A.M. Best Co.), were certied only during the later part of our sample period, and have very small market shares
compared to the three agencies we examine. Therefore, for the purpose of this study, we focus exclusively on Moodys, S&P,
and Fitch. Each debt issue in our default and comprehensive samples is rated by at least one of the three rating agencies,
but it can also be rated by more than one agency. In the latter case, we treat ratings issued by different agencies as separate
observations.
The requirement that borrowers have nancial information on Compustat and issue characteristics information on FISD
reduces the size of our sample.14 To test the sensitivity of our results to Compustat and FISD data restrictions, we perform
our univariate analysis using both larger samples of all the observations with available ratings and restricted samples
consisting of observations that have information available for all other control variables used in our multivariate analysis.
Fig. 1B presents the timeline for our default sample selection procedure. All the rating changes between January 1, 1996
and July 25, 2002 (between July 25, 2002 and December 31, 2005) are classied as being issued in the PRE (POST) period.
The defaults that take place between January 1, 1997 and July 25, 2002 (between July 25, 2003 and December 31, 2005) are
classied as PRE (POST) period defaults. Note that the PRE and POST periods for ratings and defaults are not perfectly
aligned in time. The default timeline starts a year later than the ratings timeline because our tests use rating changes issued
in the interval (360, 0), where day 0 is the default date. We exclude all the defaults occurring between July 25, 2002 and
July 25, 2003 to insure that our PRE and POST periods do not overlap. This condition insures that we eliminate
contaminated observations where defaults happen in the POST period, but the ratings leading up to the default are issued
in the PRE period.
For the comprehensive (default and non-default) sample, instead of using the default dates as a reference point in our
variable denition, we measure our variables over yearly rolling windows starting on July 25th of each year in our sample.
We choose July 25th as our yearly cut-off point because July 25, 2002 is the date that separates our PRE and POST periods.

5. Empirical analysis

5.1. Empirical tests and results for rating timeliness

5.1.1. Empirical tests for rating levels


To assess credit rating timeliness we compare rating levels at different time points (in the year before default) in the PRE
period and the POST period. As presented in Fig. 3, we nd that rating agencies assign higher (i.e., closer to default) ratings
at 270, 180, 90, and 30 days to defaulting issues in the POST period compared to the PRE period.15 Fig. 3 also shows
that, as defaulting issues get closer to the default date, rating agencies assign closer and closer to default (i.e., worse)
ratings. Untabulated t-tests show that the average rating levels at 270, 180, 90, and 30 days are signicantly higher
(i.e., closer to default) in the POST period than in the PRE period. Such a pattern is consistent with the conclusion that rating
timeliness has improved for the default sample. In a robustness test, we also plot rating levels 2 years back relative to the
default date (as opposed to just 1 year). The results are very similar to the results of our main analysis.

5.1.2. Empirical tests for rating downgrades


To test H1, we also examine changes in the timeliness of credit rating downgrades between the PRE and POST periods,
after controlling for other determinants of rating timeliness. Our rst measure of timeliness is DAHEAD, a variable dened

13
Our sample period for credit ratings starts in 1996 because FISD provides very limited coverage of credit ratings prior to this year. Out of the total
number of credit rating changes included in the FISD database, less then 1% are rating changes issued before 1996.
14
When we merge our sample of FISD ratings with Compustat, we lose a number of observations because FISD covers bonds issued by private rms
that are not covered by Compustat.
15
All our cutoff points (i.e. 270, 180, 90, etc.) are measured relative to day 0, the default date. To compute an average rating level at each cut-off
date, for each issue, we use the most recent (i.e., closer to the cut-off date) rating that falls within the window (360, cutoff point).
ARTICLE IN PRESS
116 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

Rating Levels before Default - Moody's, S&P and Fitch


20
Mean Rating Levels Pre
Mean Rating Levels Post
18

Rating Levels
16

14

12

10
-270 -180 -90 -30 0
Days till Default

Fig. 3. Presents mean rating levels at different time points in the 1-year period leading to the default date (where the default date is day 0) in PRE and
POST periods. The higher a rating level assigned to a debt issue, the riskier (i.e., closer to default) this issue is considered. Untabulated tests show that in
the 1-year period leading to default, rating agencies assign signicantly higher (i.e., worse) ratings in the POST period than in the PRE period. See
Appendix B for the rating coding schemes.

as the number of days between the downgrade date and the corresponding default date for each debt issue by each rating
agency. For example, if a defaulting debt issue gets downgraded by a certain rating agency 180 days before default, then
DAHEAD takes a value of 180. One issue can have more than one downgrade in the window of interest, in which case it
will have more the one DAHEAD value.16 Our second measure is WRATE, the weighted average rating level dened as the
sum of all rating levels outstanding over the 1-year period leading to default multiplied by the number of days each level
has been outstanding, scaled by 360. For example, for a defaulting bond that is rated 18 at day 360 and then it is
downgraded to 20 on day 100 (without any other rating change over the 1 year leading to default), rating level 18 is
outstanding for 260 days (from day 360 to day 100) and rating level 20 is outstanding for 100 days (from day 100 to
day 0). For this debt issue, our WRATE measure equals 18.56 (computed as follows: (18  260days+20  100days)/360days).
Unlike our rst timeliness measure (DAHEAD), WRATE takes into account not only the timing of rating downgrades, but
also the magnitude and the pattern of rating changes. Thus, WRATE captures at the same time the timeliness and accuracy
of such changes. To test for changes in rating timeliness, we estimate the following regression:
TIMELINESS a0 a1  POST a2  SP_RATING a3  FT_RATING a4  DTYPE
a5  QLOGASSET a6  QCOVER a7  QDE a8  FRAUD a9  SIZE
a10  ASSETB a11  CONV a12  SS a13  ENHANCE a14  PUT
a15  REDEEM a16  MATURITY a17  RATE a18  GDP a19  BOND30
a20  RECESSION a21  SPI a22  LQDEFAULT e, (1)
Dependent variable:

TIMELINESS either DAHEAD or WRATE;


DAHEAD the number of days between the downgrade date and the default date (with a minimum possible value of 360
and a maximum possible value of 0)17;
WRATE weighted average of credit rating level during the last year leading to default;

Main variable of interest:

POST indicator variable that takes a value of 1 if the rating change date falls after July 25, 2002, and 0 otherwise. We
predict that in the POST period, rating agencies will respond more timely to deteriorations in credit quality. We
expect a negative coefcient on POST in the DAHEAD regression and a positive coefcient in the WRATE
regression18;

16
In robustness tests, instead of using all the downgrades issued in the 1 year prior to default, we use: (1) a sample including only the downgrades
from investment to non-investment grade; (2) a sample of just the rst downgrade per issue in the 1 year leading to default; (3) average values of
DAHEAD based on all the downgrades per issue in the 1 year prior to default; (4) a sample of only one-notch downgrades and another sample of two-
notch downgrades; (5) three sub-samples where we keep only rating downgrades from: (a) above or equal to BBB to below or equal to BB+; (b) above or
equal to B to below or equal to CCC+; and (c) above or equal to CCC to below or equal to CC. Our results are robust to these alternative specications.
17
To address the concern that our results are driven by a truncated dependent variable bias, we also compute the DAHEAD and WRATE variables
using all available ratings, not just the ones issued in the 1-year period leading to default. Our ndings are robust to this alternative specication.
18
We measure DAHEAD as a negative value. Therefore, the more negative the DAHEAD value, the earlier the rating downgrade.
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 117

Rating type and default type:

SP_RATING indicator variable that takes a value of 1 if the rating agency is S&P, and 0 otherwise;
FT_RATING indicator variable that takes a value of 1 if the rating agency is Fitch, and 0 otherwise;
DTYPE indicator variable that takes a value of 1 if the default type is Bankruptcy, and 0 for other types of defaults (i.e.,
interest default and principal default)19;

Issuer characteristics:

QLOGASSET log of issuer quarterly total assets (Compustat Quarterly data 44) for the most recent quarter before a
downgrade. Rating downgrades may be less timely issued for larger rms because larger rms are expected to
have lower default risk and rating agencies may monitor lower risk rms less closely. In addition, larger rms
usually have a more complex operation structure. Thus, it may be more difcult for rating agencies to predict
changes in default risk well in advance;
QCOVER issuer quarterly interest coverage measured as income before extraordinary items scaled by interest expense
(Compustat Quarterly data8/data22) for the most recent quarter before a downgrade. We expect more timely
downgrades for bonds issued by borrowers with worse nancial health (i.e., lower QCOVER ratio);
QDE issuer quarterly debt to equity ratio (Compustat Quarterly data51/data59) for the most recent quarter before a
downgrade. Bonds with higher values of this ratio have higher default risk and may attract closer monitoring by
the rating agencies;
FRAUD indicator variable that take a value of 1 if the default rm has a nancial statement restatement during the
window (365, +365) relative to the default date, 0 otherwise.20 This variable is a proxy for the possibility that
the borrower hides information from the rating agencies which will making default prediction more difcult;

Issue characteristics:

SIZE log of issue size. Rating agencies may have greater incentives to provide early warnings for larger size debt
(White, 2001). At the same time, rating agencies might not monitor as closely large issues, if they are associated
with lower risk;
ASSETB indicator variable that takes a value of 1 if the issue is an asset-backed issue, 0 otherwise. Asset-backed issues
may be different from other types of issues with respect to risk characteristics;
CONV indicator variable that takes a value of 1 if the issue can be converted to the common stock (or other security) of
the issuer, 0 otherwise. Other things equal, the convertible feature is associated with lower risk for the
bondholders. We expect that rating agencies monitor less closely lower risk issues;
SS indicator variable that takes a value of 1 if the issue is senior secured debt, 0 otherwise. Senior secured issues are
less risky for the bondholders. We expect that rating agencies monitor less closely senior secured issues;
ENHANCE indicator variable that takes a value of 1 if the issue has the credit enhancement feature, 0 otherwise. The
enhancement feature is associated with lower risk for the bondholders. We expect that rating agencies monitor
less closely issues with credit enhancement features;
PUT indicator variable that takes a value of 1 if the bondholder has the option, but not the obligation, to sell the
security back to the issuer under certain circumstances, 0 otherwise. A put feature is usually associated with
lower risk for the bondholders, we expect that rating agencies monitor less closely issues with a put feature;
REDEEM indicator variable that takes a value of 1 if the issue is redeemable under certain circumstances, 0 otherwise. If
redeemable debt issues are associated with lower risk for the bondholders, we expect that rating agencies
monitor less closely redeemable issues;
MATURITY number of years to maturity. The longer the time to maturity, the higher the risk exposure of an issue and thus,
the more likely the agencies are to closely monitor the issue;
RATE the outstanding rating level 1 year before default date. Debt issues with a closer to default rating will have less
room for downgrades (this variable is only used in the DAHEAD regression);

19
There are four types of default in the FISD database: bankruptcy, interest default, principal default and debt covenant default. We do not have any
observations in the debt covenant default category for the sample used to estimate Eq. (1). Interest defaults and principal defaults add up to only 7% of the
sample.
20
Following Fich and Shivdasani, (2007), we also use a lawsuit variable to proxy for fraud. We collect a sample of class action lawsuit rms from the
Stanford University and Cornerstone Research litigation database. This database includes shareholder class action lawsuits alleging violation of rule 10(b)-
5 of the SEC Act of 1934. This rule proscribes, among other things, the intent to deceive, manipulate, or defraud with misstatements of material fact made
in connection to nancial condition, solvency and protability. If the initial default rm is in the lawsuit database and the litigation ling date occurs
during the window (365, +365) relative to the default date, we code the fraud variable equal to one. Otherwise, we code the fraud variable equal to zero.
All of our results are robust to this alternative denition of fraud.
ARTICLE IN PRESS
118 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

Macroeconomic variables:

GDP annual gross domestic product. GDP is intended to control for overall economic conditions. Historically,
when economic conditions are less favorable, the number of defaulted issues increases. The agencies may be
more sensitive to any changes in credit quality in periods of heightened credit stress (Fons et al., 2002). However,
when the economy expands, it is more difcult for poorly performing rms to hide their deteriorating credit
quality;
BOND30 CRSP 30-year bond annual return. BOND30 is intended to capture overall bond market conditions;
RECESSION indicator variable that takes the value of 1 if the rating date falls between March 2001 and October 2001 (i.e., a
recession period according to the National Bureau of Economic Research), 0 otherwise;
SPI the annual measure of Standard and Poors 500 index. We control for SPI for two reasons: (1) this index captures
the overall stock market performance, which is another measure of overall economic conditions (2) there may be
a substitution effect between the stock market performance and the bond market performance (Chen et al., 1986);
and
LQDEFAULT the number of defaults in the quarter before a rating change. This variable is a proxy for the ability of rating
agencies to learn and better calibrate their default prediction models after periods of observed increased default
activity.

Following prior literature, we assign numerical values to different letter symbol ratings (see Appendix B for a detailed
description of our rating coding). Although Fitch, Moodys, and S&P have different rating symbols, prior studies
(Holthausen and Leftwich, 1986; Hand et al., 1992; Jorion et al., 2005) code them in a comparable system. Ratings with
codes smaller or equal to 10 are considered investment grade ratings. Ratings with codes larger or equal to 11 are
considered non-investment (speculative) grade ratings.
Panels A and B of Table 1 provide descriptive statistics including PRE and POST period comparisons and a correlation
table for the variables used in our DAHEAD regression analysis.21 Since the correlation matrix shows that some control
variables are highly correlated, we perform collinearity diagnostic tests on all of our regressions using variance ination
factors and condition indices (Belsley et al., 1980). All variance ination factors are less than 3 and all condition indices are
less than 30, suggesting collinearity is not a serious problem (Kennedy, 1992).
Table 2 presents PRE and POST univariate comparisons of our DAHEAD measure for a sample of all rating downgrades
(Panel A) and for a specic type of downgrades from investment grade to non-investment grade ratings (Panel B). By
requiring additional issuer and issue control variables for the multivariate analysis, we lose many observations. Therefore,
we present results for a larger sample consisting of all the observations available for univariate analysis and a restricted
sample with additional data availability requirements for all other control variables in Eq. (1). Both in Panel A and B, the
mean and the median of DAHEAD are signicantly more negative in the POST period than in the PRE period in the larger
sample. A similar pattern exists for the restricted sample.
Panel C compares the WRATE timelines measure between the PRE and the POST periods. Consistent with our prediction,
we nd that WRATE is signicantly larger in the POST period than in the PRE period, suggesting that in the POST period,
rating agencies assign closer-to-default ratings for the defaulting issues earlier. Our univariate tests provide preliminary
support for H1. However, since the univariate tests do not control for all other possible determinants of downgrade
timeliness, we focus our discussion mainly on the multivariate regression analysis.
Table 3 (Model 1) presents our empirical results for the DAHEAD measure in Eq. (1).22 Following our main specication
in Model 1, we nd that the coefcient on the POST variable is signicantly negative, suggesting that in the POST period,
rating agencies downgrade default issues earlier than the PRE period.23 Most of our control variables are either consistent
with our predictions or have insignicant coefcients. Model 2 presents the empirical results for the WRATE measure. We
nd that the coefcient for the WRATE variable is signicantly positive suggesting that in the POST period, rating agencies
assign worse ratings to defaulting issues in a timelier manner. These empirical ndings provide support for our improved
timeliness hypothesis.

21
Since we have different samples for different rating property measures, the descriptive statistics are somewhat different for each one of our
samples. We choose to present the sample used in our rst multivariate test (i.e., the DAHEAD test in Table 3). However, we note that the descriptive
statistics are similar across different samples.
22
In our robustness tests, we eliminate, in turn, the debt issue characteristics and the outstanding rating level at day 360 relative to the default date
(RATE). We also we replace the SOX date with the Enron bankruptcy date as the cutoff point between our PRE and POST periods and rerun our tests. We
further examine the sensitivity of our results to the elimination of multiple downgrades for the same issuer at the same time. All the results are similar to
the results of our main analysis.
23
The magnitude of the coefcient on the POST variable (i.e., a1 134.42) is not surprising if we put it in the context of a rating agencies survey
prepared by the Association for Financial Professionals (2002). Slightly more than half the surveyed corporate practitioners (from companies that
experienced a downgrade) report that it took the rating agencies between one and 6 months to incorporate deteriorations in their companys nancials in
the rating changes. Twenty-seven percent of respondents report that a downgrade took place more than 6 months after the deterioration in the companys
nancials.
Table 1
Descriptive statistics..

Panel A: Distributional characteristics of variables


Full sample (N 2,537) Pre sample (N 2,056) Post sample (N 481) P-value for the difference

Variable Mean Median Mean Median Mean Median Mean Median

DAHEAD 107.35 73.00 98.83 73.00 143.75 143.00 0.00 0.00


POST 0.19 0.00 0.00 0.00 1.00 1.00 0.00 0.00
SP_RATING 0.49 0.00 0.49 0.00 0.47 0.00 0.47 0.47
FT_RATING 0.14 0.00 0.14 0.00 0.14 0.00 0.96 0.96
DTYPE 0.94 1.00 0.94 1.00 0.97 1.00 0.00 0.00
QLOGASSET 8.19 8.06 8.17 8.06 8.26 7.95 0.26 0.10
QCOVER 3.67 1.66 3.94 1.83 2.50 1.33 0.00 0.03

M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130


QDE 0.64 0.77 0.53 0.77 1.11 0.33 0.05 0.08
SIZE 12.18 12.21 12.29 12.32 11.72 11.92 0.00 0.00
ASSETB 0.05 0.00 0.00 0.00 0.26 0.00 0.00 0.00
CONV 0.06 0.00 0.06 0.00 0.05 0.00 0.21 0.25
SS 0.07 0.00 0.06 0.00 0.12 0.00 0.00 0.00
ENHANCE 0.17 0.00 0.16 0.00 0.21 0.00 0.01 0.01
PUT 0.04 0.00 0.03 0.00 0.05 0.00 0.11 0.11
REDEEM 0.75 1.00 0.72 1.00 0.85 1.00 0.00 0.00
MATURITY 11.38 9.77 10.93 9.73 13.30 10.02 0.00 0.00

ARTICLE IN PRESS
GDP 9,970.17 9,890.65 9,810.00 9,890.65 10,655.00 10,703.50 0.00 0.00
BOND30 0.08 0.09 0.08 0.03 0.06 0.09 0.00 0.00
RECESSION 0.16 0.00 0.20 0.00 0.00 0.00 0.00 0.00
SPI 1,138.58 1,148.08 1,130.30 1,148.08 1,173.90 1,211.92 0.00 0.00
RATE 12.96 14.00 12.63 14.00 14.37 15.00 0.00 0.00
FRAUD 0.26 0.00 0.25 0.00 0.31 0.00 0.01 0.01
LQDEFAULT 64.41 51.00 29.61 78.00 42.94 17.00 0.00 0.00

Panel B: Correlation matrix


Variable 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

1 DAHEAD 0.17 0.03 0.06 0.04 0.31 0.05 0.02 0.08 0.16 0.01 0.03 0.13 0.06 0.16 0.08 0.12 0.11 0.03 0.30 0.20 0.25 0.29
2 POST 0.12 0.01 0.00 0.06 0.02 0.10 0.04 0.22 0.47 0.02 0.09 0.05 0.03 0.12 0.13 0.73 0.10 0.21 0.10 0.17 0.06 0.39
3 SP_RATING 0.04 0.01 0.40 0.02 0.14 0.03 0.09 0.05 0.05 0.04 0.00 0.10 0.02 0.08 0.06 0.02 0.03 0.00 0.05 0.13 0.04 0.07
4 FT_RATING 0.06 0.00 0.40 0.01 0.22 0.07 0.07 0.14 0.04 0.08 0.04 0.14 0.04 0.08 0.09 0.05 0.07 0.08 0.15 0.32 0.11 0.13
5 DTYPE 0.02 0.06 0.02 0.01 0.19 0.00 0.02 0.09 0.06 0.05 0.04 0.09 0.03 0.02 0.12 0.04 0.01 0.08 0.01 0.07 0.11 0.01
6 QLOGASSET 0.33 0.03 0.14 0.22 0.19 0.02 0.13 0.22 0.10 0.05 0.05 0.39 0.12 0.37 0.29 0.34 0.16 0.12 0.31 0.60 0.56 0.34
7 QCOVER 0.04 0.04 0.05 0.09 0.02 0.04 0.03 0.09 0.07 0.07 0.13 0.06 0.08 0.14 0.04 0.05 0.15 0.09 0.12 0.03 0.15 0.13
8 QDE 0.05 0.04 0.08 0.08 0.03 0.22 0.21 0.03 0.06 0.01 0.06 0.04 0.03 0.10 0.04 0.02 0.00 0.05 0.03 0.18 0.09 0.01
9 SIZE 0.09 0.17 0.06 0.16 0.16 0.38 0.03 0.10 0.42 0.03 0.02 0.07 0.15 0.05 0.03 0.02 0.15 0.04 0.29 0.19 0.30 0.33
10 ASSETB 0.09 0.47 0.05 0.04 0.06 0.08 0.02 0.04 0.29 0.06 0.06 0.10 0.05 0.10 0.18 0.35 0.06 0.10 0.06 0.09 0.14 0.20
11 CONV 0.01 0.02 0.04 0.08 0.05 0.04 0.11 0.02 0.00 0.06 0.07 0.07 0.27 0.12 0.03 0.05 0.01 0.10 0.07 0.19 0.02 0.06
12 SS 0.01 0.09 0.00 0.04 0.04 0.03 0.06 0.09 0.02 0.06 0.07 0.04 0.06 0.00 0.15 0.03 0.04 0.02 0.06 0.11 0.10 0.14
13 ENHANCE 0.12 0.05 0.10 0.14 0.09 0.39 0.05 0.02 0.14 0.10 0.07 0.04 0.03 0.24 0.16 0.01 0.04 0.05 0.22 0.27 0.21 0.17
14 PUT 0.05 0.03 0.02 0.04 0.03 0.13 0.10 0.03 0.15 0.05 0.27 0.06 0.03 0.02 0.22 0.08 0.05 0.03 0.09 0.05 0.06 0.08
15 REDEEM 0.15 0.12 0.08 0.08 0.02 0.37 0.21 0.18 0.02 0.10 0.12 0.00 0.24 0.02 0.12 0.02 0.11 0.08 0.15 0.28 0.19 0.17
16 MATURITY 0.04 0.11 0.05 0.03 0.14 0.24 0.02 0.08 0.01 0.28 0.07 0.03 0.18 0.18 0.09 0.15 0.01 0.11 0.07 0.28 0.01 0.03
17 GDP 0.18 0.69 0.04 0.10 0.05 0.37 0.09 0.02 0.17 0.33 0.08 0.01 0.10 0.10 0.04 0.14 0.12 0.08 0.15 0.00 0.24 0.01
18 BOND30 0.06 0.20 0.02 0.04 0.03 0.10 0.20 0.04 0.19 0.11 0.02 0.06 0.00 0.03 0.10 0.01 0.00 0.20 0.51 0.14 0.09 0.27
19 RECESSION 0.02 0.21 0.00 0.08 0.08 0.12 0.12 0.04 0.01 0.10 0.10 0.02 0.05 0.03 0.08 0.14 0.13 0.27 0.02 0.18 0.16 0.14
20 SPI 0.22 0.06 0.05 0.16 0.01 0.27 0.13 0.04 0.28 0.04 0.08 0.05 0.23 0.08 0.13 0.06 0.52 0.11 0.02 0.33 0.23 0.71
21RATE 0.14 0.21 0.13 0.33 0.05 0.56 0.13 0.25 0.28 0.10 0.23 0.09 0.25 0.05 0.28 0.18 0.07 0.09 0.18 0.34 0.35 0.37
22 FRAUD 0.25 0.06 0.04 0.11 0.11 0.55 0.08 0.23 0.31 0.14 0.02 0.10 0.21 0.06 0.19 0.06 0.27 0.05 0.16 0.20 0.31 0.28
23 LQDEFAULT 0.22 0.39 0.08 0.14 0.01 0.32 0.12 0.04 0.33 0.20 0.07 0.12 0.17 0.07 0.18 0.02 0.24 0.16 0.13 0.72 0.39 0.25

This table presents summary statistics for a sample of 2,537 issue downgrades over the period 19962005. Panel A presents the descriptive statistics, including the PRE and POST period comparisons, for the
rating timeliness measure (DAHEAD), and all other control variables. P-values for the mean tests are based on two-tailed t-tests. P-values for the medians are based on two-tailed Wilcoxon tests. Panel B presents

119
correlations for the regression variables, where the upper triangle presents the Pearson correlations, and the lower triangle presents the Spearman correlations. In this table, boldface text indicates signicance
at the 0.01 level or lower (two-sided). See Appendix C for variable denitions.
ARTICLE IN PRESS
120 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

Table 2
Timeliness univariate comparisons..

Mean Median

Pre sample Post sample P-value Pre sample Post sample P-value

Panel A: Univariate analysis for all downgrades (DAHEAD)


Larger sample 105.12 (6,247) 136.04 (794) 0.00 74.00 (6,247) 129.50 (6,247) 0.00
Restricted sample 98.83 (2,056) 143.75 (481) 0.00 73.00 (2,056) 143.00 (481) 0.00

Panel B: Univariate analysis for downgrades from investment grade to non-investment grade ratings (DAHEAD)
Larger sample 97.66 (621) 200.92 (118) 0.00 89.00 (621) 241.00 (118) 0.00
Restricted sample 63.44 (207) 236.94 (69) 0.00 73.00 (207) 294.00 (69) 0.00

Panel C: Univariate analysis for rating levels (WRATE)


Larger sample 14.83 (2,124) 16.49 (433) 0.00 16.02 (2,124) 17.26 (433) 0.00
Restricted sample 15.30 (799) 17.42 (222) 0.00 16.34 (799) 18.01 (222) 0.00

This table presents credit rating timeliness univariate comparisons between the PRE and POST periods. The rst timeliness measure (DAHEAD) is
measured as the number of days between the downgrade date and the default date. The second timeliness measure (WRATE) is measured as the weighted
average rating level dened as the sum of all rating levels outstanding over the 1-year period leading to default multiplied by the number of days each
level has been outstanding, scaled by 360. Panel A presents comparisons for all downgrades for DAHEAD. Panel B includes only downgrades from
investment grade to non-investment grade ratings for DAHEAD. Panel C presents comparisons for all rating levels for WRATE. Within each panel, we
present results for a larger sample consisting of all the observations available for univariate analysis and a restricted sample with additional data
availability requirements for all other control variables used in our multivariate analysis. Numbers in parentheses represent number of observations. P-
values for mean comparisons are based on two-tailed t-tests. P-values for median comparisons are based on two-tailed Wilcoxon tests.

Table 3
Timeliness multivariate comparisons..

Model 1 (DAHEAD) Model 2 (WRATE)

Predicted sign Coefcient P-value Predicted sign Coefcient P-value

Intercept / 654.71 (0.00) / 13.54 (0.00)


POST  134.42 (0.00) + 2.50 (0.00)
SP_RATING +/ 8.42 (0.03) +/ 0.07 (0.66)
FT_RATING +/ 5.25 (0.37) +/ 1.43 (0.00)
DTYPE +/ 16.67 (0.04) +/ 0.74 (0.01)
QLOGASSET + 8.09 (0.00)  0.96 (0.00)
QCOVER + 0.68 (0.04)  0.50 (0.00)
QDE  0.51 (0.09) + 0.01 (0.52)
FRAUD + 32.55 (0.00)  1.07 (0.00)
SIZE +/ 18.37 (0.00) +/ 0.49 (0.00)
ASSETB +/ 75.17 (0.00) +/ 2.77 (0.00)
CONV + 14.19 (0.09)  0.81 (0.01)
SS + 26.60 (0.00)  1.41 (0.00)
ENHANCE + 2.50 (0.65)  0.22 (0.32)
PUT + 11.70 (0.26)  0.94 (0.06)
REDEEM + 3.53 (0.48)  0.05 (0.84)
MATURITY  0.60 (0.04) + 0.07 (0.00)
RATE + 1.59 (0.02) / / /
GDP +/ 0.10 (0.00) +/ 0.00 (0.11)
BOND30 +/ 149.95 (0.00) +/ 0.23 (0.79)
RECESSION +/ 6.10 (0.26) +/ 0.30 (0.18)
SPI +/ 0.21 (0.00) +/ 0.00 (0.90)
LQDEFAULT  0.36 (0.00) + 0.03 (0.00)
N 2,537 1,021
ADJRSQ 26.36% 60.11%

This table presents credit rating timeliness multivariate regression comparisons in the PRE and POST periods for a sample of defaulting debt issues. Model
1 measures rating timeliness as the number of days between the downgrade date and the default date (DAHEAD). Model 2 measures rating timeliness as
the weighted average credit rating level during the last year leading to default (WRATE). See Appendix C for variable denitions. P-values are based on
two-tail tests.
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 121

5.2. Empirical tests and results for rating accuracy

5.2.1. Absolute accuracy: types I and II errors


To test how rating accuracy changes over time, we use several measures of accuracy. First, we compare the frequency of
types I and II errors in the PRE and POST periods. We dene type I error as missed defaults (i.e., instances where the rating
agencies assign/maintain favorable ratings to issues that will default within the next year) and type II errors as false
warnings (i.e., instances where the rating agencies assign/maintain unfavorable ratings to issues that will not default
within the next year).
Note that our timeliness measures discussed in the previous section (i.e., the DAHEAD and WRATE variables) and type I
error capture a similar concept: the possibility that rating agencies maintain favorable ratings for defaulting issues in the 1
year leading to default. If the rating agencies start downgrading defaulting debt issues in a more timely manner,
empirically, we would observe both improved values for the timeliness measures, and lower frequency of missed defaults
(i.e., fewer type I errors).
Because our rating timeliness analysis shows that rating agencies issue more timely rating changes in the POST period,
we expect the frequency of type I error to decrease. It is an open question whether the decrease in type I errors comes at the
expense of increasing type II errors (i.e., assigning too harsh ratings to non-defaulting issues). We examine these two types
of errors jointly to compare the absolute rating accuracy in the PRE and the POST periods.
To investigate the changes in the frequency of type I and II errors between the PRE and POST periods, we use the
following multivariate model:
ERROR_MEASURE b0 b1  POST b2  SP_RATING b3  FT_RATING b4  QLOGASSET
b5  QDE b6  LARGE_LOSS b7  NEG_RET b8  FRAUD b9  SIZE
b10  ASSETB b11  CONV b12  SS b13  ENHANCE b14  PUT
b15  REDEEM b16  MATURITY b17  GDP b18  BOND30
b19  RECESSION b20  SPI b21  LQDEFAULT e, (2)
Dependent variables:

ERROR_MEASURE either TYPE_I_D or TYPE_II_D error;


TYPE_I_D indicator variable that takes the value of 1 for missed defaults, and 0 otherwise. Specically, for a sample of
issues that have an event of default within 1 year from the rating date, this variable takes the value of 1 (0), if a
debt issue has a rating better (worse) than the cut-off point (see the paragraph below the variable denition for
an explanation of the cut-off point);
TYPE_II_D indicator variable that takes the value of 1 for false warnings, and 0 otherwise. Specically, for a sample of
issues that do not have an event of default within 1 year from the rating date, this variable takes the value of 1 (0),
if a debt issue has a rating worse (better) than the cut-off point; and

Issue characteristics:

LARGE_LOSS indicator variable that takes a value of 1 if a rm experiences an annual loss equal or greater than 25% of total
assets, 0 otherwise, and
NEG_RETindicator variable that takes a value of 1 if a rm reports negative retained earnings and 0 otherwise.

All other variables are dened as in Eq. (1).

Empirically, measuring types I and II errors in a credit rating setting requires the choice of a cut-off point and the
assumption that all issues with ratings greater (i.e., worse) than the cut-off point are classied by the rating agencies as
defaulting. The cut-off point is necessary because we need to transform credit ratings (i.e., a variable with levels from 1 to
22) into a binary variable (i.e., 1 for predicting default, 0 for predicting non-default).24 To test the sensitivity of our ndings
to the choice of a cut-off point, we use three alternative points. In turns, issues rated 20 and higher, 17 and higher, 11 and
higher, are assumed to be classied by the rating agencies as defaulting issues. Our types I and II error regressions include
control variables for issue characteristics (SIZE, ASSETB, CONV, SS, ENHANCE, PUT, REDEEM and MATURITY), for rm-
specic nancial characteristics (QDE, QLOGASSET, LARGE_LOSS, NEG_RET and FRAUD) and macroeconomic conditions
(GDP, BOND30, RECESSION, SPI and LQDEFAULT).

24
Note that credit rating agencies do not directly predict default events. The agencies issue default ratings (i.e., the 22 (D) level ratings) only after they
observe actual defaults. The D level ratings are not predictions of future defaults. They are assigned ex-post, while levels 1 to 21 (AAA to C) ratings are
assigned ex-ante. Moreover, only S&P and Fitch have a rating scale that includes D level ratings, while Moodys does not include D ratings in its rating
scheme.
ARTICLE IN PRESS
122 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

Table 4
Frequency of types I and II errors..

Cut-off point Pre sample Post sample P-value

11 and higher
Type I error 0.231 (2,310) 0.005 (186) 0.00
Type II error 0.108 (403,298) 0.092 (157,061) 0.00

17 and higher
Type I error 0.581 (2,310) 0.365 (186) 0.00
Type II error 0.014 (403,298) 0.017 (157,061) 0.00

20 and higher
Type I error 0.809 (2,310) 0.650 (186) 0.00
Type II error 0.006 (403,298) 0.005 (157,061) 0.00

This table presents the frequency of types I and II errors in the PRE and POST periods. Type I error is dened as one minus the number of defaulted debt
issues that have an outstanding rating greater than or equal to the cut-off level 1 year prior to default scaled by total number of defaults. Type I error
represents the proportion of defaulting debt issues for which the rating agencies fail to issues a timely warning 1 year prior to default, relative to all
defaulting issues. Type II error is dened as the number of debt issues with an outstanding rating higher than the cut-off level that do not default in the
next year scaled by total number of non-defaulting debt issues. Type II error represents the proportion of non-defaulting debt issues for which the rating
agencies issue too harsh ratings, relative to all non-defaulting issues. Numbers in parentheses represent number of observations. We use three alternative
cut-off points: 11 and higher (i.e., issues rated from BB+ to D), 17 and higher (i.e., issues rated from CCC+ to D), 20 and higher (i.e., issues rated from CC to
D). Issues with ratings greater than the cut-off point are assumed to be classied by the rating agencies as defaulting.

The univariate analysis in Table 4 indicates that for all three cut-off points, the frequency of type I errors is signicantly
smaller in the POST period than in the PRE period (P-valueo0.00). With respect to the frequency of type II errors, the
univariate evidence is mixed. The frequency of type II errors decreases when we choose the 11 and higher, and the 20 and
higher cut-off points, but increases when we use the 17 and higher cut-off point.25
However, the multivariate types I and II error analysis presents a much more consistent picture. When we dene the
errors based on the 17 and higher cut-off point (Table 5), both the frequency of types I and II errors decrease signicantly in
the POST regulatory pressure and criticism period. Specically, the coefcient on the POST variable is signicantly negative
(P-valuep0.01) both for the type I and type II regressions. Untabulated results based on the 20 and higher cut-off point are
very similar to the ndings presented in Table 5. Running the same analysis using the 11 and higher cut-off point is not
feasible because, based on this denition, there are no type I errors in the POST period (i.e., there are no defaulting debt
issues with investment grade ratings 1 year prior to default). The absence of investment grade ratings for defaulting issues
is in and of itself evidence that the rating agencies take measures to avoid type I errors in the POST period. Considered
together, our type I/type II error ndings indicate the overall absolute rating accuracy improves following the increased
regulatory pressure and investor criticism. These ndings provide support for the improved credit analysis hypothesis
(H2(b)).

5.2.2. Relative accuracy: cumulative rating accuracy proles


In their rating methodologies, rating agencies explicitly indicate that credit ratings are designed to measure the relative
credit risk. In other words, when rating agencies assign an unfavorable rating to an issue (for example a C rating) it does not
mean the agencies predict that the issue will default. It just means that the issue is perceived as relatively riskier than other
debts which are assigned better ratings (for example A or B ratings).
Therefore, to test how rating accuracy changes in the POST period, we also use a relative accuracy measure. Following
credit rating agencies methodology, we use cumulative accuracy proles to measure rating accuracy for an overall sample
of all issues (default and non-default) rated by Moodys, S&P and Fitch. According to Moodys, the cumulative accuracy
proles (or power curves) are the key metric used to measure the relative accuracy of credit ratings. Because credit
ratings are designed to measure only relative (i.e., ordinal) and not absolute (i.e., cardinal) credit risk (Standard & Poors,
2002), ratings are considered more accurate if debt issues that have closer-to-default ratings today prove to be on average
more risky (i.e., more likely to default in the future) than issues that have more favorable ratings.

25
The specication of type I/II error analysis presented in Table 4 uses a default sample to measure type I errors and a non-default sample to measure
type II errors. In a sensitivity test, we also employ an alternative specication that denes both types I and II errors using default and non-default issues.
Under this alternative specication, we measure type I errors as one minus the number of issues assigned favorable ratings (i.e., ratings lower than the
cut-off point) that do not default in the next year, scaled by all issues (default and non-default) that are assigned lower than the cut-off ratings. We
measure type II errors as the number of issues assigned harsh ratings (i.e., ratings higher than the cut-off point) that will not default in the next year,
scaled by all issues (default and non-default) that are assigned higher than the cut-off ratings. Our ndings are qualitatively similar when we use these
alternative denitions of types I and II errors.
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 123

Table 5
Types I and II errors multivariate comparisons..

TYPE_ I_D TYPE_ II_D

Predicted sign Coefcient P-value Predicted sign Coefcient P-value

Intercept / 5.39 (0.00) / 14.44 (0.00)


POST  0.99 (0.01) +/ 0.37 (0.00)
SP_RATING +/ 0.51 (0.00) +/ 0.11 (0.00)
FT_RATING +/ 0.52 (0.06) +/ 0.78 (0.00)
QLOGASSET + 0.20 (0.00)  0.37 (0.00)
QDE  0.01 (0.00) + 0.00 (0.17)
LARGE_LOSS  1.44 (0.00) + 0.98 (0.03)
NEG_RET  0.10 (0.55) + 1.95 (0.00)
FRAUD + 0.65 (0.01) + 2.44 (0.00)
SIZE +/ 0.11 (0.31) +/ 0.03 (0.12)
ASSETB +/ 1.22 (0.01) +/ 1.28 (0.00)
CONV + 0.60 (0.03)  0.81 (0.00)
SS + 0.29 (0.37)  0.31 (0.00)
ENHANCE + 0.17 (0.36)  0.07 (0.18)
PUT + 0.23 (0.65)  0.51 (0.00)
REDEEM + 0.23 (0.35)  0.66 (0.00)
MATURITY  0.02 (0.11) + 0.02 (0.00)
GDP +/ 0.00 (0.56) +/ 0.00 (0.00)
BOND30 +/ 3.69 (0.00) +/ 1.28 (0.00)
RECESSION +/ 0.28 (0.17) +/ 0.14 (0.03)
SPI +/ 0.00 (0.00) +/ 0.00 (0.01)
LQDEFAULT  0.00 (0.05) +/ 0.00 (0.00)
N 869 70,797
ADJRSQ 21.85% 35.10%

This table presents the types I and II error multivariate regression comparisons in the PRE and POST periods. TYPE_I_D is an indicator variable that takes
the value of 1 for missed defaults, and 0 otherwise. Specically, for a sample of issues that have an event of default within 1 year from the rating date, this
variable takes the value of 1 (0), if a debt issue has a rating better (worse) than the cut-off point. TYPE_II_D is an indicator variable that takes the value of 1
for false warnings, and 0 otherwise. Specically, for a sample of issues that do not have an event of default within 1 year from the rating date, this
indicator takes the value of 1 (0), if a debt issue has a rating worse (better) than the cut-off point. We use rating level 17 and higher as a cut-off point in our
types I and II error denitions. Issues with ratings greater than the 17 cut-off point (i.e., issues rated from CCC+ to D) are assumed to be classied by the
rating agencies as defaulting. See Appendix C for detailed variable denitions. The numbers in parentheses are P-values based on two-tailed tests.

Based on this denition of relative accuracy, the cumulative accuracy prole is constructed by plotting, for each rating
category, the proportion of rms with the same or lower ratings that will default in the next year against the proportion of
all rms with the same or lower ratings. Each point along a cumulative accuracy curve indicates the power of a specic
rating category as a tool to detect defaulting issues from the overall population of rated issues. Fig. 4 depicts the cumulative
accuracy curves for the credit ratings issued in the PRE and the POST periods. For example, the fourth point from the left
indicates that rating level 19 and lower (i.e., issuers rated from CCC- to AAA) represents about 0.99% (0.91%) of the all the
rated debt issues and 27.32% (49.44%) of the defaulting debt issues in the PRE (POST) period. The closer a curve approaches
the upper left corner, the greater the proportion of all defaulting issues that can be accounted for by the lowest rating
categories. The closer a curve is to the 45% line (which is the cumulative accuracy curve associated with randomly assigned
ratings), the lower the accuracy of credit ratings.
We nd that the cumulative accuracy prole graph bows further toward the upper left corner in the POST period than in
the PRE period. This pattern indicates that, in the POST period, a greater proportion of all defaulting issues can be accounted
for by the closer-to-default rating categories. The pattern is consistent with a rating system that is able to better
discriminate (1 year before default) between defaulting and non-defaulting debt issues, and thus indicates greater relative
rating accuracy in the period. Further analysis shows that the area under the PRE (POST) curve is 0.944 (0.981) and the
standard error for the PRE (POST) curve is 0.002 (0.002). The area under the POST curve is signicantly larger than the area
under the PRE curve with a Z(p) statistic of 13.08 (0.00) (see Bamber (1975), Hanley and McNeil (1983) and Liu et al. (2005)
for the detailed discussion of the statistical test for area differences).

5.2.3. Accuracy: benchmark model tests


In addition to the type I, type II and relative accuracy tests, we also use benchmark model tests to assess changes in
rating accuracy. Our tests benchmark rating changes issued by credit rating agencies against changes in default risk
suggested by bankruptcy prediction models. We chose two bankruptcy prediction models widely used in the academic
literature: (1) a market-based model developed by Hillegeist et al. (2004); and (2) an Altman Z score model. If rating
agencies become more accurate in the POST period, we expect them to be more responsive to the changes in the default
ARTICLE IN PRESS
124 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

Rating Accuracy - Moody's, S&P and Fitch


1
Pre Post

Percentage of
0.8

defaults
0.6
0.4
0.2
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Percentage of issues

Statistical tests of the area difference


Area Standard Error
Pre 0.944 0.002
Post 0.981 0.002
Z Statistics (P-value) 13.08 (0.00)

Data compositions of the cumulative accuracy profile


Percentage of Percentage of Percentage of Percentage of
Rating Level
Issues: Pre Default: Pre Issues: Post Default: Post
22 0.0048 0.0885 0.0035 0.2135
22 to 21 0.0058 0.1258 0.0044 0.3034
22 to 20 0.0082 0.2220 0.0066 0.4270
22 to19 0.0099 0.2732 0.0091 0.4944
22 to18 0.0127 0.3408 0.0140 0.6292
22 to17 0.0180 0.4536 0.0209 0.6854
22 to16 0.0334 0.6080 0.0396 0.8090
22 to15 0.0486 0.6687 0.0535 0.8989
22 to14 0.0628 0.7173 0.0676 0.9438
22 to13 0.0787 0.7346 0.0769 1.0000
22 to12 0.0915 0.7493 0.0872 1.0000
22 to11 0.1079 0.8092 0.0973 1.0000
22 to10 0.1337 0.8352 0.1235 1.0000
22 to 9 0.1749 0.9141 0.1784 1.0000
22 to 8 0.2300 0.9809 0.2239 1.0000
22 to 7 0.2851 0.9861 0.2693 1.0000
22 to 6 0.4076 0.9948 0.3265 1.0000
22 to 5 0.4932 0.9965 0.4134 1.0000
22 to 4 0.5609 0.9974 0.4530 1.0000
22 to 3 0.5848 0.9974 0.4755 1.0000
22 to 2 0.5941 0.9974 0.4875 1.0000
22 to 1 1.0000 1.0000 1.0000 1.0000

Fig. 4. Presents cumulative accuracy proles for the PRE and POST periods constructed by plotting, (for each rating category), the proportion of debt
issues with the same or lower ratings that will default in the next year (Y-axis) against the proportion of all issues with the same or lower ratings (X-axis).
On the horizontal axis, each percentage point corresponds to certain rating category, from D ratings (to the left) to AAA ratings (to the right). For example,
the fourth point (row) from the left (top) in the gure (data composition table) indicates that rating level 19 and higher (i.e., issues rated from CCC- to D)
represent about 0.99% (0.91%) of the all the rated debt issues and 27.32% (49.44%) of the defaulting debt issues in the PRE (POST) period. Each point along a
cumulative accuracy curve indicates the power of a specic rating category as a tool to detect defaulting issues from the overall population of all rated
issues. The closer a curve approaches the upper left corner, the greater the fraction of all defaulting issues that can be accounted for by the lowest rating
categories and thus, the greater the relative rating accuracy. Statistical tests show that the area under the curve in the POST period is signicantly larger
than the area under the curve in the PRE period.

risk suggested by the benchmark bankruptcy models. To test for changes in rating accuracy, we identify a subsample of
rm-quarter observations that experience relatively large default probability decreases by retaining the rm-quarters at
the top (bottom) 10% of the Altman Z score change distribution (market model probability change distribution). We also
identify a subsample of rm-quarter observations with large default probability increases by retaining the rm-quarters at
the bottom (top) 10% of the Altman Z score change distribution (market model probability change distribution). Then, we
calculate the frequency of rating upgrades (downgrades) for the subsample of substantial default probability decrease
(increase) in the PRE and POST periods. Table 6 presents our empirical ndings for both upgrades and downgrades. We nd
that, when the benchmark models indicate that the risk of default goes up (down), the rating agencies issue credit rating
downgrades (upgrades) for a signicantly larger percentage of rms in the POST period relative to the PRE period. When we
use the top (bottom) 25% of quarterly bankruptcy likelihood change to identify rms with large changes in the probability
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 125

Table 6
Benchmark model tests..

Market model Altman Z score model

PRE period % change POST period % change P-value PRE period % change POST period % change P-value

Upgrades 1.81% 2.04% (0.00) 1.91% 2.78% (0.00)


Downgrades 10.69% 16.15% (0.00) 4.89% 6.14% (0.00)
This table presents rating change comparisons between the PRE and POST period for two benchmark bankruptcy prediction models. We use the Hillegeist
et al. market model and the Altman Z score model to measure quarterly bankruptcy probability changes. Based on these models, we identify a subsample
of rm-quarter observations with large default probability decreases by retaining the rm-quarters at the top (bottom) 10% of the Altman Z score change
distribution (market model probability change distribution). We also identify a subsample of rm-quarters with a large default probability increase by
retaining the rm-quarters at the bottom (top) 10% of the Altman Z score change distribution (market model probability change distribution). This table
compares the percentage of rating upgrades (downgrades) for the subsample of rm-quarter observations with a large default probability decrease
(increase). P-values are based on two-tail t-tests.

of bankruptcy, the untabulated results are similar. These ndings further support our hypothesis that rating agencies
increase the accuracy of their credit analysis in the POST period.

5.3. Empirical tests and results for rating volatility

Next, we investigate whether rating volatility changes in the POST period, using the following empirical model:

R_VOLATILITY c0 c1  POST c2  SP_RATING c3  FT_RATING c4  DTYPE c5  COVER_STD


c6  FRAUD c7  SIZE c8  ASSETB c9  CONV c10  SS c11  ENHANCE
c12  PUT c13  REDEEM c14  MATURITY c15  RATE c16  GDP
c17  BOND30 c18  RECESSION c19  SPI c20  LQDEFAULT e, (3)
Dependent variable:

R_VOLATILITY the standard deviation of ratings outstanding during the 1-year period leading to default (this measure
requires at least three outstanding ratings)26; and

Issue characteristics:

COVER_STD the standard deviation of the quarterly interest coverage ratio computed over the eight-quarter period
starting 3 years before and ending 1 year before the default date.

All other variables are dened as in Eq. (1).

Given the conicting expectations with respect to the trade-off between timeliness and volatility in the POST period, we do
not have a directional prediction for the POST variable. If rating agencies improve timeliness at the expense of increased
volatility, we expect to nd a signicantly positive coefcient on the POST variable (H3(a)). However, if rating agencies
achieve improved rating timeliness by improving their credit analysis and thus, expanding the existing timeliness-volatility
frontier, we do not expect a signicantly positive coefcient on the POST variable (H3(b)).
Table 7 presents univariate rating volatility comparisons between the PRE and POST periods based on a sample of
defaulting debt issues (panel A) and a comprehensive sample of defaulting and non-defaulting issues (panel B).27 Panel A
and B include rating volatility measures both for a larger sample consisting of all the observations available for univariate
analysis and a restricted sample with additional data availability requirements for all other control variables in Eq. (3). For
both samples, the mean rating volatility is signicantly lower in the POST period, providing support for our hypothesis
H3(b).

26
For the R_VOLATILITY measure we use only debt issues that have at least three ratings outstanding during the year (to be able to compute the
standard deviation). A limitation of this volatility measure is that it excludes issues with less the three outstanding ratings. However, in the stability tests,
our purpose is to compare ratings volatility in the PRE and POST periods. Since we apply the same denition of the R_VOLATILITY measure for both the
PRE and the POST periods, we do not believe that we introduce any bias by requiring at least three ratings.
27
For the volatility tests based on the default sample we measure rating volatility over periods of time starting 1 year before default and ending on
the default date. For the volatility tests based on the comprehensive sample, we measure the R_VOLATILITY variable over yearly rolling windows starting
on July 25th of each year in our sample. We choose July 25th as our yearly cut-off point because July 25, 2002 is the date that separates our PRE and POST
periods.
ARTICLE IN PRESS
126 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

Table 7
Volatility univariate comparisons..

Mean Median

Pre sample Post sample P-value Pre sample Post sample P-value

Volatility of credit ratings in the year prior to default (R_VOLATILITY)


Panel A: Univariate analysis default sample
Larger sample 3.57 (1,354) 2.81 (130) 0.00 3.51 (1,354) 2.51 (130) 0.00
Restricted sample 3.40 (441) 2.60 (76) 0.00 2.91 (441) 2.50 (76) 0.00

Panel B: Univariate analysis default and non-default sample


Larger sample 1.40 (11,033) 1.27 (3,567) 0.00 0.95 (11,033) 0.83 (3,567) 0.30
Restricted sample 1.51 (3,110) 1.45 (981) 0.08 1.00 (3,110) 1.15 (981) 0.00

This table presents credit rating volatility comparisons between the PRE and POST periods. Our volatility measure (R_VOLATILITY) is the standard
deviation of ratings outstanding. For the default sample (Panel A), we measure volatility over periods of time starting 1 year before default and ending on
the default date. For the comprehensive sample (Panel B), we measure the volatility variable over yearly rolling windows starting on July 25th of each year
in our sample. We choose July 25th as our yearly cut-off point because July 25, 2002 is the date that separates our PRE and POST periods. In each panel, we
present a larger sample consisting of all the observations available for univariate analysis and a restricted sample with the requirements for the
availability of all other control variables. Numbers in parentheses represent number of observations. P-values for mean comparisons are based on two-
tailed t-tests. P-values for median comparisons are based on two-tailed Wilcoxon tests.

Table 8
Volatility of credit ratings multivariate comparisons..

Predicted sign Default sample Default and non-default sample

Coefcient P-value Coefcient P-value

Intercept / 3.43 (0.01) 4.95 (0.00)


POST  0.70 (0.00) 1.02 (0.00)
SP_RATING +/ 0.63 (0.00) 0.11 (0.01)
FT_RATING +/ 1.66 (0.00) 0.29 (0.00)
DTYPE +/ 0.10 (0.45) / /
COVER_STD + 0.01 (0.26) 0.02 (0.03)
FRAUD  0.00 (0.97) / /
SIZE +/ 0.08 (0.11) 0.01 (0.47)
ASSETB +/ 0.04 (0.87) 0.50 (0.00)
CONV  0.19 (0.25) 0.33 (0.00)
SS  0.01 (0.91) 0.30 (0.00)
ENHANCE  0.32 (0.00) 0.33 (0.00)
PUT  0.02 (0.92) 0.12 (0.15)
REDEEM  0.16 (0.09) 0.30 (0.00)
MATURITY + 0.01 (0.01) 0.00 (0.00)
RATE  0.28 (0.00) 0.02 (0.00)
GDP +/ 0.00 (0.15) 0.00 (0.00)
BOND30 +/ 0.91 (0.01) 1.02 (0.03)
RECESSION +/ 0.16 (0.19) 0.09 (0.12)
SPI +/ 0.00 (0.32) 0.00 (0.00)
LQDEFAULT + 0.00 (0.13) / /
N 517 4,091
ADJRSQ 74.47% 9.34%

This table presents credit rating volatility multivariate comparisons for the PRE and POST periods. Our measure of rating volatility (R_VOLATILITY) is
dened as the standard deviation of ratings outstanding. For the default sample, we measure volatility over periods of time starting 1 year before default
and ending on the default date. For the comprehensive sample of default and non-default issues, we measure the volatility variable over yearly rolling
windows starting on July 25th of each year in our sample. We choose July 25th as our yearly cut-off point because July 25, 2002 is the date that separates
our PRE and POST periods. See Appendix C for detailed variable denitions. P-values are based on two-tailed tests.

Table 8 presents the results of estimating Eq. (3) for the default and the comprehensive debt issue samples. For both
these samples, the coefcient for the POST variable is signicantly negative indicating that rating volatility has decreased in
the POST period. This nding does not support the notion that rating agencies trade off rating timeliness and increased
volatility. On the contrary, we nd both enhanced timeliness and reduced volatility in the POST period, consistent with the
idea that the nationally recognized agencies have improved their credit analysis in recent years.
In addition, in untabulated analysis, we compare the frequency of credit rating reversals in the PRE and POST periods,
where we dene reversals as a credit rating change followed within 1-year period by another rating change in the opposite
direction (e.g., a downgrade followed by an upgrade). We nd that the number of rating reversals is very small both in the
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 127

PRE and POST periods. For example, the number of downgrades that are reversed is only about 1.5% of all rating changes
both in the PRE and in the POST periods. The number of upgrades that are reversed is less than 1% of all rating changes both
in the PRE and in the POST periods. The differences in the frequency of reversals between the PRE and POST periods (both
for downgrades and upgrades) are not statistically signicant at conventional signicance levels. This nding supports our
conclusion that in the POST period, rating agencies are able to improve the timeliness and accuracy of their ratings without
increasing rating volatility.

5.4. Robustness tests

5.4.1. Regulatory pressure and criticism versus improvements in accounting quality


An alternative explanation for our empirical ndings is that, after SOX, debt issuers have improved their nancial
reporting and disclosure quality, which could lead to improvement in credit rating properties.28 To mitigate the concern
that we attribute our ndings to increased regulatory pressure and investor criticism, when, in fact, the rating property
improvement is only driven by improved accounting quality, we perform some robustness tests. We argue that if the
observed improved rating properties are due to improved accounting quality, we would expect accounting information to
better explain credit rating levels and changes as well as rating timeliness and volatility in the POST period.
Following prior studies that examine determinants of credit ratings (Kaplan and Urwitz, 1979; Sengupta, 1998), we
select the log of total assets, leverage, interest coverage and prot margin as our accounting variables. When we regress
credit rating levels on these four accounting variables, separately for the PRE and POST periods, we nd a decrease in the
adjusted R-square from 52.68% in the PRE period to 46.67% in the POST period. Similarly, for regressions of changes in credit
ratings on the levels (changes) of the four accounting variables, we nd that the R-square decreases from 2.75% (2.19%) to
1.72% (1.61%). These ndings indicate that the explanatory power of accounting variables for credit rating levels and
changes actually decreases in the POST period.
In addition, we adjust our timeliness model in Table 3 by eliminating all the other explanatory variables and running
regressions of the timeliness measure (DAHEAD) on just accounting variables (i.e., QLOGASSET, QDE, QCOVER) for the PRE
period and the POST periods separately. The adjusted R-square from these regressions drops from 12.37% to 4.81%,
indicating that the same set of accounting variables does a poorer job in explaining rating timeliness in the POST period
than in the PRE period. Similarly, for our volatility regressions we nd a decrease in adjusted R-square in the POST period.
While these tests cannot rule out the possibility that both the regulatory pressure/investor criticism and the improvements
in accounting quality are responsible for the improvements in credit rating properties, they mitigate the concern that our
ndings are driven exclusively by improvements in accounting quality.29
Finally, we examine whether there are any cross-sectional differences in the degree of rating property improvements.
We compare ratings issued by Moodys and S&P, but not Fitch (given the small number of observations available for a Fitch
only sample). With respect to rating timeliness, we nd that, although both agencies improve, S&P improves more than
Moodys (i.e., the coefcient on an interaction term of the POST variable with SP_RATING is signicantly negative in our
timeliness regression). We also nd that S&P improves more than Moodys with respect to rating accuracy for the
cumulative accuracy prole tests. No clear difference between S&P and Moodys exists with respect to reductions in rating
volatility. In addition, we nd that rating timeliness improves more for those debt issues that start the 1-year period
leading to default with investment grade ratings compared to the ones that start with non-investment grade ratings. To the
extent all agencies and types of rated issues benet the same way from the same improved accounting/disclosure, we
should not observe differences in the degree of rating property improvement. These cross-sectional differences seem
inconsistent with the notion that improvements in rating properties are due exclusively to better accounting/disclosure in
our POST sample period.

5.4.2. Other robustness tests


To further examine the robustness of our ndings, we run several sensitivity tests: (1) In addition to controlling for
issuer characteristics measured on a quarterly basis, we use issuer annual characteristics and issuer quarterly change
characteristics; (2) We examine the change of timeliness of credit watch placements as opposed to rating downgrades for
our sample of defaulted issues; (3) In all our timeliness regressions, we add a indicator control variable for downgrades
from investment grade to non-investment grades; (4) Instead of measuring Standard and Poors 500 index at the level, we
use the Standard and Poors 500 return index; (5) we separately analyze rating properties by S&P, Moodys and Fitch; (6)
Given that a large proportion of our sample observations (82%) come from the PRE period, we test the robustness of our
ndings to a more balanced sample with an equal number of observations from the PRE and POST periods. We retain all the

28
Note, however, that improved nancial reporting and disclosure may not necessarily have an effect on the improvement in credit rating properties.
Given that the rating agencies are exempted from the regulation FDs provisions, they could have always requested private information directly from
managers to supplement the quality of publicly available disclosures. In their rating methodologies, the agencies (e.g. Standard and Poors, 2002) explicitly
state they use their direct access to debt issues management to obtain private information.
29
These ndings should be interpreted with caution for several reasons: (1) our tests are based on the assumption that the variation in the dependent
variable does not substantially change between our pre and post periods; (2) we do not perform any statistical tests to compare the R-squares; (3) the
accounting ratios we use (and the calculation of these ratios) may be different from the ratios actually used by the credit rating agencies.
ARTICLE IN PRESS
128 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

observations from the POST period and randomly select an equal number of observations from the PRE period and re-run
all our tests to ensure a more balanced comparison. For all these sensitivity tests, the inferences are similar, suggesting that
our results are robust to alternative specications.

6. Conclusion

This study examines whether and how nationally recognized credit rating agencies change their rating properties in
response to the increased regulatory pressure and investor criticism following well-publicized bankruptcy scandals. We
hypothesize and nd that credit rating agencies improve their rating timeliness in the period following increased regulatory
scrutiny and investor criticism brought about by several high-prole bankruptcy scandals. We also nd that the rating
agencies improve rating accuracy and reduce rating volatility in the POST period, suggesting that credit rating agencies
enhance rating timeliness by improving their credit analysis, and not by sacricing rating accuracy and/or volatility.
One contribution of our study is that it uses the impact of increased regulatory pressure and criticism on the properties
of credit ratings to examine the potential trade-off between rating properties. In the past, rating agencies have attributed
the lack of timeliness in credit ratings to unavoidable trade-offs between desirable rating properties. Our study nds that
when faced with increased regulatory pressure and reputation concerns, rating agencies improve rating timeliness, as well
as accuracy and volatility. Therefore, the lack of rating timeliness cannot be attributed exclusively to the trade-off
explanation. It seems to be, at least partly, due to the market power enjoyed by the nationally recognized rating agencies.
This study is subject to at least one limitation. While we attempt to control for other economic factors that could affect
credit rating properties, we cannot rule out the possibility that our results are attributable, at least in part, to some other
factor unrelated to regulatory pressure and criticism. One such possible factor is the improved accounting/disclosure
quality after SOX.

Appendix A

Sample composition for timeliness, accuracy and volatility tests can be seen in Table A1.

Table A1
Sample composition for timeliness, accuracy and volatility tests..

Hypotheses Tests Debt issues Rating changes Rating levels

Timeliness hypothesis The DAHEAD regression (Table 3 model 1) Defaulting issues Downgrades Only /
Timeliness hypothesis The WRATE regression (Table 3 model 2) Defaulting issues / All rating levels
Accuracy hypothesis Types I and II errors regressions (Table 5) Defaulting and non-defaulting issues / All rating levels
Accuracy hypothesis Bankruptcy benchmark models (Table 6) Defaulting and non-defaulting issues Downgrades and upgrades /
Volatility hypothesis Rating volatility regressions (Table 8) Defaulting and non-defaulting issues / All rating levels

This table presents our sample composition and the type of rating levels and changes used in our empirical tests. The sample composition changes for
different empirical tests. Some of our empirical tests are based on a sample of defaulting issues, while others use a comprehensive sample of all rated
bonds (defaulting and non-defaulting). See Appendix C for variable denitions.

Appendix B

Rating schemes denitions can be seen in Table B1.

Table B1
Rating schemes denitions..

Credit Risk Moodys Standard & Poors Fitchs Code assigned

Highest grade Aaa AAA AAA 1


Aa1 AA+ AA+ 2
High grade Aa2 AA AA 3
Aa3 AA AA 4
A1 A+ A+ 5
Upper medium grade A2 A A 6
A3 A A 7
Baa1 BBB+ BBB+ 8
Medium grade Baa2 BBB BBB 9
Baa3 BBB BBB 10
Ba1 BB+ BB+ 11
Lower medium grade Ba2 BB BB 12
Ba3 BB BB 13
B1 B+ B+ 14
ARTICLE IN PRESS
M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130 129

Table B1 (continued )

Credit Risk Moodys Standard & Poors Fitchs Code assigned

Low grade B2 B B 15
B3 B B 16
Caa1 CCC+ CCC+ 17
Caa2 CCC CCC 18
Caa3 CCC CCC 19
Ca CC CC 20
C C C 21
Default D DDD/DD/D 22

This table lists the numerical codes we assign for various Moodys, S&Ps, and Fitchs issue credit ratings. Credit rating agencies issue default ratings only
after they observe actual defaults (i.e., D level ratings are not predictions of future defaults, they are assigned ex-post, while AAA to C level ratings are
assigned ex-ante). Only S&P and Fitch have a rating scale that includes D level ratings, while Moodys does not include D ratings in its rating scheme.

Appendix C

C.1. Variable denitions

DAHEAD number of days between the downgrade date and the default date (with a minimum value of 360 and a
maximum value of 0),
WRATE weighted average of credit rating level during the last year leading to default,
POST indicator variable that takes a value of 1 if the rating change date falls after July 25, 2002, and 0 otherwise,
SP_RATING indicator variable that takes a value of 1 if the rating agency is S&P, and 0 otherwise,
FT_RATING indicator variable that takes a value of 1 if the rating agency is Fitch, and 0 otherwise,
DTYPE indicator variable that takes a value of 1 if the default type is Bankruptcy, and 0 otherwise,
QLOGASSET log of issuer quarterly total assets (Compustat Quarterly data 44) for the most recent quarter prior to a
downgrade,
QCOVER issuer quarterly interest coverage measured as income before extraordinary items scaled by interest expense
(Compustat Quarterly data8/data22) for the most recent quarter prior to a downgrade,
QDE issuer quarterly debt to equity ratio (Compustat Quarterly data51/data59) for the most recent quarter prior to a
downgrade,
SIZE log of issue size,
ASSETB indicator variable that takes a value of 1 if the issue is an asset-backed issue, 0 otherwise,
CONV indicator variable that takes a value of 1 if the issue can be converted to the common stock (or other security) of
the issuer, 0 otherwise,
SS indicator variable that takes a value of 1 if the issue is senior secured debt, 0 otherwise,
ENHANCE indicator variable that takes a value of 1 if the issue has the credit enhancement feature, 0 otherwise,
PUT indicator variable that takes a value of 1 if the bondholder has the option, but not the obligation, to sell the
security back to the issuer under certain circumstances, 0 otherwise,
REDEEM indicator variable that takes a value of 1 if the issue is redeemable under certain circumstances, 0 otherwise,
MATURITY number of years to maturity,
GDP annual gross domestic product,
BOND30 CRSP 30-year bond annual return,
RECESSION indicator variable that takes the value of 1 if the rating date falls between March 2001 and October 2001 (i.e., a
recession period according to the National Bureau of Economic Research), 0 otherwise,
SPI annual measure of Standard and Poors 500 index,
RATE issue outstanding credit rating 360 days before default,
R_VOLATILITY the standard deviation of ratings outstanding during the 1-year period leading to default,
COVER_STD the standard deviation of the quarterly interest coverage ratio computed over the eight-quarter period
starting 3 years before and ending 1 year before the default date,
FRAUD indicator variable that take a value of 1 if the default rm has a restatement during the window (365, +365)
relative to the default date, 0 otherwise,
LQDEFAULT the number of defaults in quarter before a rating change,
LARGE_LOSS indicator variable that takes a value of 1 if a rm experiences an annual loss equal or greater than 25% of total
assets, 0 otherwise, and
NEG_RETindicator variable that takes a value of 1 if a rm reports negative retained earnings and 0 otherwise.

References
Altman, E., 1968. Financial ratios, discriminant analysis and prediction of corporate bankruptcy. Journal of Finance 23, 589609.
Altman, E., Rijken, H., 2004. How rating agencies achieve rating stability. In: Cantor, R. (Ed.), Recent Research on Credit Ratings (Special Issue). Journal of
Banking and Finance 28, 26792714.
ARTICLE IN PRESS
130 M. Cheng, M. Neamtiu / Journal of Accounting and Economics 47 (2009) 108130

Association for Financial Professionals, 2002. Ratings agencies survey: accuracy, timeliness, and regulation. Available from /www.afponline.orgS.
Bamber, D., 1975. The area above the ordinal dominance graph and the area below the receiver operating graph. Journal of Mathematical Psychology 12,
387415.
Barlas, S., 2005. Corporate credit ratings at issue. Strategic Finance 87 (3), 2223.
Beaver, W., Shakespeare, C., Soliman, M., 2006. Differential properties in the ratings of certied versus non-certied bond-rating agencies. Journal of
Accounting and Economics 42, 303334.
Belsley, D.A., Kuh, E., Welsch, R.E., 1980. Regression Diagnostics. Wiley, New York, NY.
Cantor, R., Mann, C., 2003. Measuring the performance of corporate bond ratings. Special Comment, Moodys Investor Services.
Cantor, R., Mann, C., 2006. Analyzing the tradeoff between ratings accuracy and stability. Special Comment, Moodys Investor Services.
Chen, N.E., Roll, R., Ross, S.A., 1986. Economic forces and stock market. Journal of Business 59 (3), 383403.
Fabozzi, F., 2001. The Handbook of Fixed Income Securities, sixth ed. McGraw-Hill, New York, NY.
Fich, E., Shivdasani, A., 2007. Financial fraud, director reputation, and shareholder wealth. Journal of Financial Economics 86 (2), 306336.
Fons, J., Cantor, R., Mahoney, C., 2002. Understanding moodys corporate bond ratings and rating process. Special Comment, Moodys Investor Services.
Hand, J., Holthausen, R., Leftwich, R., 1992. The effect of bond rating agency announcements on bond and stock prices. The Journal of Finance 47, 733752.
Hanley, J.A., McNeil, B.J., 1983. A method of comparing the areas under receiving operating characteristic curves derived from the same cases. Radiology
148, 839843.
Hillegeist, S., Keating, E., Cram, D., Lundstedt, K., 2004. Assessing the probability of bankruptcy. Review of Accounting Studies 9 (1), 534.
Holthausen, R., Leftwich, R., 1986. The effect of bond rating changes on common stock prices. Journal of Financial Economics 17, 5789.
Hughes, J., 2006. Credit ratings groups come under attack. Financial Times, March 8, pp. 4546.
Ip, G., 2002. Companies keep closer eye on accounts enron collapse prompts more clout for raters, less business spending. Wall Street Journal, February
1, p. A2.
Jorion, P., Liu, Z., Shi, C., 2005. Informational effects of regulation FD: evidence from rating agencies. Journal of Financial Economics 76, 309330.
Kahn, J., 2002. Watching the detectives. Fortune 146 (9, November 11), 184.
Kaplan, R., Urwitz, G., 1979. Statistical models of bond ratings: a methodological inquiry. Journal of Business 52, 231261.
Kennedy, P., 1992. A Guide to Econometrics. MIT Press, Cambridge, MA.
Liu, H., Li, G., Cumberland, W.G., Wu, T., 2005. Testing statistical signicance of the area under a receiving operating characteristics curve for repeated
measures design with bootstrapping. Journal of Data Science 3, 257278.
Lofer, G., 2004. An anatomy of rating through the cycle. Journal of Banking & Finance 28, 695720.
Oxley, M.G., 2005. Opening statement by Chairman Michael G. Oxley, Financial Services Committee, Subcommittee on Capital Market, Insurance, and
Government Sponsored Enterprises Legislative Solutions for the Rating Agency Duopoly. June 29. (Accessed at /http://nancialservices.house.govS
in August, 2005.)
Partnoy, F., 2006. Take away the rating agencies licenses. Financial Times, March 13, pp. 1516.
Security and Exchange Commission (SEC), 2003. Report on the role and function of credit rating agencies in the operation of securities markets. Available
from /www.sec.govS.
Sengupta, P., 1998. Corporate disclosure quality and the cost of debt. Accounting Review 73, 459474.
Shaw, H., Reason, T., 2006. House approves rating agency shakeup. Available from /www.cfo.comS.
Sinclair, T., 2003. Bond rating agencies. New Political Economy (8), 147161.
Standard & Poors, 2002. Standard & Poors Corporate Governance Scores: Criteria, Methodology And Denitions. McGraw-Hill Companies, Inc., New York.
Tafara, E., 2005. Speech by SEC Staff: Remarks for the International Organization of Securities Commissions Annual Conference Panel on the Regulation of
Credit Rating Agencies, April 6.
White, L., 2001. The credit rating industry: an industrial organization analysis. Working Paper, New York University.