Escolar Documentos
Profissional Documentos
Cultura Documentos
Olli Väyrynen
Master’s thesis submitted in partial fulfillment of the requirements for the degree of
Master of Science in Technology
The process of determining the current value of a company is of interest for legislators,
company management, asset managers as well as for individuals. There are numerous
methods for determining the value, some of them are subjective and some more objective.
Public companies announce their results according to certain economic legislation in
order to improve the transparency of their businesses. At worst, the lack of integrity and
transparency of profit calculation with unethical company governance leads to severe
financial collapses. A company’s value lies in its potential to generate a stream of profits
in the future.
The goal of this Thesis is to form an accurate model to identify undervalued stocks. The
identification is based on eight financial ratios, so the analysis is multivariate in nature.
The valuation of stocks is enforced with dividend discount model (DDM) using actual
cash flow data. Stocks are then classified in undervalued and overvalued with linear
discriminant analysis (LDA) which is a widely used with corporate performance surveys.
Different ratio combinations are evaluated in order to find the most discriminating ratio
profile. The statistical assumptions of discriminant analysis are examined in depth as they
influence the both statistical and practical significance levels, as well as the prediction
capability of the model.
Based on this Thesis, the LDA based multivariate identification of undervalued stocks has
some predictive capability. As expected, the predictive capability deteriorates
substantially when predicting to other sectors or to other periods of time. Overall, more
research is needed to develop the model to be utilized in practice.
I would like to thank Professor Ahti Salo for comments and guidance.
Espoo, 3.12.2007
Olli Väyrynen
Contents
1 Introduction..................................................................................................................1
4 Empirical analysis......................................................................................................34
8 References...................................................................................................................82
1 Introduction
During the last three decades, financial markets have experienced an upheaval as
computers gain capacity and as financial information can be accessed in a
moment. Financial institutions compete with each other by offering the most
accurate and relevant financial information. Ever more sophisticated decision
support systems are formed by mathematical models and forecasts.
There are numerous people in the investment world with different backgrounds,
who valuate securities for several purposes. A common purpose is to benefit from
knowing the real value of a security. People valuating stocks can be divided into
six broad categories (Hoover 2006) according to their objectives: Corporate
managers benefit from knowing the value of their company especially when
taking strategic decisions about raising money, because in the times of
overvaluation, company can gather larger amounts of money in a share issuance.
They also benefit from knowing the values of other companies in case of an
acquisition or a merger. Financial analysts in investment banking business help
corporate managers to realize additional share issuances, acquisitions and
mergers. Financial analysts in equity research track and give out their
recommendations to public and/or private investors, although there is little
empirical evidence supporting the idea of profiting abnormally based on the
information offered by equity researchers. Asset managers are professional
investors managing funds of individuals and organizations in order to profit
abnormal profits. In order to “beat the market”, assets managers continuously seek
and act on market misvaluation of securities. Individuals are rarely recommended
to pick stocks but instead, are encouraged to invest in indices or funds. Still, stock
picking can often be seen as an enjoyable hobby bringing excitement. Economic
policymakers value stock markets as a whole in order to observe the stability of
1
the markets. They have to up to date as they make decisions about interest rates,
money supply and they enact laws.
The two extremes of security valuation are technical analysis and fundamental
analysis (Ross et al. 1999). Technical analysis (Murphy 1999) discards the
underlying business and valuates securities by statistics, such as past prices and
volumes. Advocates of technical analysis predict market prices and movements
based on the dynamics of market price and volume, rather than on fundamentals
of the corporation. Fundamental analysis strives to measure the intrinsic value of a
security by studying all the information that can affect the value. Both
microeconomic and macroeconomic factors are considered in fundamental
analysis. Many of the most well performed investors support fundamental
analysis.
Stocks are commonly given scores on the basis of their fundamentals. Nowadays,
various mathematical tools are used to aid valuating and picking stocks. Financial
ratios are being interpreted and compared to assess the intrinsic value of stocks. A
classical scoring method is linear discriminant analysis (Altman 1968;
Lachenbruch 1975; Sharma 1996), which classifies stocks in two or more groups
based on multiple financial ratios. The most recent models for stock valuation and
classification are neural network models (Brocket et al. 1994, 2006; Zapranis &
Ginoglou 2000).
In this Thesis, the stock valuation and classification is studied from the asset
manager’s point of view. Sifterfund supplied the access to Bloomberg in order to
gather the data for the Thesis. The data consists of two sub sectors in the US
manufacturing sector; industrial manufacturing and non-cyclical consumer goods.
The data is gathered from the years 1989-2006.
2
1.2 Research objectives
The Thesis focuses on the relationship between stock valuation and financial
ratios. The primary objective is to form a model to classify stocks in groups of
undervalued and overvalued based on their financial ratios.
The underlying idea is that the valuation level of a stock can be assessed from its
financial ratios. The valuation level is not apparent in the ratios and multivariate
technique is required. The Thesis concentrates on two group discriminant analysis
as a method for group classification and prediction. The classification capability is
evaluated from both, practical and statistical point of view. The validation of the
classification capability is enforced with testing the prediction capability within
the sector, as well as, within another sector. Therefore, three samples are
diligently formed; basic and hold-out samples from the industrial sector and the
additional sample from non-cyclical consumer goods sector. The basic sample is
for adjusting the model and the other two are for testing the prediction capability.
Suitability and limitations of discriminant analysis for identification of
undervalued stocks is inspected as well.
3
1.3 Scope
The main constraint in this work is that only discriminant analysis is used. That is
because the scope of the Thesis was estimated suitable with discriminant analysis
only.
The number of financial ratios in the Thesis is eight. The eight ratios were agreed
with the instructor as the most relevant ones in this regard. The limitation is
reasonable because larger number of candidates would require an additional
model for ratio selection. Originally there were nine ratios but dividend yield was
excluded because of its partially continuous nature. The exclusion was also
supported by the correlation with the valuation method. The dividends from the
following seven year affect the valuation of a stock. None of the qualitative
measures are included in the Thesis either.
4
data was commonly available since 1989. The actual valuation begins from the
year 1991 in order to increase the sample sizes.
1.4 Structure
The empirical data section consists of valuation and sample forming procedure.
This section is important because the rest of the work is based on the samples
formed here. The samples can be formed in endless ways and it largely depends
on the characteristics of the take.
The results section first analyses the results from the discriminant analysis. The
Thesis continues with sensitivity analysis, in which history data is processed,
trimmed, and various ratio profiles are being analyzed. The results section
culminates to the superior discriminating functions.
The final section assesses the analysis critically and suggestions for further study
are formed. The overall research process is shown in Figure 1, which illustrates
the crucial points in the research. The analysis diagram can be used as a guide
through the Thesis.
5
Figure 1 Analysis diagram.
6
2 Stock valuation
The starting point for asset valuation based on fundamentals is that the present
value depends on its future cash flows and for example stocks provide two kinds
of cash flows: dividends and sale price in the end (Ross et al. 1999). If the
valuation concerns bonds, coupons are received and if real projects are valued,
after tax cash flows are discounted into the present value. Summing up the future
cash flows yields the discounted cash flow model (DCF) which is the same
despite the type of the asset.
The Efficient Market Hypothesis (EMH) asserts that prices on traded assets are
unbiased and they reflect all the information available. EMH was introduced by
Eugene Fama in 1970 (Fama 1970) and it is one of the cornerstones in the modern
theories of finance. According to EMH, it is not possible to consistently
7
outperform the market with the information already available on the market. The
three forms of EMH are weak-, semi- and strong-form. Weak form efficiency
implies that technical analysis will not yield excess returns in the long-run. Semi-
strong efficiency implies that fundamental analysis cannot yield excess returns in
the long run either. Strong-form efficiency implies that security prices reflect all
the information available and no one can earn excess returns, respectively. In an
efficient market, above average return has more to do with luck than skill. There
are numerous studies and statements for and against EMH. In case of the strong-
form EMH, analyzing information would not benefit anyone. Weak-form EMH
hardly is the case because the majority of the active asset managers under-perform
their appropriate benchmarks.
The pervasive concepts of financial literature are value investing and growth
investing (Hoover 2006). Value investing is an investment strategy that favors
stocks that are undervalued, for example, because of overreaction to news flow.
Growth investing is an investment strategy that favors stocks that are expected to
earn above average earnings compared to the markets. As the EMH implies: “If
the market prices stocks accurately, there is no consistent advantage in choosing
between one type of stocks over another” (Hoover 2006). Growth investors
typically screen out low PE companies and they think that they are able to predict
high-growth phases for companies. In contrast, value investors typically screen
out high PE companies, and their idea is to know the undervalued stocks with
relatively slower growth. The academic community has generally come to agree
that value investing is the better performing strategy of the two (Chan &
Lakonishok 2004). The two popular investing strategies have different starting
points but the objectives are the same, so, they are not so different after all. As
Warren Buffet puts it: “Growth and Value Investing are joined at the hip”.
One of the most common ways of assessing the relative values of stocks among
practitioners is to compare financial ratios. The main advantage of using financial
8
ratios instead of amounts from the income statement is that they are independent
of the size of the company. Thus, financial ratios allow elegant comparison of
securities. Academicians have been studying financial ratios widely for almost a
century already. As computers develop and financial reporting became more
regular, statistics came up as a notable way of studying security valuation.
Statistics are useful in cases of extensive sample-sizes. Edward I. Altman said in
1968 (Altman 1968): “Academicians seem to be moving toward the elimination of
ratio analysis as an analytical technique in assessing the performance of the
business enterprise”. Despite of Altman’s argument, researchers have been
publishing papers on this matter frequently (Dimitras et al. 1996). Although there
is just a little formal empirical evidence that financial ratios help picking well
performing stocks (Adnan & Dar 2006), it is widely used and agreed to be useful
(Campbell & Shiller 1998).
The most popular valuation ratio is price to earnings ratio (PE), which is usually
the first thing to examined about securities. PE can also be seen in the news as
institutions assess the economy as a whole. Stock exchanges are also valued as a
whole, for example, “valued above historical average”. Practitioners rely heavily
9
on the PE-ratio. Technology and other volatile stocks generally sell at high PE
ratios because they are expected to grow fast in the future. Valuation levels can
get out of hand, as we saw in the beginning of the millennium, as the internet-
bubble burst. Small, loss making internet companies could have been valued tens
of times higher than for example stable manufacturing companies because of the
growth opportunities investors believed the companies would have. Billions and
billions of dollars disappeared as the market corrected itself. PE-ratios are sector
specific and PE comparison among the companies in the sector gives a fast,
tentative estimate of the appreciation level of the companies. The general PE-
levels also vary from country to country and for example in Japan (Ross et al.
1999), the average multiple for Tokyo Stock Exchange has been 40-100, while in
America it has been around 25. This suggests huge and constant growth
opportunities for Japanese companies but it can, as well depend on the culture and
on what level the market has been used to be. The PE-trend in Japan is downward
sloping probably because companies are getting even more multinational. In
addition to growth opportunities (Ross et al. 1999), the PE-ratio can be high
because of low risk or for it is accounted in a conservative manner, yet, the first
one being the most important. Usually earnings figure is from the last four
quarters (trailing) or expected four quarters but sometimes two past quarters are
used to predict the two future earnings. Earnings typically refer to after-tax net-
income and it is the ultimate success factor for businesses. PE ratio is calculated
according to equation
SHARE _ PRICE
PE = . (1)
EARNIGNS _ PER _ SHARE
Earnings before interests and taxes is very popular financial figure indicating the
profitability of a company. The differences to PE ratios earnings are that interest
expenses and taxes are not deducted from operating income. Corporate
management has rather wide margin to adjust EBIT because of amortizations with
10
intangibles and depreciations with tangibles. Yet, one has to keep in mind that
expenses incurred from the firm’s capital structure do not affect EBIT and thus it
cannot be observed solely. Another pitfall occurs with research and development
expenses because, for example, technology companies treat it as an operating
expense, although it is the single most important capital expenditure in a
technology company (Damodaran 2001). Another phenomenon for a company
management is to postpone earnings higher than analyst estimates. The EBITM is
EBIT compared to the net sales and it is also called operating margin. It indicates
how effective a company is at controlling the costs and expenses associated with
their normal business operations. The ratio is calculated as
REVENUE − OPERATINGEXPENSES
EBITM = , (2)
NET _ SALES
where the revenue less operating expenses is also equal to a sum of earnings,
interest expenses and taxes.
11
financial health or high earnings volatility (Defond 2003). Cash flow forecasts
assist in interpreting earnings and assessing firm viability. CPY is calculated as
Operating cash flow is the difference between the revenue from the
products/services (operating revenue) and costs incurred from producing the
products/services in question (operating costs). Operating cash flow (OCF) equals
the sum of EBIT and depreciation less taxes.
Free Cash Flow is a measure of financial performance representing the cash that is
left after the costs of handling its asset base. It is calculated by deducting capital
expenditures from operational income, as below:
Free cash flow is the sum of net income and amortization/depreciation less
changes in working capital and capital expenditures.
Free cash flow is another concrete measure of the companies’ ability to generate
profits, in addition to cash flow. Even profitable businesses can have negative
cash flow if they face increased financing cost from additional capital. The
difference between OCF and FCF is that FCF is stricter and takes into account
changes in working capital and capital expenditures to reveal the hard cash that
the company has after all the costs the business requires. The reason why
amortization/depreciation is added to the equation is that the FCF measures the
12
cash flow at that moment and affects of investments executed in the past years are
eliminated.
Investors are quite interested in FCF because the growth of a company asks for
cash and, even more importantly, the stream of dividends is paid in hard cash as
well. When a stock price is relatively low and FCF is in a steady rise, a profitable
investment opportunity might have occurred. If the company is not wasting the
incoming money for nothing, earnings will rise eventually. On the contrary, is the
FCF levels are weakening for too long, the company will face liquidity problems
and becomes indebted.
FCF is the cash that can be used to invest in and to upgrade businesses. Excessive
shareholder rewarding can deplete the FCF and way more expensive money has to
be lent from outside the firm, thus increasing risk and lowering future cash flows.
The interests of corporate managers and shareholders have major conflicts and
they have drawn a little attention in the academic community (Jensen 1986). It is
also claimed that managers’ power is reduced by high payout ratio because of
reduced resources they are in charge of, which might give incentives to mangers
to grow the company beyond its optimal size (Jensen 1986). The bias develops
further because managers’ compensations are positively related to the growth in
sales (Murphy 1985).
The free cash flow hypothesis claims that high levels of FCF leads to wasteful
activities by the management (Ross et al. 1999). According to the hypothesis,
without excess cash, management operates as in more risky situation and thus
avoids projects with negative NPV (Mitra et al. 1991). The hypothesis supports
debt financing as the principal when interest reduce the free cash flow reducing
the opportunity for managers to waste resources. According to the US oil industry
survey supporting the free cash flow hypothesis (Griffin 1988), oil industry
altered much in the beginning of the eighties, through mergers and share
buybacks. Market values increased even though debt to equity ratios increased
substantially, meaning that the markets viewed increased debt beneficial.
13
2.1.5 Return on capital employed (ROCE)
EBIT
ROCE = . (5)
TOTAL _ ASSETS − CURRENT _ LIEABILITIES
Capital employed includes fixed tangible assets, other operating assets and
working capital. In other words, capital employed is the value of all the assets
employed in a business.
ROCE is closely related to return on equity (ROE) which is the ratio between net
income and average stockholders’ equity. The difference with ROCE is that
interests and taxes are subtracted from the net income and long term liabilities are
also subtracted from the total assets. ROCE is much overlooked ratio possibly
because it is not as intuitive as many others but it is a useful ratio for assessing the
efficiency of a company’s capital investments. A public company has to raise
capital to achieve higher return and ROCE measures company’s ability to achieve
operating profit on operating assets. As a rule of thumb, ROCE should always be
higher than the rate at which the company borrows. A stable history of high
ROCE suggests high growth for a company and ROCE is especially essential with
capital intensive companies because huge sums of money is needed for
investments and, once again, it is vital to invest in order to grow.
On the contrary, as Helfert (Helfer 2001) puts it: ROCE “does not, however, relate
well to economic measures used in judging new investments, nor does it assist in
making day-to-day decisions on an economic basis”. Also, ROCE has a tendency
to rise cash being the same, because assets are being depreciated all the time. This
is not a flaw because companies increase debt rather repeatedly. When studying
ROCE, long averages should be used with assets. Another point worth
14
considering with ROCE is that inflation only increases revenues but would not
affect assets, which might increase ratios substantially in the times of high
inflation. Andersson (Andersson 2006) reveals an interesting statistic about S&P
500’s 158 survivor companies’ (been in the index since 1980 until 2003) ROCE,
what have been stable at around 12 % for over two decades. In the recession of
1990 ROCE went under 10 % and in the internet crash 2001, it dropped to 5 %.
Yet, it seems to recover quickly.
Price to book value is the intuitive comparison of the market capital and the book
value of the company in the balance sheet. There are slightly varying ways of
defining the book-value but the basic way defining it is by using share capital,
which is the difference between total assets and total liabilities. Retained earnings
are included in the equation because it is a profit retained to the company after
paying the shareholders, so it really is a tangible asset. PB ratio is
MARKET _ PRICE
PB = .(6)
TOTAL _ ASSETS − TOTAL _ LIABILITIES + RETAINED _ EARNINGS
15
metal industry, because they are not supposed to grow rapidly in the future and
investments are time demanding processes. PB ratio is less meaningful for
companies that posses hidden assets, such as intellectual property, which is not
reflected in the book value.
Price to sales ratio values stock by dividing the market price with the trailing 12
month revenue. Price to sales ratio does not take capital structure into account,
thus only similar companies should be compared. When comparing similar
companies, say, companies with similar capital composition, their price to sales
ratios tell a lot about the company’s competence of making revenue and how
much the markets value every dollar of the company’s sales. Price to sales ratio is
very handy in cases where large-scale costs occur and PE ratio becomes useless
because earnings may diminish even to negative level. Company might have been
investing heavily and the revenues are rocketing, so, valuation ratios should be
studied in a multivariate manner. PS ratio is
One should be careful with revenues, because they can sometimes be net revenues
meaning that cash discounts are being subtracted. Some practitioners consider
relatively low price to sales ratio and rising stock price to be an investment
opportunity for a growth stock. Another warning sign might be rising receivables
16
even though revenue growth is string because then revenues are not collected. PS
is suggested to be a stable stock price predictor but PE ratio outperforms it in most
cases (Senchack & Martin 1987; Park & Lee 2003).
Gearing is a financial ratio describing the level of company’s debt compared to its
share capital. The gearing equation below indicates the degree to which the firm is
funded by creditors and owners money. High levels of gearing is considered risky
but on the contrary, organic growth is not enough in most cases and financial
leverage is required survive. Gearing level must be considered in relation to its
peers and substantially high gearings should be regarded risky because in case of
an economic downturn, debt services cause serious risk for the company. Gearing
ratio is
NET _ DEBT
GEARING = * 100 . (8)
SHARE _ CAPITAL
Net debt is total debt deducted by liquid assets; cash and assets that can be
converted to cash immediately, such as savings deposits, certificates of deposit,
money market accounts and money market mutual funds.
17
situation. Equity issuance is beneficial for a firm in the times of high stock price
because more money can be gathered and managers avoid equity issuances if they
consider their stock undervalued. Recent studies also suggest that firm’s history
plays an important role in determining capital structure (Hovakiam et al. 2001).
Also highly profitable firms tend to pay down their debt and become less
leveraged. High payout ratio also affects gearing because it supports taking debt
for investments. Jensen (Jensen 1986) claims that the shareholders support paying
dividends for reducing resources under management’s control, thus, reducing
wasteful investing in negative NPV targets (mentioned also in Free Cash Flow).
The static trade-off theory (Myers 1984) says that a firm is viewed as setting a
target debt-to-equity ratio and makes choices according to the current and target
debt-to-equity ratio. The theory is for the tax benefit of debt and claims that the
marginal benefits of further increases in debt decrease, so, the debt-to-equity ratio
must be optimized according to marginal benefits.
The pecking order theory (Myers 1984) says that a firm prefers internal and debt
financing and there is no actual target debt-to-equity ratio. The theory suggests
that companies make the financing decisions according to the law of least effort,
or least resistance. Hence, the hierarchy of financing is internal funds, debt, and
equity as the last resort. The theory also claims that firms adapt their dividend
payout ratio to their investment opportunities, which makes dividend policies
sticky.
The market timing hypothesis (Baker & Wurgler 2002) does not generally care if
debt or equity is used but the choice depends on the current situation of the
financial markets and the price to be paid for the capital. Equity is issued when
prices are high and repurchasing when prices are low. Firms take advance of
perceived “mis-pricing” of markets in financing their business and therefore, the
hypothesis belongs to behavioral finance.
18
The neutral mutation hypothesis (Miller 1977) says that firms fall into financing
patterns and habits which have no effect on firm value. Habits make interest
groups feel comfort and predicting accurate.
Stocks are valued according to their future cash flows for investors, meaning
dividends, if any, and sale price after the holding period. The future cash flows
will be discounted according to investors yield requirement on the investment.
Dividend discount model (Ross et al. 1999) is the general starting point for all
security valuation methods and a number of researchers have found a positive
correlation between dividend yields and future stocks returns in a multiple-year
time horizon (Goetzmann & Jorion 1995). LeRoy and Porter (Leroy & Porter
1981) and Shiller (Shiller 1981), on the other hand, questioned the usefulness of
DDM by claiming that stock prices appear to be too volatile to be measured by
fundamentals. DDM is mostly used to with future estimates and using it with
realized dividends and market prices divides academicians into supporters and
opponents. The basic idea of DDM is (Ross et al. 1999)
Div1 P
P0 = + 1 . (9)
1+ r 1+ r
Net present value of a stock, considered one year ahead, is the sum of the
dividend, and the sale price after that year discounted by investors yield
requirement r, as can be seen in the equation above. When the proceeding n years
are in consideration, the formula evolves as (Ross et al. 1999)
n
Div1 Div2 Div3 Divn Pn
P0 = + +
1 + r (1 + r ) (1 + r )
2 3
+ ... = ∑
t =1 (1 + r )
n
+
(1 + r )n
. (10)
19
Macro economic conditions vary substantially in ten year period which is why
dividends and investor’s yields requirements cannot be assumed to be flat. A very
popular version of DDM is constant growth version, in which dividends are
assumed to grow at a constant rate g, as in the equation (Ross et al. 1999)
Div
P0 = . (11)
r−g
The equation is called Gordon formula and for the summation to be finite, it
requires g to be smaller than r. Estimation of the growth rate for dividends is
usually based on history trend and future prospects.
DivT +1
T
Div(1 + g1 ) t
r − g2
P0 = ∑ + . (12)
t =1 (1 + r )t (1 + r )T
The discount rate, r, can be put in to the equation in exactly equal, differential,
manner.
DDM has problems as well as any model. The model can be viewed too static if
constant discount rates are used during long periods. Another problem is that
when counting infinite sums, the sums might dissolve, depending on the relation
between growth factor and discount factor. Yet another flaw from reality is that
commonly used zero growth dividends are not real. Sometimes companies cannot
afford paying dividends at all and usually they prefer paying constantly rising
dividends.
20
2.2.1 Discount factor
The risk-adjusted discount factor to be used to discount future flows into present
value consists of risk-free rate of return and risk premium. Risk-free rate of return
is the minimum return to be expected from any investment, although nothing is
purely risk-free. Risk-free rate of return is usually referred to as the interest rate of
three month US Treasury Bills. Risk premium is the extra pay investors expect to
achieve because of tolerating higher risk. As put in Luenberger (1997); it is a
simplistic way of taking uncertainty in account is to increase the interest rate.
The discount rate is the name of the rate at which US banks borrow from the US
Federal Reserve. It is also called key rate or FED funds rate. The FED adjusts the
key rate to control the liquidity in the markets to control the inflation. As banks
borrow the money further, their business is to benefit from it and charge higher
interest rate than the key rate. The key rate sets up the general level for interest
rates and the end-user interest rates for each period of time are determined by
supply and demand for money, which are in turn greatly affected by the economic
outlook.
The risk premium is the rate of return above risk-free interest rate, in other words,
the reward for holding a risky investment rather than a risk-free one. Risk can be
divided in two; market risk and specific risk. Market risk (systematic risk) cannot
be avoided because economic cycles affect the whole market, yet stocks are
affected differently than bonds. Specific risk (unsystematic risk) depends on the
investment giving the investor more control on the risk she is willing to take.
Unlike the market risk, specific risk can be diversified away.
Inflation is a risk for lender because purchasing power of the amount lent, in most
cases, is not as much as it was in the beginning. As the general level of goods and
services is rising, lenders expect to be compensated for lending the money. The
most well known are CPI (Consumer Price Indicator), which measures the
21
consumers prices constantly and defines the change as inflation and GDP deflator,
which measures the cost of goods purchased by U.S. households, government and
industry.
Liquidity premium is a term used to explain the difference between two loans
otherwise similar but the maturity dates. Short term loan is expected to be less
risky than long term, giving the long term loan wider premium. In graphics, this is
called the term structure which describes the relationship of spot rates with
different maturities in which the yield curve is upward rising.
Company’s cost of capital is frequently used as the standard interest rate for
discounting future cash flows in to the present value. The cost of capital is a
weighted sum of cost of equity and cost of debt (weighted average cost of capital,
WACC), in which the tax benefit of deductible interest payments is included
rdebt (1 − TAX ) .
EQUITY DEBT
WACC = requity + (13)
EQUITY + DEBT EQUITY + DEBT
The proportions of equity and debt are calculated with market values instead of
book values. The cost of debt is the easy part of the WACC because it is usually
clear how much a company is paying for their loans and bonds but the cost of
equity is trickier. The cost of equity is normally higher than the cost of debt
because it involves the risk premium. Actually, the cost of equity is the yield
requirement of shareholders for lending the capital and bearing the risk of
ownership. A common method for estimating the cost of firm’s equity is to use
dividend capitalization model. The dividend capitalization model approximates
the future dividends which capitalize to the current market price. If the company
is not paying dividends, those can be estimated by comparing its average net
income and cash flow with a similar-size firm. The formula for dividend
capitalization model is also called Gordon Model (Gordon 1962)
22
NEXT _ YEARS_ DVDS
COST _ OF _ EQUITY= + DVD_ GROWTH_ RATE. (14)
MARKET_ VALUE
The Gordon model itself is primarily used as a stock valuation method, but it can
be used to assessing the cost of equity from dividend trend. The model should
only be used with mature firms with low growth rates because of the assumption
of constant growth rate in perpetuity.
Another model for calculating the cost of equity is Capital Asset Pricing Model
(CAPM) which describes the relation between expected return and risk. The
model begins with time value of money in the form of risk free rate and continues
by taking into account asset’s sensitivity to market risk. The risk premium is the
difference between market return and risk free rate and the premium will be
multiplied the sensitivity coefficient beta, β which yields the return above risk
free rate. In other words beta is asset’s volatility in relation to the rest of the
market. In theory, the market portfolio includes all the assets in the economy in
proportion to their size but in practice, the S&P 500 index has often been referred
to be the market portfolio with beta of 1. Luckily, news agencies such as Reuters
and Bloomberg offers betas for many of the listed stocks. The model is defined as
23
3 Bankruptcy prediction models
The first objective in LDA is to identify a set of variables that has the strongest
24
discriminating ability (Sharma 1996). Those variables are called discriminator
variables. Means xiC j , i = 1,2,..., r , j = 1,2,..., q , will be calculated for each
variable x i in each group and the number of the variables involved is denoted by r.
Categorization of observations is based on their “z-scores” given by the weight
( wi ) function. It is also called discriminant function
Two conditions must be satisfied to provide the maximum separation for z: the
group means in the z should be as far apart as possible and values of z in each
group should be as homogenous as possible. The two conditions are conjoined in
having maximum between-group sum of squares and minimum with-in sum of
squares. The second objective of LDA is to identify z, which provides the best
maximum separation into the distinct categories. The third objective is to classify
future observations into each of the groups, respectively.
(x1 − x2 )2
J ( w) = , (17)
s12 − s 22
∑ (xij − xi )2 , i = 1,2,..., g .
1 ni
s i2 = (18)
ni − 1 j =1
Maximizing Fisher’s criterion yields a closed form solution. In order to find the
maximizing vector wopt , we have to calculate the first derivate from the criterion
25
and solve the equation J& ( w) = 0 . The criterion needs to be rewritten to solve the
equation (Sharma 1996). The criterion can be written as
wT Bw
J ( w) = . (19)
wT Ww
SSCP = X T X = B + W . (20)
w = S −1W ( x1 − x2 ) . (21)
Discriminant analysis has three assumptions that the data should meet;
multivariate normality, equality of the covariance matrices and independency of
observations (Sharma 1996). In order to achieve statistically significant results,
discriminator variables come from a multivariate normal distribution. In theory,
classification results are also affected if the assumption is violated. Unfortunately
there is no clear-cut answer how much the variables can deviate from the
normality. Although, studies have shown that even if the overall classification rate
is not affected, some groups might enjoy overestimation of suffer underestimation
(Lachenbruch et al. 1973). Violation of the assumption for the equality of
covariance matrices also affects the significance tests and the classification
results. The degree to which they are affected depends on the group sizes and the
number of discriminator variables (Marks & Dunn 1974). In cases of unequal
group sizes and when the number of discriminator variables is large, the null
26
hypothesis for equal group means is rejected too often. The equality of the
covariance matrices assumption can be tested with Box’s M test variable, which,
in turn, can be approximated as an F-statistic. Discriminant analysis is quite robust
to the two assumptions but it is beneficial to know the possible effects of violating
these assumptions. The final assumption of independence of observations is less
discussed but it has a substantial effect on the power and on significance level, as
well. The assumption is often violated when delicate procedures are used to for
samples causing correlation among the observations. One can use stringent alpha
levels if the observations are assumed not to independent of each other.
In 1968, Edward Altman made the pioneer research in discriminant analysis with
financial ratios. He assessed the quality of ratio analysis as an analytical technique
and the prediction of corporate failure was used as an illustrative case. He
gathered sixty-six firms to adjust the model to best categorize between bankrupt
and non-bankrupt firms. The firms were from the US manufacturing sector from
years 1946-1965 and the mean asset size is $6.4 million. He used five financial
ratios as explanatory variables. Altman was able to achieve 94 % classification
rate with the initial sample. He tested the model with several secondary samples
which validated his results. The model could predict bankruptcy two years prior to
the actual failure.
According to a complete history review (Adnan & Dar 2006), about 30 % of the
bankruptcy prediction research is realized with dicriminant analysis. The
geometric mean of the prediction rates is 85 % among the 25 past studies he had
gathered to the review and DA ranked number one in bankruptcy prediction
scene.
The two most frequently used methods for deriving the variable profile in LDA
are simultaneous method and stepwise method (Laitinen & Laitinen 1998). The
simultaneous method is direct and the discriminant analysis is executed with ex-
ante defined variable profile. Stepwise method, on the contrary, uses forward
selection, backward elimination, or stepwise selection. Forward selection begins
27
with no variables in the model and at each step, if the variable that contributes the
least to the classifying ability measured by the coefficient of determination, R 2 ,
fails to meet the criterion to stay, will be removed. Once in the profile, the
variable stays there. The backward elimination begins with full variable profile
and they are eliminated one by one if they do not contribute significantly to the
degree of discrimination; R 2 . Stepwise procedure is a compromise between the
two other procedures. Stepwise procedure starts out empty and the order of entry
for the variables included in the model is solely based on their statistical criteria
(Tapachnick & Fidell 2000). Variables can also be eliminated from the profile if
they are not found significant anymore. The interpretation of the variables is not
important in any of these methods which might not go along with reality for some
variables being more meaningful than others. These methods are useful for testing
explanatory variables especially when there are loads of them but the researcher
must be cautious with the methods for not discarding important variables
unconsciously.
Logit models provide results that are easy to interpret because they are based on
probabilities. Each x is represents a certain financial ratio and they are weighted (a
& b) according to past-data usually by the method of maximum likelihood. In
other words, the logistic regression determines whether each explanatory variable
has a predictive relationship with dichotomous dependent variable. Logit models
are analogous to multiple linear regression methods when the dependent variable
is binary. The model as in the basic form
The main advantage against discriminant analysis is that logical regression model
does not require variables to be normally distributed and samples to have equal
28
covariance matrixes. The probability for the delayed payment p fixed from the
equation 22, as
1
p= − ( a + b1 x1 + b2 x 2 + b3 x3 + ... )
. (23)
1+ e
The logistic curve illustrates the relation between probability, p, and the
independent variable x (Figure 2). The logistic function “normalizes” all the
values non-linearly to probability scale; 0 to 1.
p 1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
X
According to the comprehensive history review (Adnan & Dar 2006), logit
models account about 21 % of the bankruptcy prediction research methods.
Average predictive accuracy among 19 past studies of the logit models is 87 %.
Logit model can be seen as simple and competitive methods for bankruptcy
prediction.
29
3.2 Artificial intelligent expert system models (AIESM)
1
F ( z) = , (24)
1 + e ( − aZ )
30
in which, a determines the steepness of the slope and, Z is the weighted
summation score. If the activation function would mimic a real neuron, it would
give binary output but for many practical reasons, a smooth function is used. The
output is usually centered to small values around zero. The most often-used
activation functions are threshold, sigmoid and hyperbolic tangent. The use
depends on the characteristics and the range of the wanted outputs.
Neural networks offer strong results for corporate failure prediction. Neural
network models neither explain how they ended up in the classification, nor do
they give a likelihood of possible failure. Teaching the neural network is time
consuming and finding the most suitable model may be difficult because there are
many different models to pick from. A neuron is depicted below (Figure 3).
31
According to the history review (Adnan & Dar 2006), past studies of neural
networks have had an average prediction rate of 87 % and about 9 % of the
bankruptcy prediction studies have been neural network studies. Artificial
intelligent models cover about a fourth of the bankruptcy prediction research.
Theoretical models concentrate on qualitative causes of failure and they are drawn
from information that could satisfy the theoretical argument. Theoretic models are
also multivariate in nature and they usually employ a statistical technique to
support the qualitative theoretical argument. Four types of theoretic models are
reviewed below.
Balance sheet decomposition measures the changes in the balance sheet and relies
on the assumption that companies try to maintain equilibrium in their assets.
Heavy changes may be signs for financial distress in the future. Decomposition
measures can include current assets as a fraction of total assets, current liabilities
as a fraction of total assets, long-term liabilities as a fraction of total assets, etc.
Booth (1983) produced empirical evidence that failed and non-failed companies
have distinct characteristics in the composition of their balance sheet, even
though, his model was unable to successfully classify non-failed companies. Yet,
balance sheet decomposition is a useful tool for assessing companies’ financial
condition.
In Gambler’s ruin theory, firms are seen as gamblers betting constantly with some
probability of loss (Adnan & Dar 2006). Ultimately the game ends as the firm
fails. Flipping a coin is a good example of gambler’s ruin theory, because the
player who starts with more coins is more likely to win, even with equal odds.
32
distress and insolvency. Cash management theory models minimize the costs of
cash management, optimize the capital structure and maximize the present value
of net cash flows (Zapranis & Ginoglou 2000). The cash management would be
much simpler in an optimal business world without lags in payments. Cash
management is needed for preparing for unexpected costs, debt services,
inventory fills, reserve cash for varying revenue, etc... The variables used in the
cash management models vary but can include for example the elasticity of cash
balance with respect to volume of transactions or elasticity of cash balance with
respect to opportunity cost rate. Empirical studies support the idea of cash
management behaviour changes notably prior to the times of financial distress
(Zapranis & Ginoglou).
Credit risk theories are mostly for money borrowing firms. Financial institutions
have created a number of models for measuring credit risk and the models are
based international bank regulatory framework BASEL, as well as on corporate
finance theories. Two influential benchmark models (Gordy 2000) are J.P.
Morgan’s CreditMetrics and Crédit Suisse’s Financial Product’s CreditRisk.
Specifically, models are measuring the portfolio value-at-risk for market risk. In a
comparison of the two, the models were claimed not having serious differences
and they are profoundly suitable for comparing the relative risk levels in two
portfolios than producing absolute levels of risk (Gordy 2000).
33
4 Empirical analysis
The economic sector used in this research is more extensive, industrial sector, as
Bloomberg defines it. The industrial sector consists of 10 sub-sectors;
aerospace/defence, building materials, electrical components and equipment,
electronics, engineering and construction, environmental control, hand- and
machine tools, construction and mining machinery, diversified machinery, metal
fabricates and hardware, miscellaneous manufacturing, packaging and containers,
shipbuilding, transportation and trucking and leasing. The industrial sector seems
capital intensive, as they need plants for production and the sector seems rather
stable. On the other hand, the industrial sector suffers from raw material and
energy price hikes and from increased competition in emerging economies, in
which the cost of labour is smaller. The industrial sector also includes cyclical
industries, such as construction but when compared to other sectors in
Bloomberg’s partition, the industrial sector is thought as the best for the research.
The industrial sector included 970 listed companies in the end of year 2006, with
market capitalization above $1 million.
34
The other sectors than the industrial sector in Bloomberg are basic materials,
communication, cyclical consumer products, non-cyclical consumer products,
diversified, energy, financial, technology and utilities. Another sector used is also
included to examine the inter-sector predictive capability of the discriminating
function. Validation of the predictive capability is an essential part of prediction
model formation. The reference sector is non-cyclical consumer goods, as it was
thought to be the most similar sector with the industrial sector in the Bloomberg’s
partitioning. Non-cyclical consumer sector includes agriculture, beverages,
biotechnology, commercial services, cosmetics and personal care, food, healthcare
products, healthcare services, household products and wares and pharmaceuticals.
Non-cyclical consumer companies are called defensive and they have tendency of
outperforming the market in the times of recessions because they produce
products that we are not used to live without. In the end of year 2006, 446 non-
cyclical companies existed with market capitalization above $1 million.
The first decision was to use annual historical data instead of quarterly data
because annual data seemed to have less variance. Companies fix their quarterly
results knowingly, although the business stays as it is and the management
prettifies results for example to please shareholders. Results can be fixed in the
short run but they have less adjustment tolerance with annual results, in which “all
comes together”. That makes annual results clearer than the quarterly data and it is
used in this Thesis. Also, annual data was better available than quarterly.
After some examination, it seemed that Bloomberg has history data from the
beginning of the 90´s. There is not much of financial ratio data available in the
80’s although price history might be available way longer in the past. Therefore,
the history data used to for the samples begins in 1989 and continues until the
latest full year; 2006.
35
4.1 Valuation phase
All the companies in the two selected sectors, industrial and non-cyclical, are
valued with the dividend discount model. The discount rates are calculated from
the Fed Funds rate, as it can be seen as the risk free return on investment. The Fed
Funds rate is used in order to discount dividends and market prices a realistic
manner. The goal of the Thesis is to valuate stocks on a relative basis which is
why the general level of the discount rate is not of greatest interest. Important is
that the discount is the same for all the observations. The annual Fed Funds rates
from years 1989-2006 are shown in Table 1.
The valuation of a stock is the ratio between the price at time t and the sum of
discounted dividends from the years t, t+1, t+2, t+3, t+4, t+5, t+6 and t+7 and
discounted sale price from year t+7;
36
P0
VALUATION = 7
. (24)
Div t P7
∑
t =1 (1 + ri )
+
(1 + r7 )
As the valuation is enforced with 7 year dividend discount model, the latest
possible valuation point in time is in the year 1999 with data reaching to year
2006. This is the major limitation of the analysis. The history data starts from the
year 1989 but the valuation is started two years later, in the year 1991. The aim is
to avoid mispricing of stocks in the first few years after the stock exchange listing.
When a fresh new company is about to gather capital by listing itself in a stock
exchange, it arouses plenty of prejudice and enthusiasm about its value.
Companies are usually heavily mispriced at the moment of listing but, sooner or
later, the markets tend to fix the price at the correct level. According the two year
existence restriction, the valuation is executed for the years 1991-1999.
Now that there are nine separate periods to enforce the valuation, there needs to be
nine separate sets of interest rates formed from the annual Fed Funds spot rates.
The required short discount factors can be calculated with rolling over an
investment each year with a spot interest
t ∗+7
1
rt∗ = ∏ . (25)
t =t * (1 + Fed _ Funds _ ratet / 100 )
As can be seen in Table 2, discount factors derived from short rates vary some
because of sliding “current time”. In other words, different sets of spot rates are
used to calculate them.
37
Table 2 Discount factors.
Period t t+1 t+2 t+3 t+4 t+5 t+6 t+7
1989-1996 1,000 0,927 0,881 0,851 0,827 0,791 0,748 0,710
1990-1997 1,000 0,950 0,919 0,892 0,853 0,807 0,766 0,726
1991-1998 1,000 0,967 0,935 0,907 0,868 0,821 0,780 0,739
1992-1999 1,000 0,971 0,929 0,878 0,834 0,791 0,751 0,715
1993-2000 1,000 0,957 0,904 0,859 0,814 0,774 0,736 0,692
1994-2001 1,000 0,945 0,898 0,851 0,809 0,769 0,723 0,700
1995-2002 1,000 0,950 0,901 0,856 0,814 0,765 0,740 0,728
1996-2003 1,000 0,948 0,901 0,857 0,805 0,779 0,767 0,759
1997-2004 1,000 0,950 0,904 0,850 0,822 0,809 0,800 0,788
1998-2005 1,000 0,951 0,894 0,865 0,851 0,842 0,829 0,801
1997-2006 1,000 0,950 0,904 0,850 0,822 0,809 0,800 0,788
Companies with market capital below $300 million and above $3 billion at the
moment of examination are excluded from the samples. Therefore, companies can
be called mid-caps, although the definition of mid-caps varies greatly. The mid-
cap restriction is not very strict but it excludes very unstable start-up companies
and, on the other hand, it excludes giant businesses that are very powerful and
exceptional in numerous ways.
The nine year period is divided in to basic sample, 1991-1996, and hold-out
sample, 1997-1999. The idea of the two samples is that the basic sample is for
setting up the coefficients for the discriminant function (in-sample prediction) and
the hold-out out sample is for trying out the predicting capability of the
discriminant function (out-of-sample prediction). Yet, another out-of-sample
prediction is enforced to the external sector of non-cyclical consumer goods for
the years 1991-1996. Researchers constantly argue whether to use the in-sample
prediction or the out-of-sample prediction (Atsushi Inoue and Lutz Kilian 2002).
The conventional wisdom is that in-sample prediction has a weakness of model
overfitting and out-of-sample prediction protects against it. The overfitting with
in-sample prediction yields from using (at least partially) same data to adjust the
model and to assess the prediction capability.
38
4.3 Selection procedure of companies
The industrial sector has 290 companies with full price history starting from 1989.
The procedure is to start from year 1991 and select nine smallest valuations,
meaning nine cheapest companies. Each observation is obliged to have full ratio
data and to have market capitalization between $300 million and $3 billion. The
chosen ones are then being excluded from the list and the same choosing is
enforced for the nine most expensive companies, respectively. This is repeated for
each of the six years and the basic sample then consists of 108 companies, 54
cheap and 54 expensive ones. The reason to start from the furthest year from now
is rationalized with the fact that there are more companies with full history data as
it gets closer to the present day. That leaves bigger sample for the hold-out period,
in which companies have to have complete price history two years before the
start; 1995.
The hold-out sample is formed in a similar manner, starting from year 1997 with
the nine cheapest companies. The hold-out time period includes 300 additional
companies with the two year existence rule compared to basic sample. However,
the number of companies with full ratio data with the restrictions mentioned
earlier is rather limited. Nine cheapest and nine most expensive companies are
chosen for each year, starting from year 1997. Consequently, the hold-out sample
consists of 54 companies, 27 cheap and 27 expensive ones.
The extra sector of non-cyclical consumer products is treated in the exact same
fashion. The list of potential companies is smaller than the list of industrial
companies with total of 130 companies with restriction of having full price history
from year 1989 until 2006. This time, the five cheapest and five most expensive
companies are chosen each year which results in a sample of 60 companies.
39
4.4 Approaches to sample forming
After many trials, the procedure for forming the samples in a yearly order with
matching the same number of the cheapest and the most expensive ones seems
practical. The procedure balances all the years with the same amount of
observations in the both categories. An alternate procedure would be to simply
choose the cheapest ones without paying attention to the year they occur. These
could be matched with the most expensive ones, respectively. Then, the
discriminating function may concentrate on some of the years more than others
and the universal applicability might get biased. The paired-sample design is
frequently used technique to form samples. Beaver (1966) is a well recognized
researcher in bankruptcy prediction and he, among others, used paired-sample
design in his analysis. At first, Beaver chose the bankrupt companies (cf. cheap
companies) and, next, he started selecting pairs (cf. expensive companies) for
each bankrupt company. The pairs needed to be from the same sector, to have
rather equal market capitalization and to be from the same year. Compared to
Beaver’s paired-sample design, the market capitalization matching has been left
out, as the market capital restriction of $300m-$3bn is thought to incur adequate
homogeneity among the companies. Also, the paired-sample design turned out to
require way bigger initial samples as the sampling rules are so strict. In this case,
the paired-sample design felt somewhat like data-mining, although the Thesis is
about data analysis. To be clear, data mining focuses on extracting useful
information from large sets of data (Mannila et al. 2001), and the crucial point is
that the data-mining applications are to some degree self-guiding. Data analysis,
for its part, is not aiming to the discovery of unforeseen patterns hidden in the
data, but to assessing existing model into the data or extracting parameters for a
model to adapt it to reality.
40
4.5 Ratio profile
The ratio data gathered for each company in the two sectors includes, price to
earnings ratio (PE), gearing ratio (GEA), price to book value ratio (PB), free cash
flow per share yield (FCPY), cash flow per share yield (CPY), return on capital
employed (ROCE), earnings before interests and taxes margin (EBITM) and price
to sales ratio(PS). The dividend yield is excluded from the profile because if a
company is not paying dividends, it is a systematic error in a form of zero or
empty value. All the variables are continuous and, for example, companies with
negative earnings must be excluded because only positive PEs are announced. All
the rest are announced to be zero or empty value, which makes them
discontinuous.
41
5 Results on predictive accuracy
Class specific and overall mean values and variances of the basic sample are
delivered in the Table 3. Each of the eight ratios is inspected as well as market
capitalization of the companies. The variances are given as a percentage share of
the mean value, thus, they are the coefficient of variance. From now on, group I
refers to cheap and group II to expensive companies.
In general, the starting point with valuation ratios is that the higher the ratio, the
more expensive the stock. Price to sales-ratio and price to book-ratio are 10% and
35% higher in the group I, which argues with intuition. The average PE-ratio level
of 31.908 is generally really high but the massive variance of 306.8 % in the
group I refer to outliers, which in turn, can explain the illogical order of
magnitude.
42
The group of cheap stocks is more heavily geared than the group of expensive
stocks but again, there is a tremendous variance of 278.9 % in the group of cheap
stocks. A somewhat logical explanation to higher gearing levels in the groups of
cheap stocks could be that small, growing companies need to take more debt to be
able to grow. Markets might price the group I stock low because of risk stemming
from the relatively high debt. The overall gearing ratio is moderate 54 %.
The cash flow ratios of CPY and FCPY and stronger among the cheap stocks but
they are the most severely affected by the variance. The FCPY ratio in the group
of expensive stocks has the record variance of 2017.9% which is not bearable for
any multivariate analysis.
The second highest correlation of 0.75 is between PB and PS. This correlation is
also a strong one and it stems from the micro finance basics: sales is the input for
the balance sheet, costs subtracted, of course. Also, the companies in the sample
must have quite similar EBITM because correlated amounts of money are wasted
43
and brought to balance sheet. The lowest value in the overall coefficients of
variance (Table 3) speaks out for the conclusion of the similar EBITM between
the groups. The third high correlation is between EBITM and PS, which
complements the aforementioned computational relationship from micro finance.
Other correlations between the variables are not significant. The highest negative
correlation of -0.26 is between PS and GEA and it suggests that the level of net
debt increases as the sales increase or market price decreases.
As is the case with the most of the multivariate techniques, the significance of
statistical tests in discriminant analysis requires certain assumptions to be
fulfilled. Violating the assumptions can influence the significance and the power
of the statistical tests. The first assumption is that the data comes from a
multivariate normal distribution. The overall glance of the mean values and
variance of the basic sample suggests that the variables are not normally
distributed.
At first, the individual variable dispersions are examined graphically. The values
of the variables are plotted as an inverse of the standard normal cumulative versus
the ordered observations. First, the data is arranged from smallest to largest, and
then the cumulative percentiles are determined. The z-scores are then calculated
from the standard normal distribution and each of the z-value is plotted against the
44
corresponding data value. If the data is normally distributed, the plotted points
form a straight line and stragglers at either end indicate outliers. Variables are
examined by groups. The observation of the probability plots revealed the fact
that in most of the cases, there are just a couple of sky high outliers. The
assumption for the variable normality is rejected for all the variable groups
because of those extreme values. The closest groups to the normality are GEA and
CPY in group II, ROCE in group I and EBITM in both groups.
The most famous and seemingly the most powerful individual test for normality is
Wilks-Shapiro. The Wilks-Shapiro test is enforced for both the groups with all the
8 variables. The test-values, probability levels and decisions at the significance
level of 0.05 are in Table 5. All the variables in both groups are rejected for
normality at significance level of 5%. There are only 6 out of 16 with a
probability level of above zero with six decimals. The same six variable groups
were concluded to be close of being normally distributed in the “eye-balling”
earlier. CPY in the group of expensive stocks is the most normal with a
probability level of 0,006998.
45
The assumption of multivariate normality is necessary for the significance tests of
the explanatory variables and for the discriminant function itself. The degree to
which the assumption can be violated cannot be specified scientifically, but earlier
research has shown, that although the overall classification rate remains
unaffected, some groups might suffer from underestimation and some from
overestimation (Lachenbruch et al. 1973). As it is said in the NCSS’s (statistical
software) help page about discriminant analysis; “a sample size of at least twenty
observations in the smallest group is usually adequate to ensure robustness of any
inferential tests that may be made”. The phrase in the help page refers to central
limit theorem, which suggests that the sum of variables is more likely to be
normally distributed as the number of observations increases, despite the
distributions they come from. The central limit theorem suggests that in the end,
everything is normally distributed even though variables being discontinuous. The
multivariate normality assumption is not very strict and researchers have not paid
much attention to it in the earlier research. Unfortunately, there are very few tests
for the examination of multivariate normality and in this regards, graphical
examination is enforced (Johnson & Wichern 1987).
The multivariate normality is checked with Q-Q plot which is formed in the
following way. First, Mahalanobis distances are calculated for each of the
companies. Mahalanobis distance is a statistical distance from the sample
centroid. The distances are calculated with equation
′
MDrs = ( xr − xs )V −1 ( xr − xs ) .
2
(26)
The matrix in the middle is the covariance matrix of the x, defined as: -V=cov(x).
Second, the distances are sorted to the order of magnitude and percentiles are
calculated for the distances according to the observation number j with an
equation of (j-0.5)/n. The n is the total number of observations. Third, Chi-square
values are calculated for the percentiles. It has been shown that when the sample
size is sufficiently large (25 or more) and when the parent population is normal,
46
the mahalanobis distances behave like a Chi-square random variable
(Gnanadesikan 1977).
The Chi-square values are plotted against the Mahalanobis distance in Figure 4. A
curve is also fitted to with a least sum of squares method to the points in Figure 4
to examine the linearity of the plot. The plot appears not be linear at least because
of extreme observations on the right side. The majority of the observations are
concentrated on the small end where the slope seems rather steep and the plot is
also skewed to left. The normality assumption is rejected after the graphical
observation of the plot. Obviously, the test based on Q-Q plot is subjective,
because the researcher visually determines the linearity. The more analytical way
of assessing the linearity of the plot is to compare the correlation coefficient of the
plot to the critical values. The critical values give the percent points of the
cumulative sampling distribution of the correlation between sample values and
theoretical quantiles obtained empirically by Filliben (1975). The correlation of
the plot is 0.740 and compared to the critical value for the alpha level of 0.05 with
47
100 (closest to 108) observations of 0.987 (Sharma 1996:446), the plot is not even
close to linear. Although the critical values are computed for univariate
distribution, they can be used as a benchmark.
Since the significance levels are below 0.05 with all the variables except PB and
EBITM, they are assumed to have significantly unequal variances between the
groups. PB (0.221) and EBITM (0.370) have equal variances at statistically
significant level, so, they pass the test. The Box’s M test is significant at any
significance level with a probability of 0.000 indicating that the equality of
48
covariance matrices is not fulfilled. Tests for the covariance matrices and the
variances are affected by the non-normality of the variables.
Classification results were weak for basic sample with total classification rate of
56 % (Table 7). Classification is practically at the level of random guess, that
being 50 %. There are 25 cheap companies that are classified as expensive (type I
error) and 22 expensive companies that are classified cheap (type II error). The
error rates are 46 % and 41 % for types I and II. As discussed earlier, the
assumptions of multivariate normality and equality of variances were violated
severely which is sure to weaken the classification results.
49
two groups is account for by the discriminating variables, which appears to be
quite unimpressive.
Table 8 The significance of the discriminant function of the basic sample of 108 companies.
Canon Canon Numer Denom Prob Wilks'
Corr Corr2 F-Value DF DF Level Lambda
0,3621 0,1311 1,9 8 99 0,0737 0,868901
There is not an analytical way of defining how high is “high” with the practical
significance but it is similar to R-squared in multiple regression and it is used to
determine if the strength of the relationship is strong on a relative basis.
Frank et al. (1965) studied the biases in the discriminant analysis he suggested to
use split sample validation. The Sample is randomly divided into five distinct
subsets, each consisting of 14 cheap stocks and 14 expensive stocks. The
prediction will be performed for the remaining 80 companies based on the
coefficients acquired from the sub-sample of 28 companies. All the five
prediction accuracies (Table 9) are in line with the original classification rate of
56 %, the final sub-sample even outperforming the original rate.
The t-values are based on a test of whether the proportion of correctly classified
cases in the sample is significantly different from the proportion that would be
obtained by change. The intuitively clear boundary is 50 % but the test takes into
account the number of observations in the prediction. According to Frank, the t-
value is biased into direction of showing greater prediction rates than there would
be among the whole population. But the magnitude of the bias decreases as the
sample sizes become larger.
50
The test values are calculated with equation
P is the theoretical likelihood of belonging to each group, which is 0.5 in the two
group case. The number of observations predicted is n (80).
This time, the Wilks’ Lambda test is used to test the statistical significance of
discriminant functions that consist of only one variable at a time and also to test
the affect on significance by removing one variable at a time from the full profile.
Variable influence section gives clues of the most and least discriminating
variables which are important especially when the ratio profile is not fixed. If the
stepwise procedure is used to try out different combinations of variables, variable
influence section gives guidelines for introducing and removing probabilities for
the variables. The variable influence results are shown in Table 10.
51
The alone statistical significances reveal that only FCPY is statistically significant
(0.021637) at the significance level of 0.05. The significance suggests that the null
hypothesis is rejected for the group centroids being equal. The second best alone
F-probability is 0.143597 by the CPY, but the null hypothesis remains in force.
The third most significant is the PS with a probability of 0.246839, but the rest of
the alone probabilities are a lot weaker, PE, PB and GEA being the weakest
discriminators.
The removed Wilks’ Lambda is computed to test the impact of removing the
variable from the profile. The impact is statistically significant with FCPY and
CPY at the significance level of 0.05, which applies to their strong alone F-
probabilities. EBITM is also close of being significant with a probability of
0.082405. If removed, PB and GEA have the weakest affect.
The last column in Table 10 is for R-squared value that would be obtained if the
variable in question is regressed on all other independent variables. Values higher
than 0.99 suggest severe multicollinearity among the variables. Removal is
advised for variables with such high R-squared values. The variables that
managed well in the two previous tests are now being the most correlated to the
rest of the variables with values of 0.857286 and 0.859114. The variables are
regressed on all the other variables but the cash flow ratios could be concluded to
cause most of the correlation to each other, as the R-squared values are almost
equal. The values are not alarming but the possibility of the linear combination of
one to others has to be taken into account. ROCE is the most independent variable
and PE is the second. The higher values with the cash flow ratios can be explained
by the similarities in the way of calculating the ratios and, in the end, all the
variables are connected through a balance sheet.
52
analysis. The cheap group has scores below zero and the expensive group above
zero.
It is interesting that the absolute biggest weight is on FCPY and the second
biggest on CPY, but they have different sign digit. PS weight of 0.78 and EBITM
of -0.59 are also relatively big but the rest of the weights are quite equal. The
correlation coefficients between the discriminant scores and the discriminator
variables assist to interpret the relative contribution each variable has on the
discrimination. The result matrix is called the structure matrix:
The cash flow ratios are two most correlated variables with the discriminant
scores but their mutual relation is ambiguous because this time, both the
correlations have negative sign. The FCPY ratio suggests that the cheap stocks
have relative strong FCPY ratio because the cheaper the stock, the more negative
the discriminant score. High FCPY ratio can easily be accepted as a common
characteristic for a cheap stock. According to the standardized weights, the CPY
ratio could be interpreted as a decelerator for the FCPY ratio but the interpretation
is more complex as the negative correlation with discriminant scores is accounted
for. A careful guess is that the CPY decelerates the FCPY ratio a little but it
remains positive for the cheap stocks. The PS ratio points out its influence on the
discrimant scores with the third place in both the tables above. The positive
correlation with discriminant score suggest high PS ratios for the expensive
stocks, which is understandable as the low sales incur high PS ratio. Therefore,
high sales figures (thus low PS ratios) can be accepted characteristic for cheap
53
stocks. The conclusion is consistent with the much debated FCPY ratio as well,
because weak sales figure is not likely to cause strong FCPY.
The success for FCPY and CPY are repeated in the dichotomous classification test
for the basic sample (Table 13).
54
increased 21 percentage points but the type II error of 44 % increased only 3
percentage points. For practical point of view, the high type I error is hazardous
because 67 % of the expensive stocks are though as cheap. The explanation for
the type I error could be too short statistical distance between the groups while the
discriminant function is leaned towards the group I. Possible extreme points in the
basic sample might also affect the classification rate.
Yet another prediction is tried out to another sector in order to assess the universal
prediction capability. The sector is non-cyclical consumer goods sector in the U.S.
and it is formed by a similar procedure than the other samples. The non-cyclical
sample consists of 60 companies, 30 cheap ones and 30 expensive ones, 10 for
each year in 1991-1996. The prediction rate is what was expected; 43 % with
error rates 69 % and 73 %. Interpretation of the rates below the random rate of 50
% is quite useless but the prediction rate can be compared to the internal
classification rate of the non-cyclical sector. A discriminant function is formed
from the 60 companies in the non-cyclical sector and the similar 8 explanatory
variables are included. The quick formation results 67 % discrimination between
the groups and it is cross-validated to 50 %.The inter-sector prediction failed as
badly as the prediction to the hold-out sample, although the non-cyclical sector
could be discriminated more accurately. The estimated discriminant function
coefficients for the basic sample are in Table 15.
55
6 Sensitivity analysis
The predictive accuracy obtained from the unprocessed data was weak. The
assumptions of normality and equality of covariance matrices were severely
violated and, next, the data will be processed to fulfill the assumptions, at least
approximately. The sensitivity chapter focuses on the classification and prediction
rates as the variables approach normal distribution and as different ratio profiles
are tried. The procedure is more or less trial and error like because of the endless
possibilities of executing the analysis.
56
As expected, the overall variances have dropped dramatically with almost all the
variables (Table 16), as the outliers were excluded. FCPY, CPY and ROCE
experienced the heaviest drops in the variance of more than 230 percentage points
and PE following with 190 percentage points. The lightest drop was with EBITM
of only 3.6 percentage points.
6.1.1 Normality
57
Table 17 Wilks-Shapiro variable normality test for the trimmed sample.
Test Prob Decision
Variable Value Level 5%
PE Group I 0.789 0.000001 Reject Normality
PE Group II 0.924 0.005698 Reject Normality
GEA Group I 0.878 0.000208 Reject Normality
GEA Group II 0.984 0.796612 Accept Normality
PB Group I 0.862 0.000077 Reject Normality
PB Group II 0.932 0.011279 Reject Normality
FCPY Group I 0.875 0.000171 Reject Normality
FCPY Group II 0.961 0.132815 Accept Normality
CPY Group I 0.821 0.000007 Reject Normality
CPY Group II 0.976 0.478296 Accept Normality
ROCE Group I 0.988 0.915369 Accept Normality
ROCE Group II 0.943 0.028268 Reject Normality
EBITM Group I 0.837 0.000017 Reject Normality
EBITM Group II 0.930 0.009540 Reject Normality
PS Group I 0.824 0.000008 Reject Normality
PS Group II 0.755 0.000000 Reject Normality
GEA ratio in group II and ROCE in group I achieved strong normality after the
exclusion of the outliers, with probability levels of 0.79 and 0.91. CPY and FCPY
in group II achieved probability levels of 0.47 and 0.13. Other ratios close to
being accepted are PE, PB and ROCE in group II. The PS ratio in group 2 is the
only one of them to remain at the probability level zero. The variable normality
improved comfortably on a univariate basis.
The multivariate normality is assessed with the help of the Chi square to
Mahalanobis distance plot, again (Figure 5).
58
Figure 5 Chi Square plot for the trimmed sample.
Based on visual examination of the Chi square plot, the multivariate normality
cannot be accepted. The plot could be seen as partly linear if it would be cut in
two at around Mahalanobis distance of 7. The first half is steeper than the second
half. The values are concentrated in to the small end The correlation between the
Mahalanobis distances and the Chi square values is 0.934, which is much higher
than the 0.740 for the basic sample. Yet, the correlation is smaller than the critical
value 0.985 and the plot is not linear. The normality assumption was rejected but
the plot got better in the sense of linearity.
59
range between 0.01-0.05 is considered having marginally equal variances between
groups, PE has marginally equal variances. PS is also close to having marginally
equal variances with the value 0.009. FCPY remains at the probability level of
zero which makes it the number one suspect for the rejection of the Box’s M test.
The F approximation of the test was rejected with significance level of 0.05 but
the value, 0.006, rose promisingly from the zero. As discussed earlier, the equality
of the covariance matrixes was found to improve the type I error rate.
Table 18 Bartlett-Box tests for the equality of the covariance matrixes for the trimmed
sample.
Bartlett F F Chi2 Chi2
Variable Value DF1 DF2 Approx Prob Approx Prob
PE 4.0201 1 23232 3.980 0.046 3.970 0.046
GEA 0.6414 1 23232 0.630 0.426 0.630 0.426
PB 0.0112 1 23232 0.010 0.916 0.010 0.916
FCPY 13.4873 1 23232 13.340 0.000 13.330 0.000
CPY 2.9138 1 23232 2.880 0.090 2.880 0.090
ROCE 1.0398 1 23232 1.030 0.311 1.030 0.311
EBITM 0.0599 1 23232 0.060 0.808 0.060 0.808
PS 6.8076 1 23232 6.730 0.009 6.730 0.009
Box's M 67.6068 36 26057 1.700 0.006 61.160 0.006
A quick trial of the Box’s M test without FCPY revealed the fact that without it,
the equality of covariance matrices is accepted with F-probability 0.055. With the
original basic sample, the coefficients of variance for FCPY were 2.918 for group
I and 20.179 for group II. In the trimmed sample, the coefficients were 1.431 and
2.873, respectively. The gap between the coefficients of variance narrowed, but
compared to group 1, the variance is twice as big in the group two. To speak for
the FCPY ratio, it was the only one to reject the null hypothesis of the group
centroids being equal, earlier with the basic sample, when it was considered alone.
The FCPY ratio is strong but temperamental predictor.
60
exclusion of the outliers. The classification results improved from the level of
random guessing to the level of being indicative.
The exclusion of the extreme points from the basic sample has a moderate effect
on the classification rate. The improvement is understandable, because such an
extreme values easily biases the discriminant function into direction or another.
High within-group variance is very harmful in discrimiant analysis. The
differences should be between the groups.
FCPY and CPY ratios are significant predictors if used alone. They both achieved
statistically significant discriminant functions with probabilities 0.004 and 0.008
(Table 20). On the other hand, they are easier to regress with other variables than
the rest of the variables, except the PS ratio. The multicollinearity situation is
changed because earlier, the R squared values were equal for the tow cash flow
ratios. EBITM and GEA were the ratios that weakened as alone predictors but the
rest of them had some improved. The rank 1 alone classifier, FCPY, is still the
best alone predictor. However, FCPY’s affect of removal decreased the previous
0.007 to 0.339. According to this test, ROCE achieved the top position in the list
of ratios not to be removed from the full ratio profile.
61
Table 20 Variable influence section for the trimmed sample of 90 companies.
R-
Removed Removed Removed Alone Alone Alone Squared
Variable Lambda F-Value F-Prob Lambda F-Value F-Prob Other X's
PE 0.963179 3.1 0.082236 0.997806 0.19 0.66109 0.437305
GEA 0.987205 1.05 0.30859 0.999454 0.05 0.826927 0.380058
PB 0.986721 1.09 0.299554 0.985097 1.33 0.2517 0.432491
FCPY 0.988714 0.92 0.339121 0.911228 8.57 0.004343 0.635176
CPY 0.992533 0.61 0.437293 0.922106 7.43 0.007725 0.718753
ROCE 0.96662 2.8 0.098289 0.987987 1.07 0.303772 0.360985
EBITM 0.987048 1.06 0.305624 0.999992 0 0.979479 0.616409
PS 0.975639 2.02 0.15882 0.978474 1.94 0.167615 0.725707
Table 21 Prediction count table of the trimmed sample to the hold-out sample.
Predicted
Actual Group I Group II Total
Group I 19 8 27
Group II 13 14 27
Total 32 22 54 61 %
62
continues with both the sample; the original basic sample (108 observations) and
the trimmed sample (90 observations) in order to test the variable transformations
and ratio profiles on them. The trimmed sample has a lead over the basic sample
because of the significant improvement in the classification and prediction rates.
The impacts of the three basic transformations are examined in this regard; square
root, logarithmic, and inverse transformation. Square root transformation is the
most common transformation and it is the most suitable for counts. Square root
transformation can be used to distributions that look quite like the normal
distribution but which are skewed to the left. In other words, the distribution has
relatively large number small values. As the square cannot be taken from the
negative values, a constant is be added to the values to makes them all positive.
Besides that, square root is highly nonlinear with values between zero and one, so,
the minimum should be set higher than that. The logarithmic transformation is
commonly used for proportions, but with counts, the transformation can be used
similarly to the square root transformation but when the distribution leans left
exceedingly. The base for the logarithmic transformation changes the nature of the
transformation and natural constant is deemed suitable for the variable values in
the Thesis. Greater bases for the logarithmic transformations, such as 10 and 100,
will result in a loss of resolution with smaller values. The logarithm cannot be
taken from negative values either; a constant must be added to level up the values.
The inverse transformation makes very small values very large and very large
numbers very small. This transformation has the effect of reversing the order of
63
the scores and therefore, the values must be reversed by multiplying them by -1
and by adding a constant to bring the minimum back to above 1.0. The inverse
transformation is suitable for variables whose distribution looks like a steep
leftward slope. That means, that there are large number of small values and the
number of bigger values decrease rather linearly. The logarithmic transformation
works also for data that has growing residual as the value of the variable grows.
The three transformations were introduced in the order of their power, starting
from the weakest. According to the guidance of literature (Osborne 2002), all the
variables are anchored to minimum of 1.0.
All the variables are first being transformed to perform an exploratory analysis.
Univariate normality, multivariate normality, as well as the classification rates are
in focus. The probabilities of the F-test indicate that there are three variable
groups significantly normal after square root transformation, five after logarithmic
transformation and five as well, with the inverse transformation (Table 22). The
probabilities without the transformations are very close to the zero because of the
outliers and they improved slightly in consequence of the transformations. On the
other hand, the classification rates improved way above the original level of 56 %.
The classification rate improves as the transformation gets stronger, the inverse
transformation resulting creditable 9 percentage point improvement to 65 %. The
prediction rates improve quite linearly with the classification rates but even the
highest rate of 54 %, by the inverse transformation, is too close to the random
guess rate to be considered noteworthy.
64
Table 22 Wilks-Shapiro normality test after transformations for the basic sample.
Original SQRT LN INV
PE Group I 0.000000 0.000000 0.000006 0.000000
PE Group II 0.000000 0.000000 0.000042 0.027550
GEA Group I 0.000000 0.000000 0.000161 0.181271
GEA Group II 0.000080 0.058680 0.908347 0.001157
PB Group I 0.000000 0.000004 0.015348 0.000016
PB Group II 0.000000 0.000000 0.000647 0.286330
FCPY Group I 0.000000 0.000000 0.000000 0.000000
FCPY Group II 0.000021 0.000007 0.000003 0.000000
CPY Group I 0.000000 0.000000 0.000000 0.000000
CPY Group II 0.006998 0.020684 0.055728 0.275222
ROCE Group I 0.000221 0.027411 0.394959 0.025635
ROCE Group II 0.000000 0.000000 0.000000 0.000000
EBITM GROUP I 0.000012 0.005744 0.560079 0.000628
EBITM GROUP II 0.000218 0.156928 0.099314 0.000000
PS GROUP I 0.000000 0.000001 0.000539 0.847031
PS GROUP II 0.000000 0.000000 0.000002 0.057886
Classification Rate 56 % 59 % 62 % 65 %
Prediction Rate 44 % 46 % 50 % 54 %
65
the multimultivariate normality is rejected at for the three transformations, as the
critical value is 0.987.
The F approximations for the Box’s M test remained at the zero level, thus none
of the crude transformations improved the equality of the covariance matrices. As
it was discussed earlier, the assumption is very sensitive to the outliers and
multivariate normality.
Since the classification rates improved by the slight improvements with the
normality assumptions, a simple sensitivity analysis is enforced, in order to
examine the effects of transforming one variable at a time. There is only one
transformation with a negative effect on the classification rate; squared ROCE
(Table 23). The transformations of the GEA ratio turned out to strengthen the
classification rate as much as 8 % at its best, by the logarithmic transformation.
The inverse transformation improves the most the classification and the squared
and logarithmic transformations cause rather equal improvements.
The main conclusion is that any of the variable transformations is beneficial for
the classification rate of the basic sample. The improvement might stem from
smoothing of the extreme distances between the observations by the
transformations.
66
power of the variables. The three transformations are tested on the eight variables
and the dichotomous classification rates are compared to the original rate. There
are three obtrusive improvements among the dichotomous classification rates; 12
and 10 percentage points with the PE ratio and the 11 percentage points with the
ROCE ratio (Table 24). The three improvements are notable but the levels reached
are still weak. The logarithmic transformation upraises PE to the rate of 61 % and
ROCE to the original level of 56 % that the multivariate case resulted. FCPY is
originally the strongest classifier but it weakens as it is being transformed. The
effects of the transformations vary quite a lot but the logarithmic transformation is
the only one not to decrease the rates.
The number of combinations that can be formed with eight variables and four
choices for each is too large to roam through manually. Yet, numerous
combinations were tried out and they did not results significantly higher
classification rates than the 65 % that was achieved by the comprehensive inverse
transformation. Even if a few percentage points were achieved, it was not
logically achieved according to all the results depicted earlier. Therefore,
transformations are considered useful for increasing the classification rate with the
basic sample, which includes univariate and multivariate outliers. The inverse
transformation was the most beneficial for the classification and prediction rates.
Although the literature stresses that the variable transformations must be well
reasoned, a simple crude trial of inverse transformation yielded highest results.
67
6.2.2 Transformations and the trimmed sample
As with the basic sample earlier, all the variables are first being transformed in
order to see what happens. The univariate normality and the multivariate
normality, as well as the classification and prediction rates are examined. Wilks’
Shapiro test and the classification rates are shown in Table 25. The last row of the
table reveals the fact that if all the variables were transformed, none of the
classification rates improved compared to the starting point of 66 %. Same goes
with the prediction rates; none of them improved. The exclusion of the outliers
made four out of 16 sixteen variable groups normally distributed at the
significance level 0.05. Taking the square root from the variables kept the same
four variable groups at an acceptable level and additional three variable groups are
also accepted as being normally distributed. The number of normally distributed
variable groups is10 and 6 for the logarithmic and inverse transformations,
respectively. The variable normality is tested by groups but if a transformation is
accepted, both groups of the variable have to be transformed.
Table 25 Wilks-Shapiro normality test after transformations for the trimmed sample.
Variable Trimmed SQRT LN INV
PE Group I 0.000001 0.000453 0.057554 0.000013
PE Group II 0.005698 0.141455 0.347929 0.002027
GEA Group I 0.000208 0.026774 0.614070 0.274891
GEA Group II 0.796612 0.982204 0.473473 0.000139
PB Group I 0.000077 0.003091 0.087513 0.590774
PB Group II 0.011279 0.159392 0.639762 0.385379
FCPY Group I 0.000171 0.000618 0.002124 0.018725
FCPY Group II 0.132815 0.097257 0.068865 0.031599
CPY Group I 0.000007 0.000021 0.000063 0.000534
CPY Group II 0.478296 0.597056 0.702221 0.840173
ROCE Group I 0.915369 0.538298 0.105115 0.000341
ROCE Group II 0.028268 0.000220 0.000000 0.000000
EBITM GROUP I 0.000017 0.004104 0.312850 0.000624
EBITM GROUP II 0.009540 0.453685 0.038543 0.000000
PS GROUP I 0.000008 0.000368 0.015145 0.802802
PS GROUP II 0.000000 0.000013 0.000615 0.123910
Classification rate 66 % 61 % 62 % 59 %
Prediction rate 61 % 56 % 57 % 59%
68
Now that the effects of the transformations on the normality are excluded, a test
sample can be formed by taking a transformation based on the best sums of the F-
test probabilities in Table 25. Then, the test sample would the most normal on a
univariate basis, and the effects on the classification rate can be observed. The test
sample consists of logarithmic PE, GEA and EBITM, inversed PB, CPY and PS
and plain FCPY and ROCE. The test sample classifies correctly 62 % of the
observations. Now there are only four variable groups that are not normally
distributed at the significance level 0.05, yet, two of them are close, the
logarithmic EBITM ratio in group 2 with probability of 0.038543 and the ROCE
ratio in the group II with probability of 0.028268. The weakest two are the FCPY
ratio in group I (0.000171) and the inverted CPY ratio in group I (0.000534). The
correlation between the Mahalanobis distance Chi square plot is 0.9686, which
suggests that the multivariate normality is rejected when compared to the critical
value of 0.985 (Sharma 1996:466 ). Nonetheless, the correlation improved from
the earlier 0.934 for the trimmed sample. In general, both the univariate and the
multivariate normality assumption are very close to being met. The classification
rate of the test sample weakened four percentage points because of the
transformations. Therefore, the method of selecting variable transformations
based univariate probabilities from the Wilks-Shapiro normality test, is considered
useless
The equality of the covariance matrices did not improve earlier after transforming
the variables in the basic sample. The F approximation of the Box’s M for the
trimmed sample is 0.006 and it improves slightly after square root transformation
(0.026) and after the logarithmic transformation (0.007) but fades after the inverse
transformation (0.000). Therefore, the assumption for the equality of the
covariance matrices was not met with the three transformations.
69
According to the classification results, there are no grounds for any of the
transformations taken in the test sample.
70
The variable transformations did not improve the discriminating power (Table 27)
of the analysis even though the normality assumptions improved. The normality
and the equality of the covariance matrices assumptions do not seem very strict
with discriminant analysis. The assumptions must be looser because the ultimate
output is binary variable instead of continuous variable.
There are 255 distinct combinations as there are eight variables in the
consideration. The purpose of this section is to strive for higher classification and
prediction rates by trying out the different combinations. The three samples are;
the basic sample of 108 companies, both original and inverse transformation
versions and trimmed sample of 90 companies. The inclusion of the three samples
enables a comparison of the classification and prediction rates after the two data
processing techniques, outlier exclusion and variable transformation.
The gearing ratio was discussed with the instructor and the conclusion was to try
it out at first but also, to substitute the ratio with a binary variable. The dummy
variables are encouraged to be used with logit models rather than with
discriminant analysis. Since there is not a clear reason not to use dummy
variables, GEA MEDIAN is introduced. The GEA MEDIAN simply divides the
companies in the sample in two, half of them being geared and the other half free
from debt. The reason for the substitute ratio is that slight differences in the
gearing ratios do not affect the “goodness” of the companies and they are
considered equal. The companies that are heavily geared are, for their part,
considered worse (different) that the rest of the companies. In other words, the
additional information that the continuous gearing ratio offers was questioned and
the simpler division was introduced. The new variable serves as a substitute for
the original GEA ratio.
71
The trials of substituting the original GEA ratio with the GEA MEDIAN ratio
proved to improve the classification and prediction rates enough to approve the
substitution to take place (Table 28). Both the inversed basic sample and the
trimmed sample achieved 71 % which is 15 percentage points higher than the
original rate. The cell size can be considered to affect the creditability of the
results. Therefore, the 71 % for the inversed basic sample is more reliable than the
trimmed sample because the trimmed sample is 17 % smaller. Surprisingly, the
prediction rate of the trimmed sample fell as the GEA MEDIAN substitute took
place. So far, the inversed basic sample is deemed having the best discriminating
capability.
The most discriminating ratio profile for the basic sample is achieved without
ROCE and EBITM; thus PE, GEA MEDIAN, PB, FCPY, CPY and PS are
included. The classification rate is 69 % while the error rates 33 % for type I and
28 % for type II. The new ratio profile improved the prediction rate by 13
percentage points to 59 % and the prediction errors are 30 % and 52 % for type I
and type II. As can be seen in Table 28, higher rates can be achieved by outlier
exclusion and variable transformation.
The trials of different variable combinations for the inversed basic sample did not
yield higher classification rates than the 71 % with the full ratio profile. Equal
classification rates can be achieved with leaving PE, PB or ROCE out, one at a
time, or by leaving ROCE and EBITM out together. The four additional ratio
profiles have more type I leaning error distribution, so, the original full profile is
chosen for further examination. The errors are rather uniformly distributed; 28 %
for type I and 30 % for type II and the classification is strongly supported by the
cross-validation of 67.6 % correct classifications. The cross-validation is leave-
72
one-out of a kind, in which each observation is classified by the functions derived
from all cases other than that case. Although the inversion is based on a trial
rather than well justified variable normality approach, it yielded the second
highest classification rate and the highest prediction rate.
The highest classification rate for the trimmed sample is 73 % and it is achieved
by two different combinations, but the one with the higher type I error is
discarded. The ratio profile excludes EBITM and PS ratio, similarly with the basic
sample, leaving PE, GEA MEDIAN, PB, FCPY, CPY and ROCE as explanatory
variables. The type I error is 22 % while type II error is 31 % and the rate of 73 %
is cross-validated as 61 %, by the leave-one-out type of cross-validation. The
cross validation suggests that chance had a little to do with the high classification
rate. The exclusion of the EBITM and PS ratio did not improve the prediction
rate, which remained at the 57 %. The prediction errors were 37 % and 48 %.
The trimmed sample with the GEA MEDIAN substitution and EBITM and PS
exclusion resulted the highest classification rate in this Thesis; 73 %. The highest
prediction rate was achieved by the inversed basic sample with GEA MEADIAN
substitution; 61 %. The two samples are now called the ultimate classifier and the
ultimate predictor. Statistical and practical significance comparison of the ultimate
samples is gathered to Table 29.
73
are statistically significant, the difference between the groups might not be large,
especially with large sample sizes. The practical significance of the contenders is
measured by the square of the canonical correlation, which is equal to the share of
the between group variance from the total variance. The square of the canonical
correlation of analogous to the R-squared in multiple regression, hence, it is the
strength of the discriminant function. The ultimate predictor sample is two
percentage points higher than the ultimate classifier in the squared canonical
correlation, but the 19 % cannot be considered impressive. There is a slight
improvement from the 13 % that was achieved by the original basic sample.
Standardized coefficients are normally used for assessing the relative importance
of discriminator variables forming the discriminant function. Table 30 exhibits the
coefficients for the two ultimate samples as well as for the basic sample with GEA
MEDIAN to comparative purposes.
74
of the discriminator variables should be avoided because of the multicollinearity
present in the variables.
The structure matrices speak for the conservative notion suggested earlier for the
importance of the cash flow ratios. FCPY ratio possesses the highest and CPY the
second highest correlations with the discriminant scores among the variables all
the three samples in Table 31. Although the variables are inversed in the ultimate
predictor sample, the order is same for the FCPY and CPY ratio as in the non-
inversed samples. Compared to a value of 0.50, which is sometimes considered as
a cut-off value by researchers, the cash flow ratios are the only ones to be
seriously correlated. The conclusion is that, in this regard, the cheap stocks have
stronger cash flow ratios than the expensive stocks. PE ratio is the least correlated
with the discriminant score, until it is transformed, which makes it third most
important variable. Furthermore, the GEA MEDIAN ratio slips from being
mediocre in the basic and the ultimate classifier sample to second last in the
ultimate predictor sample. The ambiguities in the rankings after variable
75
transformations suggest that rigorous conclusion cannot be drawn from the
composition of the discriminant scores.
The two ultimate classifiers are yet being tested with the non-cyclical sector. The
prediction rates to the external sample of 60 companies did not yield reasonable
prediction rates; 43 % for the ultimate classifier and 43 % for the ultimate
predictor. The conclusion is that the prediction models formed in this Thesis are
not capable of out-of-sample prediction, even though the external sample can be
discriminated more accurately with the existing variables.
76
7 Discussion and conclusions
This Thesis examines the relationship between stock valuation and financial
ratios. In the beginning, the classification rate of the discriminant analysis was 56
% which is too weak to be of use in practice. Later on, as the data was
transformed, various ratio combinations analyzed and one new ratio introduced, a
classification rate of 73 % was achieved. This analysis suggests that the valuation
level of a stock can be defined from combinations of financial ratios.
The sample forming procedure used in the Thesis is built on balanced assembly.
The same amount of companies where gathered for each year and for the both
groups. The major limitation in the procedure lies within the valuation phase. As
the companies are valuated according to the future cash flows from the following
seven years, the analysis lags seven years behind the present time. Shortening the
time span was thought to cause volatility for company valuation. Therefore, the
whole valuation procedure should be redesigned if the time span is shortened.
Besides the current valuation method, the sample size is also limited by the
market capitalization restriction of $300m-$3bn and by the exclusion of
companies with incomplete history data. Size and quality of history data was poor
before the 90´s. The sample size should be greater in order to gain wider
separation between the two groups and more intervening companies could be
excluded.
77
logarithmic and inverse. An optimal transformation for each of the variables was
assessed based on trials, tests and histograms, but in the end, the crude inverse
transformation turned out to outperform all the other alternatives.
By inversing all the variables, the classification rate improved to the level of 65
%. Thus, variable transformations and outlier exclusion are concluded to
substitute rather than complement each other in data harmonization. The
substitution of the gearing ratio by a binary variable, GEA MEDIAN, further
improved the discriminating capability. The entrant variable simply divided the
gearing ratios in half, which improved the classification rates to the level of 71 %
for the both the trimmed and the inversed sample.
The conclusion is that variable transformations and outlier exclusion are useful
tools for approaching multivariate normality and better discrimination results.
Variable transformations yielded ambiguous results and the most discriminating
results were achieved by a trial rather than by proceeding in strictly analytical
manner. The exclusion of outliers, for its part, is easy to execute and it improved
prediction capability but the problem is diminishing sample size. The assumption
for the equality of the covariance matrices was concluded to follow the normality
assumption quite faithfully. Therefore, it seemed like another way to assess the
normality rather than being a distinct requirement for discriminant analysis.
The conclusion is that the growth stems from strong cash flow. The conclusion is
in line with the real life situation because, after all, a company needs hard cash in
78
order to grow. The importance of an individual financial ratio is always of interest
to economists. Throughout the Thesis, the most widespread attention was gained
by the cash flow ratios, FCPY and CPY. The cash flow ratios were duly related
with a record correlation of 0.92. Despite the correlation, it was not beneficial to
remove one or the other within almost any of the samples and ratio profiles.
FCPY was the most and CPY the second most correlated variable with the
discriminant scores. Both the correlations were strong and negative which
indicates that the cheap stocks have stronger cash flow ratios. The dichotomous
classification rate was strong for the cash flow ratios; 64 % for FCPY and 56 %
for CPY with the original basic sample.
For the practical purposes, the time lag, currently seven years, should be
shortened to a just a couple of years to be able to use the analysis in practice.
Additionally, if the model valuates companies based on data from, say, two years
ahead, the prediction should be rather accurate two years into the future. The idea
behind the long valuation time span was to protect the valuation from momentary
price fluctuations. Because of the volatility of financial ratios and stock prices, the
79
use of moving averages should be examined. It is possible that smoothed data is
more informative than the original data and the predictive capability improves.
Furthermore, the smoothed data might improve the robustness of the model as the
case-specific differences in the data-sets might be reduced. Tried and tested
portfolios can also be used to adjust the discriminant function. Assuming that the
portfolio consists of cheap stocks and a group of expensive stocks is defined, the
model could be used to classify the rest of the stocks in the light of the existing
portfolio.
80
The identification of undervalued stocks could be approached from the bottom up,
with an agent-based simulation. A model is built starting from simple rules at an
individual company level. The idea is to test how changes in individual behaviors
affect the overall system as a whole, as the system consists of numerous
heterogeneous companies. An individual agent has its interests, limited
knowledge and it can for example learn from the past. A successful simulation
generates behavior similar to empirical data which helps understanding the
behavior of the stock markets. Agent based modeling attempts to re-create and
predict the behavior of complex systems that are not easily explained rationally.
Finally, various models could be combined into hybrid models that would give an
output based on a certain weighting on the outputs from the individual models.
Hybrid models are usually formed in order to achieve higher levels of prediction
robustness because strengths of some models may offset weaknesses of other
models. An extensive survey for prediction robustness would be achieved by
expanding further into different stocks from various countries and sectors as well
as to various time frames. More complex models do not inevitably mean better
prediction results. On the contrary, it often means less visibility and growing
danger of overfitting. When expanding the research, the costs and benefits should
be evaluated thoroughly.
81
8 References
Adnan, M., Adnan, M. & Dar, H.A. 2006, "Predicting corporate bankruptcy: where we stand?",
Corporate Governance: The International Journal of Effective Board Performance, vol. 6,
no. 1, pp. 18.
Altman, E.I. 1968, "Financial Ratios, Discriminant Analysis and the Prediction of Corporate
Bankruptcy", The Journal of Finance, vol. 23, no. 4, pp. 589-609.
Altman, E.I. 1968, "The Prediction of Corporate Bankruptcy: A Discriminant Analysis", The
Journal of Finance, vol. 23, no. 1, pp. 193-194.
Andersson, T., Andersson, T. & Lee, E. 2006, "Financialized accounts: Restructuring and return
on capital employed in the S&P 500", Accounting Forum (Elsevier), vol. 30, no. 1, pp. 21.
Baker, M. & Wurgler, J. 2002, "Market Timing and Capital Structure", The Journal of Finance,
vol. 57, no. 1, pp. 1-32.
Beaver, W.H. & Beaver, W.H. 1966, "Financial Ratios as Predictors of Failure", Journal of
Accounting Research, vol. 4, no. 3, pp. 71.
Blake, D. 2000, Financial Market Analysis, 2nd edn, John Wiley & Sons, New York.
Booth, P.J. & Booth, P.J. 1983, "DECOMPOSITION MEASURES AND THE PREDICTION OF
FINANCIAL FAILURE", Journal of Business Finance & Accounting, vol. 10, no. 1, pp. 67.
Brockett, P.L., Brockett, P.L., Golden, L.L., Jang, J. & Yang, C. 2006, "A Comparison of Neural
Network, Statistical Methods, and Variable Choice for Life Insurers' Financial Distress
Prediction", Journal of Risk & Insurance, vol. 73, no. 3, pp. 397.
Brockett, P.L., Cooper, W.W., Golden, L.L. & Pitaktong, U. 1994, "A Neural Network Method for
Obtaining an Early Warning of Insurer Insolvency", The Journal of Risk and Insurance, vol.
61, no. 3, pp. 402-424.
Campbell, J.Y., Campbell, J.Y. & Shiller, R.J. 1998, "Valuation Ratios and the Long-Run Stock
Market Outlook", Journal of Portfolio Management, vol. 24, no. 2, pp. 11.
Chan, L. & Lakonishok, J. 2004, "Value and growth investing: Review and update", FINANCIAL
ANALYSTS JOURNAL, vol. 60, no. 1, pp. 71-86.
Damodaran, A. 2001, The dark side of valuation : valuing old tech, new tech, and new economy
companies, New York : Financial Times/Prentice Hall.
DeFond, M.L. & DeFond, M.L. 2003, "An empirical analysis of analysts' cash flow forecasts",
Journal of Accounting & Economics, vol. 35, no. 1, pp. 73.
82
Dimitras, A.I., Dimitras, A.I., Zanakis, S.H. & Zopounidis, C. 1996, "A survey of business failures
with an emphasis on prediction methods and industrial applications", European Journal of
Operational Research, vol. 90, no. 3, pp. 487.
Fama, E.F. & Fama, E.F. 1970, "EFFICIENT CAPITAL MARKETS: A REVIEW OF THEORY
AND EMPIRICAL WORK", Journal of Finance, vol. 25, no. 2, pp. 383.
Fama, E.F. & French, K.R. 1995, "Size and Book-to-Market Factors in Earnings and Returns", The
Journal of Finance, vol. 50, no. 1, pp. 131-155.
Fama, E.F. & French, K.R. 1992, "The Cross-Section of Expected Stock Returns", The Journal of
Finance, vol. 47, no. 2, pp. 427-465.
Fisher, R.A. 1936, "The use of multiple measurements in taxonomic problems", Annals Eugen,
vol. 7, pp. 179-188.
Frank, M. & Jagannathan, R. 1998, "Why do stock prices drop by less than the value of the
dividend? Evidence from a country without taxes", Journal of Financial Economics, vol. 47,
no. 2, pp. 161.
Frank, R.E., Frank, R.E., Massy, W.F. & Morrison, G. 1965, "Bias in Multiple Discriminant
Analysis", Journal of Marketing Research (JMR), vol. 2, no. 3, pp. 250.
Gnanadesikan, R. 1977, Methods for Statistical Data Analysis of Multivariate Observations 1st
edn, John Wiley & Sons.
Goetzmann, W.N. & Jorion, P. 1995, "A Longer Look at Dividend Yields", The Journal of
Business, vol. 68, no. 4, pp. 483-508.
Gordon, M.J. 1962, "[Security and a Financial Theory of Investment]: Reply", The Quarterly
Journal of Economics, vol. 76, no. 2, pp. 315-319.
Gordy, M.B. 2000, "A comparative anatomy of credit risk models", Journal of Banking &
Finance, vol. 24, no. 1, pp. 119.
Griffin, J.M. 1988, "A Test of the Free Cash Flow Hypothesis: Results from the Petroleum
Industry", The Review of Economics and Statistics, vol. 70, no. 1, pp. 76-82.
Hagstorm, R.G. 2001, The Essential Buffet: Timeless Principles for the New Economy, John Wiley
& Sons.
Helfert, E.A., Helfert, E.A. & ebrary, I. 2001, Financial analysis [Elektroninen aineisto] : tools
and techniques : a guide for managers, , New York : McGraw-Hill, cop. 2001.
83
Hoover, S. & ebrary, I. 2006, Stock valuation [Elektroninen aineisto] : an essential guide to Wall
Street's most popular valuation models, , New York : McGraw-Hill, cop. 2006.
Hovakimian, A., Opler, T. & Titman, S. 2001, "The debt-equity choice", JOURNAL OF
FINANCIAL AND QUANTITATIVE ANALYSIS, vol. 36, no. 1, pp. 1-24.
JOHNSON, D.W. & WICHERN, R.A. 1987, Applied Multivariate Statistical Analysis, Longman
Higher Education.
Lachenbruch, P.A., Sneeringer, C. & Revo, L.T. 1973, "Robustness of linear and quadratic
discriminant function to certain types of non-normality", Communications in Statistics -
Theory and Methods, vol. 1, no. 1, pp. 39-56.
Lachenbruch, P.A. 1975, "Discriminant Analysis", Macmillan Pub Co, New York.
Laitinen, T., Back, B., Sere, K. & Wezel, M. 1995, "Choosing Bankruptcy Predictors Using
Discriminant Analysis, Logit Analysis and Genetic Algorithms", Proceedings of the first
International Meeting on Artificial Intelligence in Accounting, Finance and Tax, , pp. 337-
356.
Laitinen, E.K., Laitinen, E.K. & Laitinen, T. 1998, "CASH MANAGEMENT BEHAVIOR AND
FAILURE PREDICTION", Journal of Business Finance & Accounting, vol. 25, no. 7, pp.
893.
LeRoy, S.F., LeRoy, S.F. & Porter, R.D. 1981, "THE PRESENT-VALUE RELATION: TESTS
BASED ON IMPLIED VARIANCE BOUNDS", Econometrica, vol. 49, no. 3, pp. 555.
Liu, J., Liu, J., Nissim, D. & Thomas, J. 2007, "Is Cash Flow King in Valuations?", Financial
Analysts Journal, vol. 63, no. 2, pp. 56.
Luenberger, D.G. 1997, Investment Science, Oxford University Press, New York.
Mannila, H., Smyth, P. & Hand, D.J. 2001, Principles of Data Mining (Adaptive Computation and
Machine Learning), The MIT Press.
Marks, S., Marks, S. & Dunn, O.J. 1974, "Discriminant Functions When Covariance Matrices Are
Unequal", Journal of the American Statistical Association, vol. 69, no. 346, pp. 555.
MICHAUD, R. & DAVIS, P. 1982, "VALUATION MODEL BIAS AND THE SCALE
STRUCTURE OF DIVIDEND DISCOUNT RETURNS", Journal of Finance, vol. 37, no. 2,
pp. 563-573.
MILLER, M. 1977, "DEBT AND TAXES", Journal of Finance, vol. 32, no. 2, pp. 261-275.
84
MITRA, D., BISWAS, A. & OWERS, J. 1991, "A DIRECT TEST OF THE FREE CASH FLOW
HYPOTHESIS", Financial Management, vol. 20, no. 1, pp. 13-14.
Murphy, J.J. 1999, "Technical Analysis of the Financial Markets: A Comprehensive Guide to
Trading Methods and Applications (New York Institute of Finance)", .
Murphy, K.J. & Murphy, K.J. 1985, "CORPORATE PERFORMANCE AND MANAGERIAL
REMUNERATION An Empirical Analysis", Journal of Accounting & Economics, vol. 7,
no. 1, pp. 11.
Myers, S.C. 1984, "The Capital Structure Puzzle", The Journal of Finance, vol. 39, no. 3, pp. 575-
592.
Ohlson, J.A. & Ohlson, J.A. 1980, "Financial Ratios and the Probabilistic Prediction of
Bankruptcy", Journal of Accounting Research, vol. 18, no. 1, pp. 109.
Osborne, J. 2002, "Notes on the use of data transformations.", Practical Assessment, Research &
Evaluation, vol. 8, no. 6.
Park, Y.S. & Lee, J. 2003, "An empirical study on the relevance of applying relative valuation
models to investment strategies in the Japanese stock market", Japan & the World Economy,
vol. 15, no. 3, pp. 331.
Penman, S.H. & Penman, S.H. 1996, "The Articulation of Price-Earnings Ratios and Market-to-
Book Ratios and the Evaluation of Growth", Journal of Accounting Research, vol. 34, no. 2,
pp. 235.
Ross, S.A., Westerfield, R.W., Jaffe, J. & Ku, S. 1999, Corporate Finance, McGraw-Hill College.
Senchack Jr., A. J., Senchack Jr., A. J. & Martin, J.D. 1987, "The Relative Performance of the PSR
and PER Investment Strategies", Financial Analysts Journal, vol. 43, no. 2, pp. 46.
Shiller, R.J. & Shiller, R.J. 1981, "Do Stock Prices Move Too Much to be Justified by Subsequent
Changes in Dividends?", American Economic Review, vol. 71, no. 3, pp. 421.
Smith, K.A., Gupta, J.N.D. & ebrary, I. 2002, Neural networks in business [Elektroninen aineisto]
: techniques and applications, , Hershey, PA : Idea Group Pub : Information Science Pub,
cop. 2002.
Stracca, L. 2004, "Behavioral finance and asset prices: Where do we stand?", Journal of Economic
Psychology, vol. 25, no. 3, pp. 373.
85
Tabachnick, B.G. & Fidell, L.S. 2000, Using Multivariate Statistics, 4th edn, Allyn & Bacon.
Yang, Z.R., Yang, Z.R., Platt, M.B. & Platt, H.D. 1999, "Probabilistic Neural Networks in
Bankruptcy Prediction", Journal of Business Research, vol. 44, no. 2, pp. 67.
Zavgren, C.V. & Zavgren, C.V. 1985, "ASSESSING THE VULNERABILITY TO FAILURE OF
AMERICAN INDUSTRIAL FIRMS: A LOGISTIC ANALYSIS", Journal of Business
Finance & Accounting, vol. 12, no. 1, pp. 19.
86