Você está na página 1de 92

HELSINKI UNIVERSITY OF TECHNOLOGY

Department of Automation and Systems Technology


System Analysis Laboratory

Olli Väyrynen

Identifying Undervalued Stocks with Multiple


Financial Ratios

Master’s thesis submitted in partial fulfillment of the requirements for the degree of
Master of Science in Technology

Espoo, Finland, December 2, 2007

Supervisor: Professor Ahti Salo


Instructor: Ph.D Hannes Kulvik
HELSINKI UNIVERSITY OF TECHNOLOGY ABSTRACT OF THE
Department of Automation and Systems Technology MASTER’S THESIS
Author: Olli Väyrynen Date
Major subject: Systems and operations research 3.12.2007
Minor subject: Strategy and international business Number of pages:
86
Title: Identifying Undervalued Stocks with Multiple Financial Ratios
Title in Finnish: Aliarvostettujen osakkeiden identifiointi tunnuslukujen perusteella
Chair: Mat-2. Systems and operations research
Supervisor: Professor Ahti Salo
Instructor: Ph.D Hannes Kulvik
Abstract:

The process of determining the current value of a company is of interest for legislators,
company management, asset managers as well as for individuals. There are numerous
methods for determining the value, some of them are subjective and some more objective.
Public companies announce their results according to certain economic legislation in
order to improve the transparency of their businesses. At worst, the lack of integrity and
transparency of profit calculation with unethical company governance leads to severe
financial collapses. A company’s value lies in its potential to generate a stream of profits
in the future.

The goal of this Thesis is to form an accurate model to identify undervalued stocks. The
identification is based on eight financial ratios, so the analysis is multivariate in nature.
The valuation of stocks is enforced with dividend discount model (DDM) using actual
cash flow data. Stocks are then classified in undervalued and overvalued with linear
discriminant analysis (LDA) which is a widely used with corporate performance surveys.
Different ratio combinations are evaluated in order to find the most discriminating ratio
profile. The statistical assumptions of discriminant analysis are examined in depth as they
influence the both statistical and practical significance levels, as well as the prediction
capability of the model.

Based on this Thesis, the LDA based multivariate identification of undervalued stocks has
some predictive capability. As expected, the predictive capability deteriorates
substantially when predicting to other sectors or to other periods of time. Overall, more
research is needed to develop the model to be utilized in practice.

Keywords: discriminant analysis, stock valuation, financial ratio analysis,


bankruptcy prediction
TEKNILLINEN KORKEAKOULU DIPLOMITYÖN
Automaatio- ja systeemitekniikan osasto TIIVISTELMÄ
Tekijä: Olli Väyrynen Päiväys
Pääaine: Systeemi- ja operaatiotutkimus 3.12.2007
Sivuaine: Yritysstrategia ja kansainvälinen Sivumäärä:
liiketoiminta 86
Työn nimi: Aliarvostettujen osakkeiden identifiointi tunnuslukujen perusteella
Professuuri: Mat-2. Systeemi- ja operaatiotutkimus
Valvoja: Professori Ahti Salo
Ohjaaja: TkT Hannes Kulvik
Tiivistelmä:

Yrityksen nykyarvon määrityksestä ovat kiinnostuneita lainsäätäjät, yrityksen johto,


salkunhoitajat sekä myös yksityiset henkilöt. On olemassa lukuisia tapoja määrittää
yrityksen arvo ja toiset niistä ovat subjektiivisia ja toiset enemmän objektiivisia. Julkiset
osakeyhtiöt ilmoittavat tuloksensa tietyn taloudellisen lainsäädännön mukaisesti jonka
tarkoituksena on parantaa liiketoiminnan läpinäkyvyyttä. Epäyhtenäinen ja
läpinäkymätön tuloslaskenta yhdessä epäeettisen hallinnointikäytännön kanssa on
aiheuttanut viime vuosikymmenten aikana vakavia taloudellisia romahduksia. Yrityksen
arvo piilee sen potentiaalissa tuottaa voittoa tulevasiuudessa.

Diplomityön tavoite on muodostaa käyttökelpoinen malli aliarvostettujen osakkeiden


identifioinnille. Identifiointi perustuu kahdeksaan taloudelliseen tunnuslukuun, joten
kyseessä on monimuuttujatutkimus. Osakkeiden arvonmääritys perustuu
osinkodiskonttausmalliin (Dividend Discount Model), jossa käytetään toteutuneita
kassavirtoja. Osakkeet luokitellaan aliarvostettuihin ja yliarvostettuihin lineaarisella
erotteluanalyysillä (Linear Discriminant Analysis), jota käytetään laajalti yritysten
performanssin määrittämisessä. Parhaimman erottelun löytämiseksi eri
tunnuslukuyhdistelmät arvioidaan analyyttisin menetelmin. Erotteluanalyysin tilastolliset
otaksumat tutkitaan perusteellisesti, sillä niillä on vaikutusta mallin ennustuskykyyn sekä
tilastolliseen että käytännölliseen merkitsevyystasoon.

Tämän tutkimuksen perusteella lineaarisen erotteluanalyysiin pohjautuva aliarvostettujen


osakkeiden monimuuttujaidentifiointi osoitti jonkin verran ennustuskykyä. Kuten
odotettua, ennustamiskyky heikkenee huomattavasti ennustettaessa toiseen sektoriin tai
toiseen ajankohtaan. Kaiken kaikkiaan tarvitaan lisätutkimusta mallin kehittämiseksi jotta
sitä voitaisiin hyödyntää käytännössä.

Avainsanat: erotteluanalyysi, osakkeiden hinnoittelu, tunnuslukuanalyysi,


konkurssiennustaminen
Preface

I would like to thank Sifterfund for giving me an opportunity to do this Thesis.


Especially I would like to thank my instructor Hannes Kulvik for guidance and
interest.

I would like to thank Professor Ahti Salo for comments and guidance.

Furthermore thanks to my parents, friends and Maria for support.

Espoo, 3.12.2007

Olli Väyrynen
Contents
1 Introduction..................................................................................................................1

1.1 Problem context .........................................................................................................1


1.2 Research objectives....................................................................................................3
1.3 Scope..........................................................................................................................4
1.4 Structure.....................................................................................................................5

2 Stock valuation .............................................................................................................7

2.1 Financial ratio analysis ..............................................................................................8


2.1.1 Price to earnings ratio (PE) ....................................................................................9
2.1.2 Earnings before interests and taxes margin (EBITM) .........................................10
2.1.3 Cash flow to price yield (CPY)............................................................................11
2.1.4 Free cash flow to price yield (FCPY) ..................................................................12
2.1.5 Return on capital employed (ROCE)...................................................................14
2.1.6 Price to book value ratio (PB)..............................................................................15
2.1.7 Price to sales ratio (PS) ........................................................................................16
2.1.8 Gearing ratio (GEA) ............................................................................................17
2.2 Dividend discount model .........................................................................................19
2.2.1 Discount factor.....................................................................................................21

3 Bankruptcy prediction models..................................................................................24

3.1 Statistical Models.....................................................................................................24


3.1.1 Linear Discriminant Analysis (LDA) ..................................................................24
3.1.2 Logit models ........................................................................................................28
3.2 Artificial intelligent expert system models (AIESM)..............................................30
3.2.1 Neural networks ...................................................................................................30
3.3 Models based on economic theories ........................................................................32

4 Empirical analysis......................................................................................................34

4.1 Valuation phase........................................................................................................36


4.2 Basic sample, hold-out sample and external sample ...............................................38
4.3 Selection procedure of companies ...........................................................................39
4.4 Approaches to sample forming ................................................................................40
4.5 Ratio profile .............................................................................................................41

5 Results on predictive accuracy .................................................................................42


5.1 Mean values and variances in the basic sample.......................................................42
5.2 Variable correlations................................................................................................43
5.3 Variable normality ...................................................................................................44
5.4 Equality of covariance matrixes...............................................................................48
5.5 Classification capability...........................................................................................49
5.6 Variable influence section........................................................................................51
5.7 Prediction to hold-out sample ..................................................................................54
5.8 Prediction to another sector .....................................................................................55

6 Sensitivity analysis .....................................................................................................56

6.1 Excluding outliers ....................................................................................................56


6.1.1 Normality .............................................................................................................57
6.1.2 Equality of covariance matrixes...........................................................................59
6.1.3 Classification capability.......................................................................................60
6.1.4 Variable influence................................................................................................61
6.1.5 Out-of-sample prediction capability ....................................................................62
6.2 Variable transformations..........................................................................................63
6.2.1 Transformations and the basic sample.................................................................64
6.2.2 Transformations and the trimmed sample............................................................68
6.3 Superior ratio profile................................................................................................71
6.3.1 The proposed models for classifying and predicting ...........................................73

7 Discussion and conclusions .......................................................................................77

7.1 Evaluation of tested approaches...............................................................................77


7.2 Suggestions for further study ...................................................................................79

8 References...................................................................................................................82
1 Introduction

1.1 Problem context

During the last three decades, financial markets have experienced an upheaval as
computers gain capacity and as financial information can be accessed in a
moment. Financial institutions compete with each other by offering the most
accurate and relevant financial information. Ever more sophisticated decision
support systems are formed by mathematical models and forecasts.

There are numerous people in the investment world with different backgrounds,
who valuate securities for several purposes. A common purpose is to benefit from
knowing the real value of a security. People valuating stocks can be divided into
six broad categories (Hoover 2006) according to their objectives: Corporate
managers benefit from knowing the value of their company especially when
taking strategic decisions about raising money, because in the times of
overvaluation, company can gather larger amounts of money in a share issuance.
They also benefit from knowing the values of other companies in case of an
acquisition or a merger. Financial analysts in investment banking business help
corporate managers to realize additional share issuances, acquisitions and
mergers. Financial analysts in equity research track and give out their
recommendations to public and/or private investors, although there is little
empirical evidence supporting the idea of profiting abnormally based on the
information offered by equity researchers. Asset managers are professional
investors managing funds of individuals and organizations in order to profit
abnormal profits. In order to “beat the market”, assets managers continuously seek
and act on market misvaluation of securities. Individuals are rarely recommended
to pick stocks but instead, are encouraged to invest in indices or funds. Still, stock
picking can often be seen as an enjoyable hobby bringing excitement. Economic
policymakers value stock markets as a whole in order to observe the stability of

1
the markets. They have to up to date as they make decisions about interest rates,
money supply and they enact laws.

The two extremes of security valuation are technical analysis and fundamental
analysis (Ross et al. 1999). Technical analysis (Murphy 1999) discards the
underlying business and valuates securities by statistics, such as past prices and
volumes. Advocates of technical analysis predict market prices and movements
based on the dynamics of market price and volume, rather than on fundamentals
of the corporation. Fundamental analysis strives to measure the intrinsic value of a
security by studying all the information that can affect the value. Both
microeconomic and macroeconomic factors are considered in fundamental
analysis. Many of the most well performed investors support fundamental
analysis.

Stocks are commonly given scores on the basis of their fundamentals. Nowadays,
various mathematical tools are used to aid valuating and picking stocks. Financial
ratios are being interpreted and compared to assess the intrinsic value of stocks. A
classical scoring method is linear discriminant analysis (Altman 1968;
Lachenbruch 1975; Sharma 1996), which classifies stocks in two or more groups
based on multiple financial ratios. The most recent models for stock valuation and
classification are neural network models (Brocket et al. 1994, 2006; Zapranis &
Ginoglou 2000).

In this Thesis, the stock valuation and classification is studied from the asset
manager’s point of view. Sifterfund supplied the access to Bloomberg in order to
gather the data for the Thesis. The data consists of two sub sectors in the US
manufacturing sector; industrial manufacturing and non-cyclical consumer goods.
The data is gathered from the years 1989-2006.

2
1.2 Research objectives

The Thesis focuses on the relationship between stock valuation and financial
ratios. The primary objective is to form a model to classify stocks in groups of
undervalued and overvalued based on their financial ratios.

The underlying idea is that the valuation level of a stock can be assessed from its
financial ratios. The valuation level is not apparent in the ratios and multivariate
technique is required. The Thesis concentrates on two group discriminant analysis
as a method for group classification and prediction. The classification capability is
evaluated from both, practical and statistical point of view. The validation of the
classification capability is enforced with testing the prediction capability within
the sector, as well as, within another sector. Therefore, three samples are
diligently formed; basic and hold-out samples from the industrial sector and the
additional sample from non-cyclical consumer goods sector. The basic sample is
for adjusting the model and the other two are for testing the prediction capability.
Suitability and limitations of discriminant analysis for identification of
undervalued stocks is inspected as well.

An important phase in the formation of the model is definition of ratio-profile to


be used. Thus, various ratio profiles will be tested in order to find the most
suitable one. The candidate ratios are agreed with the instructor. As the ultimate
goal is to achieve highest possible classification and prediction rates, the statistical
assumptions of the discriminant analysis are examined in depth.

Other secondary objectives include literature review, assessment of prudence of


the Thesis in general, as well as, identifying relevant areas for further study.

3
1.3 Scope

The main constraint in this work is that only discriminant analysis is used. That is
because the scope of the Thesis was estimated suitable with discriminant analysis
only.

The number of financial ratios in the Thesis is eight. The eight ratios were agreed
with the instructor as the most relevant ones in this regard. The limitation is
reasonable because larger number of candidates would require an additional
model for ratio selection. Originally there were nine ratios but dividend yield was
excluded because of its partially continuous nature. The exclusion was also
supported by the correlation with the valuation method. The dividends from the
following seven year affect the valuation of a stock. None of the qualitative
measures are included in the Thesis either.

The Thesis is limited to only in the two sub-sectors in the US manufacturing


sector; industrial and non-cyclical consumer goods. The two sub-sectors are
selected because they are considered to have low volatility and because of the
popularity of the manufacturing sector in the earlier research. The market
capitalization is limited to about mid-caps range from $300m-3$bn. Larger
companies are deemed to be somewhat self-sustaining and therefore not
characteristic for the markets. Smaller companies have the tendency to suffer from
overall uncertainty, respectively.

On a practical implementation point of view, the largest constraint is the valuation


method. The valuation method used future cash flows from the following seven
years which impedes model’s crude implementation in to practice. A satisfactory
prediction capability seven years in to the future is not realistic. The valuation
phase as well as the sample formation phase decrease sample sizes because
observations with incomplete history data are excluded. The financial ratio history

4
data was commonly available since 1989. The actual valuation begins from the
year 1991 in order to increase the sample sizes.

1.4 Structure

Literature survey of stock valuation consists of financial ratio analysis, dividend


discount model and bankruptcy prediction models. The financial ratio analysis
concentrates on individual financial ratios and their characteristics. The dividend
discount model is reviewed concentrating on discount factor and time span to be
used in the valuation. Bankruptcy prediction models chapter covers the various
models used in the company performance research.

The empirical data section consists of valuation and sample forming procedure.
This section is important because the rest of the work is based on the samples
formed here. The samples can be formed in endless ways and it largely depends
on the characteristics of the take.

The results section first analyses the results from the discriminant analysis. The
Thesis continues with sensitivity analysis, in which history data is processed,
trimmed, and various ratio profiles are being analyzed. The results section
culminates to the superior discriminating functions.

The final section assesses the analysis critically and suggestions for further study
are formed. The overall research process is shown in Figure 1, which illustrates
the crucial points in the research. The analysis diagram can be used as a guide
through the Thesis.

5
Figure 1 Analysis diagram.

6
2 Stock valuation

The starting point for asset valuation based on fundamentals is that the present
value depends on its future cash flows and for example stocks provide two kinds
of cash flows: dividends and sale price in the end (Ross et al. 1999). If the
valuation concerns bonds, coupons are received and if real projects are valued,
after tax cash flows are discounted into the present value. Summing up the future
cash flows yields the discounted cash flow model (DCF) which is the same
despite the type of the asset.

The subjectivity of fundamental analysis often crystallizes when an investor


realizes of having only quality companies the portfolio. It is a natural tendency to
analyze and choose only high-quality companies because the markets, supply and
demand, are defined by human behavior. The demand for quality stocks can be
viewed to be substantially high and low-quality stocks low. This can inflict a wide
gap between the real values of stocks. After all, stock picking is largely about
timing and understanding the behavior of other people in the markets. Stracca
made a comprehensive review on behavioral finance (Stracca 2004) and said it to
be the most promising field of economics at the moment. Bulletproof evidence is
yet to be provided that behavioral sciences help outperforming the market. On the
other hand, many studies provide evidence that the market functioning is rather
irrational. To be exact, behavioral finance scientifically studies human and social
cognitive and emotional biases to better understand economic decisions-making
under uncertainty. It also studies how the biases affect market prices and
allocation of resources.

The Efficient Market Hypothesis (EMH) asserts that prices on traded assets are
unbiased and they reflect all the information available. EMH was introduced by
Eugene Fama in 1970 (Fama 1970) and it is one of the cornerstones in the modern
theories of finance. According to EMH, it is not possible to consistently

7
outperform the market with the information already available on the market. The
three forms of EMH are weak-, semi- and strong-form. Weak form efficiency
implies that technical analysis will not yield excess returns in the long-run. Semi-
strong efficiency implies that fundamental analysis cannot yield excess returns in
the long run either. Strong-form efficiency implies that security prices reflect all
the information available and no one can earn excess returns, respectively. In an
efficient market, above average return has more to do with luck than skill. There
are numerous studies and statements for and against EMH. In case of the strong-
form EMH, analyzing information would not benefit anyone. Weak-form EMH
hardly is the case because the majority of the active asset managers under-perform
their appropriate benchmarks.

The pervasive concepts of financial literature are value investing and growth
investing (Hoover 2006). Value investing is an investment strategy that favors
stocks that are undervalued, for example, because of overreaction to news flow.
Growth investing is an investment strategy that favors stocks that are expected to
earn above average earnings compared to the markets. As the EMH implies: “If
the market prices stocks accurately, there is no consistent advantage in choosing
between one type of stocks over another” (Hoover 2006). Growth investors
typically screen out low PE companies and they think that they are able to predict
high-growth phases for companies. In contrast, value investors typically screen
out high PE companies, and their idea is to know the undervalued stocks with
relatively slower growth. The academic community has generally come to agree
that value investing is the better performing strategy of the two (Chan &
Lakonishok 2004). The two popular investing strategies have different starting
points but the objectives are the same, so, they are not so different after all. As
Warren Buffet puts it: “Growth and Value Investing are joined at the hip”.

2.1 Financial ratio analysis

One of the most common ways of assessing the relative values of stocks among
practitioners is to compare financial ratios. The main advantage of using financial

8
ratios instead of amounts from the income statement is that they are independent
of the size of the company. Thus, financial ratios allow elegant comparison of
securities. Academicians have been studying financial ratios widely for almost a
century already. As computers develop and financial reporting became more
regular, statistics came up as a notable way of studying security valuation.
Statistics are useful in cases of extensive sample-sizes. Edward I. Altman said in
1968 (Altman 1968): “Academicians seem to be moving toward the elimination of
ratio analysis as an analytical technique in assessing the performance of the
business enterprise”. Despite of Altman’s argument, researchers have been
publishing papers on this matter frequently (Dimitras et al. 1996). Although there
is just a little formal empirical evidence that financial ratios help picking well
performing stocks (Adnan & Dar 2006), it is widely used and agreed to be useful
(Campbell & Shiller 1998).

The comparison of financial ratios is used to assess companies’ financial


condition, operations and attractiveness as an investment. Based on their
characteristics, the ratios can be divided into five categories. Leverage ratios (or
gearing) show the extent of long term debt in the capital structure of a company.
Liquidity ratios imply the solvency or a company’s ability to pay off its short term
debt obligations. Operational ratios imply the operational efficiency of a
company using its assets to generate profit. Profitability ratios show company’s
ability to generate sales on the relevant costs or capital. Solvency ratios give a
picture of a company’s ability to generate cash flow to pay its financial
obligations with available cash. The eight potential ratios were agreed with
instructor and they are examined below.

2.1.1 Price to earnings ratio (PE)

The most popular valuation ratio is price to earnings ratio (PE), which is usually
the first thing to examined about securities. PE can also be seen in the news as
institutions assess the economy as a whole. Stock exchanges are also valued as a
whole, for example, “valued above historical average”. Practitioners rely heavily

9
on the PE-ratio. Technology and other volatile stocks generally sell at high PE
ratios because they are expected to grow fast in the future. Valuation levels can
get out of hand, as we saw in the beginning of the millennium, as the internet-
bubble burst. Small, loss making internet companies could have been valued tens
of times higher than for example stable manufacturing companies because of the
growth opportunities investors believed the companies would have. Billions and
billions of dollars disappeared as the market corrected itself. PE-ratios are sector
specific and PE comparison among the companies in the sector gives a fast,
tentative estimate of the appreciation level of the companies. The general PE-
levels also vary from country to country and for example in Japan (Ross et al.
1999), the average multiple for Tokyo Stock Exchange has been 40-100, while in
America it has been around 25. This suggests huge and constant growth
opportunities for Japanese companies but it can, as well depend on the culture and
on what level the market has been used to be. The PE-trend in Japan is downward
sloping probably because companies are getting even more multinational. In
addition to growth opportunities (Ross et al. 1999), the PE-ratio can be high
because of low risk or for it is accounted in a conservative manner, yet, the first
one being the most important. Usually earnings figure is from the last four
quarters (trailing) or expected four quarters but sometimes two past quarters are
used to predict the two future earnings. Earnings typically refer to after-tax net-
income and it is the ultimate success factor for businesses. PE ratio is calculated
according to equation

SHARE _ PRICE
PE = . (1)
EARNIGNS _ PER _ SHARE

2.1.2 Earnings before interests and taxes margin (EBITM)

Earnings before interests and taxes is very popular financial figure indicating the
profitability of a company. The differences to PE ratios earnings are that interest
expenses and taxes are not deducted from operating income. Corporate
management has rather wide margin to adjust EBIT because of amortizations with

10
intangibles and depreciations with tangibles. Yet, one has to keep in mind that
expenses incurred from the firm’s capital structure do not affect EBIT and thus it
cannot be observed solely. Another pitfall occurs with research and development
expenses because, for example, technology companies treat it as an operating
expense, although it is the single most important capital expenditure in a
technology company (Damodaran 2001). Another phenomenon for a company
management is to postpone earnings higher than analyst estimates. The EBITM is
EBIT compared to the net sales and it is also called operating margin. It indicates
how effective a company is at controlling the costs and expenses associated with
their normal business operations. The ratio is calculated as

REVENUE − OPERATINGEXPENSES
EBITM = , (2)
NET _ SALES

where the revenue less operating expenses is also equal to a sum of earnings,
interest expenses and taxes.

2.1.3 Cash flow to price yield (CPY)

CPY is a reliable measure of sustainability of a business because cash is concrete,


contrary to PE which is easily manipulated. Cash either comes in or not. If a
company places a cost one year ahead, earnings weaken by that amount but cash
flows are unchanged until the next year. Many practitioners and investor gurus
place a great deal of emphasis on CPY. Jing Liu (Liu et al. 2007) recently studied
the difference between EPS (inverse of PE ratio) and CPY in the context of stock
valuation and concluded that, in general, EPS outperformed CPY. He used
estimates instead of reported financials and also argued that the estimates
outperformed the reported financials as well. However, a logical explanation for
the popularity of CPY among investor gurus might be the clarity that the measure
offers when assessing individual companies. There has been a growing trend of
analysts making cash flow forecasts especially for firms with, for example, poor

11
financial health or high earnings volatility (Defond 2003). Cash flow forecasts
assist in interpreting earnings and assessing firm viability. CPY is calculated as

OPERATING _ CASH _ FLOW


CPY = . (3)
COMMON _ SHARES _ OUTSTANDING

Operating cash flow is the difference between the revenue from the
products/services (operating revenue) and costs incurred from producing the
products/services in question (operating costs). Operating cash flow (OCF) equals
the sum of EBIT and depreciation less taxes.

2.1.4 Free cash flow to price yield (FCPY)

Free Cash Flow is a measure of financial performance representing the cash that is
left after the costs of handling its asset base. It is calculated by deducting capital
expenditures from operational income, as below:

FREE _ CASH _ FLOW _ PER _ SHARE


FCPY = . (4)
CURRENT _ MARKET _ PRICE _ PER _ SHARE

Free cash flow is the sum of net income and amortization/depreciation less
changes in working capital and capital expenditures.

Free cash flow is another concrete measure of the companies’ ability to generate
profits, in addition to cash flow. Even profitable businesses can have negative
cash flow if they face increased financing cost from additional capital. The
difference between OCF and FCF is that FCF is stricter and takes into account
changes in working capital and capital expenditures to reveal the hard cash that
the company has after all the costs the business requires. The reason why
amortization/depreciation is added to the equation is that the FCF measures the

12
cash flow at that moment and affects of investments executed in the past years are
eliminated.

Investors are quite interested in FCF because the growth of a company asks for
cash and, even more importantly, the stream of dividends is paid in hard cash as
well. When a stock price is relatively low and FCF is in a steady rise, a profitable
investment opportunity might have occurred. If the company is not wasting the
incoming money for nothing, earnings will rise eventually. On the contrary, is the
FCF levels are weakening for too long, the company will face liquidity problems
and becomes indebted.

FCF is the cash that can be used to invest in and to upgrade businesses. Excessive
shareholder rewarding can deplete the FCF and way more expensive money has to
be lent from outside the firm, thus increasing risk and lowering future cash flows.
The interests of corporate managers and shareholders have major conflicts and
they have drawn a little attention in the academic community (Jensen 1986). It is
also claimed that managers’ power is reduced by high payout ratio because of
reduced resources they are in charge of, which might give incentives to mangers
to grow the company beyond its optimal size (Jensen 1986). The bias develops
further because managers’ compensations are positively related to the growth in
sales (Murphy 1985).

The free cash flow hypothesis claims that high levels of FCF leads to wasteful
activities by the management (Ross et al. 1999). According to the hypothesis,
without excess cash, management operates as in more risky situation and thus
avoids projects with negative NPV (Mitra et al. 1991). The hypothesis supports
debt financing as the principal when interest reduce the free cash flow reducing
the opportunity for managers to waste resources. According to the US oil industry
survey supporting the free cash flow hypothesis (Griffin 1988), oil industry
altered much in the beginning of the eighties, through mergers and share
buybacks. Market values increased even though debt to equity ratios increased
substantially, meaning that the markets viewed increased debt beneficial.

13
2.1.5 Return on capital employed (ROCE)

ROCE measures the profitability of a company’s capital investments. The ratio is


defined as

EBIT
ROCE = . (5)
TOTAL _ ASSETS − CURRENT _ LIEABILITIES

Capital employed includes fixed tangible assets, other operating assets and
working capital. In other words, capital employed is the value of all the assets
employed in a business.

ROCE is closely related to return on equity (ROE) which is the ratio between net
income and average stockholders’ equity. The difference with ROCE is that
interests and taxes are subtracted from the net income and long term liabilities are
also subtracted from the total assets. ROCE is much overlooked ratio possibly
because it is not as intuitive as many others but it is a useful ratio for assessing the
efficiency of a company’s capital investments. A public company has to raise
capital to achieve higher return and ROCE measures company’s ability to achieve
operating profit on operating assets. As a rule of thumb, ROCE should always be
higher than the rate at which the company borrows. A stable history of high
ROCE suggests high growth for a company and ROCE is especially essential with
capital intensive companies because huge sums of money is needed for
investments and, once again, it is vital to invest in order to grow.

On the contrary, as Helfert (Helfer 2001) puts it: ROCE “does not, however, relate
well to economic measures used in judging new investments, nor does it assist in
making day-to-day decisions on an economic basis”. Also, ROCE has a tendency
to rise cash being the same, because assets are being depreciated all the time. This
is not a flaw because companies increase debt rather repeatedly. When studying
ROCE, long averages should be used with assets. Another point worth

14
considering with ROCE is that inflation only increases revenues but would not
affect assets, which might increase ratios substantially in the times of high
inflation. Andersson (Andersson 2006) reveals an interesting statistic about S&P
500’s 158 survivor companies’ (been in the index since 1980 until 2003) ROCE,
what have been stable at around 12 % for over two decades. In the recession of
1990 ROCE went under 10 % and in the internet crash 2001, it dropped to 5 %.
Yet, it seems to recover quickly.

2.1.6 Price to book value ratio (PB)

Price to book value is the intuitive comparison of the market capital and the book
value of the company in the balance sheet. There are slightly varying ways of
defining the book-value but the basic way defining it is by using share capital,
which is the difference between total assets and total liabilities. Retained earnings
are included in the equation because it is a profit retained to the company after
paying the shareholders, so it really is a tangible asset. PB ratio is

MARKET _ PRICE
PB = .(6)
TOTAL _ ASSETS − TOTAL _ LIABILITIES + RETAINED _ EARNINGS

Book value manipulation is possible because plant depreciates but the


management can choose the pace at which it depreciates. The annual depreciation
rates are regulated but companies can adjust depreciations according to their
results. Company owned old buildings are commonly complicated to value and
they can be depreciated “worthless” according to balance sheet but in reality, they
might be worth millions. Respectively, some equipment can be depreciated for a
couple of years and still have value in the balance sheet, although, nobody would
buy used equipment.

If a company is trading below its book value, it is usually thought as cheap. PB


ratios are usually low among capital intensive industries, such as engineering and

15
metal industry, because they are not supposed to grow rapidly in the future and
investments are time demanding processes. PB ratio is less meaningful for
companies that posses hidden assets, such as intellectual property, which is not
reflected in the book value.

(Penman 1996) nominated PB as an appropriate indicator (argued also that PE is


not sufficient) of earnings growth because PB is unaffected by current
profitability. Fama and French (Fama & French 1992, 1995) show that firms with
low PB have persistently low earnings, high financial leverage and are more likely
to cut dividends compared to companies with high PB.

2.1.7 Price to sales ratio (PS)

Price to sales ratio values stock by dividing the market price with the trailing 12
month revenue. Price to sales ratio does not take capital structure into account,
thus only similar companies should be compared. When comparing similar
companies, say, companies with similar capital composition, their price to sales
ratios tell a lot about the company’s competence of making revenue and how
much the markets value every dollar of the company’s sales. Price to sales ratio is
very handy in cases where large-scale costs occur and PE ratio becomes useless
because earnings may diminish even to negative level. Company might have been
investing heavily and the revenues are rocketing, so, valuation ratios should be
studied in a multivariate manner. PS ratio is

CURRENT _ MARKET _ PRICE


PS = . (7)
12 _ MONTH _ TRAILING _ REVENUE

One should be careful with revenues, because they can sometimes be net revenues
meaning that cash discounts are being subtracted. Some practitioners consider
relatively low price to sales ratio and rising stock price to be an investment
opportunity for a growth stock. Another warning sign might be rising receivables

16
even though revenue growth is string because then revenues are not collected. PS
is suggested to be a stable stock price predictor but PE ratio outperforms it in most
cases (Senchack & Martin 1987; Park & Lee 2003).

2.1.8 Gearing ratio (GEA)

Gearing is a financial ratio describing the level of company’s debt compared to its
share capital. The gearing equation below indicates the degree to which the firm is
funded by creditors and owners money. High levels of gearing is considered risky
but on the contrary, organic growth is not enough in most cases and financial
leverage is required survive. Gearing level must be considered in relation to its
peers and substantially high gearings should be regarded risky because in case of
an economic downturn, debt services cause serious risk for the company. Gearing
ratio is

NET _ DEBT
GEARING = * 100 . (8)
SHARE _ CAPITAL

Net debt is total debt deducted by liquid assets; cash and assets that can be
converted to cash immediately, such as savings deposits, certificates of deposit,
money market accounts and money market mutual funds.

Capital intensive industries, such as automobile industry, tend to have gearing


ratios as high as 2, compared to ratios well below one in less capital intensive
industries. Schools books put in brief (Ross et al. 1999):” Changes in capital
structure benefit the stockholders if and only if the value of the firm increases”.
Traditional corporate finance is based on Modigliani-Miller Theorem (Modigliani
& Miller 1958), which states that the firm is unaffected by how the firm if
financed, thus, in the absence of tax effects, transaction costs, an asymmetric
information and bankruptcy costs. In practice, the capital structure is optimized by
the absent settings mentioned, mostly according to tax effects and economic

17
situation. Equity issuance is beneficial for a firm in the times of high stock price
because more money can be gathered and managers avoid equity issuances if they
consider their stock undervalued. Recent studies also suggest that firm’s history
plays an important role in determining capital structure (Hovakiam et al. 2001).
Also highly profitable firms tend to pay down their debt and become less
leveraged. High payout ratio also affects gearing because it supports taking debt
for investments. Jensen (Jensen 1986) claims that the shareholders support paying
dividends for reducing resources under management’s control, thus, reducing
wasteful investing in negative NPV targets (mentioned also in Free Cash Flow).

Some terminology explaining corporate financing behavior:

The static trade-off theory (Myers 1984) says that a firm is viewed as setting a
target debt-to-equity ratio and makes choices according to the current and target
debt-to-equity ratio. The theory is for the tax benefit of debt and claims that the
marginal benefits of further increases in debt decrease, so, the debt-to-equity ratio
must be optimized according to marginal benefits.

The pecking order theory (Myers 1984) says that a firm prefers internal and debt
financing and there is no actual target debt-to-equity ratio. The theory suggests
that companies make the financing decisions according to the law of least effort,
or least resistance. Hence, the hierarchy of financing is internal funds, debt, and
equity as the last resort. The theory also claims that firms adapt their dividend
payout ratio to their investment opportunities, which makes dividend policies
sticky.

The market timing hypothesis (Baker & Wurgler 2002) does not generally care if
debt or equity is used but the choice depends on the current situation of the
financial markets and the price to be paid for the capital. Equity is issued when
prices are high and repurchasing when prices are low. Firms take advance of
perceived “mis-pricing” of markets in financing their business and therefore, the
hypothesis belongs to behavioral finance.

18
The neutral mutation hypothesis (Miller 1977) says that firms fall into financing
patterns and habits which have no effect on firm value. Habits make interest
groups feel comfort and predicting accurate.

2.2 Dividend discount model

Stocks are valued according to their future cash flows for investors, meaning
dividends, if any, and sale price after the holding period. The future cash flows
will be discounted according to investors yield requirement on the investment.
Dividend discount model (Ross et al. 1999) is the general starting point for all
security valuation methods and a number of researchers have found a positive
correlation between dividend yields and future stocks returns in a multiple-year
time horizon (Goetzmann & Jorion 1995). LeRoy and Porter (Leroy & Porter
1981) and Shiller (Shiller 1981), on the other hand, questioned the usefulness of
DDM by claiming that stock prices appear to be too volatile to be measured by
fundamentals. DDM is mostly used to with future estimates and using it with
realized dividends and market prices divides academicians into supporters and
opponents. The basic idea of DDM is (Ross et al. 1999)

Div1 P
P0 = + 1 . (9)
1+ r 1+ r

Net present value of a stock, considered one year ahead, is the sum of the
dividend, and the sale price after that year discounted by investors yield
requirement r, as can be seen in the equation above. When the proceeding n years
are in consideration, the formula evolves as (Ross et al. 1999)

n
Div1 Div2 Div3 Divn Pn
P0 = + +
1 + r (1 + r ) (1 + r )
2 3
+ ... = ∑
t =1 (1 + r )
n
+
(1 + r )n
. (10)

19
Macro economic conditions vary substantially in ten year period which is why
dividends and investor’s yields requirements cannot be assumed to be flat. A very
popular version of DDM is constant growth version, in which dividends are
assumed to grow at a constant rate g, as in the equation (Ross et al. 1999)

Div
P0 = . (11)
r−g

The equation is called Gordon formula and for the summation to be finite, it
requires g to be smaller than r. Estimation of the growth rate for dividends is
usually based on history trend and future prospects.

As we want to illustrate reality in more accurate way, the equation evolves in


differential growth DDM, in which we have more than one distinct growth
rates g i . There are two different growth rates in the equation (Ross et al. 1999)

DivT +1
T
Div(1 + g1 ) t
r − g2
P0 = ∑ + . (12)
t =1 (1 + r )t (1 + r )T

The discount rate, r, can be put in to the equation in exactly equal, differential,
manner.

DDM has problems as well as any model. The model can be viewed too static if
constant discount rates are used during long periods. Another problem is that
when counting infinite sums, the sums might dissolve, depending on the relation
between growth factor and discount factor. Yet another flaw from reality is that
commonly used zero growth dividends are not real. Sometimes companies cannot
afford paying dividends at all and usually they prefer paying constantly rising
dividends.

20
2.2.1 Discount factor

The risk-adjusted discount factor to be used to discount future flows into present
value consists of risk-free rate of return and risk premium. Risk-free rate of return
is the minimum return to be expected from any investment, although nothing is
purely risk-free. Risk-free rate of return is usually referred to as the interest rate of
three month US Treasury Bills. Risk premium is the extra pay investors expect to
achieve because of tolerating higher risk. As put in Luenberger (1997); it is a
simplistic way of taking uncertainty in account is to increase the interest rate.

The discount rate is the name of the rate at which US banks borrow from the US
Federal Reserve. It is also called key rate or FED funds rate. The FED adjusts the
key rate to control the liquidity in the markets to control the inflation. As banks
borrow the money further, their business is to benefit from it and charge higher
interest rate than the key rate. The key rate sets up the general level for interest
rates and the end-user interest rates for each period of time are determined by
supply and demand for money, which are in turn greatly affected by the economic
outlook.

The risk premium is the rate of return above risk-free interest rate, in other words,
the reward for holding a risky investment rather than a risk-free one. Risk can be
divided in two; market risk and specific risk. Market risk (systematic risk) cannot
be avoided because economic cycles affect the whole market, yet stocks are
affected differently than bonds. Specific risk (unsystematic risk) depends on the
investment giving the investor more control on the risk she is willing to take.
Unlike the market risk, specific risk can be diversified away.

Inflation is a risk for lender because purchasing power of the amount lent, in most
cases, is not as much as it was in the beginning. As the general level of goods and
services is rising, lenders expect to be compensated for lending the money. The
most well known are CPI (Consumer Price Indicator), which measures the

21
consumers prices constantly and defines the change as inflation and GDP deflator,
which measures the cost of goods purchased by U.S. households, government and
industry.

Liquidity premium is a term used to explain the difference between two loans
otherwise similar but the maturity dates. Short term loan is expected to be less
risky than long term, giving the long term loan wider premium. In graphics, this is
called the term structure which describes the relationship of spot rates with
different maturities in which the yield curve is upward rising.

Company’s cost of capital is frequently used as the standard interest rate for
discounting future cash flows in to the present value. The cost of capital is a
weighted sum of cost of equity and cost of debt (weighted average cost of capital,
WACC), in which the tax benefit of deductible interest payments is included

rdebt (1 − TAX ) .
EQUITY DEBT
WACC = requity + (13)
EQUITY + DEBT EQUITY + DEBT

The proportions of equity and debt are calculated with market values instead of
book values. The cost of debt is the easy part of the WACC because it is usually
clear how much a company is paying for their loans and bonds but the cost of
equity is trickier. The cost of equity is normally higher than the cost of debt
because it involves the risk premium. Actually, the cost of equity is the yield
requirement of shareholders for lending the capital and bearing the risk of
ownership. A common method for estimating the cost of firm’s equity is to use
dividend capitalization model. The dividend capitalization model approximates
the future dividends which capitalize to the current market price. If the company
is not paying dividends, those can be estimated by comparing its average net
income and cash flow with a similar-size firm. The formula for dividend
capitalization model is also called Gordon Model (Gordon 1962)

22
NEXT _ YEARS_ DVDS
COST _ OF _ EQUITY= + DVD_ GROWTH_ RATE. (14)
MARKET_ VALUE

The Gordon model itself is primarily used as a stock valuation method, but it can
be used to assessing the cost of equity from dividend trend. The model should
only be used with mature firms with low growth rates because of the assumption
of constant growth rate in perpetuity.

Another model for calculating the cost of equity is Capital Asset Pricing Model
(CAPM) which describes the relation between expected return and risk. The
model begins with time value of money in the form of risk free rate and continues
by taking into account asset’s sensitivity to market risk. The risk premium is the
difference between market return and risk free rate and the premium will be
multiplied the sensitivity coefficient beta, β which yields the return above risk
free rate. In other words beta is asset’s volatility in relation to the rest of the
market. In theory, the market portfolio includes all the assets in the economy in
proportion to their size but in practice, the S&P 500 index has often been referred
to be the market portfolio with beta of 1. Luckily, news agencies such as Reuters
and Bloomberg offers betas for many of the listed stocks. The model is defined as

rasset = r free + β (rmarket − r free ) . (15)

23
3 Bankruptcy prediction models

3.1 Statistical Models

Statistical models focus on symptoms of failure drawn mainly from company


accounts. Statistical models follow classical standard modelling procedures and
can be multivariate or univariate. Statistical models are mostly based financial
ratios but the calculation methods divide them into two dominating research
models: discriminant models and logarithmic models. Both the models lack the
direct influence of corporate governance structures and management practises in
numerical forms (Adnan & Dar 2006), yet, they are found useful and intuitive
explanatory models.

3.1.1 Linear Discriminant Analysis (LDA)

Linear discriminant analysis (LDA) (Lachembruch 1975) concentrates on


assigning observations to two or more distinct groups. The discrimination is based
on their characteristics, thus two dimensional data is projected in to one
dimension. The discrimination of the observations into different groups is realized
with a maximal separation between groups and the question is to fairly select the
characteristics involved in the analysis. LDA has an assumption of being able to
classify the initial data correctly into different groups which then will be used to
evaluate weights to each variable (characteristic). Hence, LDA is a parametric
method meaning that all the variables should be normally distributed. LDA is
closely related to the truly common method in the statistics, regression analysis,
with the difference of having a quantitative variable, where LDA has categorical
variable.

The idea is to discriminate the observations into q different groups C1 , C2 ,..., Cq .

The first objective in LDA is to identify a set of variables that has the strongest

24
discriminating ability (Sharma 1996). Those variables are called discriminator
variables. Means xiC j , i = 1,2,..., r , j = 1,2,..., q , will be calculated for each

variable x i in each group and the number of the variables involved is denoted by r.
Categorization of observations is based on their “z-scores” given by the weight
( wi ) function. It is also called discriminant function

zi = x1w1 + x2 w2 + ... + xn wn = X ′w . (16)

Two conditions must be satisfied to provide the maximum separation for z: the
group means in the z should be as far apart as possible and values of z in each
group should be as homogenous as possible. The two conditions are conjoined in
having maximum between-group sum of squares and minimum with-in sum of
squares. The second objective of LDA is to identify z, which provides the best
maximum separation into the distinct categories. The third objective is to classify
future observations into each of the groups, respectively.

LDA problems can be solved with Fisher’s (Fisher 1936) criterion

(x1 − x2 )2
J ( w) = , (17)
s12 − s 22

where within group variances are calculated by

∑ (xij − xi )2 , i = 1,2,..., g .
1 ni
s i2 = (18)
ni − 1 j =1

Maximizing Fisher’s criterion yields a closed form solution. In order to find the
maximizing vector wopt , we have to calculate the first derivate from the criterion

25
and solve the equation J& ( w) = 0 . The criterion needs to be rewritten to solve the
equation (Sharma 1996). The criterion can be written as

wT Bw
J ( w) = . (19)
wT Ww

B is the between classes correlation matrix and W is within groups correlation


matrix. The relation between B and W is that together they form a matrix which
has the sums of squares as diagonal values and sums of cross products as off-
diagonal values. It is also called SSCP matrix (Sums of Squares and Cross
Products)

SSCP = X T X = B + W . (20)

The solution for the weights in the criterion is

w = S −1W ( x1 − x2 ) . (21)

Discriminant analysis has three assumptions that the data should meet;
multivariate normality, equality of the covariance matrices and independency of
observations (Sharma 1996). In order to achieve statistically significant results,
discriminator variables come from a multivariate normal distribution. In theory,
classification results are also affected if the assumption is violated. Unfortunately
there is no clear-cut answer how much the variables can deviate from the
normality. Although, studies have shown that even if the overall classification rate
is not affected, some groups might enjoy overestimation of suffer underestimation
(Lachenbruch et al. 1973). Violation of the assumption for the equality of
covariance matrices also affects the significance tests and the classification
results. The degree to which they are affected depends on the group sizes and the
number of discriminator variables (Marks & Dunn 1974). In cases of unequal
group sizes and when the number of discriminator variables is large, the null

26
hypothesis for equal group means is rejected too often. The equality of the
covariance matrices assumption can be tested with Box’s M test variable, which,
in turn, can be approximated as an F-statistic. Discriminant analysis is quite robust
to the two assumptions but it is beneficial to know the possible effects of violating
these assumptions. The final assumption of independence of observations is less
discussed but it has a substantial effect on the power and on significance level, as
well. The assumption is often violated when delicate procedures are used to for
samples causing correlation among the observations. One can use stringent alpha
levels if the observations are assumed not to independent of each other.

In 1968, Edward Altman made the pioneer research in discriminant analysis with
financial ratios. He assessed the quality of ratio analysis as an analytical technique
and the prediction of corporate failure was used as an illustrative case. He
gathered sixty-six firms to adjust the model to best categorize between bankrupt
and non-bankrupt firms. The firms were from the US manufacturing sector from
years 1946-1965 and the mean asset size is $6.4 million. He used five financial
ratios as explanatory variables. Altman was able to achieve 94 % classification
rate with the initial sample. He tested the model with several secondary samples
which validated his results. The model could predict bankruptcy two years prior to
the actual failure.

According to a complete history review (Adnan & Dar 2006), about 30 % of the
bankruptcy prediction research is realized with dicriminant analysis. The
geometric mean of the prediction rates is 85 % among the 25 past studies he had
gathered to the review and DA ranked number one in bankruptcy prediction
scene.

The two most frequently used methods for deriving the variable profile in LDA
are simultaneous method and stepwise method (Laitinen & Laitinen 1998). The
simultaneous method is direct and the discriminant analysis is executed with ex-
ante defined variable profile. Stepwise method, on the contrary, uses forward
selection, backward elimination, or stepwise selection. Forward selection begins

27
with no variables in the model and at each step, if the variable that contributes the
least to the classifying ability measured by the coefficient of determination, R 2 ,
fails to meet the criterion to stay, will be removed. Once in the profile, the
variable stays there. The backward elimination begins with full variable profile
and they are eliminated one by one if they do not contribute significantly to the
degree of discrimination; R 2 . Stepwise procedure is a compromise between the
two other procedures. Stepwise procedure starts out empty and the order of entry
for the variables included in the model is solely based on their statistical criteria
(Tapachnick & Fidell 2000). Variables can also be eliminated from the profile if
they are not found significant anymore. The interpretation of the variables is not
important in any of these methods which might not go along with reality for some
variables being more meaningful than others. These methods are useful for testing
explanatory variables especially when there are loads of them but the researcher
must be cautious with the methods for not discarding important variables
unconsciously.

3.1.2 Logit models

Logit models provide results that are easy to interpret because they are based on
probabilities. Each x is represents a certain financial ratio and they are weighted (a
& b) according to past-data usually by the method of maximum likelihood. In
other words, the logistic regression determines whether each explanatory variable
has a predictive relationship with dichotomous dependent variable. Logit models
are analogous to multiple linear regression methods when the dependent variable
is binary. The model as in the basic form

ln( p /(1 − p)) = a + b1 x1 + b2 x 2 + b3 x3 + ... + bn x n . (22)

The main advantage against discriminant analysis is that logical regression model
does not require variables to be normally distributed and samples to have equal

28
covariance matrixes. The probability for the delayed payment p fixed from the
equation 22, as

1
p= − ( a + b1 x1 + b2 x 2 + b3 x3 + ... )
. (23)
1+ e

The logistic curve illustrates the relation between probability, p, and the
independent variable x (Figure 2). The logistic function “normalizes” all the
values non-linearly to probability scale; 0 to 1.

p 1

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0
X

Figure 2 Logistic curve.

In 1985, Zavgren made a bankruptcy prediction research for 90 companies


(Zavgren 1985) from the years 1972-1978. He made the actual prediction for one
to five years prior to actual corporate failure. The prediction was reliable only one
year prior to the failure by an accuracy rate of 82 %.

According to the comprehensive history review (Adnan & Dar 2006), logit
models account about 21 % of the bankruptcy prediction research methods.
Average predictive accuracy among 19 past studies of the logit models is 87 %.
Logit model can be seen as simple and competitive methods for bankruptcy
prediction.

29
3.2 Artificial intelligent expert system models (AIESM)

Artificial intelligent expert system models also focus on symptoms of failure


drawn from company accounts. AIES models depend heavily on computer
technology and they are multivariate in nature. There are many different kinds of
AIESM models (Adnan & Dar 2006) but neural networks reflect the basic idea of
the field.

3.2.1 Neural networks

Compared to discriminant analysis, neural networks have a major advantage of


not having a priori assumptions regarding the underlying structure of the
relationship. Non-parametric models illustrate reality better than linear, parametric
models but linear models are easier to calculate. Neural networks consist of
neurons that work together to produce an output from inputs. Neural networks
were initially invented to simulate the processes of human brain (Brockett et al.
2006) giving the structure for neurons. Neurons are interconnected, meaning that
a neuron has many inputs and the neuron calculates weighted summation of the
inputs to give an output. Mathematical neural networks function by constantly
adjusting the weights of the summation. The process of learning (recognizing
patterns, changing the interconnections, developing generalizations, etc.) is called
the training rule (Brockett 1994). Neurons are grouped in layers, so the neural
network functions in parallel, which gives the advantage of functioning even in
the case of malfunction of some of the neurons. Similarly to the discriminant z-
score, the weighted aggregate from inputs is calculated. Then, the sum is
interpreted by the activation function to be sent out from the neural unit. The
activation function is usually a logistic function (Smith & Gupta 2002)

1
F ( z) = , (24)
1 + e ( − aZ )

30
in which, a determines the steepness of the slope and, Z is the weighted
summation score. If the activation function would mimic a real neuron, it would
give binary output but for many practical reasons, a smooth function is used. The
output is usually centered to small values around zero. The most often-used
activation functions are threshold, sigmoid and hyperbolic tangent. The use
depends on the characteristics and the range of the wanted outputs.

Neural networks offer strong results for corporate failure prediction. Neural
network models neither explain how they ended up in the classification, nor do
they give a likelihood of possible failure. Teaching the neural network is time
consuming and finding the most suitable model may be difficult because there are
many different models to pick from. A neuron is depicted below (Figure 3).

Figure 3 Neural network neuron.

Neural networks approach outperforms the linear discrimant analysis marginally


in corporate failure prediction, especially in classifying financially troubled firms
(Zapranis & Ginoglou 200). Yang et al. (1999) compared neural networks with
DA with data gathered from years 1984-1989 from 122 companies in the US oil
and gas industry. Five financial ratios were used to explain bankruptcy; net cash
flow to total assets, total debt to total assets, explorations expenses to total
reserves, current liabilities to total debt and the trend in total reserves. Yang’s
results were a little ambiguous depending on the data processing methods but the
conclusion was that Fisher’s discriminant analysis predicted bankrupt companies
more accurately than the neural networks.

31
According to the history review (Adnan & Dar 2006), past studies of neural
networks have had an average prediction rate of 87 % and about 9 % of the
bankruptcy prediction studies have been neural network studies. Artificial
intelligent models cover about a fourth of the bankruptcy prediction research.

3.3 Models based on economic theories

Theoretical models concentrate on qualitative causes of failure and they are drawn
from information that could satisfy the theoretical argument. Theoretic models are
also multivariate in nature and they usually employ a statistical technique to
support the qualitative theoretical argument. Four types of theoretic models are
reviewed below.

Balance sheet decomposition measures the changes in the balance sheet and relies
on the assumption that companies try to maintain equilibrium in their assets.
Heavy changes may be signs for financial distress in the future. Decomposition
measures can include current assets as a fraction of total assets, current liabilities
as a fraction of total assets, long-term liabilities as a fraction of total assets, etc.
Booth (1983) produced empirical evidence that failed and non-failed companies
have distinct characteristics in the composition of their balance sheet, even
though, his model was unable to successfully classify non-failed companies. Yet,
balance sheet decomposition is a useful tool for assessing companies’ financial
condition.

In Gambler’s ruin theory, firms are seen as gamblers betting constantly with some
probability of loss (Adnan & Dar 2006). Ultimately the game ends as the firm
fails. Flipping a coin is a good example of gambler’s ruin theory, because the
player who starts with more coins is more likely to win, even with equal odds.

Cash management theory puts weight on short-term cash balances of a firm


(Adnan & Dar 2006). Imbalances with inflows and outflows can cause financial

32
distress and insolvency. Cash management theory models minimize the costs of
cash management, optimize the capital structure and maximize the present value
of net cash flows (Zapranis & Ginoglou 2000). The cash management would be
much simpler in an optimal business world without lags in payments. Cash
management is needed for preparing for unexpected costs, debt services,
inventory fills, reserve cash for varying revenue, etc... The variables used in the
cash management models vary but can include for example the elasticity of cash
balance with respect to volume of transactions or elasticity of cash balance with
respect to opportunity cost rate. Empirical studies support the idea of cash
management behaviour changes notably prior to the times of financial distress
(Zapranis & Ginoglou).

Credit risk theories are mostly for money borrowing firms. Financial institutions
have created a number of models for measuring credit risk and the models are
based international bank regulatory framework BASEL, as well as on corporate
finance theories. Two influential benchmark models (Gordy 2000) are J.P.
Morgan’s CreditMetrics and Crédit Suisse’s Financial Product’s CreditRisk.
Specifically, models are measuring the portfolio value-at-risk for market risk. In a
comparison of the two, the models were claimed not having serious differences
and they are profoundly suitable for comparing the relative risk levels in two
portfolios than producing absolute levels of risk (Gordy 2000).

33
4 Empirical analysis

Altman (1968), as well as other researchers have been using the US


manufacturing sector as their test field for their statistical multivariate prediction
models. The reasons for this cluster are that the enormous size of the US economy
and the large number of listed companies, as well as, the availability of financial
information. Also, many researchers come from the US and it is a common
tradition to inspect domestic markets. The manufacturing sector is suitable for
statistical multivariate research because it is stable and more visible than, for
example, financial sector, where the business is not as concrete as in the
transforming raw materials into goods. Manufacturing sector manufactures
finished goods from primary materials offered by primary sector and sells them to
other businesses, export or to domestic consumers.

The economic sector used in this research is more extensive, industrial sector, as
Bloomberg defines it. The industrial sector consists of 10 sub-sectors;
aerospace/defence, building materials, electrical components and equipment,
electronics, engineering and construction, environmental control, hand- and
machine tools, construction and mining machinery, diversified machinery, metal
fabricates and hardware, miscellaneous manufacturing, packaging and containers,
shipbuilding, transportation and trucking and leasing. The industrial sector seems
capital intensive, as they need plants for production and the sector seems rather
stable. On the other hand, the industrial sector suffers from raw material and
energy price hikes and from increased competition in emerging economies, in
which the cost of labour is smaller. The industrial sector also includes cyclical
industries, such as construction but when compared to other sectors in
Bloomberg’s partition, the industrial sector is thought as the best for the research.
The industrial sector included 970 listed companies in the end of year 2006, with
market capitalization above $1 million.

34
The other sectors than the industrial sector in Bloomberg are basic materials,
communication, cyclical consumer products, non-cyclical consumer products,
diversified, energy, financial, technology and utilities. Another sector used is also
included to examine the inter-sector predictive capability of the discriminating
function. Validation of the predictive capability is an essential part of prediction
model formation. The reference sector is non-cyclical consumer goods, as it was
thought to be the most similar sector with the industrial sector in the Bloomberg’s
partitioning. Non-cyclical consumer sector includes agriculture, beverages,
biotechnology, commercial services, cosmetics and personal care, food, healthcare
products, healthcare services, household products and wares and pharmaceuticals.
Non-cyclical consumer companies are called defensive and they have tendency of
outperforming the market in the times of recessions because they produce
products that we are not used to live without. In the end of year 2006, 446 non-
cyclical companies existed with market capitalization above $1 million.

The first decision was to use annual historical data instead of quarterly data
because annual data seemed to have less variance. Companies fix their quarterly
results knowingly, although the business stays as it is and the management
prettifies results for example to please shareholders. Results can be fixed in the
short run but they have less adjustment tolerance with annual results, in which “all
comes together”. That makes annual results clearer than the quarterly data and it is
used in this Thesis. Also, annual data was better available than quarterly.

After some examination, it seemed that Bloomberg has history data from the
beginning of the 90´s. There is not much of financial ratio data available in the
80’s although price history might be available way longer in the past. Therefore,
the history data used to for the samples begins in 1989 and continues until the
latest full year; 2006.

35
4.1 Valuation phase

All the companies in the two selected sectors, industrial and non-cyclical, are
valued with the dividend discount model. The discount rates are calculated from
the Fed Funds rate, as it can be seen as the risk free return on investment. The Fed
Funds rate is used in order to discount dividends and market prices a realistic
manner. The goal of the Thesis is to valuate stocks on a relative basis which is
why the general level of the discount rate is not of greatest interest. Important is
that the discount is the same for all the observations. The annual Fed Funds rates
from years 1989-2006 are shown in Table 1.

Table 1 Fed Funds rate.


Year Fed Funds rate
1989 9,156
1990 7,875
1991 5,250
1992 3,438
1993 3,000
1994 4,500
1995 5,813
1996 5,250
1997 5,500
1998 5,250
1999 5,125
2000 6,375
2001 3,375
2002 1,625
2003 1,063
2004 1,563
2005 3,500
2006 5,125

The valuation of a stock is the ratio between the price at time t and the sum of
discounted dividends from the years t, t+1, t+2, t+3, t+4, t+5, t+6 and t+7 and
discounted sale price from year t+7;

36
P0
VALUATION = 7
. (24)
Div t P7

t =1 (1 + ri )
+
(1 + r7 )

As the valuation is enforced with 7 year dividend discount model, the latest
possible valuation point in time is in the year 1999 with data reaching to year
2006. This is the major limitation of the analysis. The history data starts from the
year 1989 but the valuation is started two years later, in the year 1991. The aim is
to avoid mispricing of stocks in the first few years after the stock exchange listing.
When a fresh new company is about to gather capital by listing itself in a stock
exchange, it arouses plenty of prejudice and enthusiasm about its value.
Companies are usually heavily mispriced at the moment of listing but, sooner or
later, the markets tend to fix the price at the correct level. According the two year
existence restriction, the valuation is executed for the years 1991-1999.

Now that there are nine separate periods to enforce the valuation, there needs to be
nine separate sets of interest rates formed from the annual Fed Funds spot rates.
The required short discount factors can be calculated with rolling over an
investment each year with a spot interest

t ∗+7
1
rt∗ = ∏ . (25)
t =t * (1 + Fed _ Funds _ ratet / 100 )

As can be seen in Table 2, discount factors derived from short rates vary some
because of sliding “current time”. In other words, different sets of spot rates are
used to calculate them.

37
Table 2 Discount factors.
Period t t+1 t+2 t+3 t+4 t+5 t+6 t+7
1989-1996 1,000 0,927 0,881 0,851 0,827 0,791 0,748 0,710
1990-1997 1,000 0,950 0,919 0,892 0,853 0,807 0,766 0,726
1991-1998 1,000 0,967 0,935 0,907 0,868 0,821 0,780 0,739
1992-1999 1,000 0,971 0,929 0,878 0,834 0,791 0,751 0,715
1993-2000 1,000 0,957 0,904 0,859 0,814 0,774 0,736 0,692
1994-2001 1,000 0,945 0,898 0,851 0,809 0,769 0,723 0,700
1995-2002 1,000 0,950 0,901 0,856 0,814 0,765 0,740 0,728
1996-2003 1,000 0,948 0,901 0,857 0,805 0,779 0,767 0,759
1997-2004 1,000 0,950 0,904 0,850 0,822 0,809 0,800 0,788
1998-2005 1,000 0,951 0,894 0,865 0,851 0,842 0,829 0,801
1997-2006 1,000 0,950 0,904 0,850 0,822 0,809 0,800 0,788

Companies with market capital below $300 million and above $3 billion at the
moment of examination are excluded from the samples. Therefore, companies can
be called mid-caps, although the definition of mid-caps varies greatly. The mid-
cap restriction is not very strict but it excludes very unstable start-up companies
and, on the other hand, it excludes giant businesses that are very powerful and
exceptional in numerous ways.

4.2 Basic sample, hold-out sample and external sample

The nine year period is divided in to basic sample, 1991-1996, and hold-out
sample, 1997-1999. The idea of the two samples is that the basic sample is for
setting up the coefficients for the discriminant function (in-sample prediction) and
the hold-out out sample is for trying out the predicting capability of the
discriminant function (out-of-sample prediction). Yet, another out-of-sample
prediction is enforced to the external sector of non-cyclical consumer goods for
the years 1991-1996. Researchers constantly argue whether to use the in-sample
prediction or the out-of-sample prediction (Atsushi Inoue and Lutz Kilian 2002).
The conventional wisdom is that in-sample prediction has a weakness of model
overfitting and out-of-sample prediction protects against it. The overfitting with
in-sample prediction yields from using (at least partially) same data to adjust the
model and to assess the prediction capability.

38
4.3 Selection procedure of companies

The industrial sector has 290 companies with full price history starting from 1989.
The procedure is to start from year 1991 and select nine smallest valuations,
meaning nine cheapest companies. Each observation is obliged to have full ratio
data and to have market capitalization between $300 million and $3 billion. The
chosen ones are then being excluded from the list and the same choosing is
enforced for the nine most expensive companies, respectively. This is repeated for
each of the six years and the basic sample then consists of 108 companies, 54
cheap and 54 expensive ones. The reason to start from the furthest year from now
is rationalized with the fact that there are more companies with full history data as
it gets closer to the present day. That leaves bigger sample for the hold-out period,
in which companies have to have complete price history two years before the
start; 1995.

The hold-out sample is formed in a similar manner, starting from year 1997 with
the nine cheapest companies. The hold-out time period includes 300 additional
companies with the two year existence rule compared to basic sample. However,
the number of companies with full ratio data with the restrictions mentioned
earlier is rather limited. Nine cheapest and nine most expensive companies are
chosen for each year, starting from year 1997. Consequently, the hold-out sample
consists of 54 companies, 27 cheap and 27 expensive ones.

The extra sector of non-cyclical consumer products is treated in the exact same
fashion. The list of potential companies is smaller than the list of industrial
companies with total of 130 companies with restriction of having full price history
from year 1989 until 2006. This time, the five cheapest and five most expensive
companies are chosen each year which results in a sample of 60 companies.

39
4.4 Approaches to sample forming

After many trials, the procedure for forming the samples in a yearly order with
matching the same number of the cheapest and the most expensive ones seems
practical. The procedure balances all the years with the same amount of
observations in the both categories. An alternate procedure would be to simply
choose the cheapest ones without paying attention to the year they occur. These
could be matched with the most expensive ones, respectively. Then, the
discriminating function may concentrate on some of the years more than others
and the universal applicability might get biased. The paired-sample design is
frequently used technique to form samples. Beaver (1966) is a well recognized
researcher in bankruptcy prediction and he, among others, used paired-sample
design in his analysis. At first, Beaver chose the bankrupt companies (cf. cheap
companies) and, next, he started selecting pairs (cf. expensive companies) for
each bankrupt company. The pairs needed to be from the same sector, to have
rather equal market capitalization and to be from the same year. Compared to
Beaver’s paired-sample design, the market capitalization matching has been left
out, as the market capital restriction of $300m-$3bn is thought to incur adequate
homogeneity among the companies. Also, the paired-sample design turned out to
require way bigger initial samples as the sampling rules are so strict. In this case,
the paired-sample design felt somewhat like data-mining, although the Thesis is
about data analysis. To be clear, data mining focuses on extracting useful
information from large sets of data (Mannila et al. 2001), and the crucial point is
that the data-mining applications are to some degree self-guiding. Data analysis,
for its part, is not aiming to the discovery of unforeseen patterns hidden in the
data, but to assessing existing model into the data or extracting parameters for a
model to adapt it to reality.

40
4.5 Ratio profile

The ratio data gathered for each company in the two sectors includes, price to
earnings ratio (PE), gearing ratio (GEA), price to book value ratio (PB), free cash
flow per share yield (FCPY), cash flow per share yield (CPY), return on capital
employed (ROCE), earnings before interests and taxes margin (EBITM) and price
to sales ratio(PS). The dividend yield is excluded from the profile because if a
company is not paying dividends, it is a systematic error in a form of zero or
empty value. All the variables are continuous and, for example, companies with
negative earnings must be excluded because only positive PEs are announced. All
the rest are announced to be zero or empty value, which makes them
discontinuous.

41
5 Results on predictive accuracy

5.1 Mean values and variances in the basic sample

Class specific and overall mean values and variances of the basic sample are
delivered in the Table 3. Each of the eight ratios is inspected as well as market
capitalization of the companies. The variances are given as a percentage share of
the mean value, thus, they are the coefficient of variance. From now on, group I
refers to cheap and group II to expensive companies.

Table 3 Mean values and variances of the basic sample.


Variable Cheap CV Expensive CV Overall CV
PE 34.858 3.068 28.957 1.152 31.908 2.473
GEA 60.966 2.789 46.380 1.675 53.673 2.455
PB 2.875 0.876 3.170 0.941 3.023 0.911
FCPY 0.095 2.918 0.004 20.179 0.049 4.201
CPY 0.275 3.347 0.090 0.930 0.183 3.604
ROCE 15.832 0.602 26.242 3.851 21.037 3.405
EBITM 11.628 0.585 10.779 0.558 11.203 0.572
PS 1.102 1.076 1.483 1.410 1.293 1.317
MCAP 748.199 0.862 874.486 0.668 811.343 0.759
Count 54 54 108

The market capitalization is restricted to $300m-$3bn and the average


capitalization in the group of expensive stocks is about one fifth higher than in the
group of cheap ones. The overall average market capitalization is $811m with a
variance of 75.9 %. Companies in among the expensive ones seem to be more
homogenous in size with a variance of 66.8%.

In general, the starting point with valuation ratios is that the higher the ratio, the
more expensive the stock. Price to sales-ratio and price to book-ratio are 10% and
35% higher in the group I, which argues with intuition. The average PE-ratio level
of 31.908 is generally really high but the massive variance of 306.8 % in the
group I refer to outliers, which in turn, can explain the illogical order of
magnitude.

42
The group of cheap stocks is more heavily geared than the group of expensive
stocks but again, there is a tremendous variance of 278.9 % in the group of cheap
stocks. A somewhat logical explanation to higher gearing levels in the groups of
cheap stocks could be that small, growing companies need to take more debt to be
able to grow. Markets might price the group I stock low because of risk stemming
from the relatively high debt. The overall gearing ratio is moderate 54 %.

The cash flow ratios of CPY and FCPY and stronger among the cheap stocks but
they are the most severely affected by the variance. The FCPY ratio in the group
of expensive stocks has the record variance of 2017.9% which is not bearable for
any multivariate analysis.

The profitability ratios of ROCE and EBITM are at an understandable level,


except the high variance of ROCE in the expensive group. EBITM is marginally
stronger in the group of cheap stocks and on the contrary, ROCE is weaker in the
group of cheap stocks.

5.2 Variable correlations

Correlations between variables, without paying attention to groups, are in Table 4.


The biggest correlation is between FPCY and CPY, which is understandable,
because of the similarity of the calculation methods. The FCPY subtracts more
than just the operational costs, such as capital expenditures and changes in
working capital. The correlation of 0.92 affects multicollinearity which
complicates the interpretation of individual variable influences in the discriminant
score composition.

The second highest correlation of 0.75 is between PB and PS. This correlation is
also a strong one and it stems from the micro finance basics: sales is the input for
the balance sheet, costs subtracted, of course. Also, the companies in the sample
must have quite similar EBITM because correlated amounts of money are wasted

43
and brought to balance sheet. The lowest value in the overall coefficients of
variance (Table 3) speaks out for the conclusion of the similar EBITM between
the groups. The third high correlation is between EBITM and PS, which
complements the aforementioned computational relationship from micro finance.

Table 4 Variable correlation in the basic sample.


Variable PE GEA PB FCPY CPY ROCE EBITM
PE
GEA 0.15
PB 0.24 0.09
FCPY -0.12 -0.03 -0.12
CPY -0.06 0.07 -0.15 0.92
ROCE 0.01 -0.00 0.08 -0.00 -0.03
EBITM -0.11 -0.21 0.22 0.06 0.06 0.04
PS 0.14 -0.26 0.75 -0.10 -0.13 0.02 0.49

Other correlations between the variables are not significant. The highest negative
correlation of -0.26 is between PS and GEA and it suggests that the level of net
debt increases as the sales increase or market price decreases.

5.3 Variable normality

As is the case with the most of the multivariate techniques, the significance of
statistical tests in discriminant analysis requires certain assumptions to be
fulfilled. Violating the assumptions can influence the significance and the power
of the statistical tests. The first assumption is that the data comes from a
multivariate normal distribution. The overall glance of the mean values and
variance of the basic sample suggests that the variables are not normally
distributed.

At first, the individual variable dispersions are examined graphically. The values
of the variables are plotted as an inverse of the standard normal cumulative versus
the ordered observations. First, the data is arranged from smallest to largest, and
then the cumulative percentiles are determined. The z-scores are then calculated
from the standard normal distribution and each of the z-value is plotted against the

44
corresponding data value. If the data is normally distributed, the plotted points
form a straight line and stragglers at either end indicate outliers. Variables are
examined by groups. The observation of the probability plots revealed the fact
that in most of the cases, there are just a couple of sky high outliers. The
assumption for the variable normality is rejected for all the variable groups
because of those extreme values. The closest groups to the normality are GEA and
CPY in group II, ROCE in group I and EBITM in both groups.

The most famous and seemingly the most powerful individual test for normality is
Wilks-Shapiro. The Wilks-Shapiro test is enforced for both the groups with all the
8 variables. The test-values, probability levels and decisions at the significance
level of 0.05 are in Table 5. All the variables in both groups are rejected for
normality at significance level of 5%. There are only 6 out of 16 with a
probability level of above zero with six decimals. The same six variable groups
were concluded to be close of being normally distributed in the “eye-balling”
earlier. CPY in the group of expensive stocks is the most normal with a
probability level of 0,006998.

Table 5 Wilks-Shapiro normality test for the variables.


Test Prob Decision
Variable Value Level 5%
PE - Group I 0.201 0.000000 Reject Normality
PE - Group II 0.475 0.000000 Reject Normality
GEA - Group I 0.521 0.000000 Reject Normality
GEA - Group II 0.883 0.000080 Reject Normality
PB - Group I 0.681 0.000000 Reject Normality
PB - Group II 0.521 0.000000 Reject Normality
FCPY - Group I 0.411 0.000000 Reject Normality
FCPY - Group II 0.864 0.000021 Reject Normality
CPY - Group I 0.213 0.000000 Reject Normality
CPY - Group II 0.937 0.006998 Reject Normality
ROCE- Group I 0.897 0.000221 Reject Normality
ROCE - Group II 0.174 0.000000 Reject Normality
EBITM - Group I 0.856 0.000012 Reject Normality
EBITM - Group II 0.897 0.000218 Reject Normality
PS - Group I 0.701 0.000000 Reject Normality
PS - Group II 0.519 0.000000 Reject Normality

45
The assumption of multivariate normality is necessary for the significance tests of
the explanatory variables and for the discriminant function itself. The degree to
which the assumption can be violated cannot be specified scientifically, but earlier
research has shown, that although the overall classification rate remains
unaffected, some groups might suffer from underestimation and some from
overestimation (Lachenbruch et al. 1973). As it is said in the NCSS’s (statistical
software) help page about discriminant analysis; “a sample size of at least twenty
observations in the smallest group is usually adequate to ensure robustness of any
inferential tests that may be made”. The phrase in the help page refers to central
limit theorem, which suggests that the sum of variables is more likely to be
normally distributed as the number of observations increases, despite the
distributions they come from. The central limit theorem suggests that in the end,
everything is normally distributed even though variables being discontinuous. The
multivariate normality assumption is not very strict and researchers have not paid
much attention to it in the earlier research. Unfortunately, there are very few tests
for the examination of multivariate normality and in this regards, graphical
examination is enforced (Johnson & Wichern 1987).

The multivariate normality is checked with Q-Q plot which is formed in the
following way. First, Mahalanobis distances are calculated for each of the
companies. Mahalanobis distance is a statistical distance from the sample
centroid. The distances are calculated with equation


MDrs = ( xr − xs )V −1 ( xr − xs ) .
2
(26)

The matrix in the middle is the covariance matrix of the x, defined as: -V=cov(x).
Second, the distances are sorted to the order of magnitude and percentiles are
calculated for the distances according to the observation number j with an
equation of (j-0.5)/n. The n is the total number of observations. Third, Chi-square
values are calculated for the percentiles. It has been shown that when the sample
size is sufficiently large (25 or more) and when the parent population is normal,

46
the mahalanobis distances behave like a Chi-square random variable
(Gnanadesikan 1977).

Figure 4 Chi square plot for the basic sample.

The Chi-square values are plotted against the Mahalanobis distance in Figure 4. A
curve is also fitted to with a least sum of squares method to the points in Figure 4
to examine the linearity of the plot. The plot appears not be linear at least because
of extreme observations on the right side. The majority of the observations are
concentrated on the small end where the slope seems rather steep and the plot is
also skewed to left. The normality assumption is rejected after the graphical
observation of the plot. Obviously, the test based on Q-Q plot is subjective,
because the researcher visually determines the linearity. The more analytical way
of assessing the linearity of the plot is to compare the correlation coefficient of the
plot to the critical values. The critical values give the percent points of the
cumulative sampling distribution of the correlation between sample values and
theoretical quantiles obtained empirically by Filliben (1975). The correlation of
the plot is 0.740 and compared to the critical value for the alpha level of 0.05 with

47
100 (closest to 108) observations of 0.987 (Sharma 1996:446), the plot is not even
close to linear. Although the critical values are computed for univariate
distribution, they can be used as a benchmark.

5.4 Equality of covariance matrixes

The second assumption for discriminant analysis is the equality of covariance


matrices. Violation of the assumption affects both, type I and type II error but type
I is more severely affected (Sharma 1996). The type I error refers to the
probability of falsely rejecting the null hypothesis due to change and the type II
error is the probability of correctly rejecting the null hypothesis when it is false.
As it came up with the normality assumption, group sizes should be kept equal
because the significance level is not affected as badly by the inequality of
covariance matrices. The equality of covariance matrices is tested with Bartlett-
Box homogeneity test for individual variable and with Box’s M test for all the
variables together. The test results for the equality of covariance matrixes are
shown in Table 6.

Table 6 Bartlett-Box tests for the equality of the covariance matrices.


Bartlett F F Chi2 Chi2
Variable Value DF1 DF2 Approx Prob Approx Prob
PE 59.877 1 33708 59.420 0.000 59.310 0.000
GEA 29.6714 1 33708 29.420 0.000 29.390 0.000
PB 1.5138 1 33708 1.500 0.221 1.500 0.221
FCPY 70.8414 1 33708 70.320 0.000 70.170 0.000
CPY 181.9493 1 33708 181.210 0.000 180.230 0.000
ROCE 177.8006 1 33708 177.060 0.000 176.120 0.000
EBITM 0.8097 1 33708 0.800 0.370 0.800 0.370
PS 16.1772 1 33708 16.030 0.000 16.020 0.000
Box's M 590.5288 36 37807 15.090 0.000 543.790 0.000

Since the significance levels are below 0.05 with all the variables except PB and
EBITM, they are assumed to have significantly unequal variances between the
groups. PB (0.221) and EBITM (0.370) have equal variances at statistically
significant level, so, they pass the test. The Box’s M test is significant at any
significance level with a probability of 0.000 indicating that the equality of

48
covariance matrices is not fulfilled. Tests for the covariance matrices and the
variances are affected by the non-normality of the variables.

5.5 Classification capability

Classification results were weak for basic sample with total classification rate of
56 % (Table 7). Classification is practically at the level of random guess, that
being 50 %. There are 25 cheap companies that are classified as expensive (type I
error) and 22 expensive companies that are classified cheap (type II error). The
error rates are 46 % and 41 % for types I and II. As discussed earlier, the
assumptions of multivariate normality and equality of variances were violated
severely which is sure to weaken the classification results.

Table 7 Classification count table for basic sample.


Predicted
Actual Group I Group II Total
Group I 32 22 54
Group II 25 29 54
Total 57 51 108 56 %

A Wilks’ Lambda is used to test the significance of the discriminant function as a


whole (Table 8). The Wilks’ Lambda is a direct measure of the proportion of the
variance in the combination of dependent variables that is not accounted for by
the grouping variable. The Wilks’ Lambda is about 0.87, which suggest that the
group means are not substantially different from each other. The F-value is an
approximation of the Wilks’ Lambda and the null hypothesis is that the groups are
not statistically different from each other. The null hypothesis is accepted at the
significance level of 0.05, thus the discriminant function is not statistically
significant. Although, the probability level of 0.0737 is not far away from being
significant. The square of the canonical correlation can be used as a practical
significance of the discriminant function. The squared canonical correlation
equals 0.1311 (Table 8), meaning, that about 13 % of the variation between the

49
two groups is account for by the discriminating variables, which appears to be
quite unimpressive.

Table 8 The significance of the discriminant function of the basic sample of 108 companies.
Canon Canon Numer Denom Prob Wilks'
Corr Corr2 F-Value DF DF Level Lambda
0,3621 0,1311 1,9 8 99 0,0737 0,868901

There is not an analytical way of defining how high is “high” with the practical
significance but it is similar to R-squared in multiple regression and it is used to
determine if the strength of the relationship is strong on a relative basis.

Frank et al. (1965) studied the biases in the discriminant analysis he suggested to
use split sample validation. The Sample is randomly divided into five distinct
subsets, each consisting of 14 cheap stocks and 14 expensive stocks. The
prediction will be performed for the remaining 80 companies based on the
coefficients acquired from the sub-sample of 28 companies. All the five
prediction accuracies (Table 9) are in line with the original classification rate of
56 %, the final sub-sample even outperforming the original rate.

Table 9 Split sample validation for the basic sample


Prediction t-value p-value
Sub 1 53 % 3.3 0.001
Sub 2 55 % 2.6 0.005
Sub 3 54 % 1.3 0.094
Sub 4 51 % 5.2 0.000
Sub 5 58 % 1.2 0.115

The t-values are based on a test of whether the proportion of correctly classified
cases in the sample is significantly different from the proportion that would be
obtained by change. The intuitively clear boundary is 50 % but the test takes into
account the number of observations in the prediction. According to Frank, the t-
value is biased into direction of showing greater prediction rates than there would
be among the whole population. But the magnitude of the bias decreases as the
sample sizes become larger.

50
The test values are calculated with equation

proportion _ correctly _ classfied − P


t= . (27)
P(1 − P)
n

P is the theoretical likelihood of belonging to each group, which is 0.5 in the two
group case. The number of observations predicted is n (80).

5.6 Variable influence section

This time, the Wilks’ Lambda test is used to test the statistical significance of
discriminant functions that consist of only one variable at a time and also to test
the affect on significance by removing one variable at a time from the full profile.
Variable influence section gives clues of the most and least discriminating
variables which are important especially when the ratio profile is not fixed. If the
stepwise procedure is used to try out different combinations of variables, variable
influence section gives guidelines for introducing and removing probabilities for
the variables. The variable influence results are shown in Table 10.

Table 10 Variable influence for the basic sample of 108 companies.


R-
Removed Removed Removed Alone Alone Alone Squared
F- Other
Variable Lambda F-Value F-Prob Lambda Value F-Prob X's
PE 0.985312 1.48 0.227325 0.998588 0.15 0.699459 0.123465
GEA 0.997176 0.28 0.597645 0.996909 0.33 0.567667 0.323457
PB 0.997792 0.22 0.640783 0.997094 0.31 0.579475 0.694694
FCPY 0.929438 7.52 0.007258 0.951233 5.43 0.021637 0.857286
CPY 0.958672 4.27 0.041452 0.97993 2.17 0.143597 0.859114
ROCE 0.989552 1.05 0.309091 0.99467 0.57 0.452727 0.019774
EBITM 0.969838 3.08 0.082405 0.99556 0.47 0.493228 0.327410
PS 0.980207 2 0.160534 0.987369 1.36 0.246839 0.752938

51
The alone statistical significances reveal that only FCPY is statistically significant
(0.021637) at the significance level of 0.05. The significance suggests that the null
hypothesis is rejected for the group centroids being equal. The second best alone
F-probability is 0.143597 by the CPY, but the null hypothesis remains in force.
The third most significant is the PS with a probability of 0.246839, but the rest of
the alone probabilities are a lot weaker, PE, PB and GEA being the weakest
discriminators.

The removed Wilks’ Lambda is computed to test the impact of removing the
variable from the profile. The impact is statistically significant with FCPY and
CPY at the significance level of 0.05, which applies to their strong alone F-
probabilities. EBITM is also close of being significant with a probability of
0.082405. If removed, PB and GEA have the weakest affect.

The last column in Table 10 is for R-squared value that would be obtained if the
variable in question is regressed on all other independent variables. Values higher
than 0.99 suggest severe multicollinearity among the variables. Removal is
advised for variables with such high R-squared values. The variables that
managed well in the two previous tests are now being the most correlated to the
rest of the variables with values of 0.857286 and 0.859114. The variables are
regressed on all the other variables but the cash flow ratios could be concluded to
cause most of the correlation to each other, as the R-squared values are almost
equal. The values are not alarming but the possibility of the linear combination of
one to others has to be taken into account. ROCE is the most independent variable
and PE is the second. The higher values with the cash flow ratios can be explained
by the similarities in the way of calculating the ratios and, in the end, all the
variables are connected through a balance sheet.

The standardized canonical coefficients (same as standardized discriminant


scores) are as an aid for the interpretation of the variates by showing the weight
given each variable in the construction of the score (Table 11). The standardized
coefficients are analogous to standardized beta coefficients in multiple regression

52
analysis. The cheap group has scores below zero and the expensive group above
zero.

Table 11 Standardized canonical coefficients for the basic sample.


FCPY CPY PS EBITM PE ROCE PB GEA
-1,96 1,51 0,78 -0,59 -0,36 0,29 -0,23 -0,18

It is interesting that the absolute biggest weight is on FCPY and the second
biggest on CPY, but they have different sign digit. PS weight of 0.78 and EBITM
of -0.59 are also relatively big but the rest of the weights are quite equal. The
correlation coefficients between the discriminant scores and the discriminator
variables assist to interpret the relative contribution each variable has on the
discrimination. The result matrix is called the structure matrix:

Table 12 Structure matrix for the basic sample.


FCPY CPY PS ROCE EBITM GEA PB PE
-0,58 -0,37 0,29 0,19 -0,17 -0,14 0,14 -0,10

The cash flow ratios are two most correlated variables with the discriminant
scores but their mutual relation is ambiguous because this time, both the
correlations have negative sign. The FCPY ratio suggests that the cheap stocks
have relative strong FCPY ratio because the cheaper the stock, the more negative
the discriminant score. High FCPY ratio can easily be accepted as a common
characteristic for a cheap stock. According to the standardized weights, the CPY
ratio could be interpreted as a decelerator for the FCPY ratio but the interpretation
is more complex as the negative correlation with discriminant scores is accounted
for. A careful guess is that the CPY decelerates the FCPY ratio a little but it
remains positive for the cheap stocks. The PS ratio points out its influence on the
discrimant scores with the third place in both the tables above. The positive
correlation with discriminant score suggest high PS ratios for the expensive
stocks, which is understandable as the low sales incur high PS ratio. Therefore,
high sales figures (thus low PS ratios) can be accepted characteristic for cheap

53
stocks. The conclusion is consistent with the much debated FCPY ratio as well,
because weak sales figure is not likely to cause strong FCPY.

The success for FCPY and CPY are repeated in the dichotomous classification test
for the basic sample (Table 13).

Table 13 Dichotomous classification rates of the basic sample of 108 companies.


Variable Group I Group II Overall
PE 17 % 81 % 49 %
GEA 33 % 63 % 48 %
PB 74 % 35 % 55 %
FCPY 52 % 76 % 64 %
CPY 24 % 87 % 56 %
ROCE 76 % 15 % 45 %
EBITM 35 % 67 % 51 %
PS 76 % 26 % 51 %

The overall classification rate is at the level of random guessing, with an


exception of FCPY, which classified 64% of the companies in the basic sample
correctly. There seems to be a trend of predicting expensive stocks more
accurately than the cheap ones. In an investing situation avoiding the type I error
is more important because investing into expensive stocks based on incorrect
classification can be disastrous. There are exceptionally high classification rates in
the group II by PE, FCPY and CPY. On the other hand, the overall classification
rates are evened out by classifying almost all the companies as expensive.

5.7 Prediction to hold-out sample

The prediction capability is tested with the hold-out sample of 27 cheap


companies and 27 expensive companies from the following years of 1997-1999.
The companies have the same selection procedure and the same market
capitalization limits than in the basic sample. As expected, the prediction rate is
weaker than the original classification rate reaching only to 44 % (Table 14). The
discriminant function managed to predict 15 out of 27 cheap companies while it
predicted only 9 out of 27 expensive companies. Thus, the type I error of 67 %

54
increased 21 percentage points but the type II error of 44 % increased only 3
percentage points. For practical point of view, the high type I error is hazardous
because 67 % of the expensive stocks are though as cheap. The explanation for
the type I error could be too short statistical distance between the groups while the
discriminant function is leaned towards the group I. Possible extreme points in the
basic sample might also affect the classification rate.

Table 14 Prediction results of the hold-out sample.


Actual Group I Group II Total
Group I 15 12 27
Group II 18 9 27
Total 33 21 54 44 %

5.8 Prediction to another sector

Yet another prediction is tried out to another sector in order to assess the universal
prediction capability. The sector is non-cyclical consumer goods sector in the U.S.
and it is formed by a similar procedure than the other samples. The non-cyclical
sample consists of 60 companies, 30 cheap ones and 30 expensive ones, 10 for
each year in 1991-1996. The prediction rate is what was expected; 43 % with
error rates 69 % and 73 %. Interpretation of the rates below the random rate of 50
% is quite useless but the prediction rate can be compared to the internal
classification rate of the non-cyclical sector. A discriminant function is formed
from the 60 companies in the non-cyclical sector and the similar 8 explanatory
variables are included. The quick formation results 67 % discrimination between
the groups and it is cross-validated to 50 %.The inter-sector prediction failed as
badly as the prediction to the hold-out sample, although the non-cyclical sector
could be discriminated more accurately. The estimated discriminant function
coefficients for the basic sample are in Table 15.

Table 15 Discriminant function for the basic sample.


Constant PE GEA PB FCPY CPY ROCE EBITM PS
0,8836 -0,0045 -0,0014 -0,0850 -9,6840 2,3117 0,0040 -0,0923 0,4616

55
6 Sensitivity analysis

The predictive accuracy obtained from the unprocessed data was weak. The
assumptions of normality and equality of covariance matrices were severely
violated and, next, the data will be processed to fulfill the assumptions, at least
approximately. The sensitivity chapter focuses on the classification and prediction
rates as the variables approach normal distribution and as different ratio profiles
are tried. The procedure is more or less trial and error like because of the endless
possibilities of executing the analysis.

6.1 Excluding outliers

There are opponing viewpoints on whether it is wise or not to remove extreme


points from the data. The grounds for and against the exclusion is not studied in
this Thesis; the focus is on the effects of the exclusion on the classification and
prediction assignments. The variable normality is approached by excluding
outliers one by one. Outliers are excluded by observing the multivariate
probability plot and individual probability plots at the same time. In many cases,
the most extreme univariate outliers turned out to be multivariate outliers as well.
Research has shown that the significance level is not appreciably affected if the
group sizes are equal (Sharma 1996), even if the covariance matrices are not
equal. Every effort should be made in order to keep the group sizes equal, they
claim. Total of 25 outliers are excluded based on our graphical assessment. The
outliers count only as 18 companies, as there are multidimensional outliers. The
current exclusion is fit, because same amount of companies is removed from both
groups (9) and because the sample size is not reduced too much. The original
sample is reduced by 17 % and from now on, the sample is called the trimmed
sample.

56
As expected, the overall variances have dropped dramatically with almost all the
variables (Table 16), as the outliers were excluded. FCPY, CPY and ROCE
experienced the heaviest drops in the variance of more than 230 percentage points
and PE following with 190 percentage points. The lightest drop was with EBITM
of only 3.6 percentage points.

Table 16 Coefficients of the overall variance development.


Variable CV 108 Basic CV 90 Trimmed
PE 2.473 0.512
GEA 2.455 1.951
PB 0.911 0.478
FCPY 4.201 1.889
CPY 3.604 0.785
ROCE 3.405 0.528
EBITM 0.572 0.536
PS 1.317 0.862
MCAP 0.759 0.717

6.1.1 Normality

The exclusion of the 18 companies made four out of 16 groups of variables


normally distributed with a significance level of 0.05 according to Wilks-Shapiro
test (Table 17).

57
Table 17 Wilks-Shapiro variable normality test for the trimmed sample.
Test Prob Decision
Variable Value Level 5%
PE Group I 0.789 0.000001 Reject Normality
PE Group II 0.924 0.005698 Reject Normality
GEA Group I 0.878 0.000208 Reject Normality
GEA Group II 0.984 0.796612 Accept Normality
PB Group I 0.862 0.000077 Reject Normality
PB Group II 0.932 0.011279 Reject Normality
FCPY Group I 0.875 0.000171 Reject Normality
FCPY Group II 0.961 0.132815 Accept Normality
CPY Group I 0.821 0.000007 Reject Normality
CPY Group II 0.976 0.478296 Accept Normality
ROCE Group I 0.988 0.915369 Accept Normality
ROCE Group II 0.943 0.028268 Reject Normality
EBITM Group I 0.837 0.000017 Reject Normality
EBITM Group II 0.930 0.009540 Reject Normality
PS Group I 0.824 0.000008 Reject Normality
PS Group II 0.755 0.000000 Reject Normality

GEA ratio in group II and ROCE in group I achieved strong normality after the
exclusion of the outliers, with probability levels of 0.79 and 0.91. CPY and FCPY
in group II achieved probability levels of 0.47 and 0.13. Other ratios close to
being accepted are PE, PB and ROCE in group II. The PS ratio in group 2 is the
only one of them to remain at the probability level zero. The variable normality
improved comfortably on a univariate basis.

The multivariate normality is assessed with the help of the Chi square to
Mahalanobis distance plot, again (Figure 5).

58
Figure 5 Chi Square plot for the trimmed sample.

Based on visual examination of the Chi square plot, the multivariate normality
cannot be accepted. The plot could be seen as partly linear if it would be cut in
two at around Mahalanobis distance of 7. The first half is steeper than the second
half. The values are concentrated in to the small end The correlation between the
Mahalanobis distances and the Chi square values is 0.934, which is much higher
than the 0.740 for the basic sample. Yet, the correlation is smaller than the critical
value 0.985 and the plot is not linear. The normality assumption was rejected but
the plot got better in the sense of linearity.

6.1.2 Equality of covariance matrixes

The equality of covariance matrices was reasoned earlier to be dependent of the


normality of the variables. Now that normality assumption is a little closer to
being satisfied, the equality of the covariance and variances is tested again with
the Bartlett-Box test. The null hypothesis that the inside-group variances are
similar is accepted for GEA, PB, CPY, ROCE, and for EBITM (Table 18). If the

59
range between 0.01-0.05 is considered having marginally equal variances between
groups, PE has marginally equal variances. PS is also close to having marginally
equal variances with the value 0.009. FCPY remains at the probability level of
zero which makes it the number one suspect for the rejection of the Box’s M test.
The F approximation of the test was rejected with significance level of 0.05 but
the value, 0.006, rose promisingly from the zero. As discussed earlier, the equality
of the covariance matrixes was found to improve the type I error rate.

Table 18 Bartlett-Box tests for the equality of the covariance matrixes for the trimmed
sample.
Bartlett F F Chi2 Chi2
Variable Value DF1 DF2 Approx Prob Approx Prob
PE 4.0201 1 23232 3.980 0.046 3.970 0.046
GEA 0.6414 1 23232 0.630 0.426 0.630 0.426
PB 0.0112 1 23232 0.010 0.916 0.010 0.916
FCPY 13.4873 1 23232 13.340 0.000 13.330 0.000
CPY 2.9138 1 23232 2.880 0.090 2.880 0.090
ROCE 1.0398 1 23232 1.030 0.311 1.030 0.311
EBITM 0.0599 1 23232 0.060 0.808 0.060 0.808
PS 6.8076 1 23232 6.730 0.009 6.730 0.009
Box's M 67.6068 36 26057 1.700 0.006 61.160 0.006

A quick trial of the Box’s M test without FCPY revealed the fact that without it,
the equality of covariance matrices is accepted with F-probability 0.055. With the
original basic sample, the coefficients of variance for FCPY were 2.918 for group
I and 20.179 for group II. In the trimmed sample, the coefficients were 1.431 and
2.873, respectively. The gap between the coefficients of variance narrowed, but
compared to group 1, the variance is twice as big in the group two. To speak for
the FCPY ratio, it was the only one to reject the null hypothesis of the group
centroids being equal, earlier with the basic sample, when it was considered alone.
The FCPY ratio is strong but temperamental predictor.

6.1.3 Classification capability

The classification capability improved 10 percentage points to 66 % (Table 19).


The type I error is 33 % and type II 36 % compared to 46 % and 41 %, prior to the

60
exclusion of the outliers. The classification results improved from the level of
random guessing to the level of being indicative.

Table 19 Classification count table for the trimmed sample.


Actual Group I Group II Total
Group I 29 16 45
Group II 15 30 45
Total 44 46 90 66 %

The exclusion of the extreme points from the basic sample has a moderate effect
on the classification rate. The improvement is understandable, because such an
extreme values easily biases the discriminant function into direction or another.
High within-group variance is very harmful in discrimiant analysis. The
differences should be between the groups.

6.1.4 Variable influence

FCPY and CPY ratios are significant predictors if used alone. They both achieved
statistically significant discriminant functions with probabilities 0.004 and 0.008
(Table 20). On the other hand, they are easier to regress with other variables than
the rest of the variables, except the PS ratio. The multicollinearity situation is
changed because earlier, the R squared values were equal for the tow cash flow
ratios. EBITM and GEA were the ratios that weakened as alone predictors but the
rest of them had some improved. The rank 1 alone classifier, FCPY, is still the
best alone predictor. However, FCPY’s affect of removal decreased the previous
0.007 to 0.339. According to this test, ROCE achieved the top position in the list
of ratios not to be removed from the full ratio profile.

61
Table 20 Variable influence section for the trimmed sample of 90 companies.
R-
Removed Removed Removed Alone Alone Alone Squared
Variable Lambda F-Value F-Prob Lambda F-Value F-Prob Other X's
PE 0.963179 3.1 0.082236 0.997806 0.19 0.66109 0.437305
GEA 0.987205 1.05 0.30859 0.999454 0.05 0.826927 0.380058
PB 0.986721 1.09 0.299554 0.985097 1.33 0.2517 0.432491
FCPY 0.988714 0.92 0.339121 0.911228 8.57 0.004343 0.635176
CPY 0.992533 0.61 0.437293 0.922106 7.43 0.007725 0.718753
ROCE 0.96662 2.8 0.098289 0.987987 1.07 0.303772 0.360985
EBITM 0.987048 1.06 0.305624 0.999992 0 0.979479 0.616409
PS 0.975639 2.02 0.15882 0.978474 1.94 0.167615 0.725707

6.1.5 Out-of-sample prediction capability

Hold-out sample prediction is supposed to remind real decision making situation


and, thus, the normality or equality of covariance matrices are not checked. Hold-
out samples are rarely processed in the earlier studies, either. The prediction rate
of the trimmed sample to the hold-out sample reveals more surprising results
(Table 21). The overall prediction rate jumped from the noninformative 44 % to
61 %. The prediction rate is not high enough to be used in practice but the
improvement is promising. The type I error is 48% and the type II error is credible
30 %. The trend of leaning towards the group I seems to exist in the trimmed
sample as well as earlier with the original sample. The lean might also be caused
by the differences in the ratios in general because the hold-out sample is after the
trimmed sample in the time span.

Table 21 Prediction count table of the trimmed sample to the hold-out sample.
Predicted
Actual Group I Group II Total
Group I 19 8 27
Group II 13 14 27
Total 32 22 54 61 %

The impacts of removing the 18 companies with extreme characteristics, turned


out to be beneficial for the classification and prediction rates. The research

62
continues with both the sample; the original basic sample (108 observations) and
the trimmed sample (90 observations) in order to test the variable transformations
and ratio profiles on them. The trimmed sample has a lead over the basic sample
because of the significant improvement in the classification and prediction rates.

6.2 Variable transformations

There is a great variety of mathematical transformations for achieving variable


normality. Variable can be multiplied, squared, raised to power, converted to
logarithmic scale, invested and etc. For counts, the most common transformation
is the square root transformation. Data transformations should be utilized only
with a clear reason to do so and to do less is to decrease the chance of drawing
incorrect conclusions.

The impacts of the three basic transformations are examined in this regard; square
root, logarithmic, and inverse transformation. Square root transformation is the
most common transformation and it is the most suitable for counts. Square root
transformation can be used to distributions that look quite like the normal
distribution but which are skewed to the left. In other words, the distribution has
relatively large number small values. As the square cannot be taken from the
negative values, a constant is be added to the values to makes them all positive.
Besides that, square root is highly nonlinear with values between zero and one, so,
the minimum should be set higher than that. The logarithmic transformation is
commonly used for proportions, but with counts, the transformation can be used
similarly to the square root transformation but when the distribution leans left
exceedingly. The base for the logarithmic transformation changes the nature of the
transformation and natural constant is deemed suitable for the variable values in
the Thesis. Greater bases for the logarithmic transformations, such as 10 and 100,
will result in a loss of resolution with smaller values. The logarithm cannot be
taken from negative values either; a constant must be added to level up the values.
The inverse transformation makes very small values very large and very large
numbers very small. This transformation has the effect of reversing the order of

63
the scores and therefore, the values must be reversed by multiplying them by -1
and by adding a constant to bring the minimum back to above 1.0. The inverse
transformation is suitable for variables whose distribution looks like a steep
leftward slope. That means, that there are large number of small values and the
number of bigger values decrease rather linearly. The logarithmic transformation
works also for data that has growing residual as the value of the variable grows.
The three transformations were introduced in the order of their power, starting
from the weakest. According to the guidance of literature (Osborne 2002), all the
variables are anchored to minimum of 1.0.

6.2.1 Transformations and the basic sample

All the variables are first being transformed to perform an exploratory analysis.
Univariate normality, multivariate normality, as well as the classification rates are
in focus. The probabilities of the F-test indicate that there are three variable
groups significantly normal after square root transformation, five after logarithmic
transformation and five as well, with the inverse transformation (Table 22). The
probabilities without the transformations are very close to the zero because of the
outliers and they improved slightly in consequence of the transformations. On the
other hand, the classification rates improved way above the original level of 56 %.
The classification rate improves as the transformation gets stronger, the inverse
transformation resulting creditable 9 percentage point improvement to 65 %. The
prediction rates improve quite linearly with the classification rates but even the
highest rate of 54 %, by the inverse transformation, is too close to the random
guess rate to be considered noteworthy.

64
Table 22 Wilks-Shapiro normality test after transformations for the basic sample.
Original SQRT LN INV
PE Group I 0.000000 0.000000 0.000006 0.000000
PE Group II 0.000000 0.000000 0.000042 0.027550
GEA Group I 0.000000 0.000000 0.000161 0.181271
GEA Group II 0.000080 0.058680 0.908347 0.001157
PB Group I 0.000000 0.000004 0.015348 0.000016
PB Group II 0.000000 0.000000 0.000647 0.286330
FCPY Group I 0.000000 0.000000 0.000000 0.000000
FCPY Group II 0.000021 0.000007 0.000003 0.000000
CPY Group I 0.000000 0.000000 0.000000 0.000000
CPY Group II 0.006998 0.020684 0.055728 0.275222
ROCE Group I 0.000221 0.027411 0.394959 0.025635
ROCE Group II 0.000000 0.000000 0.000000 0.000000
EBITM GROUP I 0.000012 0.005744 0.560079 0.000628
EBITM GROUP II 0.000218 0.156928 0.099314 0.000000
PS GROUP I 0.000000 0.000001 0.000539 0.847031
PS GROUP II 0.000000 0.000000 0.000002 0.057886

Classification Rate 56 % 59 % 62 % 65 %
Prediction Rate 44 % 46 % 50 % 54 %

The prior objective of the analysis of the transformations is to achieve higher


classification rates through the implicit goal of improving the variable normality
and the equality of the covariance matrices. Based on the prior objective, a test
profile can be formed to represent the highest normality probabilities on a
univariate basis. The test profile consists of inversed PE, PB, CPY and PS,
logarithmic GEA, ROCE and EBITM , and plain FCPY, according to the
probabilities from the Wilks Shapiro test. The multivariate normality was earlier
assessed with the correlation between the Mahalanobis distances and the
corresponding Chi square values. For the test sample, the correlation is 0,822539.
The correlation is better than the original 0.740 but the multivariate normality is
rejected, the critical value being 0.987 (Sharma 1996:466), n=100 and alpha=0.05.
The test profile yields classification rate of 61 % which outperforms the crude
square root transformation. The logarithmic and the inverse transformations result
higher rates, and the process of deciding on the transformations based on their
univariate normality is found useless. The multivariate normality correlations was
initially 0.740 and it improves to levels 0.769, 0.841 and 0.833 for the
transformations in the order of their power (sqrt, ln and inv). The assumption of

65
the multimultivariate normality is rejected at for the three transformations, as the
critical value is 0.987.

The F approximations for the Box’s M test remained at the zero level, thus none
of the crude transformations improved the equality of the covariance matrices. As
it was discussed earlier, the assumption is very sensitive to the outliers and
multivariate normality.

Since the classification rates improved by the slight improvements with the
normality assumptions, a simple sensitivity analysis is enforced, in order to
examine the effects of transforming one variable at a time. There is only one
transformation with a negative effect on the classification rate; squared ROCE
(Table 23). The transformations of the GEA ratio turned out to strengthen the
classification rate as much as 8 % at its best, by the logarithmic transformation.
The inverse transformation improves the most the classification and the squared
and logarithmic transformations cause rather equal improvements.

Table 23 Classification rate: one transformation at a time.


SQRT LN INV
PE 2% 0% 1%
GEA 2% 8% 7%
PB 2% 2% 5%
FCPY 1% 1% 0%
CPY 4% 4% 6%
ROCE -1 % 2% 3%
EBITM 4% 4% 3%
PS 1% 2% 4%

The main conclusion is that any of the variable transformations is beneficial for
the classification rate of the basic sample. The improvement might stem from
smoothing of the extreme distances between the observations by the
transformations.

Dichotomous classification rates are also of interest and despite complex


relationship with multivariate classification, they give hints of the discriminating

66
power of the variables. The three transformations are tested on the eight variables
and the dichotomous classification rates are compared to the original rate. There
are three obtrusive improvements among the dichotomous classification rates; 12
and 10 percentage points with the PE ratio and the 11 percentage points with the
ROCE ratio (Table 24). The three improvements are notable but the levels reached
are still weak. The logarithmic transformation upraises PE to the rate of 61 % and
ROCE to the original level of 56 % that the multivariate case resulted. FCPY is
originally the strongest classifier but it weakens as it is being transformed. The
effects of the transformations vary quite a lot but the logarithmic transformation is
the only one not to decrease the rates.

Table 24 Dichotomous classification rates after transformations.


Original SQRT LN INV
PE 49 % 7% 12 % 10 %
GEA 48 % -1 % 7% 9%
PB 55 % 3% 3% 0%
FCPY 64 % -1 % 0% -4 %
CPY 56 % 5% 2% 0%
ROCE 45 % -4 % 11 % 6%
EBITM 51 % 3% 0% 1%
PS 51 % 4% 5% 0%

The number of combinations that can be formed with eight variables and four
choices for each is too large to roam through manually. Yet, numerous
combinations were tried out and they did not results significantly higher
classification rates than the 65 % that was achieved by the comprehensive inverse
transformation. Even if a few percentage points were achieved, it was not
logically achieved according to all the results depicted earlier. Therefore,
transformations are considered useful for increasing the classification rate with the
basic sample, which includes univariate and multivariate outliers. The inverse
transformation was the most beneficial for the classification and prediction rates.
Although the literature stresses that the variable transformations must be well
reasoned, a simple crude trial of inverse transformation yielded highest results.

67
6.2.2 Transformations and the trimmed sample

As with the basic sample earlier, all the variables are first being transformed in
order to see what happens. The univariate normality and the multivariate
normality, as well as the classification and prediction rates are examined. Wilks’
Shapiro test and the classification rates are shown in Table 25. The last row of the
table reveals the fact that if all the variables were transformed, none of the
classification rates improved compared to the starting point of 66 %. Same goes
with the prediction rates; none of them improved. The exclusion of the outliers
made four out of 16 sixteen variable groups normally distributed at the
significance level 0.05. Taking the square root from the variables kept the same
four variable groups at an acceptable level and additional three variable groups are
also accepted as being normally distributed. The number of normally distributed
variable groups is10 and 6 for the logarithmic and inverse transformations,
respectively. The variable normality is tested by groups but if a transformation is
accepted, both groups of the variable have to be transformed.

Table 25 Wilks-Shapiro normality test after transformations for the trimmed sample.
Variable Trimmed SQRT LN INV
PE Group I 0.000001 0.000453 0.057554 0.000013
PE Group II 0.005698 0.141455 0.347929 0.002027
GEA Group I 0.000208 0.026774 0.614070 0.274891
GEA Group II 0.796612 0.982204 0.473473 0.000139
PB Group I 0.000077 0.003091 0.087513 0.590774
PB Group II 0.011279 0.159392 0.639762 0.385379
FCPY Group I 0.000171 0.000618 0.002124 0.018725
FCPY Group II 0.132815 0.097257 0.068865 0.031599
CPY Group I 0.000007 0.000021 0.000063 0.000534
CPY Group II 0.478296 0.597056 0.702221 0.840173
ROCE Group I 0.915369 0.538298 0.105115 0.000341
ROCE Group II 0.028268 0.000220 0.000000 0.000000
EBITM GROUP I 0.000017 0.004104 0.312850 0.000624
EBITM GROUP II 0.009540 0.453685 0.038543 0.000000
PS GROUP I 0.000008 0.000368 0.015145 0.802802
PS GROUP II 0.000000 0.000013 0.000615 0.123910

Classification rate 66 % 61 % 62 % 59 %
Prediction rate 61 % 56 % 57 % 59%

68
Now that the effects of the transformations on the normality are excluded, a test
sample can be formed by taking a transformation based on the best sums of the F-
test probabilities in Table 25. Then, the test sample would the most normal on a
univariate basis, and the effects on the classification rate can be observed. The test
sample consists of logarithmic PE, GEA and EBITM, inversed PB, CPY and PS
and plain FCPY and ROCE. The test sample classifies correctly 62 % of the
observations. Now there are only four variable groups that are not normally
distributed at the significance level 0.05, yet, two of them are close, the
logarithmic EBITM ratio in group 2 with probability of 0.038543 and the ROCE
ratio in the group II with probability of 0.028268. The weakest two are the FCPY
ratio in group I (0.000171) and the inverted CPY ratio in group I (0.000534). The
correlation between the Mahalanobis distance Chi square plot is 0.9686, which
suggests that the multivariate normality is rejected when compared to the critical
value of 0.985 (Sharma 1996:466 ). Nonetheless, the correlation improved from
the earlier 0.934 for the trimmed sample. In general, both the univariate and the
multivariate normality assumption are very close to being met. The classification
rate of the test sample weakened four percentage points because of the
transformations. Therefore, the method of selecting variable transformations
based univariate probabilities from the Wilks-Shapiro normality test, is considered
useless

The equality of the covariance matrices did not improve earlier after transforming
the variables in the basic sample. The F approximation of the Box’s M for the
trimmed sample is 0.006 and it improves slightly after square root transformation
(0.026) and after the logarithmic transformation (0.007) but fades after the inverse
transformation (0.000). Therefore, the assumption for the equality of the
covariance matrices was not met with the three transformations.

A simple sensitivity analysis of trying out one variable transformation at a time


for each variable in the trimmed sample resulted equal or weaker classification
rates compared to the 66 % with the non-transformed ratio profile (Table 26).

69
According to the classification results, there are no grounds for any of the
transformations taken in the test sample.

Table 26 Classification rates after transforming variables one at a time.


SQRT LN INV
PE 0% -2 % -4 %
GEA -3 % -3 % -2 %
PB -2 % -4 % -3 %
FCPY 0% -2 % 0%
CPY 0% -2 % -2 %
ROCE -3 % -4 % -6 %
EBITM -2 % -2 % -5 %
PS -4 % -5 % 0%

The transformations did not improve the classification rate on a dichotomous


basis. The drops of 8 % with PE and 6 % with ROCE and EBITM, can be
considered harmful. The rest of the alterations in the classification rate are rather
marginal. The interesting thing is that the why FCPY and CPY maintain their
classification rate even if they are transformed? The Bartlett’s test for equality of
variances points out that the FCPY is furthest away from the equality (Table 18).
CPY was ranked fifth, respectively. This intimates that the violation of equal
variances does not affect the classification rate, at least not very obviously. The
comparison of the group I mean values to group II mean values reveals the fact
the FCPY has the biggest gap (72 %) and the CPY (36 %) the second biggest,
compared to the rest of the gaps from (0.29 %-28 %). The values of the FCPY
with such a wide gap between the means makes it a strong classifier, even if
transformed.

Table 27 Effects of the transformations on the dichotomous classification rate.


Variable Trimmed SQRT LN INV
PE 63 % -2 % -4 % -8 %
GEA 56 % 1% 1% -1 %
PB 59 % -1 % -1 % -2 %
FCPY 63 % 0% 0% 0%
CPY 53 % 0% 1% 3%
ROCE 56 % -1 % -6 % -3 %
EBITM 52 % -4 % -6 % -3 %
PS 56 % 0% 1% -3 %

70
The variable transformations did not improve the discriminating power (Table 27)
of the analysis even though the normality assumptions improved. The normality
and the equality of the covariance matrices assumptions do not seem very strict
with discriminant analysis. The assumptions must be looser because the ultimate
output is binary variable instead of continuous variable.

6.3 Superior ratio profile

There are 255 distinct combinations as there are eight variables in the
consideration. The purpose of this section is to strive for higher classification and
prediction rates by trying out the different combinations. The three samples are;
the basic sample of 108 companies, both original and inverse transformation
versions and trimmed sample of 90 companies. The inclusion of the three samples
enables a comparison of the classification and prediction rates after the two data
processing techniques, outlier exclusion and variable transformation.

The gearing ratio was discussed with the instructor and the conclusion was to try
it out at first but also, to substitute the ratio with a binary variable. The dummy
variables are encouraged to be used with logit models rather than with
discriminant analysis. Since there is not a clear reason not to use dummy
variables, GEA MEDIAN is introduced. The GEA MEDIAN simply divides the
companies in the sample in two, half of them being geared and the other half free
from debt. The reason for the substitute ratio is that slight differences in the
gearing ratios do not affect the “goodness” of the companies and they are
considered equal. The companies that are heavily geared are, for their part,
considered worse (different) that the rest of the companies. In other words, the
additional information that the continuous gearing ratio offers was questioned and
the simpler division was introduced. The new variable serves as a substitute for
the original GEA ratio.

71
The trials of substituting the original GEA ratio with the GEA MEDIAN ratio
proved to improve the classification and prediction rates enough to approve the
substitution to take place (Table 28). Both the inversed basic sample and the
trimmed sample achieved 71 % which is 15 percentage points higher than the
original rate. The cell size can be considered to affect the creditability of the
results. Therefore, the 71 % for the inversed basic sample is more reliable than the
trimmed sample because the trimmed sample is 17 % smaller. Surprisingly, the
prediction rate of the trimmed sample fell as the GEA MEDIAN substitute took
place. So far, the inversed basic sample is deemed having the best discriminating
capability.

Table 28 GEA MEADIAN substitute; classification and prediction rates.


ORIGINAL GEA GEA MEDIAN
Classification Prediction Classification Prediction
Basic 56 % 44 % 65 % 46 %
Inversed basic 65 % 44 % 71 % 61 %
Trimmed 66 % 61 % 71 % 57 %

The most discriminating ratio profile for the basic sample is achieved without
ROCE and EBITM; thus PE, GEA MEDIAN, PB, FCPY, CPY and PS are
included. The classification rate is 69 % while the error rates 33 % for type I and
28 % for type II. The new ratio profile improved the prediction rate by 13
percentage points to 59 % and the prediction errors are 30 % and 52 % for type I
and type II. As can be seen in Table 28, higher rates can be achieved by outlier
exclusion and variable transformation.

The trials of different variable combinations for the inversed basic sample did not
yield higher classification rates than the 71 % with the full ratio profile. Equal
classification rates can be achieved with leaving PE, PB or ROCE out, one at a
time, or by leaving ROCE and EBITM out together. The four additional ratio
profiles have more type I leaning error distribution, so, the original full profile is
chosen for further examination. The errors are rather uniformly distributed; 28 %
for type I and 30 % for type II and the classification is strongly supported by the
cross-validation of 67.6 % correct classifications. The cross-validation is leave-

72
one-out of a kind, in which each observation is classified by the functions derived
from all cases other than that case. Although the inversion is based on a trial
rather than well justified variable normality approach, it yielded the second
highest classification rate and the highest prediction rate.

The highest classification rate for the trimmed sample is 73 % and it is achieved
by two different combinations, but the one with the higher type I error is
discarded. The ratio profile excludes EBITM and PS ratio, similarly with the basic
sample, leaving PE, GEA MEDIAN, PB, FCPY, CPY and ROCE as explanatory
variables. The type I error is 22 % while type II error is 31 % and the rate of 73 %
is cross-validated as 61 %, by the leave-one-out type of cross-validation. The
cross validation suggests that chance had a little to do with the high classification
rate. The exclusion of the EBITM and PS ratio did not improve the prediction
rate, which remained at the 57 %. The prediction errors were 37 % and 48 %.

6.3.1 The proposed models for classifying and predicting

The trimmed sample with the GEA MEDIAN substitution and EBITM and PS
exclusion resulted the highest classification rate in this Thesis; 73 %. The highest
prediction rate was achieved by the inversed basic sample with GEA MEADIAN
substitution; 61 %. The two samples are now called the ultimate classifier and the
ultimate predictor. Statistical and practical significance comparison of the ultimate
samples is gathered to Table 29.

Table 29 Significances of the ultimate functions.


Canon Canon Prob Wilks'
Corr Corr ^ 2 Level Lambda
The ultimate classifier 0.4123 0.17 0.0147 0.83001
The ultimate predictor 0.4362 0.1903 0.0059 0.809717

As expected, the null hypothesis can be rejected at an alpha level of 0.05,


implying that in both cases, the two groups are significantly different, respect to
the explanatory variables taken jointly. Even though the discriminant functions

73
are statistically significant, the difference between the groups might not be large,
especially with large sample sizes. The practical significance of the contenders is
measured by the square of the canonical correlation, which is equal to the share of
the between group variance from the total variance. The square of the canonical
correlation of analogous to the R-squared in multiple regression, hence, it is the
strength of the discriminant function. The ultimate predictor sample is two
percentage points higher than the ultimate classifier in the squared canonical
correlation, but the 19 % cannot be considered impressive. There is a slight
improvement from the 13 % that was achieved by the original basic sample.

Standardized coefficients are normally used for assessing the relative importance
of discriminator variables forming the discriminant function. Table 30 exhibits the
coefficients for the two ultimate samples as well as for the basic sample with GEA
MEDIAN to comparative purposes.

Table 30 Standardized coefficients for the ultimate functions.


Basic Sample The Ultimate Classifier The Ultimate Predictor
FCPY -1.56 CPY 0.66 INVERSED PE -0.90
PS 1.13 GEA MEDIAN -0.50 INVERSED CPY -0.83
CPY 1.08 ROCE 0.49 INVERSED PS 0.70
EBITM -0.52 PB -0.43 GEA MEDIAN 0.67
PB -0.48 PE 0.43 INVERSED FCPY -0.59
GEA MEDIAN 0.48 FCPY 0.30 INVERSED ROCE -0.32
PE -0.37 INVERSED EBITM -0.30
ROCE 0.24 INVERSED PB 0.01

The existence of the multicollinearity is pretty evident because of the varying


relative importance between the samples. For example, FCPY ratio is the most
important variable in the basic sample but in the ultimate classifier sample
(outliers excluded); it is the least important contributor. Similarly, PE ratio is the
most important in the ultimate predictor sample compared to the second last
position in the two other samples. A conservative conclusion could be made for
the importance of the cash flow ratios jointly for the high ranking of CPY ratio
and varying ranking for FCPY ratio. Strong inferences regarding the importance

74
of the discriminator variables should be avoided because of the multicollinearity
present in the variables.

Another way to rank the discriminator variables is to compare the correlation


coefficients between the discriminant score and the discriminator variable, which
vary from +1 to -1. A correlation close to either end of the range indicates high
communality between the discriminator variable and the discriminant score. The
structure matrices of the two ultimate samples and the basic sample with GEA
MEADIAN variable substitute are compared below.

Table 31 Comparison of the structure matrices.


Basic sample The ultimate classifier The ultimate predictor
FCPY -0.54 FCPY -0.69 INVERSED FCPY -0.61
CPY -0.34 CPY -0.64 INVERSED CPY -0.55
PS 0.27 GEA MEDIAN 0.35 INVERSED PE 0.36
GEA MEDIAN 0.27 PB 0.27 INVERSED ROCE -0.29
ROCE 0.17 ROCE -0.24 INVERSED PS 0.28
EBITM -0.16 PE 0.10 INVERSED PB 0.28
PB 0.13 GEA MEDIAN 0.23
PE -0.09 INVERSED EBITM -0.22

The structure matrices speak for the conservative notion suggested earlier for the
importance of the cash flow ratios. FCPY ratio possesses the highest and CPY the
second highest correlations with the discriminant scores among the variables all
the three samples in Table 31. Although the variables are inversed in the ultimate
predictor sample, the order is same for the FCPY and CPY ratio as in the non-
inversed samples. Compared to a value of 0.50, which is sometimes considered as
a cut-off value by researchers, the cash flow ratios are the only ones to be
seriously correlated. The conclusion is that, in this regard, the cheap stocks have
stronger cash flow ratios than the expensive stocks. PE ratio is the least correlated
with the discriminant score, until it is transformed, which makes it third most
important variable. Furthermore, the GEA MEDIAN ratio slips from being
mediocre in the basic and the ultimate classifier sample to second last in the
ultimate predictor sample. The ambiguities in the rankings after variable

75
transformations suggest that rigorous conclusion cannot be drawn from the
composition of the discriminant scores.

The two ultimate classifiers are yet being tested with the non-cyclical sector. The
prediction rates to the external sample of 60 companies did not yield reasonable
prediction rates; 43 % for the ultimate classifier and 43 % for the ultimate
predictor. The conclusion is that the prediction models formed in this Thesis are
not capable of out-of-sample prediction, even though the external sample can be
discriminated more accurately with the existing variables.

The estimated variable coefficients for the two ultimate functions:

Table 32 Two ultimate discriminant functions.


The ultimate classifier The ultimate predictor
GEA MEDIAN -1,001 INVERSED PE 8,454
PE 0,040 INV ERSED PB -0,050
PB -0,347 INV ERSED FCPY 8,281
FCPY 3,889 INV ERSED CPY 9,165
CPY 7,782 INV ERSED ROCE 22,978
ROCE 0,067 INV ERSED EBITM 2,963
Constant -1,407 INV ERSED PS -3,579
GEA MEDIAN -1,330
Constant -33,961

76
7 Discussion and conclusions

7.1 Evaluation of tested approaches

This Thesis examines the relationship between stock valuation and financial
ratios. In the beginning, the classification rate of the discriminant analysis was 56
% which is too weak to be of use in practice. Later on, as the data was
transformed, various ratio combinations analyzed and one new ratio introduced, a
classification rate of 73 % was achieved. This analysis suggests that the valuation
level of a stock can be defined from combinations of financial ratios.

The sample forming procedure used in the Thesis is built on balanced assembly.
The same amount of companies where gathered for each year and for the both
groups. The major limitation in the procedure lies within the valuation phase. As
the companies are valuated according to the future cash flows from the following
seven years, the analysis lags seven years behind the present time. Shortening the
time span was thought to cause volatility for company valuation. Therefore, the
whole valuation procedure should be redesigned if the time span is shortened.
Besides the current valuation method, the sample size is also limited by the
market capitalization restriction of $300m-$3bn and by the exclusion of
companies with incomplete history data. Size and quality of history data was poor
before the 90´s. The sample size should be greater in order to gain wider
separation between the two groups and more intervening companies could be
excluded.

Classification and prediction rates improved as the data was harmonized by


excluding outliers or by variable transformation. As the original sample was
trimmed 17 % by excluding multivariate and univariate outliers, the analysis was
able to classify 66 % of the observations correctly. Similar improvement was
achieved by transforming the variables by the basic transformations; square root,

77
logarithmic and inverse. An optimal transformation for each of the variables was
assessed based on trials, tests and histograms, but in the end, the crude inverse
transformation turned out to outperform all the other alternatives.

By inversing all the variables, the classification rate improved to the level of 65
%. Thus, variable transformations and outlier exclusion are concluded to
substitute rather than complement each other in data harmonization. The
substitution of the gearing ratio by a binary variable, GEA MEDIAN, further
improved the discriminating capability. The entrant variable simply divided the
gearing ratios in half, which improved the classification rates to the level of 71 %
for the both the trimmed and the inversed sample.

Finally, various ratio combinations were analyzed and a classification rate of 73


% was achieved. The corresponding prediction rate to the hold-out sample is 57
%. The inversed basic sample maintained the second highest classification rate of
71 % and the highest prediction rate of 61 % to the hold-out sample. The
prediction rates were further assessed with a sample from another sector which
resulted 43 % for both the functions. If the classification rate is deemed
satisfactory, the prediction capability of the formed functions is faint.

The conclusion is that variable transformations and outlier exclusion are useful
tools for approaching multivariate normality and better discrimination results.
Variable transformations yielded ambiguous results and the most discriminating
results were achieved by a trial rather than by proceeding in strictly analytical
manner. The exclusion of outliers, for its part, is easy to execute and it improved
prediction capability but the problem is diminishing sample size. The assumption
for the equality of the covariance matrices was concluded to follow the normality
assumption quite faithfully. Therefore, it seemed like another way to assess the
normality rather than being a distinct requirement for discriminant analysis.

The conclusion is that the growth stems from strong cash flow. The conclusion is
in line with the real life situation because, after all, a company needs hard cash in

78
order to grow. The importance of an individual financial ratio is always of interest
to economists. Throughout the Thesis, the most widespread attention was gained
by the cash flow ratios, FCPY and CPY. The cash flow ratios were duly related
with a record correlation of 0.92. Despite the correlation, it was not beneficial to
remove one or the other within almost any of the samples and ratio profiles.
FCPY was the most and CPY the second most correlated variable with the
discriminant scores. Both the correlations were strong and negative which
indicates that the cheap stocks have stronger cash flow ratios. The dichotomous
classification rate was strong for the cash flow ratios; 64 % for FCPY and 56 %
for CPY with the original basic sample.

In summary, it was found that the interpretation of individual variables in a


multivariate discriminant function is complicated and ambiguous. In some cases
of the sensitivity analysis, the composition of a discriminant score varied
significantly when the inputs were changed. This raises the question of the
relevancy of examining the score composition instead of focusing only on the
classification capability? Despite of multiform analyses, it was difficult to piece
together the behavior of the score compositions. That is why an iterative code
would be needed to try out many, if not all the combinations that can be formed
with the ratios and transformations. Probably because of irregular data and the
multicollinearity, the stepwise procedure did not result in consistent variable
profiles. The stepwise procedure was found to require cleaner data because it
selects the variables on the basis of statistical criterion.

7.2 Suggestions for further study

For the practical purposes, the time lag, currently seven years, should be
shortened to a just a couple of years to be able to use the analysis in practice.
Additionally, if the model valuates companies based on data from, say, two years
ahead, the prediction should be rather accurate two years into the future. The idea
behind the long valuation time span was to protect the valuation from momentary
price fluctuations. Because of the volatility of financial ratios and stock prices, the

79
use of moving averages should be examined. It is possible that smoothed data is
more informative than the original data and the predictive capability improves.
Furthermore, the smoothed data might improve the robustness of the model as the
case-specific differences in the data-sets might be reduced. Tried and tested
portfolios can also be used to adjust the discriminant function. Assuming that the
portfolio consists of cheap stocks and a group of expensive stocks is defined, the
model could be used to classify the rest of the stocks in the light of the existing
portfolio.

The financial ratios used in this Thesis were pre-determined. It would be


interesting to examine and design different kinds of ratio selection procedures in
order to find the superior discriminator combinations. Five out the eight ratios in
this Thesis were pure valuation ratios, thus they are compared to the market price.
A logical direction to broaden the ratio base would be to the direction of the ratios
that are not compared to the market price. The ratio examination could also be
expanded into the qualitative measure, such as management competency, working
environment, political risk, etc. The inclusion of the qualitative factors is not
complicated because many qualitative factors are already being measured by
financial institutions and news agencies. Including all the available company
financials would be a practical starting point for further study. The abundant ratio
base and advanced mathematical optimization methods might result in models that
would be of use for practical purposes.

There are several other multivariate techniques to study such as non-parametric


logit- and probit models as well as more recent invention of neural networks.
Neural networks are more complicated than the basic MANOVA techniques and
it is time consuming to program the models to learn from past events.
Nonetheless, researchers have achieved excellent bankruptcy prediction results
with neural networks and it would be interesting to apply them to the problem of
identifying cheap stocks. Neural networks can be used to model complex
relationships and patterns in data.

80
The identification of undervalued stocks could be approached from the bottom up,
with an agent-based simulation. A model is built starting from simple rules at an
individual company level. The idea is to test how changes in individual behaviors
affect the overall system as a whole, as the system consists of numerous
heterogeneous companies. An individual agent has its interests, limited
knowledge and it can for example learn from the past. A successful simulation
generates behavior similar to empirical data which helps understanding the
behavior of the stock markets. Agent based modeling attempts to re-create and
predict the behavior of complex systems that are not easily explained rationally.

Finally, various models could be combined into hybrid models that would give an
output based on a certain weighting on the outputs from the individual models.
Hybrid models are usually formed in order to achieve higher levels of prediction
robustness because strengths of some models may offset weaknesses of other
models. An extensive survey for prediction robustness would be achieved by
expanding further into different stocks from various countries and sectors as well
as to various time frames. More complex models do not inevitably mean better
prediction results. On the contrary, it often means less visibility and growing
danger of overfitting. When expanding the research, the costs and benefits should
be evaluated thoroughly.

81
8 References

Adnan, M., Adnan, M. & Dar, H.A. 2006, "Predicting corporate bankruptcy: where we stand?",
Corporate Governance: The International Journal of Effective Board Performance, vol. 6,
no. 1, pp. 18.

Altman, E.I. 1968, "Financial Ratios, Discriminant Analysis and the Prediction of Corporate
Bankruptcy", The Journal of Finance, vol. 23, no. 4, pp. 589-609.

Altman, E.I. 1968, "The Prediction of Corporate Bankruptcy: A Discriminant Analysis", The
Journal of Finance, vol. 23, no. 1, pp. 193-194.

Andersson, T., Andersson, T. & Lee, E. 2006, "Financialized accounts: Restructuring and return
on capital employed in the S&P 500", Accounting Forum (Elsevier), vol. 30, no. 1, pp. 21.

Baker, M. & Wurgler, J. 2002, "Market Timing and Capital Structure", The Journal of Finance,
vol. 57, no. 1, pp. 1-32.

Beaver, W.H. & Beaver, W.H. 1966, "Financial Ratios as Predictors of Failure", Journal of
Accounting Research, vol. 4, no. 3, pp. 71.

Blake, D. 2000, Financial Market Analysis, 2nd edn, John Wiley & Sons, New York.

Booth, P.J. & Booth, P.J. 1983, "DECOMPOSITION MEASURES AND THE PREDICTION OF
FINANCIAL FAILURE", Journal of Business Finance & Accounting, vol. 10, no. 1, pp. 67.

Brockett, P.L., Brockett, P.L., Golden, L.L., Jang, J. & Yang, C. 2006, "A Comparison of Neural
Network, Statistical Methods, and Variable Choice for Life Insurers' Financial Distress
Prediction", Journal of Risk & Insurance, vol. 73, no. 3, pp. 397.

Brockett, P.L., Cooper, W.W., Golden, L.L. & Pitaktong, U. 1994, "A Neural Network Method for
Obtaining an Early Warning of Insurer Insolvency", The Journal of Risk and Insurance, vol.
61, no. 3, pp. 402-424.

Campbell, J.Y., Campbell, J.Y. & Shiller, R.J. 1998, "Valuation Ratios and the Long-Run Stock
Market Outlook", Journal of Portfolio Management, vol. 24, no. 2, pp. 11.

Chan, L. & Lakonishok, J. 2004, "Value and growth investing: Review and update", FINANCIAL
ANALYSTS JOURNAL, vol. 60, no. 1, pp. 71-86.

Damodaran, A. 2001, The dark side of valuation : valuing old tech, new tech, and new economy
companies, New York : Financial Times/Prentice Hall.

DeFond, M.L. & DeFond, M.L. 2003, "An empirical analysis of analysts' cash flow forecasts",
Journal of Accounting & Economics, vol. 35, no. 1, pp. 73.

82
Dimitras, A.I., Dimitras, A.I., Zanakis, S.H. & Zopounidis, C. 1996, "A survey of business failures
with an emphasis on prediction methods and industrial applications", European Journal of
Operational Research, vol. 90, no. 3, pp. 487.

Fama, E.F. & Fama, E.F. 1970, "EFFICIENT CAPITAL MARKETS: A REVIEW OF THEORY
AND EMPIRICAL WORK", Journal of Finance, vol. 25, no. 2, pp. 383.

Fama, E.F. & French, K.R. 1995, "Size and Book-to-Market Factors in Earnings and Returns", The
Journal of Finance, vol. 50, no. 1, pp. 131-155.

Fama, E.F. & French, K.R. 1992, "The Cross-Section of Expected Stock Returns", The Journal of
Finance, vol. 47, no. 2, pp. 427-465.

FILLIBEN, J. 1975, "PROBABILITY PLOT CORRELATION COEFFICIENT TEST FOR


NORMALITY", TECHNOMETRICS, vol. 17, no. 1, pp. 111-117.

Fisher, R.A. 1936, "The use of multiple measurements in taxonomic problems", Annals Eugen,
vol. 7, pp. 179-188.

Frank, M. & Jagannathan, R. 1998, "Why do stock prices drop by less than the value of the
dividend? Evidence from a country without taxes", Journal of Financial Economics, vol. 47,
no. 2, pp. 161.

Frank, R.E., Frank, R.E., Massy, W.F. & Morrison, G. 1965, "Bias in Multiple Discriminant
Analysis", Journal of Marketing Research (JMR), vol. 2, no. 3, pp. 250.

Gnanadesikan, R. 1977, Methods for Statistical Data Analysis of Multivariate Observations 1st
edn, John Wiley & Sons.

Goetzmann, W.N. & Jorion, P. 1995, "A Longer Look at Dividend Yields", The Journal of
Business, vol. 68, no. 4, pp. 483-508.

Gordon, M.J. 1962, "[Security and a Financial Theory of Investment]: Reply", The Quarterly
Journal of Economics, vol. 76, no. 2, pp. 315-319.

Gordy, M.B. 2000, "A comparative anatomy of credit risk models", Journal of Banking &
Finance, vol. 24, no. 1, pp. 119.

Griffin, J.M. 1988, "A Test of the Free Cash Flow Hypothesis: Results from the Petroleum
Industry", The Review of Economics and Statistics, vol. 70, no. 1, pp. 76-82.

Hagstorm, R.G. 2001, The Essential Buffet: Timeless Principles for the New Economy, John Wiley
& Sons.

Helfert, E.A., Helfert, E.A. & ebrary, I. 2001, Financial analysis [Elektroninen aineisto] : tools
and techniques : a guide for managers, , New York : McGraw-Hill, cop. 2001.

83
Hoover, S. & ebrary, I. 2006, Stock valuation [Elektroninen aineisto] : an essential guide to Wall
Street's most popular valuation models, , New York : McGraw-Hill, cop. 2006.

Hovakimian, A., Opler, T. & Titman, S. 2001, "The debt-equity choice", JOURNAL OF
FINANCIAL AND QUANTITATIVE ANALYSIS, vol. 36, no. 1, pp. 1-24.

JENSEN, M. 1986, "AGENCY COSTS OF FREE CASH FLOW, CORPORATE-FINANCE,


AND TAKEOVERS", AMERICAN ECONOMIC REVIEW, vol. 76, no. 2, pp. 323-329.

JOHNSON, D.W. & WICHERN, R.A. 1987, Applied Multivariate Statistical Analysis, Longman
Higher Education.

Lachenbruch, P.A., Sneeringer, C. & Revo, L.T. 1973, "Robustness of linear and quadratic
discriminant function to certain types of non-normality", Communications in Statistics -
Theory and Methods, vol. 1, no. 1, pp. 39-56.

Lachenbruch, P.A. 1975, "Discriminant Analysis", Macmillan Pub Co, New York.

Laitinen, T., Back, B., Sere, K. & Wezel, M. 1995, "Choosing Bankruptcy Predictors Using
Discriminant Analysis, Logit Analysis and Genetic Algorithms", Proceedings of the first
International Meeting on Artificial Intelligence in Accounting, Finance and Tax, , pp. 337-
356.

Laitinen, E.K., Laitinen, E.K. & Laitinen, T. 1998, "CASH MANAGEMENT BEHAVIOR AND
FAILURE PREDICTION", Journal of Business Finance & Accounting, vol. 25, no. 7, pp.
893.

LeRoy, S.F., LeRoy, S.F. & Porter, R.D. 1981, "THE PRESENT-VALUE RELATION: TESTS
BASED ON IMPLIED VARIANCE BOUNDS", Econometrica, vol. 49, no. 3, pp. 555.

Liu, J., Liu, J., Nissim, D. & Thomas, J. 2007, "Is Cash Flow King in Valuations?", Financial
Analysts Journal, vol. 63, no. 2, pp. 56.

Luenberger, D.G. 1997, Investment Science, Oxford University Press, New York.

Mannila, H., Smyth, P. & Hand, D.J. 2001, Principles of Data Mining (Adaptive Computation and
Machine Learning), The MIT Press.

Marks, S., Marks, S. & Dunn, O.J. 1974, "Discriminant Functions When Covariance Matrices Are
Unequal", Journal of the American Statistical Association, vol. 69, no. 346, pp. 555.

MICHAUD, R. & DAVIS, P. 1982, "VALUATION MODEL BIAS AND THE SCALE
STRUCTURE OF DIVIDEND DISCOUNT RETURNS", Journal of Finance, vol. 37, no. 2,
pp. 563-573.

MILLER, M. 1977, "DEBT AND TAXES", Journal of Finance, vol. 32, no. 2, pp. 261-275.

84
MITRA, D., BISWAS, A. & OWERS, J. 1991, "A DIRECT TEST OF THE FREE CASH FLOW
HYPOTHESIS", Financial Management, vol. 20, no. 1, pp. 13-14.

MODIGLIANI, F. & MILLER, M. 1958, "THE COST OF CAPITAL, CORPORATION


FINANCE AND THE THEORY OF INVESTMENT", AMERICAN ECONOMIC REVIEW,
vol. 48, no. 3, pp. 261-297.

Murphy, J.J. 1999, "Technical Analysis of the Financial Markets: A Comprehensive Guide to
Trading Methods and Applications (New York Institute of Finance)", .

Murphy, K.J. & Murphy, K.J. 1985, "CORPORATE PERFORMANCE AND MANAGERIAL
REMUNERATION An Empirical Analysis", Journal of Accounting & Economics, vol. 7,
no. 1, pp. 11.

Myers, S.C. 1984, "The Capital Structure Puzzle", The Journal of Finance, vol. 39, no. 3, pp. 575-
592.

Ohlson, J.A. & Ohlson, J.A. 1980, "Financial Ratios and the Probabilistic Prediction of
Bankruptcy", Journal of Accounting Research, vol. 18, no. 1, pp. 109.

Osborne, J. 2002, "Notes on the use of data transformations.", Practical Assessment, Research &
Evaluation, vol. 8, no. 6.

Park, Y.S. & Lee, J. 2003, "An empirical study on the relevance of applying relative valuation
models to investment strategies in the Japanese stock market", Japan & the World Economy,
vol. 15, no. 3, pp. 331.

Penman, S.H. & Penman, S.H. 1996, "The Articulation of Price-Earnings Ratios and Market-to-
Book Ratios and the Evaluation of Growth", Journal of Accounting Research, vol. 34, no. 2,
pp. 235.

Ross, S.A., Westerfield, R.W., Jaffe, J. & Ku, S. 1999, Corporate Finance, McGraw-Hill College.

Senchack Jr., A. J., Senchack Jr., A. J. & Martin, J.D. 1987, "The Relative Performance of the PSR
and PER Investment Strategies", Financial Analysts Journal, vol. 43, no. 2, pp. 46.

Sharma, S. 1996, Applied Multivariate Techniques, New York : John Wiley.

Shiller, R.J. & Shiller, R.J. 1981, "Do Stock Prices Move Too Much to be Justified by Subsequent
Changes in Dividends?", American Economic Review, vol. 71, no. 3, pp. 421.

Smith, K.A., Gupta, J.N.D. & ebrary, I. 2002, Neural networks in business [Elektroninen aineisto]
: techniques and applications, , Hershey, PA : Idea Group Pub : Information Science Pub,
cop. 2002.

Stracca, L. 2004, "Behavioral finance and asset prices: Where do we stand?", Journal of Economic
Psychology, vol. 25, no. 3, pp. 373.

85
Tabachnick, B.G. & Fidell, L.S. 2000, Using Multivariate Statistics, 4th edn, Allyn & Bacon.

Yang, Z.R., Yang, Z.R., Platt, M.B. & Platt, H.D. 1999, "Probabilistic Neural Networks in
Bankruptcy Prediction", Journal of Business Research, vol. 44, no. 2, pp. 67.

Zapranis, A., Zapranis, A. & Ginoglou, D. 2000, "FORECASTING CORPORATE FAILURE


WITH NEURAL NETWORK APPROACH: THE GREEK CASE", Journal of Financial
Management & Analysis, vol. 13, no. 2, pp. 11.

Zavgren, C.V. & Zavgren, C.V. 1985, "ASSESSING THE VULNERABILITY TO FAILURE OF
AMERICAN INDUSTRIAL FIRMS: A LOGISTIC ANALYSIS", Journal of Business
Finance & Accounting, vol. 12, no. 1, pp. 19.

86

Você também pode gostar