Você está na página 1de 10

Time Series Analysis

an introduction

Sean R. Avent

Introduction - Goals - Data Types - Autocorrelation - Correlograms - Box-Jenkins Models -


Frequency Analysis - References

A time series is defined as a collection of observations made sequentially in time. This means that there must be equal
intervals of time in between observations.

Introduction
This page is designed for those who have a basic knowledge of elementary statistics and need a short introduction to time-
series analysis. Many references are included for those who need to probe further into the subject which is suggested if
these methods are to be applied. This guide will hopefully help people to decide if these are the correct applications to use
on their data and to give a quick summary of the basics involved. For analyzing the data there are a number of statistical
packages available.

Goals of Time Series Analysis


Time series analysis can be used to accomplish different goals:

1) Descriptive analysis determines what trends and patterns a time series has by plotting or using more complex techniques.
The most basic approach is to graph the time series and look at:

Overall trends (increase, decrease, etc.)


Cyclic patterns (seasonal effects, etc.)
Outliers – points of data that may be erroneous
Turning points – different trends within a data series

2) Spectral analysis is carried out to describe how variation in a time series may be accounted for by cyclic components.
This may also be referred to as "Frequency Domain". With this an estimate of the spectrum over a range of frequencies can
be obtained and periodic components in a noisy environment can be separated out.

Example: What is seen in the ocean as random waves may actually be a number of different frequencies and amplitudes that
are quite stable and predictable. Spectral analysis is used on the wave height vs. time to determine which frequencies are
most responsible for the patterns that are there, but can’t be readily seen without analysis.

3) Forecasting can do just that - if a time series has behaved a certain way in the past, the future behavior can be predicted
within certain confidence limits by building models.

Example: Tidal charts are predictions based upon tidal heights in the past. The known components of the tides (e.g.,
positions of the moon and sun and their weighted values) are built into models that can be employed to predict future values
of the tidal heights.

4) Intervention analysis can explain if there is a certain event that occurs that changes a time series. This technique is used a
lot of the time in planned experimental analysis. In other words, 'Is there a change in a time series before and after a certain
event?'
Example: 1. If a plant's growth rate before changing the amount of light it gets is different from that afterwards, an
intervention has occurred - the change in light is the intervention. 2. When a community of goats changes its behavior after a
bear shows up in the area, then there may be an intervention.

5) Explanative Analysis (Cross Correlation)


Using one or more variable time series, a mechanism that results in a dependent time series can be estimated. A common
question to be answered with this analysis would be "What relationship is there between two time series data sets?" This
topic is not discussed within this page although it is discussed in Chatfield (1996) and Box et al. (1994).

Example: Atmospheric pressure and seawater temperature affect sea level. All of these data are in time series and can relate
how and to what degree pressure and temperature affect the sea level.

Types of Time Series Data


Continuous vs. Discrete

Continuous - observations made continuously in time

Examples:
1. Seawater level as measured by an automated sensor.
2. Carbon dioxide output from an engine.

Discrete - observations made only at certain times.

Examples:
1. Animal species composition measured every month.
2. Bacteria culture size measured every six hours.

Stationary vs. Non-stationary

Stationary - Data that fluctuate around a constant value

Non-stationary - A series having parameters of the cycle (i.e., length, amplitude or phase) change over time

Deterministic vs. Stochastic

Deterministic time series - This data can be predicted exactly.

Stochastic time series - Data are only partly determined by past values and future values have to be described with a
probability distribution. This is the case for most, if not all, natural time series. So many factors are involved in a natural
system that we can not possibly correctly apply all of them.

Transformations of the Data

We can transform data to:

1. Stabilize the variance - use the logarithmic transformation


2. Make the seasonal effect additive - this makes the effect constant from year to year - use the logarithmic
transformation.
3. Make data normally distributed - this reduces the skewness in the data so that we may apply appropriate
statistics - use the Box-Cox (logarithmic and square root) transformation

There are many more transformations not discussed here that are available to use for the many different things we may want
to do with the time series data. These are discussed in the various texts listed througout this page.
Autocorrelation
A series of data may have observations that are not independent of one another.

Example:
A population density on day 8 depends on what that population density was at on day 7. And likewise, that in
turn is dependent on day 6 and so forth.

The order of these data has to be taken into account so that we can assess the autocorrelation involved..

To find out if autocorrelation exists:


Autocorrelation Coefficients measure correlations between observations a certain distance apart.
Based on the ordinary correlation coefficient ‘r’ (see Zar for a full explanation), we can see if successive observations are
correlated. An autocorrelation coefficient at lag k can be found by:

This is the covariance (xt xt+k)divided by the variance (xt).


An rk value of (± 2/Ö N) denotes a significant difference from zero and signifies an autocorrelation.
Also note that as k gets large, rk becomes smaller.

Another test for the presence or absence of autocorrelation, a Durbin-Watson d-statistic can be employed:

Fig. 1 shows the five regions of values in which autocorrelation is accepted or not.

Figure 1. The five regions of the Durban-Watson d-statistic.

A Note on Non-Stationary Data

As stated above, non-stationary data has the parameters of the cycle involved changing over time. This is a trend that must
be removed before the calculation of rk and the resulting correlograms seen below. Without this trend removal, the trend will
tend to dominate the other features of the data.

Correlograms
The autocorrelation coefficient ‘rk’ can then be plotted against the lag (k) to develop a correlogram. This will give us a
visual look at a range of correlation coefficients at relevant time lags so that significant values may be seen.
The correlogram in Fig.2 shows a short-term correlation being significant at low k and small correlation at longer lags.
Remember that an rk value of (± 2/Ö N) denotes a significant difference (a = 0.05) from zero and signifies an
autocorrelation. Some procedures may call for a higher a value since this constitues expectation that one out of every
twenty obsservations in a truly random data series will be significant.

Figure 2. A time series showing short-term autocorrelation together with its correlogram.

Fig. 3 shows an alternating (negative correlation) time series.


The coefficient rk alternates as does the raw data (r1 is negative and r2 is positive …..) This series of rk is negative.

Figure 3. An alternating time series with its correlogram.

A greater discussion on the correlograms and associated periodograms can be found in Chatfield (1996), Naidu (1996), and
Warner (1998).
Box-Jenkins Models (Forecasting)
Box and Jenkins developed the AutoRegressive Integrative Moving Average (ARIMA) model which combined the
AutoRegresive (AR) and Moving Average (MA) models developed earlier with a differencing factor that removes in trend in
the data.

This time series data can be expressed as: Y1, Y2, Y3,…, Yt-1, Yt

With random shocks (a) at each corresponding time: a1, a2, a3,…,at-1, at

In order to model a time series, we must state some assumptions about these 'shocks'. They have:

1. a mean of zero
2. a constant variance
3. no covariance between shocks
4. a normal distribution (although there are procedures for dealing with this)

An ARIMA (p,d,q) model is composed of three elements:

p: Autoregression
d: Integration or Differencing
q: Moving Average

A simple ARIMA (0,0,0) model without any of the three processes above is written as:

Yt = at

The autoregression process [ARIMA (p,0,0)] refers to how important previous values are to the current one over time. A
data value at t1 may affect the data value of the series at t2 and t3. But the data value at t1 will decrease on an exponential
basis as time passes so that the effect will decrease to near zero. It should be pointed out that f is constrained between -1 and
1 and as it becomes larger, the effects at all subsequent lags increase.

Yt = f1 Y t-1 + at

The integration process [ARIMA (0,d,0)] is differenced to remove the trend and drift of the data (i.e. makes non-stationary
data stationary). The first observation is subtracted from the second and the second from the third and …. So the final form
without AR or MA processes is the ARIMA (0,1,0) model:

Yt = Yt-1 + at

The order of the process rarely exceeds one (d < 2 in most situations).

The moving average process [ARIMA (0,0,q)] is used for serial correlated data. The process is composed of the current
random shock and portions of the q previous shocks. An ARIMA (0,0,1) model is described as:

Yt = at - q1at-1

As with the integration process, the MA process rarely exceeds the first order.

Time Series Intervention Analysis


(or Interrupted Time Series Analysis)
The basic question is "Has an event had an impact on a time series?"
The null hypothesis is that the level of the series before the intervention (bpre) is the same as the level of the series after the
intervention (bpost). or

Ho: bpre - bpost = 0

After building the ARIMA model, an intervention term (It) can be added and the ARIMA equation is now a noise
component (Nt):

Yt = f(It) + Nt

The intervention component can be of four different types that are described by their onset and duration characteristics (Fig.
4):

Figure 4. Types of intervention components. From Mc Dowall et al. (1980).

Frequency Analysis
(Frequency Domain)
Frequency analysis is used to decompose a time series into an array of sine and cosine functions which can be plotted by
their wavelengths. This spectrum of wavelengths can be analyzed to determine which are most relevant (see Fig. 5). In
Fig.5 you cant tell what the major components are of the raw data, but when a spectral analysis is completed, yu can pick out
the relevant wavelengths.
In any one of these analyses, the data is considered to be stationary. If it is not, then a filter should be applied to the data
before instituting the appropriate analysis. All angles are presented as radians.
Figure 5. Frequency analysis data sets. The top four plots are the raw data as where the bottom
four are the periodograms for the top four, but are not in order.

A Harmonic Analysis (a type of regression analysis) is used to fit a model when the period or cycle length is known apriori.
This can estimate the amplitude, cycle phase, and mean.

Xt = m + A cos(wt) + B sin(wt) + et
w = 2p/t (We know what the period (t) is).
t = observation time or number
A and B = coefficients
e = residuals that are uncorrelated
Given t, we can use OLS regression methods to estimate the amplitude and the phase of the cycle.

Amplitude: R = (A2 + B2) 1/2


Phase: f = arctan (-B, A)

Using SPSS, we use multiple regression using sinw and cosw as variables to give us estimates of A and B.
Once we have this info, we can calculate the amplitude and phase and the model is fit.

A Periodogram or Spectral Analysis is used if there is no reason to suspect a certain period or cycle length. These methods
fit a suite of cycles of differing lengths or periods to the data.

Periodogram Analysis

To find which sinusoidals describe the data and to what degrees, a generalization of the harmonic analysis is
applied to the residuals of the data. The overall SS variance is partitioned into N/2 periodic components each
with df=2. Then a harmonic analysis is done on each component and summed in an ANOVA source table.
From this, we get estimates of A and B (SS’s) for each component and as they are additive to the SStotal, we can
get estimates of variances for each component.
The null hypothesis is that the variances are all the same and this is indicative of white noise. This is plotted
with intensity or SS on the Y-axis, while the X-axis is composed of the frequencies. A large peak represents a
frequency that varies the data significantly.

Xt = m + S [A cos (wt) + B sin (wt) ]

Dependent variable: Xt = time series


A = cosine parameter is
Independent variables:
regression coefficient
B = sine parameter is
regression coefficient

A and B determine the degree to which each function is correlated with the data.

w = 2*p*uk
Since the sine and cosine functions are orthogonal (mutually independent), Periodogram Values (Pk) are created
and ploted against the frequency. These values are interpreted as variances of the frequencies.

Pk = A2 + B2 * N/2 Pk = periodogram value at uk


N = overall length of series

Leakage
Since the true data are not sampled continuously, the significant period peak may leak into other adjacent
frequencies. To alleviate this problem, deployment of the following are suggested:

1. Padding
2. Tapering or windowing
3. Smoothing

These methods can be found in Warner et al. (1998), Chatfield (1996), and Gardner (1988).

Spectral Analysis

In order to get a power spectrum, we must smooth the data from the periodogram so that the each periodogram
intensity is replaced by an average that includes weighted neighboring data. This gives a better and more
reliable picture of the distribution of power (or variance accounted for). Smoothing procedures can differ by
window width and weighting function.
Fourier frequencies are chosen with the longest cycle equal to the length of the series and the shortest cycle
having a period of two cycles. All frequencies in between are equally spaced and don’t overlap. A Fast Fourier
Transform uses the Euler relation deriving complex numbers (Chatfield, 1996) and is too math-intensive to
practically do by hand. SPSS has a fast Fourier transfrom built in for these analyses.
Spectrum analysis significance tests use upper and lower bounds of a confidence interval that are derived using
a c2 distribution. The degrees of freedom will depend on what kind of smoothing was used. This confidence
interval can be superimposed on the Power Spectrum so that significant values may be seen.

For a more complete description see any one of the spectral analysis books listed below, but especially Chatfield
(1996) and Warner (1998).

References
There are many references out there for time series analysis. Most refer to applications involving econometrics or social
sciences, but most techniques can be applied to the biological sciences. Most of the web pages involve vaery advanced
theories and techniques.

Books held by the SFSU Library


Box, G.E.P., G.M. Jenkins, and G.C. Reinsel. 1994. Time A great introductory section, although the rest of the book
series analysis – Forecasting and control. 3rd ed. Prentice is very involved and mathematically in-depth.
Hall, Englewood Cliffs, NJ, USA

Chatfield, C. 1996. The analysis of time series – an A very good and readable book that goes over most aspects
introduction. 5th ed. Chapman and Hall, London, UK. of time series data. Highly recommended.

Gardner, W.H. 1988. Statistical spectral analysis - A


nonprobabilistic theory. Prentice-Hall Inc. Englewood An in-depth book with advanced features and methods.
Cliffs, NJ, USA.

Harvey, H.C., 1981. Time series models. Halstead Press, A moderately involved book with some understandable
New York, NY, USA. sections on model building.

McDowall, D., R. McCleary, E.E. Meidinger, and A.H. A good book in the Sage series on intervention analysis
Richard Jr. 1980. Interupted Time Series Analysis. Sage that covers the basics quite well. Very readable.
Publications,Inc., Thousand Oaks, CA, USA.

Naidu, P.S. 1996. Modern spectrum analysis of time series. A complete account of spectrum analysis, but very
CRC Press Inc., Boca Raton, FL, USA involved and assumes great comfort with basic statistics.

Ostrom, C.W., 1978. Time series analysis : regression A good and short book in the Sage series that goes over the
techniques. Sage Publications, Beverly Hills, CA. USA. basics with decent ease.

Warner, R. M. 1998. Spectral analysis of time-series data. A very good book on spectral analysis that is especially
Guilford Press, New York, NY, USA. good with experimental design and data collection/entry.

Other Books

SPSS for Beginners - $5.95 This can be downloaded in a pdf (Acrobat Reader) file for
a small fee. Chapter 17, Time Series Analysis can be
downloaded separately for free from the SPSS site.

Internet Resources
An online textbook from Statsoft that cover most aspects of Very complete and readable.
Time Series Analysis.

Autobox tutorial A rather bulky tutorial on ARIMA Models

Carnegie Mellon Univerity - Datasets Very wide range of datasets to play with.

Rob J Hyndman's Forecasting Pages A set of pages with everything forecasting.

Time Series Analysis and Chaosdynamics - Rotating Fluids A very in depth page on advanced time series analysis.

Forum: sci.stat.edu If you get stuck, you can post a question to this forum….

Forum: sci.stat.consult Or this forum….

Forum: comp.soft-sys.stat.spss Or for any SPSS question, use this forum.


Journals

Journal of Time Series Analysis

Journal of Forecasting

International Journal of Forecasting

Software
SPSS Lots of Software - a great statistics package
Looks useful, but I havent played with it. Starts at $400 and goes up from there. Forecasting
AFS - Autobox
and intervention analysis.
UCLA Statistics Bookmark
Need Software - look here!
Database

Page last updated 14 Dec 1999.

Você também pode gostar