Escolar Documentos
Profissional Documentos
Cultura Documentos
an introduction
Sean R. Avent
A time series is defined as a collection of observations made sequentially in time. This means that there must be equal
intervals of time in between observations.
Introduction
This page is designed for those who have a basic knowledge of elementary statistics and need a short introduction to time-
series analysis. Many references are included for those who need to probe further into the subject which is suggested if
these methods are to be applied. This guide will hopefully help people to decide if these are the correct applications to use
on their data and to give a quick summary of the basics involved. For analyzing the data there are a number of statistical
packages available.
1) Descriptive analysis determines what trends and patterns a time series has by plotting or using more complex techniques.
The most basic approach is to graph the time series and look at:
2) Spectral analysis is carried out to describe how variation in a time series may be accounted for by cyclic components.
This may also be referred to as "Frequency Domain". With this an estimate of the spectrum over a range of frequencies can
be obtained and periodic components in a noisy environment can be separated out.
Example: What is seen in the ocean as random waves may actually be a number of different frequencies and amplitudes that
are quite stable and predictable. Spectral analysis is used on the wave height vs. time to determine which frequencies are
most responsible for the patterns that are there, but can’t be readily seen without analysis.
3) Forecasting can do just that - if a time series has behaved a certain way in the past, the future behavior can be predicted
within certain confidence limits by building models.
Example: Tidal charts are predictions based upon tidal heights in the past. The known components of the tides (e.g.,
positions of the moon and sun and their weighted values) are built into models that can be employed to predict future values
of the tidal heights.
4) Intervention analysis can explain if there is a certain event that occurs that changes a time series. This technique is used a
lot of the time in planned experimental analysis. In other words, 'Is there a change in a time series before and after a certain
event?'
Example: 1. If a plant's growth rate before changing the amount of light it gets is different from that afterwards, an
intervention has occurred - the change in light is the intervention. 2. When a community of goats changes its behavior after a
bear shows up in the area, then there may be an intervention.
Example: Atmospheric pressure and seawater temperature affect sea level. All of these data are in time series and can relate
how and to what degree pressure and temperature affect the sea level.
Examples:
1. Seawater level as measured by an automated sensor.
2. Carbon dioxide output from an engine.
Examples:
1. Animal species composition measured every month.
2. Bacteria culture size measured every six hours.
Non-stationary - A series having parameters of the cycle (i.e., length, amplitude or phase) change over time
Stochastic time series - Data are only partly determined by past values and future values have to be described with a
probability distribution. This is the case for most, if not all, natural time series. So many factors are involved in a natural
system that we can not possibly correctly apply all of them.
There are many more transformations not discussed here that are available to use for the many different things we may want
to do with the time series data. These are discussed in the various texts listed througout this page.
Autocorrelation
A series of data may have observations that are not independent of one another.
Example:
A population density on day 8 depends on what that population density was at on day 7. And likewise, that in
turn is dependent on day 6 and so forth.
The order of these data has to be taken into account so that we can assess the autocorrelation involved..
Another test for the presence or absence of autocorrelation, a Durbin-Watson d-statistic can be employed:
Fig. 1 shows the five regions of values in which autocorrelation is accepted or not.
As stated above, non-stationary data has the parameters of the cycle involved changing over time. This is a trend that must
be removed before the calculation of rk and the resulting correlograms seen below. Without this trend removal, the trend will
tend to dominate the other features of the data.
Correlograms
The autocorrelation coefficient ‘rk’ can then be plotted against the lag (k) to develop a correlogram. This will give us a
visual look at a range of correlation coefficients at relevant time lags so that significant values may be seen.
The correlogram in Fig.2 shows a short-term correlation being significant at low k and small correlation at longer lags.
Remember that an rk value of (± 2/Ö N) denotes a significant difference (a = 0.05) from zero and signifies an
autocorrelation. Some procedures may call for a higher a value since this constitues expectation that one out of every
twenty obsservations in a truly random data series will be significant.
Figure 2. A time series showing short-term autocorrelation together with its correlogram.
A greater discussion on the correlograms and associated periodograms can be found in Chatfield (1996), Naidu (1996), and
Warner (1998).
Box-Jenkins Models (Forecasting)
Box and Jenkins developed the AutoRegressive Integrative Moving Average (ARIMA) model which combined the
AutoRegresive (AR) and Moving Average (MA) models developed earlier with a differencing factor that removes in trend in
the data.
This time series data can be expressed as: Y1, Y2, Y3,…, Yt-1, Yt
With random shocks (a) at each corresponding time: a1, a2, a3,…,at-1, at
In order to model a time series, we must state some assumptions about these 'shocks'. They have:
1. a mean of zero
2. a constant variance
3. no covariance between shocks
4. a normal distribution (although there are procedures for dealing with this)
p: Autoregression
d: Integration or Differencing
q: Moving Average
A simple ARIMA (0,0,0) model without any of the three processes above is written as:
Yt = at
The autoregression process [ARIMA (p,0,0)] refers to how important previous values are to the current one over time. A
data value at t1 may affect the data value of the series at t2 and t3. But the data value at t1 will decrease on an exponential
basis as time passes so that the effect will decrease to near zero. It should be pointed out that f is constrained between -1 and
1 and as it becomes larger, the effects at all subsequent lags increase.
Yt = f1 Y t-1 + at
The integration process [ARIMA (0,d,0)] is differenced to remove the trend and drift of the data (i.e. makes non-stationary
data stationary). The first observation is subtracted from the second and the second from the third and …. So the final form
without AR or MA processes is the ARIMA (0,1,0) model:
Yt = Yt-1 + at
The order of the process rarely exceeds one (d < 2 in most situations).
The moving average process [ARIMA (0,0,q)] is used for serial correlated data. The process is composed of the current
random shock and portions of the q previous shocks. An ARIMA (0,0,1) model is described as:
Yt = at - q1at-1
As with the integration process, the MA process rarely exceeds the first order.
After building the ARIMA model, an intervention term (It) can be added and the ARIMA equation is now a noise
component (Nt):
Yt = f(It) + Nt
The intervention component can be of four different types that are described by their onset and duration characteristics (Fig.
4):
Frequency Analysis
(Frequency Domain)
Frequency analysis is used to decompose a time series into an array of sine and cosine functions which can be plotted by
their wavelengths. This spectrum of wavelengths can be analyzed to determine which are most relevant (see Fig. 5). In
Fig.5 you cant tell what the major components are of the raw data, but when a spectral analysis is completed, yu can pick out
the relevant wavelengths.
In any one of these analyses, the data is considered to be stationary. If it is not, then a filter should be applied to the data
before instituting the appropriate analysis. All angles are presented as radians.
Figure 5. Frequency analysis data sets. The top four plots are the raw data as where the bottom
four are the periodograms for the top four, but are not in order.
A Harmonic Analysis (a type of regression analysis) is used to fit a model when the period or cycle length is known apriori.
This can estimate the amplitude, cycle phase, and mean.
Xt = m + A cos(wt) + B sin(wt) + et
w = 2p/t (We know what the period (t) is).
t = observation time or number
A and B = coefficients
e = residuals that are uncorrelated
Given t, we can use OLS regression methods to estimate the amplitude and the phase of the cycle.
Using SPSS, we use multiple regression using sinw and cosw as variables to give us estimates of A and B.
Once we have this info, we can calculate the amplitude and phase and the model is fit.
A Periodogram or Spectral Analysis is used if there is no reason to suspect a certain period or cycle length. These methods
fit a suite of cycles of differing lengths or periods to the data.
Periodogram Analysis
To find which sinusoidals describe the data and to what degrees, a generalization of the harmonic analysis is
applied to the residuals of the data. The overall SS variance is partitioned into N/2 periodic components each
with df=2. Then a harmonic analysis is done on each component and summed in an ANOVA source table.
From this, we get estimates of A and B (SS’s) for each component and as they are additive to the SStotal, we can
get estimates of variances for each component.
The null hypothesis is that the variances are all the same and this is indicative of white noise. This is plotted
with intensity or SS on the Y-axis, while the X-axis is composed of the frequencies. A large peak represents a
frequency that varies the data significantly.
A and B determine the degree to which each function is correlated with the data.
w = 2*p*uk
Since the sine and cosine functions are orthogonal (mutually independent), Periodogram Values (Pk) are created
and ploted against the frequency. These values are interpreted as variances of the frequencies.
Leakage
Since the true data are not sampled continuously, the significant period peak may leak into other adjacent
frequencies. To alleviate this problem, deployment of the following are suggested:
1. Padding
2. Tapering or windowing
3. Smoothing
These methods can be found in Warner et al. (1998), Chatfield (1996), and Gardner (1988).
Spectral Analysis
In order to get a power spectrum, we must smooth the data from the periodogram so that the each periodogram
intensity is replaced by an average that includes weighted neighboring data. This gives a better and more
reliable picture of the distribution of power (or variance accounted for). Smoothing procedures can differ by
window width and weighting function.
Fourier frequencies are chosen with the longest cycle equal to the length of the series and the shortest cycle
having a period of two cycles. All frequencies in between are equally spaced and don’t overlap. A Fast Fourier
Transform uses the Euler relation deriving complex numbers (Chatfield, 1996) and is too math-intensive to
practically do by hand. SPSS has a fast Fourier transfrom built in for these analyses.
Spectrum analysis significance tests use upper and lower bounds of a confidence interval that are derived using
a c2 distribution. The degrees of freedom will depend on what kind of smoothing was used. This confidence
interval can be superimposed on the Power Spectrum so that significant values may be seen.
For a more complete description see any one of the spectral analysis books listed below, but especially Chatfield
(1996) and Warner (1998).
References
There are many references out there for time series analysis. Most refer to applications involving econometrics or social
sciences, but most techniques can be applied to the biological sciences. Most of the web pages involve vaery advanced
theories and techniques.
Chatfield, C. 1996. The analysis of time series – an A very good and readable book that goes over most aspects
introduction. 5th ed. Chapman and Hall, London, UK. of time series data. Highly recommended.
Harvey, H.C., 1981. Time series models. Halstead Press, A moderately involved book with some understandable
New York, NY, USA. sections on model building.
McDowall, D., R. McCleary, E.E. Meidinger, and A.H. A good book in the Sage series on intervention analysis
Richard Jr. 1980. Interupted Time Series Analysis. Sage that covers the basics quite well. Very readable.
Publications,Inc., Thousand Oaks, CA, USA.
Naidu, P.S. 1996. Modern spectrum analysis of time series. A complete account of spectrum analysis, but very
CRC Press Inc., Boca Raton, FL, USA involved and assumes great comfort with basic statistics.
Ostrom, C.W., 1978. Time series analysis : regression A good and short book in the Sage series that goes over the
techniques. Sage Publications, Beverly Hills, CA. USA. basics with decent ease.
Warner, R. M. 1998. Spectral analysis of time-series data. A very good book on spectral analysis that is especially
Guilford Press, New York, NY, USA. good with experimental design and data collection/entry.
Other Books
SPSS for Beginners - $5.95 This can be downloaded in a pdf (Acrobat Reader) file for
a small fee. Chapter 17, Time Series Analysis can be
downloaded separately for free from the SPSS site.
Internet Resources
An online textbook from Statsoft that cover most aspects of Very complete and readable.
Time Series Analysis.
Carnegie Mellon Univerity - Datasets Very wide range of datasets to play with.
Time Series Analysis and Chaosdynamics - Rotating Fluids A very in depth page on advanced time series analysis.
Forum: sci.stat.edu If you get stuck, you can post a question to this forum….
Journal of Forecasting
Software
SPSS Lots of Software - a great statistics package
Looks useful, but I havent played with it. Starts at $400 and goes up from there. Forecasting
AFS - Autobox
and intervention analysis.
UCLA Statistics Bookmark
Need Software - look here!
Database