Você está na página 1de 11

1 Time Series Analysis: The Basics Steve Carpenter, Zoology 535, Spring 2006 Models for Univariate Time

Series A series is an ordered sequence of observations in space or time. We will use t as a subscript to denote the position of an observation in a time series. For example yt is observation t of time series y. The backshift operator BS shifts a time series backward s steps. For example, B yt = yt-1 [1] B2 yt = yt-2 B3 yt = yt-s Serial correlation is the correlation of a time series with itself. The noise model is the model for the stochasticity of a time series that cannot be explained by serial correlation, or by correlation with another variable. An intervention model is the model for proposed causal relationships. In an ecological study, the intervention model relates series of one or more independent variates, or input variates, to the response variate. Its the time-series analog of a regression. Intervention analysis is the process of identifying, fitting, and evaluating combined intervention and noise models for a set of input and response series. Diagnostics for serial correlation include Autocorrelation function or ACF: This function is a plot of the autocorrelation as a function of lag. The autocorrelation is simply the ordinary Pearson productmoment correlation of a time series with itself at a specified lag. The autocorrelation at lag 0 is the correlation of the series with its unlagged self, or 1. The autocorrelation at lag 1 is the correlation of the series with itself lagged one step; the autocorrelation at lag 2 is the correlation of the series with itself lagged 2 steps; and so forth. Partial autocorrelation function or PACF: This function is a plot of the partial autocorrelations versus lag. The partial autocorrelation at a given lag is the autocorrelation that is not accounted for by autocorrelations at shorter lags. To calculate the partial autocorrelation at lag s, yt is first regressed against yt-1, yt-2, . . . yt-(s-1). Then calculate the correlation of the residuals of this regression with yt-s. The partial autocorrelation as the autocorrelation which remains at lag s after the effects of shorter lags (1, 2, . . . s-1) have been removed by regression. [2] [3]

It is useful to know that for a time series of n observations, the smallest significant autocorrelation is about 2/sqrt(n). There are many kinds of models for serially correlated errors. Most ecological time series can be fit by one of a few types of models. We will restrict our attention to the fairly small group of models that usually work for ecological studies. Moving average or MA models have the general form yt = (B) t [4]

In equation 4, y is the observed series which is serially correlated. is a series of errors which are free of serial correlation. Time series without serially correlated errors are sometimes called "white noise". (B) is a polynomial involving the MA parameters and the backshift operator B: (B) = 1 + 1 B + 2 B2 . . . The parameters must lie between -1 and 1. The order of a MA model is the number of parameters. For a pure MA process, the ACF cuts off at a lag corresponding to the order of the process. An ACF that cuts off at lag k is significant at lags of k and below, and nonsignificant for lags greater than k. If the ACF cuts off at lag k, that is evidence that a MA model of order k, or MA(k) model, is appropriate. A MA(k) model has k parameters: 1, 2, . . . k. The MA(1) model is yt = (1 + B) t which is equivalent to yt = t + t-1 The MA(2) model is yt = (1 + 1 B + 2 B2) t = t + 1 t-1 + 2 t-2 [8] [7] [6] [5]

Examples of ACFs and PACFs for data that fit these models are presented in the illustrations on the next page.

Diagnostics from some simple Moving Average processes.

4 Autoregressive or AR models have the general form (B) yt = t [9]

Here yt and t have the same meaning as in eq. 4. (B) is a polynomial involving the autoregressive parameters (-1 < < 1) and the backshift operator: (B) = 1 + 1 B + 2 B2 . . . [10]

An AR(k) model, or AR model of order k, has k parameters. For a pure AR(k) process, the PACF cuts off at lag k. The AR(1) model is (1 + B) yt = t or yt = - yt-1 + t The AR(2) model is (1 + 1 B + 2 B2) yt = t or yt = - 1yt-1 - 2yt-2 + t [14] [13] [12] [11]

Examples of ACFs and PACFs from data that fit these models are presented in the illustrations on the next page.

Diagnostics for some simple autoregressive processes.

6 Autoregressive Moving Average or ARMA models include both AR and MA terms: (B) yt = (B) t [15]

Since we are usually interested in an expression for y, ARMA models are often written yt = [(B) / (B)] t The ARMA(1,1) model is (1 + B) yt = (1 + B) t or yt = - yt-1 + t + t-1 [18] [17] [16]

Examples of ACFs and PACFs from ARMA(1,1) models are presented in the illustrations below.

Diagnostics for some simple ARMA processes.

7 Fitting Univariate Time Series Models: Model fitting is a sequential process of estimation and evaluation. Here is a pseudocode for fitting a time series model. 1. Inspect the time series to see if it is stationary. If it is not stationary, make some appropriate transformations (see below). 2. Calculate ACF and PACF 3. Guess a form for the model. Start with a simple model. Add new parameters one at a time, only as needed. 4. Fit the model, calculate AIC, and the ACF and PACF for the residuals. 5. If the ACF or PACF of residuals contain significant terms, return to step 3 to make a different guess for the form of the model. Add parameters one at a time. 6. If the ACF and PACF do not contain significant terms, stop. 7. A good model should have non-significant ACF and PACF for residuals, residuals that appear to be stationary and normally-distributed, and a lower AIC than other models. A stationary series is one whose statistical distribution is constant throughout the series. In practice, we will call a series stationary if the mean is roughly constant (no obvious trends) and the variance is roughly constant (scatter around the mean is about the same). If a series is not stationary, it may be impossible to find a stable, well-behaved model. Transformations of the data can be used to make a series more stationary and improve model fitting. Since units are always arbitrary, there is no reason not to transform a series to units that improve model performance. Log transformations often help stabilize the variance of biological data. For more about choosing transformations, see Box et al. (1978). Differencing transforms a series to first differences. The first difference of yt is (1-B)yt. Differencing often makes a nonstationary time series appear to be stationary. Models fit to differenced series are called Autoregressive Integrated Moving Average or ARIMA models.

8 Intervention Analysis Intervention analysis is used to detect effects of an independent variable on a dependent variable in a time series. In some ways it is a time series analog of regression. Intervention analysis can be used to measure the effects of a driver on a response variable, or to measure the effects of disturbances (either experimental or inadvertent) in ecological time series. Definitions Intervention Models link one or more input (or independent) variates to a response (or dependent) variate (Box and Tiao 1975, Wei 1990). For one input variate x and one response variate y, the general form of an intervention model is yt = [(B) / (B)] xt-s + N(t) [19]

Here N(t) is an appropriate noise model for y (for example, an ARMA model). The delay between a change in x and a response in y is s. The intervention model has both a numerator polynomial and a denominator polynomial. The numerator polynomial is (B) = 0 + 1B + 2B2 + . . . [20]

The numerator parameters determine the magnitude of the effect of x on y. These parameters can be any real number; they are not constrained to lie between -1 and +1. The numerator parameters are usually of greatest interest. They are analogs of regression parameters The denominator polynomial is (B) = 1 + 1B + 2B2 + . . . where -1 < < 1. The denominator determines the shape of the response. Graphs of some common intervention models are shown on the next page. [21]

10 Cross-correlations are used to diagnose intervention models. The cross-correlation function (CCF) is a plot of the correlation of x and y versus lag. The CCF can be misleading if x or y are serially correlated. Thus x and y are usually fit to noise models (or "filtered") before calculating the CCF. The lag of the significant CCF term at the shortest lag indicates s, the shift of the intervention effect. If your intervention model has captured all the information about y that is available in x, then the residuals of the model will have no significant cross correlations with x. Fitting intervention models follows a sequential approach similar to the one used to fit univariate time series. 1. Inspect the Y and X time series to see if they are stationary. If they are not stationary, make some appropriate transformations. 2. Calculate ACF and PACF for Y, and the CCF for Y versus X. 3. Guess a form for the model. Start with a simple model. Add new parameters one at a time, only as needed. 4. Fit the model, calculate the AIC, the ACF and PACF for the residuals, and the CCF for the residuals versus X. 5. If the ACF, PACF or CCF contain significant terms, return to step 3 to make a different guess for the form of the model. Add parameters one at a time. 6. If the ACF, PACF, and CCF do not contain significant terms, stop. 7. A good model should have non-significant ACF and PACF for residuals, non-significant CCF for the residuals versus X, residuals that appear to be stationary and normallydistributed, and a lower AIC than other models.

11 REFERENCES Box, G.E.P., G.M. Jenkins and G.C. Reinsel. 1994. Time Series Analysis: Forecasting and Control. Prentice-Hall, Englewood Cliffs, NJ. Box, G.E.P. and G.C. Tiao. 1975. Intervention analysis with application to economic and environmental problems. J. Amer. Stat. Assoc. 70: 70-79. Carpenter, S.R. 1993. Analysis of the ecosystem experiments. Chapter 3 in S.R. Carpenter and J.F. Kitchell (eds), The Trophic Cascade in Lakes. Cambridge Univ. Press, London. Wei, W.W.S. Time Series Analysis. Addison-Wesley, Redwood City, California.

Você também pode gostar