Xem mẫu

Chapter 8 Autocorrelation Correlation indicates a relationship between two variables. In simple terms, when one ‘wiggles’ the other ‘wiggles’ too. In autocorrelation, instead of correlation between two different variables, the correlation is between two values of the same variable at different times or different places. The autocorrelation function (ACF) of a variable X describes the correla-tion at different points Xi and Xj. If X has a mean of μ and variance of σ2 the ACF as a function of two points i and j where E is the expected value is given by: ACF(i,j) = E[(Xi−μ)(Xj−μ)] Autocorrelation occurs in both the spatial context of environmental vari-ables and the temporal context of time series analysis. The main concern with autocorrelation is that failing to take it into account can produce exaggeration of significance and hence errors, e.g.: Correlation between an autocorrelated response variable and each of a set of explanatory variables is highly biased in favor of those explanatory variables that are highly autocorrelated [Len00]. That is, multiple regression will find a variable with high autocorrelation ‘significant’ more often than it should, and therefore be featured more highly in a model than it deserves, possibly replacing a best variable without au-tocorrelation. It has been claimed that models niche models may introduce ‘low frequency’ variables like temperature and rainfall falsely into models due to the high autocorrelation in climate variables. In a fair comparison, ‘high frequency’ variables such as vegetation could be as accurate or better [Len00]. It is important therefore for successful niche modeling to understand au-tocorrelation and how it can lead to errors. The simplest way to study and understand autocorrelation is to look at the one dimensional case of time series, rather than 2D to which most results generalize. Here we construct a set of the basic types of series to examine their prop-erties. 127 © 2007 by Taylor and Francis Group, LLC 128 Niche Modeling 8.1 Types While basic features such as the mean, standard deviation and linear trends are usually the basis of analysis, little attention is usually paid to the auto-correlation properties of these models. There are a number of ways of generating autocorrelation. These internal features also have a bearing on explanations for phenomena. As an example, we determine the parameters for different types of series matching the parameters derived from global temperature. We use the global temperatures from the mid-nineteenth century to the present recorded by the Climate Research Unit (CRU) [Uni]. 8.1.1 Independent identically distributed (IID) An IID series is the simplest and most familiar series consisting of inde-pendent random numbers with a distribution such as the normal distribution. Future terms in the series are determined by the long term mean a and vari-ance of past data. Specifically, each value is not dependent on any other term. For example where e is a normally distributed random variable: Xt = e The series of random numbers with a normal distribution and a standard deviation equal to CRU data is shown in Figure 8.1. 8.1.2 Moving average models (MA) In moving averages, the average of a limited set or window of values is calculated at every position in the series. In R this is done with the filter command, the filter being determined by a list of numbers to use as coeffi-cients in a summation – in this case 30 values of 1/30 provide a 30 year moving average for CRU. A MA is often called a low frequency band pass filter, as it suppresses high frequency fluctuations while passing the long frequency ones. Here is an equation for generating a moving average shown in Figure 8.1: Pn Xt−i+e i=1 n © 2007 by Taylor and Francis Group, LLC Autocorrelation 129 8.1.3 Autoregressive models (AR) In auto-regression models each term in the series is determined by the pre-vious terms plus some random error. In an AR(1) (or Markov) model only the previous term is used in predicting the next term. Each term in the AR(1) series where a is a coefficient and e is a random error term can be generated from the following equation Xt = e+aXt−1 A random walk is a form where a = 1. A walk can be generated from a series of random numbers by taking the cumulative sum. We can estimate the value of a in R with the ar() function and the CRU temperature data. We can then generate an AR(1) model using the R facility arima.sim with the given parameters. The coefficient is a = 0.67 and standard deviation is sd = 0.15 for the AR(1) model of CRU. 8.1.4 Self-similar series (SSS) The next series goes by many names: self-similar, fractal, roughness, frac-tional Gaussian noise model (FGN), long term persistence (LTP), clustering or simple scaling series (SSS). Mostly they are characterized as having con-stantly scaling variance (or standard deviation) over all time or spatial scales, and hence the term simple scaling series is most accurate. Fractional dif-ferencing, is a generalization of integer difference series, where the degree of differencing is allowed to take any real value rather than being restricted to integers. For example, in normal Brownian motion, the value of a series Xt at time t is dependent on its previous value Xt−1 and the random variable a has a difference of one. In the following Xt is a function of the partial sum of all terms preceding it. The integer differencing operator is written in terms of a backshift operator B as: (1−B)Xt = at The fraction difference operator (1 − B)d is defined by the binomial series where kth term in the series is summed from 0 to infinity, and d is a function of the Hurst exponent d = H − 0.5. These are called FARIMA models. A FARIMA(0,d,0) process is written: © 2007 by Taylor and Francis Group, LLC 130 Niche Modeling (1−B)dXt = Pk=0 d−Bk R has a package called fracdiff that allows estimation of the parameters of ar, d, and ma for simulation of a FARIMA(ar,d,ma) process where ar and ma are the classical ARMA(ar,ma) parameters. 8.2 Characteristics In Figure 8.1 the simulated series are plotted. The AR(1) and the SSS series resemble quite closely the CRU natural series. However the IID series does not capture the longer time scale fluctuations. In comparison, the random walk is difficult to plot as it tends to trend so strongly it walks out of the figure area. While it can be seen by eye in Figure 8.1 that IID and random walk are not good models for the natural series more insightful methods are needed to distinguish them. Highly autocorrelated models are described as having ‘fat tails’. This refers to the way the distribution of less frequent difference values fades out into a thicker tail (power-type) rather than the exponential form of a normal distribution. When these distributions are plotted in Figure 8.2 it is hard to see which are power and which are not. We need more powerful ways to examine the data. 8.2.1 Autocorrelation Function (ACF) One of the main tools for examining the autocorrelation structure of data is the autocorrelation function or ACF. The ACF provides a set of correlations for each distance between numbers in the series, or lags. The autocorrelation decays in a characteristic fashion for each series as the lags get longer as shown in Figure 8.3. It can be seen that the autocorrelations of the IID series decay very quickly (no long term correlation), the AR(1) model decays fairly quickly, the SSS next and the random walk most slowly. The characteristic decay in autocorrelations relative to the inverse and in-verse log plot is sometimes easily seen by plotting the log of the y axis (Fig-ure 8.4). A second tool for examining the autocorrelation structure of data is the lag plot. Figure 8.5 shows the autocorrelated processes CRU, CRU30, AR1.67, WALK and SSS with diagonals, while the random IID variable is a cloud of points. Smoothing greatly increases the diagonalization of the points on the © 2007 by Taylor and Francis Group, LLC Autocorrelation 131 sss ar1.67 CRU30 iid CRU 1500 1600 1700 1800 year walk 1900 2000 FIGURE 8.1: Plots of the global temperatures (CRU), the simulated series random, walk, ar(1), and sss. © 2007 by Taylor and Francis Group, LLC ... - tailieumienphi.vn
nguon tai.lieu . vn