Stationarity and differencing of time series data (2024)

<![if !vml]><![endif]>Data concepts

Principles and risksof forecasting (pdf)

Famous forecastingquotes
How to move data around
Get to know your data
Inflation adjustment (deflation)
Seasonal adjustment
Stationarity and differencing
The logarithm transformation

Stationarity anddifferencing

Statistical stationarity
First difference (period-to-period change)

Statisticalstationarity: Astationary time series is one whose statistical properties such as mean,variance, autocorrelation, etc. are all constant over time. Most statistical forecastingmethods are based on the assumption that the time series can be renderedapproximately stationary (i.e., "stationarized") through the use ofmathematical transformations. A stationarized series is relatively easy topredict: you simply predict that its statistical properties will be the same inthe future as they have been in the past! (Recall our famous forecasting quotes.) The predictions forthe stationarized series can then be "untransformed," by reversingwhatever mathematical transformations were previously used, to obtainpredictions for the original series. (The details are normally taken care of byyour software.) Thus, finding the sequence of transformations needed tostationarize a time series often provides important clues in the search for anappropriate forecasting model.Stationarizing a time series through differencing (where needed) is animportant part of the process of fitting an ARIMA model, as discussed in the ARIMA pagesof these notes.

Anotherreason for trying to stationarize a time series is to be able to obtainmeaningful sample statistics such as means, variances, and correlations withother variables. Such statistics are useful as descriptors of future behavior onlyif the series is stationary. For example, if the series is consistentlyincreasing over time, the sample mean and variance will grow with the size ofthe sample, and they will always underestimate the mean and variance in futureperiods. And if the mean and variance of a series are not well-defined, thenneither are its correlations with other variables. For this reason you shouldbe cautious about trying to extrapolate regression models fitted tononstationary data.

Mostbusiness and economic time series are far from stationary when expressed intheir original units of measurement, and even after deflation or seasonaladjustment they will typically still exhibit trends, cycles, random-walking,and other non-stationary behavior. If the series has a stablelong-run trend and tends to revert to the trend line following a disturbance,it may be possible to stationarize it by de-trending (e.g., by fitting a trendline and subtracting it out prior to fitting a model, or else by including thetime index as an independent variable in a regression or ARIMA model), perhapsin conjunction with logging or deflating. Such a series is said to be trend-stationary. However, sometimes even de-trending is not sufficient to make theseries stationary, in which case it may be necessary to transform it into aseries of period-to-period and/or season-to-season differences. Ifthe mean, variance, and autocorrelations of the original series are notconstant in time, even after detrending, perhaps the statistics of the changesin the series between periods or between seasons will beconstant. Such a series is said to be difference-stationary.(Sometimes it can be hard to tell the difference between a series that istrend-stationary and one that is difference-stationary, and a so-called unitroot test may be used to get a more definitive answer. We willreturn to this topic later in the course.)
(Return to top of page.)

Thefirst difference of a time series is the series of changes from oneperiod to the next. If Y_t denotes the value of the time series Y atperiod t, then the first difference of Y at period t is equal to Y_t-Y_t-1.In Statgraphics, the first difference of Y is expressed as DIFF(Y), and inRegressIt it is Y_DIFF1. If the first difference of Y is stationary and also completelyrandom (not autocorrelated), then Y is described by a randomwalk model: each value is a random step away from the previous value. Ifthe first difference of Y is stationary but not completely random--i.e.,if its value at period t is autocorrelated with its value at earlierperiods--then a more sophisticated forecasting model such as exponentialsmoothing or ARIMA may be appropriate. (Note: if DIFF(Y) isstationary and random, this indicates that a random walk model is appropriatefor the original series Y, not that a random walk model should be fittedto DIFF(Y). Fitting a random walk model to Y is logically equivalent tofitting a mean (constant-only) model to DIFF(Y).)

Here is agraph of the first difference of AUTOSALE/CPI, the deflated auto sales series.Notice that it now looks approximately stationary (at least the mean andvariance are more-or-less constant) but it is not at all random (a strongseasonal pattern remains):

(Return to top of page.)

Thefollowing spreadsheet illustrates how the first difference is calculated forthe deflated auto sales data:

(Return to top of page.)

Go on to next topic: The logarithm transformation