A time series is a series of data points indexed in time order, normally with equally spaced points in time. Examples of time series are stocks’ prices, monthly returns, company’s sales and so forth. Time series can be seen as data with a target variable (price, returns, amount of sales…) and one feature only: time. Indeed, the idea of Time Series is that we can extrapolate interesting information by analyzing the behavior of a given variable throughout time. Then, if relevant findings emerge, it will be ideally possible to predict the trend of our value in the future.

However, before analyzing a time series as anticipated above, some assumptions need to be made, since our object of interest is stochastic. To understand the aleatory component of time series, let’s consider the following definitions.

Imagine we have a stochastic process, that is, an infinite sequence of random variables which are defined over Ω (the event space) and return a number within R.

Each random variable is indexed at a fixed point in time and has a probability distribution. Nevertheless, at each point in time t only one realization of our stochastic process in t will be shown: we have no information about the other possible realizations. As a result, we cannot compute the cross-sectional (for each t) sample mean of our process, that is:

The impossibility of computing this quantity relies on the impossibility of observing the realization of X for more than one ω at the time. We can visualize this restriction with the following example:

At each moment t in time, we have a probability distribution of our random variable Xt (the blue dots), but we can observe only one realization (the orange dot). The only way to compute the cross sectional sample mean would be drawing, for the same phenomenon, multiple time series, but this is obviously impossible since involves the ability of going back in time.

As an alternative, we might change our framework and compute the sample average of our process across time, that means:

We want this quantity to converge towards the true mean of the time series, and we can achieve that under two conditions.

## Stationarity

We want our time series average to be time invariant, that means:

Plus, the autocovariance of two or more elements of the process has to depend only on the relative distance in time among them. In formula:

We can also strengthen this condition, introducing the so called strong stationarity, which occurs when the joint distribution of two or more elements depends only on their relative distance. Note that strong stationarity implies weak stationarity, provided that the covariance exists.

Let’s size the difference between a non-stationary process and a stationary one (data available here):

```import pandas as pd
ts = pd.read_csv('sales-of-shampoo-over-a-three-ye.csv')
ts.set_index('Month', inplace=True)
ts.head()
ts.plot()
```

Here, if we compute the sample average across time, we will obtain something which overestimates the true values in the first periods and underestimate them in the last periods:

```import matplotlib.pyplot as plt
m = ts.mean()
ts.plot()
plt.axhline(y=m, color='red')
```

On the other hand, if we take the first difference of this series, we obtain something which looks way more stationary:

```diff=ts.diff()
diff.plot()
```

And, computing the mean, we can see how this time it is a more accurate approximation of our series:

```m = diff.mean()
diff.plot()
plt.axhline(y=m, color='red')
```

Now, is stationarity sufficient for our sample mean to be a good approximation of the real mean? The answer is no, since we need a further condition to reach this goal.

## Ergodicity

With ergodicity, we are asking our process to move around the average, taking values all over its support. Otherwise, the risk of having a biased mean, tending towards the ‘area’ where the process is stuck, is high.

To give you the intuition, let’s consider an example where a process is stationary but not ergodic. Imagine a stochastic process where each X is a Bernoulli binari R.V., hence it takes values 1 or 0. Plus, our process is such that it takes a value (0 or 1) at the initial moment and stays fixed at this value for ever. Hence, our time series will look like a straight line at either 1 or 0. We know that the true average of our process is something between 0 and 1, however, the average computed across time will return always 1 or 0.

The idea behind ergodicity is that, while collecting more and more observations, we keep learning something new about the process. In other words, if I pick two random variables of the process which are sufficiently ‘far apart’, their distributions should be independent among each others. If you think about the example above, you can now see how that process is clearly not ergodic: no matter how many observations we collect, there is no further information we are gathering, since everything is known since the beginning (from the initial value taken by our process).

The formal definition of ergodicity is the following:

For any two bounded function f:Rk->R, g:Rl->R.

Combining stationarity with ergodicity, the following relation holds:

Stationarity and Ergodicity are the basic assumptions to perform time series analysis, and it is important to have in mind how to achieve them and how to test whether they hold.

If you are interested in time series analysis, I’m attaching the links of some previous articles I wrote about the topic, both in R and Python.

Hope you enjoyed the reading!

Further readings:

## Published by valentinaalto

I'm a 22-years-old student based in Milan, passionate about everything related to Statistics, Data Science and Machine Learning. I'm eager to learn new concepts and techniques as well as share them with whoever is interested in the topic.