Time series analysis is pivotal in financial markets, since it is mostly based on the analysis of stocks’ prices and the attempt of predicting their future values.

In this article, I will dwell on some stylized facts about time series. For this purpose, I’m going to use the historical stock prices of Altaba. You can easily download the dataset on Kaggle.

Importing and preparing data

import pandas as pd
df=pd.read_csv('AABA.csv')

ts=df.iloc[:,[0,4]] #taking only the closing price
ts.set_index('date', inplace=True)
ts.head()
import matplotlib.pyplot as plt
ts.plot()
plt.xticks(rotation=45)

Intuitively the price series does not look stationary, which can be deduced from the presence of (stochastic or even deterministic) trends and the prolonged deviations from the unconditional mean. This means that the price of today can explain something about the price of tomorrow.

Ideally, we want to manage series which are stationary in order to bypass the main concern of time series: the impossibility of observing, at a fixed time t, all the possible realizations of the stock/index. Indeed, we only observe one path of realization across time of any given stock. However, if we assume stationarity, we have:

And, if we assume ergodicity:

In general, return series are more likely to be stationary (or at least trend stationary or integrated processes: both of them are easily convertible into stationary processes).

prices=ts['close']
daily_return = prices.pct_change(1)
ts['returns']=daily_return
ts.iloc[:,1].plot()
plt.xticks(rotation=45)

The last step before examining the stylized facts is converting once more our returns in log-returns.

Why do we use log-return instead of normal return? One easy yet powerful reason is that they are easy to manage from a mathematical point of view. Indeed, let’s consider the compounding normal return of a sequence of n trades we have: 

This formula is hard to manage, as the product of normally-distributed variables is not normal. Instead, the sum of normally-distributed variables is normal (only when all variables are uncorrelated), which is useful when we recall the following logarithmic identity:

Thus, compounding returns are normally distributed. Finally, this identity leads us to:

Log-returns, hence, have the property of time-additivity.

Furthermore, if we assume that prices are distributed log normally (which, in practice, may or may not be true for any given price series), then  

is conveniently normally distributed, because:

So let’s convert our data:

ts['log_return'] = np.log(1 + ts['returns'])
ts.iloc[:,2].plot()
plt.xticks(rotation=45)

We can now start analyzing some common facets of financial time series.

Returns do not follow a normal distribution

Prices tend not to be log-normal distributed, hence log-returns are not normal distributed.

mean = ts['log_return'].mean()
std = ts['log_return'].std()
ts['log_return'].hist(bins=20)

plt.axvline(x=mean, color='r', linestyle='--')
plt.axvline(x=std, color='k', linestyle='--')
plt.axvline(x=-std, color='k', linestyle='--')
 
plt.show()

Let’s check their normality with the Jarque-Bera test, where the Null Hypothesis is ‘data are normally distributed’:

from scipy import stats
x=np.asarray(ts['log_return'].dropna())
vstats.jarque_bera(x) #the second value is the p-value

Output:
(137611.29450582498, 0.0)

If the data comes from a normal distribution, the JB statistic asymptotically has a chi-squared distribution with two degrees of freedom. Indeed, this test works on Python only with >2000 observations. Since the p-value is less than any significant level of aplha, we can reject the Null and conclude that data do not follow a normal distribution.

Returns series are leptokurtic

From data it emerges that large returns, both negative and positive, are more likely to occur than in a normal distribution. Furthermore, very small returns, both positive and negative, are more likely to occur too. That means that the returns’ distribution has a greater concentration of probability in its tails (heavier tails than normal) and in the neighborhood of zero (higher hump). In numbers, this means that the kurtosis of the return distribution:

Is greater than 3. Hence, the return distribution is called “leptokurtic”.

from scipy.stats import kurtosis
kurtosis(x)

Output:
33.06489953805719

         Returns are subject to volatility clustering

Examining the AABA returns, there is evidence that high volatility tends to concentrate in single periods, and so does low volatility.

Indeed, from the graph we can see periods of high volatility (red circle), whereas we have periods of low volatility (green circle). Once provided with this evidence, it might be useful to model our time series with algorithms which take into account that phenomenon, also known as heteroskedasticity. Those models are ARCH and GARCH, which differs from the standard Autoregressive moving average (ARMA) models since the homoskedasticity is not assumed anymore (you can read more about ARMA models here).

      Large negative returns are often followed by periods of high volatility

Looking at the graph above, we can see that each period of high volatility begun with a fall in returns. The reason behind that is that economic agents overreact to negative returns, and bad expectations and uncertainty spread among financial markets, leading to a situation of high volatility.

      Intraday returns are subject to typical trading session effects

Empirical data show that in opening and closing time, the volatility of returns is more pronounced. The reason is that during those moments, the amount of trades is higher than in other moments of the day. As we have not a sufficiently high frequency in our sample data, we cannot visualize it on the graph.

Conclusions

Keeping in mind those recurrent elements in financial time series, they can be incorporated while building predictive models on stock prices. To give an example, in recent years Behavioral funds, which are a category of mutual funds that use behavioral finance as a basis for their investment strategy, became very popular since they incorporate agents’ behavioral features (like the overreaction to negative returns).

Published by valentinaalto

I'm a 22-years-old student based in Milan, passionate about everything related to Statistics, Data Science and Machine Learning. I'm eager to learn new concepts and techniques as well as share them with whoever is interested in the topic.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: