• Abhishek Singh

Time series analysis in Python

In this tutorial, you will learn about how to perform time series analysis in python.


First, we have to understand what exactly is time series analysis.


According to Wikipedia Time series analysis is a statistical technique that is used to deal with time-series data i.e data is in the series of a time interval or periods.



For example- The price of stock changes every day with respect to time. Some of the factors that affect the stock prices are company mergers and acquisitions, news related to company products, any government rules or decision directly impacts the stock prices.


To perform time series analysis there are mainly two factors that we have to keep in mind.


- The first frame the problem, in other words, understand the problem in great depth.


For example, a few months back the price of Tesla stock is at a peak. The question arises why did this happen is there any particular reason for that. The reason behind this hike in stock prices of Tesla is because of the giga-factory that is open in China.



As someone said that


"It is about knowing what you have... The challenge is that most organizations don't know what they have".-Rosario

- Second, we have to understand what type of problem is this. In time series analysis there are particular five types of questions that arise in mind.


1- Descriptive: Summarise characteristics set of data


  • Which states have the highest number of cases of corona

  • What is the average price of medical kits in India for corona


2- Exploratory: Examine or analyze the data to find any pattern, trends or relationship between variables.


  • Is there any pattern for moving the stock prices

  • Is there any relationship between the salary and the experiences


3- Inferential: It means making a prediction based on the sample data.


  • how is the price of an apple is correlated with the volume of apple?


4- Predictive: It is used to determine the impact of one factor based on another factor for making a prediction.


  • What is the price of Tesla stock likely to be the next day?

  • What are the sales of Wallmart next month?


5-Causal: It means whether by changing one factor will change another factor in the data. It is used for establishing a link between two variables.


  • What is the effect on sales that will happen if Nike chooses a small Instagram influencer instead of popular celebrities?



Some other blog post that you may want to read is





Step by Step approach for solving a time series problem

Step 1- The first step is to understand the problem. What is it asking and how can we achieve the output.


Step 2- The primary and maybe the foremost important step is to gather the info. Some of the platforms where we can get the data are Kaggle, google dataset search, data.gov.in, Quandl, UCI repository or you can manually scrap the data from Facebook, Twitter, Reddit, and any other social platform. 




Step 3- The next step is to clean the data. More than 98% of the world data is in messy form i.e data may be in the form of audio, text, images, and video. As some data scientists said that more than 70% of our time is spent on data preprocessing.


Step 4- Now it's time to explore the data


"I don't know, what I don't know"

Now we have to understand it's data structure i.e knowing about shapes, data types, its summary, plots, representation and many more.


Step 5- Now we are ready to build our time series forecasting model to forecast the prices.


Some of the basic models are


1- Mean constant model: It is also called a mean model in which we are taking the mean of the output variable. Its plot looks like this.


Plot of Mean constant model

2- Linear trend model: From the name, it is clear that we are using linear regression to plot our linear trend model. Let us understand what is linear regression, linear regression is an approach to model the relationship between target/output and the input variable. In this case, our price column is dependant upon the time. Its plot looks something like this.



Plot of Linear trend model

3- Random walk model: In the model, we forecast the price based upon the change in the period (i.e in stocks we calculate like this today stock price- yesterday stock price). It is also called as one-period ahead forecasting.



Plot of Random walk model


In the earlier time-series model, we often assume that our time series data is stationary it means if it follows some pattern over a long time, then there is a very high probability that it will follow the same in the future. In theories it is possible but in reality, it is not. So to avoid this problem we convert the non-stationary data into stationary.


A graph is said to stationary if it's mean, variance and covariance should not be a function of time.


So the next step is to check for stationarity. There are some following ways to check that our dataset is stationary or not.


1- Visual plot test: In this method, we check the graph visually to find whether the data is stationary or not.


2- Summary of stats: In this method, we check the mean and variances of each column if they are not in the same range we can say that data is stationarity.


3- Augmented Dickey-Fuller Test: In this test, we determine the presence of unit root in the series. There are two hypotheses is there a null hypothesis and alternate hypothesis.


The test result consists of test statistics and critical value. If test statistics are less than the critical value we reject the null hypothesis and says that series is stationary.


4- Kwiatkowski–Phillips–Schmidt–Shin (KPSS) Test: It is the opposite of the Augmented Dickey-Fuller Test. If test statistics are greater than the critical value we reject the null hypothesis and says that series is stationary.


Now we know how to check for stationarity. The next step is to make the dataset stationary.


There is some following method are there:


1- Differencing: In this technique, we take the differences of the observation at a particular instant with that at the previous instant.


2- Log Transformation: It is used to stabilize the non-consistence values by taking big of them.


3- Decompose our time-series data: In this, we find trends and seasonality and remove them from the model.


Now after all these things from data exploration to data preprocessing finally we reach the last step that is to make the forecast.


There are some following models that we can use to make the time series forecasting.


1- ARIMA model

2- SARIMA model

3- FB-Prophet model


In this blog, we learn about how to make a forecast with the ARIMA model. In the upcoming blog, we learn about SARIMA and FB-Prophet models.


ARIMA Model

So, let us understand what is the ARIMA model. As from the name, it implies that it is an integration of two models that is the Automative regressive and the Moving average. This integration means the reverse of differencing. Let us understand the term one by one.


Automative regressive(p): In this model, we use the previous time steps as the input to the regression equation to forecast the value at the next time step. It is denoted by symbol p.


For example if p value is 6 then the predictors of y(t)= y(t-1)....y(t-6).


Moving average(q): It means we compute the average price of an item over a specific no of periods. Generally, we use a moving average as a technical indicator in the stock market. It is denoted by symbol q


For example if q is 6, the predictors for y(t) will be e(t-1)….e(t-6) where e(i) is the difference between the moving average at an ith instant and actual value.


Number of Differences(d): It denotes the no of nonseasonal differences.


Now the question arises is that how we determine the value of p and q. The answer to the question is very simple we have to make two plots to determine these numbers.


The two plots are:


  • Autocorrelation Function (ACF): It is a proportion of correlation between the time step with a lagged version of itself.

  • Partial Autocorrelation Function (PACF): It is a proportion of correlation between the time step with a lagged version of itself but after wiping out the varieties previously explained by the intervening comparisons.


The results look like this





Conclusion

In this blog post, we learn about all the technical concepts and methods to perform time series analysis and forecasting.


In the upcoming blog post, we code all the things that we talk about by taking a real-world example.


So if you like this blog post, please like it and subscribe to our data spoof community to get real-time updates.



2,988 views