Lasso Regression in R

Regularization in terms of machine learning is a very important factor that is used for avoiding the overfitting of data which occurs when the training data and the testing data vary too much.

In this blog, we will learn about a regularization technique named Lasso regression, its pros and cons, implementation, and use case of this algorithm.

Table of Contents

What is Lasso regression?

The lasso regression also stands for the least absolute shrinkage and selection operator. As per the full form of lasso regression, we can understand a bit of usage of this algorithm and how this algorithm can be applied, and in which cases depending upon the circumstances of the dataset. The lasso regression is a form of linear regression or, more conveniently, a modification of the linear regression. The lasso regression algorithm can be used as a classification algorithm, and it uses the process of shrinkage to solve the problems by having models in a simple form or a model having a fewer number of parameters. The shrinkage of the data set means that the values in the dataset will be shrunk towards a value similar to that of a mean. It is a regularized regression algorithm, and the lasso regression works by performing the L1 regularization, thus adding the penalty equivalent to that of the absolute value of the magnitude of coefficients.

In the Lasso regression algorithm, the loss function is to be modified for minimizing the complexity of the model and therefore limits the sum of absolute values of the coefficients of that model and is also called the l1-norm. The lasso regression is similar to that of the ridge regression in several ways, including the fact that the ridge regression and the lasso regression are the two techniques to apply regularization. However, these two regression algorithms differ in the manner in which they assign the penalty to the coefficients. The lasso regression algorithm is most efficient for working with models which show high levels of multicollinearity and can also be used in cases where we want to automate certain parts of model selection, such as the variable selection or the elimination of the parameters.

The mathematical formulation of Lasso regression

In this algorithm, the regularization can be implemented by simply adding a term, “penalty” to the derived best fit from the training dataset, which will be beneficial for achieving a limited variance through the testing data and will help in the restriction of the influence for the predictor variables over that of the output variable thereby compressing the coefficients of these. The process that is followed while dealing with the regularization is that we need the same number of features; however, we need to reduce the coefficient’s magnitude. This can be done by using different types of regression techniques that use regularization if we have to solve this problem.

Let us now see how we are going to represent the Lasso regression in mathematical terms. The mathematical equation of the Lasso regression can be implied through this equation:

Cost = Residual sum of squares (RSS) + λ * ( Sum of the squares of the weights ),

where this can be represented in the form of an equation as:

The cost function of the lasso regression

In this equation,

λ denotes the shrinkage performed

If λ = 0 denotes that all of the features are being considered in the equation, this is equivalent to that of the linear regression in which only the residual sum of squares is taken for building a predictive model.

And if λ = ∞ will denote that there is no feature to be considered, which in simple terms means that as λ closes to ∞, the features will be eliminated.

Finally, the variance of the model will increase with the decrease in the value of λ, and the bias will increase will the increase in the value of λ.

Difference between lasso regression and ridge regression

Earlier in this blog, we discussed that the lasso regression algorithm works by adding the penalty, and the ridge regression algorithm also does the same. So, what is the difference between the working of these two regression algorithms?

The answer is simple, if a regression model uses the L1 regularization technique to add the penalty, then it is the Lasso regression, and if the model uses the L2 regularization technique to perform the tasks of adding the penalty, then that algorithm is the Ridge regression algorithm. The L1 regularization technique adds the penalty equivalent to that of the absolute value of the coefficient’s magnitude. The L1 regularization results in the sparse models with a few numbers of coefficients where some of them among these few may become zero and, as a result, will be eliminated from the model. In the case where the values are closer to zero, it will result in larger penalties.

Implementation of Lasso regression

To apply the Lasso regression algorithm in accordance with the machine learning problem, we can do it with the help of some simple steps:

The first step to applying the Lasso regression in simple terms is to import all the required libraries and then import the dataset on which we are going to work.
The second step is to analyze the dataset, and then we divide the dataset into the training and the testing dataset, where the training dataset will be used for training the model, and the testing of the performance of the models will be done through the test dataset. The scaling of the training and testing dataset will also be performed here to check the fitting of the dataset.
The third step is to perform the exploratory data analysis that will help us in studying the background of the dataset and will help us in calculating the mean, median, quartiles, etc.
Finally, we can apply the Lasso regression algorithm using the commands and directions being implemented in the below implementation. After that, we can use the various testing parameters such as the RMSE values and the R square values, which will help us in knowing the efficiency of the algorithm. We can also calculate the mean validation score to know the efficiency of the model.

Uses of the Lasso regression algorithm

The lasso regression algorithm is a modified version of the linear regression and is helpful in solving various problems which become difficult to be solved by the linear regression algorithm. The main usage of the lasso regression algorithm is in the field where we need to apply the regularization to the dataset. The aim of lasso regression is to calculate a set or a subset of predictors, which as result minimizes the prediction error for a variable of quantitative response. Therefore, in cases where we have a small number of predictors and the other predictors range close to zero, we can efficiently use the lasso regression, and it tends to work well in these cases. With the help of lasso regularization, we can regularize or shrink the coefficients of the datasets and push them towards zero, which will make it work better when performed on the new datasets. With the help of regularization, we can use complex models, which will also help us in avoiding the cases of overfitting. We can also use the lasso regression model as a classification model by using it for the generalized linear regression.

Feature selection in the Lasso regression model

Lasso regression is very efficient for the purposes of using feature selection methods and therefore is an important machine learning algorithm when we need to use the regularization methods. In this lasso regression model, we need to calculate the cost function, which can be done with the help of the equation provided above. In the above mathematical equation, we calculate the cost function, and then the regression line is obtained through this.

Implementaion of Lasso regression

library(glmnet)


lasso<- glmnet(x = features, y = target, alpha = 1)
plot(lasso, xvar = "lambda", label = TRUE)


cv_lasso <- cv.glmnet(x = features, y = target, alpha = 1)
plot(cv_lasso)

Advantages and disadvantages of using the Lasso regression model

Advantages:

The Lasso regression can be efficiently used for regularization purposes, and therefore it can avoid overfitting and can be applied even if the number of features is greater than that of the number of data.
The Lasso regression model can perform feature selection and is faster when compared in terms of fitting and inference.
The results obtained through the Lasso regression model are much better than the other methods of automatic variable selection like that of the forward, backward, and stepwise variable selection methods.

Disadvantages:

One of the main problems that occur while using the lasso regularization model is that the Lasso model is automatic and therefore avoids the case where the data analyst does need to think, and hence this model works without the guidance of the analyst.
Being automatic is why the lasso regularization model produces unnecessary models and does not follow any hierarchy principles.
The lasso regression model ignores the non-significant variable, which may not be of any importance.
The models which are being selected by the lasso model may or may not be stable.
In case when we are using the lasso regression model and we have a highly correlated feature, then the lasso model may select a part of them or one of them, and this result depends on the implementation of the model.