You are currently viewing Stacking Classifier in Python

Stacking Classifier in Python

Loading

In the era of Machine learning, no single algorithm performs best across all problems. This is where ensemble learning comes into play. A powerful technique that combines the strengths of multiple models to improve overall performance. In this blog we will be going to talk about stacking classifier in python.

Meta learning

This algorithm learns from the output from other machine learning algorithms.

Images is taken from https://meta-world.github.io/

Stacking classifier in Python

It uses the concepts of meta learning which learns how to combine the predictions of two or more machine learning algorithm.

Stacking in Machine learning

The pseudo algorithm is given as follows

  • Base estimators (m1,m2,m3)
  • meta learner (ml1)
  • stacking classifier will take input as base estimator and meta learner
  • After that model will train on training data.
  • And makes the final prediction

Implementation of Stacking classifier in Python

The first step is to import all the necessary libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report

from catboost import CatBoostClassifier
from xgboost import XGBClassifier

from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

The second step is to load the CSV dataset using the Pandas library

company= pd.read_csv("Company_Data.csv")
company.head()

The third step is to create a new column from the existing column aka feature engineering. Here pd.cut() function is used on the sales column.

company['Sales_Category'] = pd.cut(company['Sales'], bins=[0, 5, 10, 15, 20, 25], labels=['Very Low', 'Low', 'Medium', 'High', 'Very High'])

The fourth step is to perform the label encoding (convert the categorical column into the numerical column like red for 1 , green for 2, yellow for 3) for the categorical columns.

for col_name in company.columns:
    if(company[col_name].dtype == 'object'):
        company[col_name]= company[col_name].astype('category')
        company[col_name] = company[col_name].cat.codes

The fifth step is to drop the null values from the dataset.

company=company.dropna()

The sixth step is to define the input column and the target column which is Sales category.

X = company.drop(['Sales', 'Sales_Category'], axis=1)
y = company['Sales_Category']

After that data is split into training and testing in the ratio of 80% and 20% with a random state for reproducibility.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Next step is to define the base estimators in this case it is random forest classifier, Adaboost classifier, XgBoost classifier, and CatBoost classifier.

# Define base estimators
base_estimators = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)), #bagging
    ('ada', AdaBoostClassifier(n_estimators=100, random_state=42)),    #boosting
    ('xgb', XGBClassifier()), #boosting
    ('catboost', CatBoostClassifier())  #boosting
]

After that a meta classifier is defined such as logistic regression.

# Define meta classifier
meta_estimator = LogisticRegression()

The next step is to initialize the stacking classifier and fit the training data.

# Initialize Stacking Classifier
stacking_model = StackingClassifier(estimators=base_estimators, final_estimator=meta_estimator)

# Fit the Stacking Classifier
stacking_model.fit(X_train, y_train)

The next step is to evaluate the model on the test dataset using the classification metrics like precision, recall, f1 score and accuracy.

# Evaluate the Stacking Classifier
accuracy = stacking_model.score(X_test, y_test)
print("Accuracy:", accuracy)

y_pred = stacking_model.predict(X_test)
print("Classification Report:")
print(classification_report(y_test, y_pred))
Output of Stacking in Machine learning

The results shows that model achieves an 80% performance on the test dataset, precision of 85%, recall of 6i% and 74% f1 score .

Conclusion

In this blog we have talked about the stacking which is an ensemble algorithm in a very simple manner. We have done the practical implementation.

If you like the article and would like to support me, make sure to:

This Post Has 4 Comments

  1. Jasper

    I have followed your tutorial and implemented this one of our project. This blog is really helpful.

  2. Mohan

    Great

  3. samual

    Great read! It’s fascinating how stacking classifier works

  4. jacob

    Reading about stacking classifier is great. However i have faced some problems can you helps me to resolve this out

Comments are closed.