Ensemble Learning algorithm and its types

An ensemble learning is a technique that combines the predictions from multiple machine learning algorithm to make more accurate prediction than any individual model.

Table of Contents

Bagging (Ensemble learning)

Bagging is also called as bootstrap aggregation which is a very powerful ensemble method. With the help of bagging method, you can reduce the variance for those algorithm which have high variance.

Some of bagging algorithm which have high variance are decision tree, support vector machine and K nearest neighbors.

Classification

Regression

Some of the bagging algorithms are

Random Forest
Bagged Decision Trees
Bagged K-Nearest Neighbors (Bagged KNN):
Bagged SVM (Support Vector Machine):
Pasting

Boosting (Ensemble learning)

Before learning boosting first let us understand what is weak learner and strong learner.

Weak learner

A weak learner produces a classifier that performs slightly better than random guessing. Sometimes also called as weak classifier

For a binary classification model, the accuracy of weak learner is slightly greater than 50%.

Some common weak learners are

Decision tree- A decision tree which is having a single node for splitting also known as decision stump.
K nearest neighbors- When the K values is set to 1, considered as weak learner model.
Multi layer perceptron- When it is having a single node
Naïve bayes- When this algorithm is performed on a single input variable

Strong learner

A strong learner produces a classifier that achieves a good accuracy while solving classification problem. Sometimes also called as Strong classifier.

Some common Strong learners are

Logistic regression
Support vector machine
K nearest neighbors (K>1)

Boosting

Boosting means a family of algorithm that can convert weak learners to strong learner.

Some of the most common Boosting algorithms are

Adaptive boosting (Adaboost)
Extreme gradient boosting (XGBoost)
LightGBM
CatBoost
Gradient boosting

Let us understand the boosting with the help of an example

Suppose that you are a computer science student and you have 7 subject in total. Now the test is conducted on all the 7 subjects. Each subject has its weight denoted as (w1,w2,w3……w7).

Now in the first test (M1) you got failed in three subject such as Physics, Math and CSE.

Now in the second test (M2) what you will do you will give more focus to the failed subjects (it means the weight of these subject got increase)

Now in the second test you got failed in Chemistry and Bio. Now you again increase the importance of these subject.

Now in third test M3 you have got failed in physics and now you will increase the importance of Physics.

Finally you will take the majority votes from three model in case of classification or average in case of regression and makes the final prediciton.

Stacking (Ensemble learning)

This algorithm learns from the output from other machine learning algorithms.

Images is taken from https://meta-world.github.io/

It uses the concepts of meta learning which learns how to combine the predictions of two or more machine learning algorithm.

Blending (Ensemble learning)

Blending is an ensemble machine learning technique that uses a machine learning model to learn how to best combine the predictions from multiple contributing ensemble member models.

It is applicable on the regression as well as the classification problem.

🎯 Scenario: Loan Default Prediction for a Bank

✅ Step 1: Define your base models

The bank uses:

A Logistic Regression model (good for interpreting relationships).
A Decision Tree model (captures non-linear patterns).

✅ Step 2: Define your blending model

They choose a Random Forest as the blending model. It’s powerful, robust, and can learn from a combination of different model outputs.

✅ Step 3: Train your training data on the base models

The bank trains both Logistic Regression and Decision Tree models on the same customer data (features and default status).

✅ Step 4: Get the predictions from the base models

Once trained, both base models predict whether a new customer will default. These are usually probability scores (e.g., 0.7 means 70% chance of default).

✅ Step 5: Create an empty list and append predictions

For each customer, the bank takes the predictions from the two models and stores them — say:

Logistic Regression: 0.7
Decision Tree: 0.6
→ Stored as: [0.7, 0.6]

✅ Step 6: Convert the list into a structured format

They organize all predictions from all customers into a 2D array — where each row is [LR prediction, DT prediction]. This becomes the input to the blending model.

✅ Step 7: Train your blending model

Now, the Random Forest model is trained only on these predictions as features and learns patterns in the confidence levels of the base models.

✅ Step 8: Make final predictions

The blending model now makes the final decision — whether the customer will default — by leveraging the strengths of both base models. The bank then evaluates performance using accuracy, AUC, recall, etc.

Conclusion

In this blog we have talked about the ensemble learning algorithm in a very simple manner. We have talked about the bagging, boosting, stacking and blending algorithm with the help of an example.

If you like the article and would like to support me, make sure to:

👏 Like for this article and subscribe to our newsletter
📰 View more content on my DataSpoof website
🔔 Download the code from Github
🔔 Follow Me: LinkedIn| Youtube | Instagram | Twitter

Bagging (Ensemble learning)

Classification

Regression

Boosting (Ensemble learning)

Weak learner

Strong learner

Boosting

Stacking (Ensemble learning)

Blending (Ensemble learning)

Conclusion

Please Share This Share this content

You Might Also Like

K- Nearest Neighbors from scratch in python

Evaluation Metrics for Regression Models

8 ways to handle Imbalance data in Python

Leave a Reply Cancel reply

Share this content