You are currently viewing Naive Bayes classifier and its implementation in Python

Naive Bayes classifier and its implementation in Python

Suppose you are working in a retail store and you have to classify between beauty products and household items. Suppose that you are working as an eye doctor in which you have to classify between the diabetic retina and normal retina. All the examples come under the category of classifications.

“Science is the systematic classification of experience.”

Naive Bayes is a subset of Bayesian decision theory. It is also called naive because the formulation makes some naive assumptions.

Naive Bayes is a classification algorithm which is used to solve classification algorithm. Here naive stands for every feature in the dataset is independent of each other. It works on the belief that the value of one feature is independent of others.

Naive Bayes is used in various applications like classifying tweets as positive, negative, or neutral. Some other examples like medical image classification, document classification, spam filtering, and many more.

How classification pipeline works

The first step is to solve a classification problem is to understand the objective of the data and identify what are the features and the labels. Features are the characteristics on which the results of output depends.

For example– If you want to predict whether the patient has diabetes or not we have to check some certain factors like body mass index, age, blood pressure, if the patient is women then no of times the women pregnant, skin thickness, and many more others.

After identifying the objectives now we have to split the dataset into two parts which are training and testing. Now we train our classifier on train data and save the results. And then perform testing on test data. After that, we evaluate the performance based on various parameters such as accuracy, error, precision, and recall.

Naive Bayes algorithm and how it’s work

Naive Bayes Classifier is a family of simple probabilistic classifier dependent on applying Bayes hypothesis with strong(naive) independence assumptions between the features.

P(A|B) represents the posterior probability of ‘A’ being TRUE given that ‘B’ is TRUE.

P(B|A) represents the likelihood of the probability of ‘B’ being TRUE given that ‘A’ is TRUE.

P(A) is the prior probability means the probability of ‘A’ being TRUE.

P(B) is called predictor prior probability means the probability of ‘B’ is TRUE. 

Let us understand the Naive Bayes classifier with the help of an example. Given an example of fruits and their characteristics. Based on the characteristics identify which types of fruits are that.

Algorithms

1- Calculate the prior probability for the given class labels.

2- Find likelihood probability and then calculate the posterior probability.

3- After calculating all the probabilities put them in the Bayes formula

4- Now do it for all other classes. See which classes have a higher probability and then assign the class to that input.

Applications of Naive Bayes Classifier

Customer churn is one of the major used application of Naive Bayes. Many service provider companies use this algorithm to find out which customers are going to cancel their subscription plan. So in this way, they will give them a better offer or discount so the customer can’t churn out. 

To predict a person has heart disease or not we can easily do this by naive Bayes. 

There are some following steps involved in that process.

  1. We take the patient medical history
  2. Then use the Naive Bayes algorithm to calculate the probability of each attribute
  3. After that, we calculate if a person has heart disease (yes) or if he/she has not had heart disease (no) probability.
  4. Then we tell about the risk to the patient

So by using this report, the patient family member can take proper care of that person.

More than 500 million tweets are generated every single day. Suppose If we want to know what is the sentiment of people regarding who will win the US presidential election. So for that purpose, we can perform sentiment analysis and make a forecast. The Naive Bayes algorithm is a pretty good fit for that.

3 types of Naive Bayes algorithm

Gaussian distribution

It is used is those types of datasets where features are in continuous form and when we model those probabilities of that given features we use Gaussian distribution. It is also called as Normal distribution.

The normal distribution can be represented by a bell-shaped curve and the mathematical formula is:

Some examples where we can use Gaussian Naive Bayes are

1- Iris dataset

2- Text classification problems

Multinomial Naive Bayes

It is used in those types of datasets where we want to calculate discrete count or no of occurrences or frequency so that we use multinomial Naive Bayes.

For example- Suppose we have given a text document and we want to calculate what is the frequency of a certain word that is occurring in the document in that case we can use the multinomial Naive Bayes algorithm. It follows the property of multinomial distribution.

Some examples are

1- Words count for text classification dataset

2- Document classification

3- Topic Modelling

Let us understand mathematically with the help an example

In a survey of a population, we have taken their heights

Now we have taken 5 random people with their height

1: Less than 5ft

4: In between 5 and 6ft

1: Greater than 6 ft

What is the probability that if we picked a random man from the given sample has a height greater than 6 ft?

Answer- So to solve this question we use the Multinomial distribution formula

This is the final answer.

Bernoulli Naive Bayes

It is used in those types of datasets where features are in the form of binary value( 0 or 1) In those cases, we use Bernoulli Naive Bayes. 

This algorithm follows Bernoulli distribution means a random variable that can only take two possible values, usually 0 and 1.

Some examples are

  1. When we toss a coin, the outcome is either heads or tails.
  2. In an exam, a student can either pass an exam or fail in it
  3. In the customer churn problem, a customer can either cancel the subscription or stick with it.

Let us understand mathematically with the help an example

Suppose you started a business, you can either succeed in business or get failed in running a business. 

Let say the probability of getting success in business is p.

And the probability of getting failure is Q and q is nothing but 1-p.

We represent 1 as success and 0 as a failure.

Lets us introduce a random variable X.

X= 1 [success]

X= 0 [failure]

Now we assume a statement that random variable X has Bernoulli distribution

If we put x=1 in the above formula we get P(X=1) = p which is the probability of success, similar if we put x=0 we get P(X=0) = (1-p) is the probability of failure.

Naive Bayes Classifier example

The example that we are going to take is the characteristics of the fruit classifier.

In the given question we have given 3 classes of fruits which are mango, lemon, and orange. And there are 3 characteristics of fruit which is (Yellow, Sweet, Long). 

Now the question is to predict that if a fruit has the following properties then which type of fruit it is if it has properties like ( Yellow, Sweet, Long ).

Let us assume all the properties is (Yellow, Sweet, Long) is X. Now find out which fruit is this:

Now in mathematical terms, we have to find 

Now let us apply Naive Bayes to solve the above problem:

Now calculate P( Fruit | Mango ) = 0.5 * 0.69 * 0 = 0

Similarly, we have to calculate the other two classes also and we get

P( Fruit | Lemon ) = 1 * 0.75 * 0.87 = 0.65

P( Fruit | others ) = 0.33 * 0.66 * 0.33 = 0.072

After that we look for that class which has a higher probability in our case it is Lemon so we assign Lemon class to the given input.

Solving real-world problems using Naive Bayes classifier

The dataset that we are going to use is the ,,Bank Marketing dataset. There are a total of 17 columns. Some of them are age, job, marital status, education, balance, did he have his own house or not, and 12 others.

We have to use machine learning algorithms, especially naive Bayes to find out which customers are going to continue our subscription service and which one is going to cancel our subscription service. So, in this way, we can give them a better offer to continue our subscription services.

So let’s start the coding.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

data= pd.read_csv('bank_marketing.csv')

data.head()

data.columns

data.shape

data.dtypes

data['job'].value_counts()

data['job'].value_counts().plot(kind='barh')

Line 1-5: The first step is to load all the required libraries 

Line 7: The second step to load the bank marketing dataset using the pandas read_csv function.

Line 9-15: In this step, we check the first five rows of the dataset, the name of columns in the dataset, the shape of the dataset, type of the column in the dataset.

Line 17-19: Now we use the value_counts() function to get a series containing a count of unique values and then plot it using the Matplotlib library.

Now after that, we do value_counts for all the other columns in the dataset.

data.isnull().sum() # To check for missing values in the dataset


# Encode all the categorical columns
for col_name in data.columns:
    if(data[col_name].dtype == 'object'):
        data[col_name]= data[col_name].astype('category')
        data[col_name] = data[col_name].cat.codes
        
# Define features and target varibles in the dataste
features = data.loc[:, data.columns != 'y']
target= data['y'] 


#Split the dataset into training an test set by using train_test_split function 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test= train_test_split(features,target,test_size=0.2, random_state=42)

The next step is to check for missing values in the dataset using isnull().sum() . 

In our dataset, we have zero missing variables.

Now we encode all the categorical variables into (0,1,2….9) based on the number of unique values In the column. Next, we define our input columns and the target columns on which we have to make the prediction.

And then we split the dataset into training and testing dataset. 20% of the dataset is taken into testing 80% into training and set random_state=42. 

Model Definition

from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train).predict(X_test)

print("Number of mislabeled points out of a total %d points : %d"
...       % (X_test.shape[0], (y_test != y_pred).sum()))

Instantiate the Naive Bayes classifier, fit the dataset on the classifier and make the prediction on the test set. Let’s create an instance of the class GaussianNB, which will represent the Naive Bayes model.

This statement creates the variable model as an instance of GaussianNB. You can provide several optional parameters to GaussianNB:

  • priors: This parameter will calculate the prior probability of the class. The default parameter is None.
  • var_smoothing: Portion of the most important variance of all features that are added to variances for calculation stability. The default value is 1e-09.

This example uses the default values of all parameters.

Now we call the fit() method and predict the test set. After that, we print the no of data points that are incorrectly classified.

Output: Number of mislabeled points out of a total of 905 points: 152.

#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics

# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

Now we use the sklearn metrics module to import accuracy (i.e how often our classifier is correct). Result: Accuracy: 0.8320441988950277

The Pros And Cons of using the Naive Bayes algorithm are

Pros

  • It is very easy to implement and real obtain very good results on large datasets
  • Naive Bayes has computational efficiency.
  • It works well on a large size dataset.
  • It can also be used to predict multiple classes.
  • It can also use in the NLP text classification task.
  • Low storage requirements
  • Another advantage of using Naive Bayes over logistic regression is that it can handle missing values in the data.

Cons

  • If the dataset size is small, it will decrease the precision value.
  • When the goal is to predicting probability instead of classification, then the method provides very biased results.

Conclusion

In this tutorial, we have the following points on the Naive Bayes algorithm:

1- What is the Naive Bayes algorithm and how it works

2- Applications of Naive Bayes Classifier

3- Types of Naive Bayes algorithm

4- Naive Bayes Classifier example

5- Next we implement Naive Bayes classifier on a real-world problem using Scikit-learn

6- Pros And Cons of using the Naive Bayes algorithm are

 If you want to learn about machine learning, deep learning, natural language processing, computer vision. You can subscribe to my blog for getting notified when I release a new blog. 

If you liked this tutorial please like it, share it, and if you have any problem regarding the implementation or any topic feel free to leave a comment. 

You can follow us on our Instagram account for daily updates regarding AI updates and the latest research on Artificial Intelligence.