What is Support Vector Machine- Learn to implement SVM in Python
In this tutorial we learn about Support Vector Machine, types of SVM, and its implementation in python from scratch.
Some other blog post that you may want to read is
In this machine learning series now we move onto the fifth algorithm of machine learning which is a support vector machine. The support vector machine solves the problem of classification as well as regression. It belongs to the category of the deterministic algorithm (given a particular input, will always produce the same output).
Let us understand the algorithm with the help of an example.
A few days ago, My niece was buying fruits from a fruitseller. My niece was very confused that which fruit should he buy because two fruits were looking similar fro him. After that, he asks me uncle which one is apple they all looking very similar to me. He is confused between the apple and green pears because of their shapes and tastes. The problem of classifying things can be solved by SVM.
SVM is mainly used to solve the classification problem. As from the above example, we clearly understand what SVM is. Let us represent the above figure in coordinate representations.
Let us understand the technical terms behind the SVM.
1- The red lines that separate the two-class is called as a hyperplane.
2- The nearest point( specifically orange points) to the lines are called support vectors.
Types of SVM
There are basically two types of SVM are there.
1- Linear SVM- It creates a line or a hyperplane which separates the data into classes. Here the dataset is linearly separable.
2- Non-linear SVM- It is used to classifying a non-linearly separable dataset.
What is linear SVM and how does it works
As we all know the linear SVM goal is used to separate the dataset into two classes by creating a hyperplane. Your goal is to create a line that classifies the data into two groups or classes.
So the workflow is as follows:
First, we have to draw all the possible lines which separate the two classes and then we have to pick those lines which have the highest margin.
So the question arises why the above lines are the best split why not this line.
This is because we choose only that line which has the widest margin that separates the two groups or classes.
In other words, SVM models try to maximize the distance between the two classes by creating a well-defined decision boundary.
What is Non-linear SVM and how to make the data separable
Non-linear SVM is also called Kernel SVM and it is used to map the dataset into the higher dimensional space.
The below graph reveals the non-linear separable dataset
So the question arises on how to make this thing separable.
There are two ways to solve this problem
1- Map this dataset into some higher dimensional feature space where the data points become separable.
2- And the second way is by adding the polynomial features and similarity features.
So when we add polynomial features to the dataset, and in some cases, it creates the linear separable dataset.
In the below figure the dataset is not separable but if we add x2= (x1 )^2, the data becomes linearly separable.
Next in this SVM Tutorial, we will see the implementation of SVM in Python.
Implementation of SVM in python from scratch
Steps that are involved in writing SVM code are
Step 1- We import all the required libraries
Step 2- Define our data that is the input data which is in the form of (X, Y, bias term). After that, we define our output labels which are in the form of -1 or 1.
Step 3- The third step is to plot these data on a 2d graph
Step 4- In this, we define a function name svm_plot to learn the separating hyperplane between both classes.
Step 5- After that, we call our svm_plot function and store the result in the w variable.
w = svm_plot(X,y)
Step 6- After that we plot the final result
Pros and Cons of using Support Vector Machine
SVM works very well with a clear margin of separation
It is effective in high dimensional spaces.
It is effective in cases where the number of dimensions is bigger than the number of samples.
It uses a subset of training points within the decision function (called support vectors), so it's also memory efficient
It doesn't perform well when we have a large data set because the specified training time is higher.
It also doesn't perform alright, when the data set has more noise i.e. target classes are overlapped
Speed and size requirements while training as well as in testing.
Real-life Applications of SVM are
1- Classification of Images: Like classifying between alcoholic beverages and non-alcoholic beverages.
2- Face detection: Identifying faces from the images by using a kernel function.
3- Documents Classification: Whether it is a sports news or tech news.
4- Detecting whether the email is spam or not.
5- Classification between Human handwriting and Computer alphabets.
6- Time series forecasting: Forecasting the price of the electricity bill in next month.
7- Customer churn prediction: whether the customer is going to cancel our membership plans in the future or not.
Wrap up the Session
In this SVM tutorial, we learn about what is SVM and how does the SVM works. After that, we learned about the types of SVM and then we implement the SVM algorithm using python from scratch. Then we talk about some of the benefits and drawbacks of using SVM. And at last, we learned about the application of SVM in real life.
If you like this SVM tutorial please like it and share it with your friends and colleagues And Subscribe to our newsletter to keep up to date.