In the previous blog post we learn about regression in this blog post we learn about classification mainly logistic regression from scratch and it’s an implementation in python.
Why we need logistic regression?
We all know that linear regression is used to predict numerical values but what happen if we have to predict categorical values like yes/no or spam/not spam? Linear regression isn’t suitable in such cases because it tries to fit a straight line, which can lead to invalid predictions (e.g., values less than 0 or greater than 1).

When a dataset contains a large number of outliers, linear regression can become unreliable, causing the best-fit line to deviate significantly. As a result, the predicted values may often fall outside the valid probability range (i.e., less than 0 or greater than 1).
To address this limitation, logistic regression is used it maps predictions to a bounded range between 0 and 1 using the sigmoid function, making it suitable for classification problems.
Logistic Regression Overview
The name logistic regression sounds inappropriate. This statistical method is not used to model regression problems but we use to solve a classification problem.
Logistic regression is known as the function used at the core of the tactic, the logistic function. In linear regression, the result (dependent variable) is continuous. It can have any one of an infinite number of possible values.
In logistic regression, the result (dependent variable) has only a limited number of possible values. Logistic Regression is employed when the response variable is categorical in nature.
The logistic function also called the sigmoid function is an S-shaped curve that will take any real-valued number and map it into a worth between 0 and 1, but never exactly at those limits.

Logistic regression uses an equation as the representation, similar to linear regression. The central assumption of Logistic Regression is that your input space is often separated into two nice ‘regions’, one for every class, by a linear(read: straight) boundary. Your data must be linearly separable in n dimensions.

In simple linear regression, the equation of best-fit line is
y= βo + β1X
In the logistic, the output is in the form of probabilities(P)
P= βo + β1X
In the linear regression the value of y is greater than 1 and maybe less than 0. But the probabilities lie in the range of (0,1). To deal with this issue we take log of odds
log(P)= βo + β1X
The term logistic refers to the log odds probability that is modelled the term odds.
And the term odd is defined as the ratio of the the probability that an event occurs to the probability that it doesn’t occur.

Then the complete equation

To compute the probability we take the exponent on both sides


Let (“βo + β1X)” is set to x and the P is σ(x)

- Where σ(x) is a sigmoid function
- If x is very large no then σ(x) = 1
- If x is very small no then σ(x) = 0
There are many ways to accomplish this thing, in the logistic regression we use sigmoid function to accomplish this.
The goal of the learning is to estimate parameter vector B^ in order to make predictions.
As in the linear regression we use least square method to estimate parameters in the same way, logistic regression uses Maximum Likelihood method for parameter estimation.
So the question arises is how does this maximum likelihood works?
Step 1– Consider n samples with labels either 0 or 1.
Step 2– For the sample labelled “1”: Estimate Beta hat (B^) such that p(X) close to 1.
Step 3- For sample labelled “0”: Estimate B^ such that 1- p(X) is as close to possible as 1.
Mathematically it is represented as

where i is with ith sample.
On combining these equations we want to find Beta parameters such that product of both these equation is maximum over all elements of the dataset.

This function that we need to optimize is called as Likelihood function.
After that we can take the log likelihood and convert them into a summation.

Now. we substitute p(x) values then we now group the coefficients of yi.

Now after simplifying the equation we get

Let us recall that our goal is to find the Beta which maximizes the function.
The above equations we get are also called transcendental Equation which cannot be computed exactly that’s why we use numerical methods for approximation
which is Newton raphson method of approximation to find Beta coffecients.
Now let us move on to code.
Logistic Regression from scratch in Python
There are basically 5 steps to solve logistic regression in python.
Step 1- Import all the required libraries
import numpy as np
import matplotlib.pyplot as plt
import numpy.matlib # return matrices
from sklearn.metrics import accuracy_score
Set random seed for reproducibility
np.random.seed(42)
Step 2- Create custom dataset aka training data
# Cluster centers for training data
c1 = [2, 3] # Cluster center for class 1
c2 = [10, 11] # Cluster center for class 2
no = 50 # Number of samples in each class
class1 = np.matlib.repmat(c1, no, 1) + np.random.randn(no, len(c1))
class2 = np.matlib.repmat(c2, no, 1) + np.random.randn(no, len(c2))
D = np.append(class1, class2, axis=0)
Data = np.concatenate((D, np.ones((2 * no, 1))), axis=1)
# Labels for training data
c1_label = np.ones((no, 1))
c2_label = -1 * np.ones((no, 1))
label = np.concatenate((c1_label, c2_label), axis=0)
# Transpose Data and labels for easier handling
Data = Data.T
y = label.T
Step 3- Create validation data
# Cluster centers for validation data
v1 = [5, 4] # Cluster center for validation class 1
v2 = [7, 12] # Cluster center for validation class 2
v_no = 30 # Number of samples in each class for validation
v_class1 = np.matlib.repmat(v1, v_no, 1) + np.random.randn(v_no, len(v1))
v_class2 = np.matlib.repmat(v2, v_no, 1) + np.random.randn(v_no, len(v2))
v_D = np.append(v_class1, v_class2, axis=0)
v_Data = np.concatenate((v_D, np.ones((2 * v_no, 1))), axis=1)
v_c1_label = np.ones((v_no, 1))
v_c2_label = -1 * np.ones((v_no, 1))
v_label = np.concatenate((v_c1_label, v_c2_label), axis=0)
v_Data = v_Data.T
v_y = v_label.T # Validation labels
Step 4- plotting custom dataset and validation
# Plotting training and validation data in subplots
fig, axs = plt.subplots(1, 2, figsize=(12, 6))
# Training data plot
axs[0].plot(class1[:, 0], class1[:, 1], 'ro', class2[:, 0], class2[:, 1], 'bo')
axs[0].set_title('Training Data')
axs[0].set_xlabel('Feature 1')
axs[0].set_ylabel('Feature 2')
# Validation data plot
axs[1].plot(v_class1[:, 0], v_class1[:, 1], 'ro', v_class2[:, 0], v_class2[:, 1], 'bo')
axs[1].set_title('Validation Data')
axs[1].set_xlabel('Feature 1')
axs[1].set_ylabel('Feature 2')
# Show the plot
plt.tight_layout()
plt.show()

Step 5- The fifth step is to define sigmoid function which helps us to output the probabilities between 0 and 1 and also define prediction function.
# Sigmoid function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Prediction function
def prediction(w, Data):
pred = []
z = np.dot(w, Data)
a = sigmoid(z)
for i in range(len(a[0])):
if a[0][i] > 0.5:
pred.append(1)
else:
pred.append(-1)
return pred
Step 6- Training part- In this step, the loss is calculated and minimized with respect to weights.
Now define hyperparameters
learning_rate = 0.01
w = np.random.randn(1,3)
Start the Training loop
# Training loop
for i in range(1, 1500):
z = np.dot(w, Data)
y_pred = prediction(w, Data)
val = -np.multiply(y, z)
J = np.sum(np.log(1 + np.exp(val)))
num = -np.multiply(y, np.exp(val))
den = 1 + np.exp(val)
f = num / den
gradJ = np.dot(Data, f.T)
w = w - learning_rate * gradJ.T
print(f"Epoch {i}, Loss {J}, Training Accuracy {accuracy_score(y[0], y_pred) * 100:.2f}%")
we are getting 100% accuracy on training data
let us make prediction on test data. And then plot the result
# Test the model
Test_predict = prediction(w, v_Data)
print(f"Test Accuracy: {accuracy_score(v_y[0], Test_predict) * 100:.2f}%")
#Test Accuracy: 88.33%
Now plot the decision boundary
# Plot decision boundary
domain = np.linspace(0, 13, 100)
h_x = -(w[0, 0] / w[0, 1]) * domain - (w[0, 2] / w[0, 1])
plt.plot(v_class1[:, 0], v_class1[:, 1], 'ro', v_class2[:, 0], v_class2[:, 1], 'bo')
plt.plot(domain, h_x)
plt.show()

Wrap up the Session
You now know what logistic regression is and the way you’ll implement it for classification with Python. You’ve used many open-source packages, including NumPy, to work with arrays and Matplotlib to visualize the results. You also used both scikit-learn and StatsModels to create, fit, evaluate, and apply models.
You’ve come a long way in understanding one of the most important areas of machine learning! If you have questions or comments, then please put them in the comments section below.
If you like the article and would like to support me, make sure to: