Building Recommender system using python

In this tutorial, we are going to make a location-based recommendation system using python and machine learning.

Some other blog post that you may want to read is

Table of Contents

Introduction

With the increasing volume of online information, the recommender system is the best line of defense for consumer choice. Due to the massive information that is available on the web, due to this reason customer often get confused that which product they have to buy because in web there is a countless number of the product that a seller can offer. That problem is known as the paradox of choice.

Challenge Accepted

To solve this problem the recommendation system was built. Its main aim is to provide a better experience to the customer by giving them their favorite products.

The recommendation system does not provide benefits to the consumer only, it also provides benefits to the online store like Amazon, Flipkart, Alibaba and many more.

Let us take an example, suppose that you are going to buy an Apple laptop from Amazon. Then the amazon recommender system also gives you options to buy hard-disk, laptop cover, mouse and many more products also. So this way the recommender system works.

Some years back Amazon reported that by using a recommender system there is a hike in their sales more than 30%. Same as for Netflix also.

So the question arises is that;-

How does the recommender system work

Basically, the idea is very simple, recommendation lists are generated based on the user preferences, item features and the consumer interaction with the particular product. There are some other additional information also like temporal and spatial data.

Temporal data means the data that is related to the past time. For example- user transaction history shows how frequently does he buy the product from us.

And the second term is the spatial data which means information about a physical object. For example- for which particular product that the user most frequently buys.

Types of recommender system

Recommendation models are mainly categorized into three categories that are

1- Collaborative filtering recommender system

Collaborative system predicts what you like based on the other similar user liked in the past.

collaborative filtering recommender system using python

For example- Jack likes to buy a computer because one of his friends also buys a computer the previous month.

That’s how the collaborative system works.

2- Content-based recommender system

content based recommendation system using python

Content-based systems predict what you like based on what you like in the past.
For example- Shane buys an earphone last week then the content-based system also recommends him to buy a headset or an AirPods.

One major issue with content-based systems is it is only limited to the products from similar categories that the user bought in the past. Maybe the user may have different interests. To solve this issue the hybrid recommender system comes into play.

3- Hybrid recommender system

It is a combination of collaborative based filtering and content-based filtering.
For example- Netflix uses a hybrid recommender system to recommend the movies to the user.

Top 6 Applications of recommender system

1- Products recommendation system that is used by Amazon to recommend the products.

2- Movie recommendation system that is used by Netflix and it is reported that more than 80% of movies that are watched on Netflix came from recommendations.

3-Video recommendation system that is used by YouTube to recommend home-page videos.

4- App recommender system that is used by Google Play and the App Store to recommend similar apps to the user.

5- Location-based recommendation system that is used by an application or service to recommend like best hotels in the XYZ area or the best restaurant in Las Vegas to customers.

6- Facebook recommender system that is used to recommend peoples that you might know.

How to make location-based recommendation system using python

In this example, we are going to make a location-based recommendation system with python and machine learning.

You can download the dataset directly from the Kaggle. Here is the link.

The dataset consists of business information across 11 metropolitan areas in four countries. Some of the features are opening hours, latitude, longitude, reviews, stars, categories and the name of the business along with the id.

Well, keep in mind that the location-based recommendation system post relied on four important external libraries:

1- Plotly- It is used to plotting and interactive charts or graphs.
2- Geopandas- It is used when we are working with geospatial data.
3- Folium- It is used to visualize data on an interactive leaflet map.
4- GDAL- It is used for reading and writing geospatial data.

!pip install git+git://github.com/geopandas/geopandas.git
!pip install folium 
!pip install plotly_express
!apt install gdal-bin python-gdal python3-gdal

Now let us start coding

Step 1- To load all the required libraries

import pandas as pd 
import numpy as np
import geopandas as gpd

import matplotlib.pyplot as plt
import seaborn as sns

import folium

import plotly 
import plotly.offline as py
import plotly.graph_objs as go
import plotly_express as px

from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

Step 2- Load the dataset, display the features along with shape.

df = pd.read_json('yelp_academic_dataset_business.json', lines=True)

df.head()

df.shape  #(192609, 14)

Step 3- Now its time to get some insights from the data

Now we making a new column called Restaurants which contains all the Restaurants details and then we assign it to a new variable named
df_restaurants. After that, we check the shape of the new variable.

df['Restaurants'] = df['categories'].str.contains('Restaurants')
df.head(2)

df_restaurants = df.loc[df.Restaurants == True]
df_restaurants.head()
df_restaurants.shape #(59371, 15)

fig, ax = plt.subplots(figsize=(12,10))
sns.countplot(df_restaurants['stars'], ax=ax)
plt.title('Review Stars Countplot')
plt.savefig('stars.png')
plt.show()

Now we plot the ratings of the restaurant in range 1 to 5.

Now we arrange the restaurants in the descending order and then plot the top 20 restaurants with their respective ratings and save the figure in png format.

fig, ax = plt.subplots(figsize=(12,10))
sns.barplot(x = 'stars', y = 'name', data=top_restaurants, ax= ax);
plt.savefig('top20_restaurants.png')
plt.show()

Step 4- Now set the map box access token.

For getting the access token you have to create an account in Mapbox and from there you will get the access token.

After that plot the scatter plot of restaurant location i.e latitude and longitude by using our beautiful plotly library.

Now let’s plot the restaurants named NV that is in Las Vegas.

px.set_mapbox_access_token("pk.eyJ1Ijoic2hha2Fzb20iLCJhIjoiY2plMWg1NGFpMXZ5NjJxbjhlM2ttN3AwbiJ9.RtMNPmreKiyBfHuElgYq_w")

#configure_plotly_browser_state()
px.scatter_mapbox(df_restaurants, lat="latitude", lon="longitude", color="stars", size='review_count' ,
                   size_max=30, zoom=3, width=1200, height=800)
                   
lasVegas = df_restaurants[df_restaurants.state == 'NV']
px.scatter_mapbox(lasVegas, lat="latitude", lon="longitude", color="stars", size='review_count' ,
                   size_max=15, zoom=10, width=1200, height=800)

Step 5- Now let us use the famous clustering algorithm that is K-Means clustering to get our expected output which is the recommendation of restaurants in a particular place.

To apply K-Means clustering the first step is going to determine the optimal no of the cluster.

There are two methods to find the optimal no clusters are:-

1- Elbow method– According to Wikipedia, the elbow model is defined as the mean squared distance between each instance and its closest centroid. There is a package in machine learning which is sklearn that is used to find optimal no of the cluster by running the K-means algorithm for n number of times and choose the one with having the lowest inertia.

2- Silhouette Score– We use the silhouette coefficient to find the optimal number of clusters. The formula to calculate the silhouette coefficient is

Silhouette Coefficient = (x-y)/ max(x,y)

where y is the mean intracluster distance: mean distance to the other instances in the same cluster. x means nearest cluster distance that is the mean distance to the instances of the next closest cluster.

# Using elbow method to find optimal no of cluster

coords = lasVegas[['longitude','latitude']]
distortions = []
K = range(1,25)

for k in K:
    kmeansModel = KMeans(n_clusters=k)
    kmeansModel = kmeansModel.fit(coords)
    distortions.append(kmeansModel.inertia_)
    
# And then plot them  
fig, ax = plt.subplots(figsize=(12, 8))
plt.plot(K, distortions, marker='o')
plt.xlabel('k')
plt.ylabel('Distortions')
plt.title('Elbow Method For Optimal k')
plt.savefig('elbow.png')
plt.show()

In the given diagram we can see that by using the elbow method the optimal no of the cluster is 5.

Now let us see the optimal no of the cluster using the silhouette method.

from sklearn.metrics import silhouette_score

sil = []
kmax = 50

# dissimilarity would not be defined for a single cluster, thus, minimum number of clusters should be 2
for k in range(2, kmax+1):
  kmeans = KMeans(n_clusters = k).fit(coords)
  labels = kmeans.labels_
  sil.append(silhouette_score(coords, labels, metric = 'euclidean'))

Now after that, we apply the k-means algorithm. And predict the location.

And then plot the scatter map box for the prediction.

kmeans = KMeans(n_clusters=5, init='k-means++')
kmeans.fit(coords)
y = kmeans.labels_
print("k = 5", " silhouette_score ", silhouette_score(coords, y, metric='euclidean'))

lasVegas['cluster']=kmeans.predict(lasVegas[['longitude','latitude']])
lasVegas.head()

px.scatter_mapbox(lasVegas, lat="latitude", lon="longitude", color="cluster", size='review_count', 
                  hover_data= ['name', 'latitude', 'longitude'], zoom=10, width=1200, height=800)

Wrap up the Session

In this tutorial, we have learned about how to make a recommendation system using python. Now your task is to take this code as a reference and make a Facebook recommendation system using python to recommend friends.

If you like this tutorial please like it and subscribe to our newsletter or you may follow our Facebook page to get a notification when I release any other blog post.

Introduction

How does the recommender system work

Types of recommender system

1- Collaborative filtering recommender system

2- Content-based recommender system

3- Hybrid recommender system

Top 6 Applications of recommender system

How to make location-based recommendation system using python

Now let us start coding

To apply K-Means clustering the first step is going to determine the optimal no of the cluster.

Wrap up the Session

Please Share This Share this content

You Might Also Like

Support Vector Machine- Learn to implement SVM in Python

How to detect and remove outliers in Python

Top 10 evaluation metrics for classification models

Share this content