You are currently viewing Scrap any website in just 7 lines of code

Scrap any website in just 7 lines of code

Loading

Have you ever tried to scrap the contents from a website, in order to scrape even a few piece of content from website you have to write the lengthy scripts and do the complex setup?😩 Well, not anymore! In this blog you will be going to learn how to scrap any website in just 7 lines of code — quick, simple, and super efficient. Say goodbye to complexity and hello to effortless web scraping! 💻⚡

FireCrawl the Savior

FireCrawl is an amazing Python library through which you can scrap the content of website in a very simple and efficient manner. You can also integrate your OpenAI LLM models like chatgpt and ask the question from that scrapped data.

Get the FireCrawl API Key

The first step is to get your FireCrawl API Key https://www.firecrawl.dev/app/api-keys

Copy the default API key and save it somewhere

Implementation in Python

☑ The first step is to install the FireCrawl library using pip.

pip install -q firecrawl-py

☑ The second step is to import two libraries

  • getpass- This library help us to pass the API keys securely
  • firecrawl- This library helps us to initialize the FireCrawl application
import getpass
from firecrawl import FirecrawlApp

☑ The third step is to pass the API keys securely by running the below lines of code

api_key = getpass.getpass("Enter your Firecrawl API Key: ")

Enter your Firecrawl API Key: ··········

☑ The fourth step is to initialize the Firecrawl client by passing the API keys

# Initialize Firecrawl client
firecrawl = FirecrawlApp(api_key)

☑ The fifth step is to define the URLS that you want to scrape. If there are multiple URLs you can specify it in a list.

# Define the URL to scrape
url = 'https://timesofindia.indiatimes.com/india/fire-at-delhi-hc-judges-house-leads-to-recovery-of-cash-pile/articleshow/119272174.cms'

☑ The sixth step is to define the scrap options like only scrap the main contents and give me the results in the markdown format.

scrape_options = {
    'scrapeOptions': {'formats': ['markdown'],'onlyMainContent': True},
}

After that you have to perform the scrapping

# Perform the scrape
result = firecrawl.crawl_url(url, scrape_options)
result

☑ The seventh step is to see only the markdown down

for page_data in result['data']:
  markdown_content = page_data.get('markdown', '')
  print(markdown_content)

Conclusion

Web scraping doesn’t have to be hard or time-consuming. With just 7 lines of code, you can quickly grab data from any website. Try it out, and you’ll see how easy and fun scraping can be!

If you like the article and would like to support me, make sure to:

Leave a Reply