Methods of Web Scraping

  1. Using software
  2. Writing code

How to build a basic web scraper using python Beautiful Soup module.

  1. To get the HTML source code of the web page, send an HTTP request to the URL of that web page one wants to access.
  2. After accessing the HTML content, parse the data.
  3. The last task is navigating and searching the parse tree that was created using the parser. For this task, we will be using Beautiful Soup. Steps :
  4. Import required third party libraries

pip install requests pip install lxml pip install bs4

  1. Get the HTML content from the web page

source = requests.get('website link').text

  1. Parsing the HTML content

soup = BeautifulSoup(source, 'lxml')

  1. print the HTML content

print(soup.prettify())

  1. Navigating and searching the parse tree

article = soup.find('article') headline = article.div.h3.text print(headline)

Next, let’s grab the website

offcialWebsite = article.find('div', class_='entry-content').a.text print(offcialWebsite)

so the code looks like

import requests from bs4 import BeautifulSoup import lxml

source = requests.get('webiste link').text soup = BeautifulSoup(source, 'lxml')

article = soup.find('article') headline = article.div.h3.text print(headline) offcialWebsite = article.find('div', class_='entry-content').a.text print(offcialWebsite)

Built With

Share this project:

Updates