Web Scraping

Methods of Web Scraping

Using software
Writing code

How to build a basic web scraper using python Beautiful Soup module.

To get the HTML source code of the web page, send an HTTP request to the URL of that web page one wants to access.
After accessing the HTML content, parse the data.
The last task is navigating and searching the parse tree that was created using the parser. For this task, we will be using Beautiful Soup. Steps :
Import required third party libraries

pip install requests pip install lxml pip install bs4

Get the HTML content from the web page

source = requests.get('website link').text

Parsing the HTML content

soup = BeautifulSoup(source, 'lxml')

print the HTML content

print(soup.prettify())

Navigating and searching the parse tree

article = soup.find('article') headline = article.div.h3.text print(headline)

Next, let’s grab the website

offcialWebsite = article.find('div', class_='entry-content').a.text print(offcialWebsite)

so the code looks like

import requests from bs4 import BeautifulSoup import lxml

source = requests.get('webiste link').text soup = BeautifulSoup(source, 'lxml')

article = soup.find('article') headline = article.div.h3.text print(headline) offcialWebsite = article.find('div', class_='entry-content').a.text print(offcialWebsite)

Built With

python

Updates

Alexy P Thomas started this project — Jan 16, 2021 03:21 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.