Methods of Web Scraping
- Using software
- Writing code
How to build a basic web scraper using python Beautiful Soup module.
- To get the HTML source code of the web page, send an HTTP request to the URL of that web page one wants to access.
- After accessing the HTML content, parse the data.
- The last task is navigating and searching the parse tree that was created using the parser. For this task, we will be using Beautiful Soup. Steps :
- Import required third party libraries
pip install requests pip install lxml pip install bs4
- Get the HTML content from the web page
source = requests.get('website link').text
- Parsing the HTML content
soup = BeautifulSoup(source, 'lxml')
- print the HTML content
print(soup.prettify())
- Navigating and searching the parse tree
article = soup.find('article') headline = article.div.h3.text print(headline)
Next, let’s grab the website
offcialWebsite = article.find('div', class_='entry-content').a.text print(offcialWebsite)
so the code looks like
import requests from bs4 import BeautifulSoup import lxml
source = requests.get('webiste link').text soup = BeautifulSoup(source, 'lxml')
article = soup.find('article') headline = article.div.h3.text print(headline) offcialWebsite = article.find('div', class_='entry-content').a.text print(offcialWebsite)
Log in or sign up for Devpost to join the conversation.