I have come across Python libraries such as BeautifulSoup and selenium and have got the inspiration to scrap a website. So, I first targeted Amazon ;-)
What it does
It simply gets requests from any random website and scraps the data we want. Here I've used the website title. This is nothing but the OnePlus smartphone reviews page.
How I built it
First of all, I've used the Python BeautifulSoup library and imported the request like
from bs4 import BeautifulSoup as bs
import requests.After that,
-I get the request from the link
-Get the page content to see the document of the website.
-Declared a variable to parse the document for that I used
soup = bs(amazon_html.content,'html.parser')
-prettify the document and print it to see it again.
-From the prettify content I got all the class name that was required.
-Declared an array and store in that only the text I wanted like the Title of the review.
-Did this for all the required fields.
-Imported the panda library to create a data frame using a variable.
import pandas as pd
df = pd.DataFrame()
-Make each field to complete the structure of the CSV file
-Then converted it to CSV file using 'df.to_csv(r'C:\Users\Itika\Downloads\WebScrapingAmazon.csv',index=True)'
Challenges I ran into
Great things need times and failures and so for me also :) The first challenge I faced was repeated reviews. For that, I found out the length for each field and then pop which of them was repeating.
The second challenge was I was getting the extra character '\' in every review. So for that, I use the lstrip and rstip method to kill those :) My third challenge was the index I was getting was null in my CSV file so I've taken help from the StackOverflow website and make the
Accomplishments that I'm proud of
I'm proud of the moment when I got all the data scraped in my CSV file from the website :)
What we learned
Yes, I've learned a lot from BeautifulSoup and Selenium library.
What's next for Amazon Price Scraping
My next target is to scrap google web searches. Let's hope for the best :)