Use Natural Language Processing

Step-1: After logging in, you will see a home page of this software → Go to “+New” button on leftmost upper corner → click on it → choose an advanced option from drop-down → New task window will open up → make a new group to organize all your task in that (optional) → Enter the webpage URL that you want to scrape (e.g., https://twitter.com/realDonaldTrump/status/1287119187324874754 ) → Click on save. STEP-2: After saving, you will see a pop-up window asking for choosing between two options: either Automatically let Octoparse bot scrape the Web-page or select specific elements manually. STEP-3: Since twitter has no next button or divided into pages, paginating isn’t done by bot automatically. Otherwise, it would have identified it. So, to create a pagination loop, we have to set up an infinitive scrolling to load more data through pagination. STEP-4: Now our next step is to change settings accordingly, let’s start with pagination → go to “Click to paginate” option in workflow → hover on it & you will see “Action settings” option → click on it → and specify these changes in the settings STEP-5: Now our next task is to clean the data, I am extracting tweet replies only for my project. You can also extract details like “Retweets”, “Likes”, etc. So, you can choose to keep other attributes too if they make sense to you in your project. STEP-6: After making several changes, if you are getting expected data, you are all set to run your task.

Built With

natural-language-processing

Updates

Kashish Dharmani started this project — Apr 03, 2021 01:00 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.