Inspiration
Coming into Hack Holyoke, we simply wanted to make any project. We were particularly interested in using web scrapers as well as the Python word cloud library. Originally, our idea was to scrape Instagram hashtags, creating a word cloud out of relevant keywords and their respective number of posts. However, Instagram has made it impossible to do web scraping with their platform, so we shifted directions. We still wanted to use web scraping to make a word cloud to provide a helpful visual aid, so we shifted our focus to how this might help college research.
We made a program that allows a user to see a visual of related keywords with the bigger the word, the more result it has. This can help guide student research by creating an easy to read visual that helps them narrow down search terms and their amount of resources. The generated word cloud may also be used as a visual in a presentation or to aid learning about a topic. Furthermore, the program screenshots the first 10 pages of results on Google for the keyword which can help to see what resources are available in an easy to view manner.
What it does
Given a user input of a keyword the program Googles that keyword and screenshots the first 10 pages of results. The program then finds the synonyms of the word and searches them in Google Scholar, finding out how many results there are for each synonym. Finally, a word cloud is created with larger words representing synonyms with the most results.
Finding synonyms is done by scraping thesaurus.com results and the number of results are found by scraping Google Scholar.
How we built it
We used python with Selenium, Requests, and BeautifulSoup to scrape data from thesaurus.com and Google Scholar as well as to take screenshots.
The libraries for Selenium used are: WebDriver Manager for Python (Chrome) The libraries for creating the word cloud are: matplotlib, PIL.image, and wordcloud
Challenges we ran into
As we mentioned before, we ran into issues with our original idea of scraping Instagram and had to shift our focus after working on that for about 8 hours. Additionally, we had trouble scraping Google Scholar due to bot detection and needed log in. However, we were able to workshop these issues by shifting our focus, doing research, and relying on each other for help.
Accomplishments that we're proud of
We are both proud that we were able to create a program in 24 hours!
What we learned
We learned the basics of participating in a hackathon and the importance of planning, how to scrape the web with Selenium/BeautifulSoup, what websites can be scraped well and ones that cannot (e.g. Instagram), and how to effectively collaborate on a short term project efficiently.
What's next for Synonym Cloud
We maybe will find a way to put our project into a web framework such as Flask!
Built With
- beautiful-soup
- matplotlib
- pil
- python
- requests
- selenium
- webdriver
- wordcloud

Log in or sign up for Devpost to join the conversation.