I love reading scary stories every october and the best ones always come from regular users on forums sharing their experiences. The only downside is that they are unorganized and hard to search for. My goal is to have a web site that helps make the discovery of these spooky tales easier by centralizing them in a single place and categorizing them.
What it does
A google cloud functions runs automatically scraping user submitted scary stories from 2 sources; reddit's nosleep subreddit and jezebel's scary story contest.
It then analyzes the posts it founds using google's cloud natural language api. Finally the information acquired gets saved on firebase's realtime database.
These posts are then visible and searchable on the web site.
How I built it
The website is built with react js and using firebase libraries to access the data that gets displayed. The scraper is built with node js using cheerio for html handling and firebase admin library to store the information.
Challenges I ran into
- Figuring out the basics of web scraping
- On one particular case some information I needed was inside an iframe which required some workaround to extract.
- It took me a while to understand google' natural language api and to found a use case for it.
Accomplishments that I'm proud of
- I added a word cloud from the entities that the natural language api returns
- I was able to extract all the information I wanted from the sites that I originally planned.
What I learned
- Cloud functions
- Web scraping techniques and libraries
- Google's Natural Language API
What's next for Scary Story Scraper
- More complex categorizing of the stories.
- Extract user stories from other sites/sources.