Inspiration
While brainstorming for ideas, we were shocked to find quite a few projects similar to ones we've worked on before, and even some similar to ideas we had came up with during PennApps. We decided to make a website/search engine tool to help hackers like us realize whether our ideas are actually new, and prevent duplicate projects.
What it does
"Been There Done That" is a minimalist and elegant website with just a search bar, but behind the scenes. it's a powerful data retrieval and analysis tool. Type in a couple sentences describing the core ideas of projects you're thinking about, and the website will look through hundreds of old hackathon projects submitted on Devpost while analyzing how similar their developer-written descriptions are to yours. Receive a nice overview of the key words of your descriptions, and which old projects are most similar to yours.
How we built it
We first gathered hackathon project data by using Apify's web scraping API. We wrote logic that allows the scraper to crawl through many pages of a Devpost hackathon submissions page, and previous editions of the hackathon, all while opening each new project and storing the "What it does" section.
Then, we created a similarity checking process starting with twinword's text similarity API. It gave a decent metric for quickly finding which projects have at least some similarity to the user-supplied one. We improved this analysis by using twinword's topic tagging API in order to find the key words in the user's description, and then searching the hackathon database with it.
We designed the website using Materialize, and put the web app together using flask and python.
Challenges we ran into
The most difficult part of developing this was the theorizing of how to improve text similarity analysis beyond the twinword API. We tried many different ideas, but most of them failed or weren't much of an improvement for too high a performance cost.
Accomplishments that we're proud of
We built finished all parts of the product, and created a website that's really easy to use while having really useful info to offer. Our idea for using a keyword extractor for searching also improved accuracy above that of just twinword's text similarity API.
What we learned
The people who research NLP have very difficult jobs. Even using an existing text analysis tool, the results weren't very good, and our many tries to improve the process either made it slower or less accurate. However, as a team, we learned the power of determination, from staying up so long and finally feeling the satisfaction of a (decently) working product with a beautiful look.
What's next for Been There Done That
We hope to continue developing the tool to scan more hackathon submissions and become more accurate at detecting similar ideas.
Built With
- apify-api
- flask
- javascript
- materialize
- python
- twinword-text-similarity
- twinword-topic-tagging
Log in or sign up for Devpost to join the conversation.