How many times do you visit a website such as Wikipedia, and end up reading heaps of content to find what you're actually looking for? Find++ aims to solve this issue by locating information on a page based on Natural Language Semantics and word relationships.

What it does

Find++ is a Chrome extension that helps users locate what they're looking for on a webpage, with non-exact matches. It has the ability to use pre-trained word embedding information to determine relations between words and their meanings.

How I built it

It runs on a Python and Flask API stack, with the Gensim library powering its semantic search. In particular, it uses paragraph embeddings to determine context, based on transfer learning on pre-trained results. The client is a chrome extension that can be installed on a user's Chrome browser and can be triggered on each webpage. Searches can then be made in a non-exact fashion.

Challenges I ran into

Integration of Flask APIs with Javascript and Chrome proved to be particularly challenging. Another challenge was that the matches were non-deterministic and were highly dependent on the amount and quality of the content on the webpage.

Accomplishments that I'm proud of

Getting something sizeable done in the time provided.

What I learned

Integration between Javascript, Python and Chrome extensions. Data cleaning and preprocessing. REST API creation with Flask. Applying embedding techniques to retrieve vectors for sentences.

What's next for Find++

Improve the accuracy of matches using sophisticated Deep Learning models. Information augmentation to help search on sites with lesser content. Implementing voice-based interaction.

Built With

Share this project: