Curate is an AI-enabled browser extension to allow individuals to more effectively and efficiently consume digital information.


Content on the web is rarely able to communicate ideas in an effective manner. Whether it be browsing the latest news or perusing a technical report, content consumers are often inundated with distractions (advertisements and boilerplate, for example). This problem is much more pronounced in an enterprise setting. Journalists and researchers are required to sift through countless sources to collect evidence. Software developers need to parse through lengthy documentation to implement programs. Executives need to gather high-level context from volumes of meeting minutes and reports. As such, the lack of support in ingesting the ever-increasing amount of digital information is a significant productivity drain on both individuals and enterprises.

While many tools exist to enable more efficient content creation (ex. Google Docs & Grammarly), few services exist to allow individuals to better consume this content.

What it does

Curate is a browser extension that allows for better content consumption. It exposes a distractionless environment, a "reader-mode," so that readers can process and digest digital information more efficiently. Using the latest machine learning models (BERT in particular), our service automatically highlights the most important sentences within an article. We also recognize that people differ in their preferred learning strategy. To help cater to this preference and to enable content accessibility, we leveraged text-to-speech technology to narrate a given piece of content.

How I built it

The browser extension was built using React.js using common libraries and leveraging this cross-browser extension boilerplate. The backend is a Python server powered by FastAPI. We were able to leverage the latest NLP capabilities using Google's BERT implementation for extractive summarization (using this library).

For text-to-speech capabilities, we used the neural models offered by Google Cloud text-to-speech. We used Google's App Engine to host an internal endpoint for development. As the BERT ML model requires quite a bit of memory, we needed to use the F4_1G instance class.

What's next for Curate

We see a significant value and monetization opportunity for Curate, and would like to keep developing this product after the hackathon. We are excited to continue work on this product with an initial market focus on digital-forward content consumers (such as young software engineers & journalists) using a freemium pricing model.

Our competitive advantage comes from a unique application of the latest machine learning capabilities to create a unified and efficient platform for content ingestion.

As we are one of the first to enter the market with a priority in leveraging ML, we believe that we can maintain our competitive advantage by establishing a data moat. Tracked user behaviors, such as changing the automatically identified texts or the generated transcripts, can be used as training data to fine-tune our ML models. This enables us to offer a significantly more effective product than any subsequent competitors. That being said, we want to stress that data privacy is of utmost concern - we have no intention of continuing the history of data exploitation.

We also see a significant specialization and monetization opportunity within the corporate market (especially in the law, education, and journalism), where the advantage of a data moat is especially clear. It would be immensely difficult for new entrants to compete with ML models fine-tuned to industry-specific content.

Share this project: