Omni - Text to API

Results for a sample query

Inspiration

Whenever I am doing research on a specific topic, I open 20-30 pages and go through them looking for specific things that I want to find. I always end up creating a google sheet to keep track of things.

What it does

It takes in raw pages as input and provides you a way to extract entities out of the raw form text and return it in a JSON format specified by you.

How we built it

Used Google Cloud Functions for the serverless infra and API endpoint
Built a chrome extension to download raw text from a webpage
Used claude2 to convert raw text into JSON format

Challenges we ran into

I started with using saved html files but that was too much content
I looked into embeddings to solve the problem of auto categorization, but we are not there yet
Ran into the challenge of getting just a JSON as output

Accomplishments that we're proud of

Building all these things within the short time period
Keeping the tech stack slim
Keeping the user interactions to as minimal as possible

What we learned

How to enforce model to return JSON outputs
How to use embeddings

What's next for Omni - Text to API

Auto capture pages and put them in cloud storage
Background tasks to process data async
Semantic search using ScaNN over embeddings

Built With

Updates

Vikram Tiwari started this project — Jul 30, 2023 05:41 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.