Sieve Data

the logo
searching for people wearing shorts in a courtyard, so we can ensure a bar of performance on that subset.
searching for people wearing blue pants, so we can ensure a bar of performance on that subset.
searching for dark lighting data, so we can ensure a bar of performance on that subset.
example API call to make a query directly from evaluation code.

Inspiration

Computer vision models need to be tested rigorously on important subsets of data. However, rigorous testing involves hand-curation which is time- consuming, expensive, and subjective. Take the example of building a stop sign detector; if we want to know how we it's doing on curvy roads with foggy weather, it's a matter of engineers manually going through the dataset and picking out important samples.

What it does

Sieve fixes this as a managed platform that automatically tags data and offers solutions to search any dataset by text, image, or generated tags while also proactively suggesting and making it easy to create interesting groups.

How we built it

The hard part of this problem is being able to quickly build fine-tuned models that actually work on a prospective customer's dataset, especially for hyper-specific things like what type of clothes someone is wearing in a factory.

Challenges we ran into

Fine-tuning models is really hard and they can be finicky. Building a quick way to train with little data is the trick, but the hardest part is actually building the infra that can reliably serve this especially when customers can add data to a storage bucket whenever they'd like.

Accomplishments that we're proud of

The tagging solutions built work pretty well!

What we learned

Infra is the hardest part, along with software to make it easy to specifically fine-tune models as we receive data from customers.

What's next for Sieve Data

Building the full-fledged product!

Built With

Updates

Mokshith Voodarla started this project — Aug 06, 2021 11:00 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.