Computer vision models need to be tested rigorously on important subsets of data. However, rigorous testing involves hand-curation which is time- consuming, expensive, and subjective. Take the example of building a stop sign detector; if we want to know how we it's doing on curvy roads with foggy weather, it's a matter of engineers manually going through the dataset and picking out important samples.

What it does

Sieve fixes this as a managed platform that automatically tags data and offers solutions to search any dataset by text, image, or generated tags while also proactively suggesting and making it easy to create interesting groups.

How we built it

The hard part of this problem is being able to quickly build fine-tuned models that actually work on a prospective customer's dataset, especially for hyper-specific things like what type of clothes someone is wearing in a factory.

Challenges we ran into

Fine-tuning models is really hard and they can be finicky. Building a quick way to train with little data is the trick, but the hardest part is actually building the infra that can reliably serve this especially when customers can add data to a storage bucket whenever they'd like.

Accomplishments that we're proud of

The tagging solutions built work pretty well!

What we learned

Infra is the hardest part, along with software to make it easy to specifically fine-tune models as we receive data from customers.

What's next for Sieve Data

Building the full-fledged product!

Built With

Share this project: