searching for people wearing shorts in a courtyard, so we can ensure a bar of performance on that subset.
searching for people wearing blue pants, so we can ensure a bar of performance on that subset.
searching for dark lighting data, so we can ensure a bar of performance on that subset.
example API call to make a query directly from evaluation code.
Computer vision models need to be tested rigorously on important subsets of data. However, rigorous testing involves hand-curation which is time- consuming, expensive, and subjective. Take the example of building a stop sign detector; if we want to know how we it's doing on curvy roads with foggy weather, it's a matter of engineers manually going through the dataset and picking out important samples.
What it does
Sieve fixes this as a managed platform that automatically tags data and offers solutions to search any dataset by text, image, or generated tags while also proactively suggesting and making it easy to create interesting groups.
How we built it
The hard part of this problem is being able to quickly build fine-tuned models that actually work on a prospective customer's dataset, especially for hyper-specific things like what type of clothes someone is wearing in a factory.
Challenges we ran into
Fine-tuning models is really hard and they can be finicky. Building a quick way to train with little data is the trick, but the hardest part is actually building the infra that can reliably serve this especially when customers can add data to a storage bucket whenever they'd like.
Accomplishments that we're proud of
The tagging solutions built work pretty well!
What we learned
Infra is the hardest part, along with software to make it easy to specifically fine-tune models as we receive data from customers.
What's next for Sieve Data
Building the full-fledged product!