Inspiration
The inspiration came from Xian, who has been dealing with constant debugging issues at Meta. With many types of objects and types, Autoval is looking to solve for regression issues, automatic PR's and improve agentic loops for general and compliant-heavy industries that companies like Crosby, LuminAI.
What it does
Autoval helps teams catch regressions early, generate evals from real failures, and automatically ship safer fixes through PRs.
How we built it
Within the scope of the hackathon, we were impressed with Datadog's Lapdog observability capabilities, as well as Clickhouse and LuminAI. Using strong authoritative sources of truth, we built a fast, compliant engine to identify problems within the development loop.
Challenges we ran into
With Agents, we wanted to find ways for users to find case studies of actual value so we had to vibecode another project so that users can create example datasets that they can test with for Evals. For the longest time, our logs were not showing up in Lapdog that worried us a bit until they populated.
Accomplishments that we're proud of
I think what we are proud of most is the immediate usage and practicality. We built something that can be used for consumption now for debugging and PR's that would take hours if not days to complete.
What we learned
Time management was essential for creating this project. Working through this challenge we encountered a few hallucinations, quick changes to what we wanted to build and strong communication throughout the process.
What's next for Autoval
Had we had more time, we would have taken the time to connect to a custom domain with a front end that allows. We could have added defined endpoints that allowed users to not only create new agents but also use pre-defined tool calls and commands that are industry specific.
Built With
- claude
- clickhouse
- datadog
- nimble
- react
- supabase
- typescript
- vercel
Log in or sign up for Devpost to join the conversation.