Inspiration

AI applications using MCP (Model Context Protocol) and Agent-to-Agent interactions are notoriously difficult to test comprehensively. Manual testing is time-consuming, inconsistent, and doesn't scale with the complexity of modern agentic systems.

What it does

Magic Eval is an intelligent testing automation platform that automatically generates comprehensive test scenarios for AI applications and evaluates their performance using Google's ADK.

How we built it

  • Automated Scenario Generation: Using CrewAI, we automatically create diverse, realistic test scenarios based on your AI agent's available tools and capabilities
  • Evaluation: Leverages Google ADK to run comprehensive evaluations against your AI applications
  • Human-in-the-Loop Validation: Incorporates human oversight at critical evaluation points to ensure quality and catch edge cases
  • Observability: Built on Weave for tracking and monitoring. ## Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Magic_evals

Built With

Share this project:

Updates