Inspiration
We were inspired by a simple problem we face every day while using AI tools, we never know which answer to trust. Different AI models give different responses, and sometimes one is correct, sometimes incomplete, sometimes totally wrong. There is no system to compare them or explain why one answer is better than another. This confusion pushed us to build something that doesn’t just give answers, but validates them through competition.
What it does
AI Arena is a multi-agent AI platform where multiple AI models answer the same question in parallel. An intelligent AI referee then evaluates all responses based on correctness, clarity, and usefulness. The best response is selected and further refined using Google Gemini to deliver one final, high-quality answer. Users can see all responses, scores, and reasoning transparently, along with saved history for future reference.
How we built it
We built AI Arena using a modern full-stack architecture. The frontend is built with React, TypeScript, Vite, and TailwindCSS for a smooth and responsive UI. The backend is powered by FastAPI, which orchestrates the multi-agent flow. We integrated Google Gemini API as both an answering agent and an AI referee. For privacy, we also support local AI execution using Ollama. Authentication is handled using Google Firebase, ensuring secure sign-in and session management.
Challenges we ran into
The biggest challenge was designing fair AI judging logic. Making sure the referee evaluates answers objectively and not randomly took multiple iterations. Handling parallel responses without slowing the system was another issue. We also faced challenges in balancing local AI execution with cloud-based APIs while keeping performance fast and stable. Debugging async flows under time pressure was honestly tough.
Accomplishments that we're proud of
We’re proud that we built a working multi-agent AI council system within a limited time. Seeing AI models compete, get scored, and produce one refined answer in real time feels very rewarding. We successfully integrated Google Gemini in multiple roles and maintained privacy support with local models. The UI, animations, and overall experience came together better than expected.
What we learned
We learned that AI answers need validation, not just generation. Building AI systems is not only about models, but about trust, transparency, and explainability. We also learned a lot about async system design, prompt engineering for evaluation, and integrating Google’s AI tools effectively. Team collaboration under hackathon pressure taught us real-world problem solving.
What's next for AI-Arena
Next, we plan to add more AI models like Claude and GPT-4, improve analytics, and allow users to export results. We want to build team collaboration features and developer APIs. Long term, we aim to create domain-specific AI councils for medical, legal, and enterprise use cases — making AI Arena a standard for AI answer verification.
Built With
- fastapi
- gemini
- mongodb
- olama
- openrouter
- python
- react
- typescript
Log in or sign up for Devpost to join the conversation.