Aegis: AI Talent Validation Platform

Candidate Workspace: The candidate enters a simulated development environment featuring a code editor and an integrated AI Assistant.
Autonomous Validation: Upon submission, our Validator Tool securely executes the candidate's code in a sandboxed environment.
Dashboard: The final output is a professional, interactive dashboard that provides a holistic view of the candidate's performance.
Challenge Generation: A recruiter pastes any job description into Aegis. Our Architect Agent, acting as an expert hiring manager.

Inspiration

Our primary inspiration came directly from the hackathon sponsor, Calyptus, and their mission to provide "rigorously vetted, AI-fluent talent." We were fascinated by the challenge this presents: in a world where AI is a core part of a developer's workflow, how do you truly measure a candidate's ability to leverage AI effectively?
We realized that traditional hiring methods are outdated. A resume can't prove AI skills, and a standard coding test actively discourages using the very AI tools that define modern productivity.
This led to our "Aha!" moment: What if we used an AI-powered system to validate AI skills? We were inspired to build a platform that doesn't just test code, but analyzes the entire problem-solving process, providing the objective proof that companies like Calyptus need.

Aegis is an end-to-end, autonomous platform that revolutionizes technical assessments. Here's the user journey:

Challenge Generation: A recruiter pastes any job description into Aegis. Our Architect Agent, acting as an expert hiring manager, instantly designs a custom, relevant coding challenge to test the role's core competencies.
Candidate Workspace: The candidate enters a simulated development environment featuring a code editor and an integrated AI Assistant. They can write their solution and, crucially, ask our Helper Agent for guidance, just like they would with a real AI tool.
Autonomous Validation: Upon submission, our Validator Tool securely executes the candidate's code in a sandboxed environment, providing an immediate, objective report on whether the code runs or fails.
Intelligent Evaluation: The Assessor Agent—the core of our system—analyzes all the data: the final code, the validation report, and the full chat transcript with the AI assistant.
Data-Driven Dashboard: The final output is a professional, interactive dashboard that provides a holistic view of the candidate's performance, including a Technical Score, a unique AI Fluency Score, and detailed summaries of their strengths and weaknesses.

We built Aegis by designing a collaborative, multi-agent system, with each component having a specialized role.

Frontend: We used Streamlit to rapidly build a polished, interactive, and data-driven user interface. We wrote custom CSS to enhance the theme and layout for a professional feel.
AI Framework: LangChain served as the backbone of our project. It allowed us to "chain" together prompts, models, and parsers, and to orchestrate the complex interactions between our different agents.
LLM Provider: We leveraged the Groq API to power our agents with the high-speed llama-3.3-70b-versatile model, ensuring our application was both intelligent and highly responsive.
The Agents: Each agent was built with a highly-engineered prompt, giving it a unique persona (Architect, Helper, Assessor) and a strict set of instructions to ensure reliable and consistent behavior.
Structured Output: A key innovation was prompting our Assessor Agent to return its evaluation as a structured JSON object. This allowed us to easily parse the data and create the visual dashboard, moving beyond simple text generation.

API Instability: We initially started with one API provider but quickly ran into rate limits. We then pivoted to another, only to find the model we were using was deprecated overnight. This forced us to build a more resilient system and a script to programmatically check for available models, which was a great real-world learning experience.
Prompt Engineering for Fairness: Our biggest challenge was making the Assessor Agent "fair." Early versions would unfairly penalize candidates if their code failed due to a missing library in our simple sandboxed environment. We overcame this through iterative prompt engineering, explicitly teaching the agent about its own limitations and instructing it to focus on the candidate's logic rather than environmental errors.
Getting Structured JSON: Forcing an LLM to consistently return clean, parsable JSON is non-trivial. It required multiple revisions of our prompt to add strict formatting rules, examples, and error-handling instructions for the model.

A Complete End-to-End System: We didn't just build a single-function tool; we created a complete, multi-stage platform that handles the entire assessment lifecycle autonomously.
The "AI Fluency" Score: We're incredibly proud of this unique metric. It's the core innovation of our project and directly addresses the initial problem statement by creating a quantifiable measure of how effectively someone uses AI.
The Context-Aware Assessor: The fact that our final agent can understand the limitations of its own tools and provide a fair, nuanced evaluation is a significant step towards truly intelligent, agentic systems.
The Polished UI: We're proud of the final user experience, from the professional theme to the interactive dashboard. It shows that we care about building a product, not just a proof-of-concept.

The Power of Multi-Agent Systems: We learned that breaking a complex problem down into smaller, specialized agents (like an Architect and an Assessor) makes the system more robust, easier to debug, and ultimately more capable.
Prompt Engineering is an Art and a Science: The quality of our application is directly tied to the quality of our prompts. We learned that to get reliable results, you need to be incredibly specific, providing roles, context, examples, and strict output formats.
Iteration is Key: Every part of this project, from the UI to the agent prompts, went through multiple iterations. We learned that the first idea is rarely the best, and continuous refinement is what leads to a great final product.

Aegis is more than just a hackathon project; it's a vision for the future of technical hiring. Here are our next steps:

Support for Multiple Languages: Extend the Validator Tool to securely execute code in other major languages like JavaScript, Java, and C++.
Time Tracking and Analytics: Implement a visible timer during the assessment and add "Completion Time" as a key metric on the final dashboard.
Database Integration: Store all assessment results in a database to allow recruiters to track candidate performance over time and identify hiring trends.
Full IDE Integration: Embed a more powerful, full-featured code editor directly into the workspace for an even more realistic development experience.

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.