Inspiration
Our key inspiration for building Swarm AI was the reality of how importnat real-life conversation is, yet the lack to truly prepare for it before it comes. Everyinterview prep tool gives everyone the same genric questions, and maybe one robotic voice to simulate a live interview exprince if they're lucky. However, we wanted to build something that truly did the real work behind providing a high-quality interview: Researching your specific situation, building agents from scratch, and putting someone in the actual room before it's too late.
What it does
Swarm AI launches right afteryou describe your situation. Whether it be an emotional conversation with loved ones, a pannel interview for a Fortune 500, or a high-stakes VC pitch, Swarm spins up 5 specialized AI research agents in paralle, gives each generated persona a distinct and matching ElevenLabs voice, and runs a live spoken practice session like you're literally joining the real life video call. It then ends with a cinematic, detailed, and voice-powered debrief where it provides feedback and an overall clarity score, but also an option for users to ask Swarm AI about their interview. Swarm AI is trained on the entire context of what happened during the interview, and is ready to answer any questions with feedback to boost performance for the next time. On top of this, Swarm AI has a dashboard where you can also take a look at your past sessions/interview history and learn from charge forward even stronger.
How we built it
Groq powers the Judge Orchestrator at sub 500 ms for real time conversation routing, ElevenLabs Conversational Agents API handles dynamic voice configuration per persona also making us have a text to speech and speech to text feature in the website. Claude drives all 5 research agents and debrief analysis and the Tavily API provides live web search so agents learn from real data from the web. Finally for the backend we used Google OAuth in order to create accounts for Swarm.The frontend is built off React JS a javascript framework. We used 3D features in React to create cool animations and particle movements.
Challenges we ran into
OpenAI hit multiple rate limits during the build causing us to switch to Groq which is faster and is completely free. The voice latency at the start was extremely slow taking about 15 seconds so we were able to decrease that to around 5 seconds by optimizing the architecture. ElevenLabs proved much farther more complex than the standard text to speech(TTS).
Accomplishments that we're proud of
We are proud of getting 5 parallel models running at the same time each having their own purpose in creating a session. We were also able to configure ElevenLab voices based on the persona and stream their outputs using 3D React UI. Most importantly we were able to accomplish all of this in sub 12 hours.
What we learned
Sub 500 ms orchestration latency is the difference between a conversation that feels alive and one that feels broken which is why Groq was the right decision for the orchestrator. We also learned that the orchestration logic for a multi agent is much harder than building the multi agent itself.
What's next for SwarmAI
AMD Gpu credits will power a fine tuning pipeline trained on session star ratings - turning every user interaction into labeled training data that makes Swarms agent generation smarter over time. Long term SwarmAI will become the first conversation prep that actually learns what works for each individual user and adapts every session accordingly.
Log in or sign up for Devpost to join the conversation.