Inspiration Most tools out there teach you how to speak in interviews — tone, filler words, body language. But that's not the real problem. The real problem is that people don't know what to say based on who they are, who they're meeting, and what's actually at stake. Alex, a junior in college, has a Google interview next week, a coffee chat with a senior PM, and a networking event on Friday — and no single tool helps with all three in a way that actually knows his background. School taught us everything except how to navigate the professional world. That gap is what we built for.
What it does Babysitter in the Professional World is a holistic AI agent for professional development. You upload your resume, portfolio, and any relevant personal content once — and it becomes your persistent professional profile. From there, you can:
Prepare for interviews by telling it the company and role. It generates personalized behavioral and situational questions based on your actual background against that company's culture, then runs a voice mock interview where the AI plays the interviewer in character. At the end, you get a scored report with specific improvements. Prep for a coffee chat by inputting the person's name, title, and company. It surfaces a personalized question list and a framework for how to connect your background to theirs. Prepare for networking events by describing the event context, demographics, and tone. It helps you craft and rehearse a personalized elevator pitch as a personal story — not a generic 30-second script.
Across all three sessions, the AI remembers your profile as persistent context. You speak, it responds — and after every session, it scores you, generates a report, and gives you a concrete framework to do better next time.
How we built it We split into two parallel tracks that converged into the final product. Iris and Enge used TRAE to build the full product — frontend and backend — within the hackathon window. They wired up the complete voice pipeline (Web Speech API for speech-to-text, MiniMax Speech-02-HD for text-to-speech), integrated the conversation engine across all three session types, and built the profile context system that persists across sessions. Adi trained a custom LLM on MiniMax, specialized specifically on the professional world and grounded in our core value proposition. Rather than prompting a general-purpose model, we fine-tuned it to deeply understand interview dynamics, networking behavior, and professional communication nuance. The architecture uses a two-prompt system: a simulation prompt keeps the AI fully in character throughout the session, and a separate one-shot feedback prompt fires at the end to generate the structured report — so the model never gets confused between playing a role and analyzing performance.
Challenges we ran into Context without scraping. We wanted to automatically fetch LinkedIn profiles and company data to enrich each session. LinkedIn's anti-scraping restrictions made this infeasible within the hackathon window. We reframed this as a deliberate product decision: a user-driven context template where the user briefs the agent directly. It turns out users often know nuances about the person they're meeting that no public data source captures. Keeping the AI in character. Getting the model to stay fully in-character as an interviewer or networking contact — without randomly breaking into coaching mode mid-conversation — required careful prompt architecture. The two-prompt separation was our solution. Voice latency. The full round-trip of speech recognition → LLM response → TTS playback needed to feel natural enough for a real-time conversation. We optimized by using the browser's native Web Speech API for STT to cut one API round-trip, and falling back to MiniMax Speech-02-Turbo when latency spiked.
Accomplishments that we're proud of The moment the agent says "I see you had a PM internship at X — a Google interviewer would push on this. Let's start there" — fully unprompted, based purely on the uploaded resume — that's the wow moment we're proud of. It reads who you are and immediately personalizes the entire session. No other tool does that out of the box. We're also proud of shipping three distinct, coherent session types (interview, coffee chat, networking) under one unified profile system within 24 hours, using TRAE to move fast without sacrificing product quality.
What we learned Personalization is the actual product. Generic AI tools are everywhere. What makes this different is that it knows you — and that changes everything about how the conversation feels. The more specific the context, the more useful the agent becomes. We also learned that voice changes the product entirely. The gap between knowing an answer and saying it out loud is real and significant. Building for voice rather than text forced us to think about the product as a practice environment, not just an information tool.
What's next for Babysitter in Job Market
AI voice tutor — currently the user speaks and the AI responds in text. The next step is full two-way voice interaction, making every session feel like a genuine real-time conversation. Automated research — web scraping and LinkedIn fetch so the agent can research the company and the person you're meeting automatically, without requiring the user to provide context manually. Smart prompt templates — in-chat guided inputs surfaced at the right moment to help users brief the agent more efficiently and get better, more personalized sessions faster.
Built With
- minimax
- trae
Log in or sign up for Devpost to join the conversation.