Inspiration

We were inspired to build something that solves a challenge we and our peers know all too well: the high-pressure world of CS interviews. Focusing on the education sector felt personal to us, and we wanted to tackle a problem we directly face because we understand exactly how much it matters to students. Our goal is to streamline the path to internships and jobs, using InnerView to turn interview anxiety into interview readiness.

What it does

InnerView provides a comprehensive "inner view" into your interview performance, bridging the gap between preparation and professional confidence. Our platform utilizes generative AI to generate hyper-realistic, role-specific interview questions that simulate a high-stakes environment. As the user responds, the AI analyzes their content in real-time, providing instant, actionable feedback on their answers. Beyond verbal communication, InnerView integrates a behavioral tracking system using computer vision to monitor eye contact. By ensuring users maintain a steady gaze throughout the session, the platform helps them project the confidence and engagement necessary to land the job. It’s not just a mock interview, but also a data-driven coaching session for your voice and your presence.

How we built it

We engineered InnerView as a multi-modal AI feedback system, utilizing Python to orchestrate vision, audio, and generative intelligence. The core experience is powered by the Gemini 2.5 API, which we utilized to dynamically generate industry-specific interview questions and provide granular, behavioral feedback on user responses. To simulate a realistic environment, we integrated ElevenLabs to convert text-based queries into high-fidelity AI vocalizations. On the frontend, we used Streamlit to build a responsive web interface, refactoring our backend logic to support live camera feeds and real-time audio processing within a browser context. For the behavioral analysis component, we implemented OpenCV with Haar Cascades to track eye contact and pupil centering, providing users with data-driven insights into their non-verbal communication. By managing complex library dependencies and implementing a polling loop architecture for file processing, we ensured a seamless, end-to-end loop between the user’s performance and the AI’s critique.

Challenges we ran into

Navigating the development of InnerView presented technical hurdles that required both precision and different strategic pivots. Our most persistent challenge was "dependency hell," specifically a protobuf version conflict between TensorFlow, MediaPipe, and the Gemini API; we eventually negotiated this by forcing version 4.25.3 to stabilize the environment. This compatibility issue also forced a pivot in our computer vision logic. When MediaPipe’s internal requirements clashed with our AI pipeline, we switched to Haar Cascades via OpenCV to maintain robust eye-tracking without the library overhead. We also faced hurdles with the Presage SDK, which proved unstable in both C++ and Swift, leading us to de-prioritize it to protect the core project timeline. On the backend, we had to implement a custom polling loop to handle Gemini 2.5’s audio processing latency, ensuring the system waited for an active file status before requesting feedback. Finally, transitioning our interface to Streamlit required us to refactor our tracking logic to render camera feeds directly within a web view rather than standard OpenCV windows. Throughout this process, strict adherence to Python virtual environments and python3 pip protocols became essential to ensuring a functional and unified codebase across our different local setups.

Accomplishments that we're proud of

We are proud of how our project manages a complex "closed-loop" cycle where the Gemini API dynamically generates context-aware interview questions that are instantly synthesized into life-like speech via ElevenLabs. We then captured the user's verbal response while simultaneously recording their eye contact metrics, fusing these two distinct data streams into a single STAR-method critique. Completing the cycle, the system uses ElevenLabs to "speak" the improved answer back to the user, a flow we successfully transitioned from a local script into a professional-grade Streamlit web application. Beyond the code itself, we are incredibly proud of the depth of understanding we gained regarding our tech stack; rather than simply generating a product with AI, we treated it as a collaborative tool, combining its capabilities with our own skills to build a more robust and intentional application.

What we learned

Our journey through this hackathon was a steep but rewarding learning curve that transformed how we approach technical development. We mastered the art of environment management, specifically learning to distinguish between Conda (base) and our project's virtual environment (venv) to ensure that every team member worked with identical Python versions and dependencies. By matching our VS Code interpreter to this virtual environment, we avoided installation conflicts and ensured a smooth workflow. On the security front, we implemented industry-standard practices by using .env files and the python-dotenv library to manage sensitive API keys, while carefully configuring .gitignore to keep those secrets off public repositories. The core of our project relied on a sophisticated asynchronous bridge between two AI models: Gemini for text generation and ElevenLabs for voice synthesis. This required us to master complex stream management, where we learned to handle API "generators" by joining streamed data into bytes to produce playable MP3 audio. Beyond the code we actually deployed, we broadened our technical horizons by exploring tools like Presage and MediaPipe, gaining a deeper understanding of the vast ecosystem available for modern AI applications.

What's next for InnerView

Our vision for InnerView is to move beyond a simple practice tool and become a comprehensive behavioral coaching platform. Future iterations will replace the standard self-view mirror with a real-time AI-generated human visual, creating a more authentic environment. By adding multi-gender audio filters and selectable feedback coaching styles [blunt, kind, constructive, etc], we want users to be able to "stress-test" their performance in any scenario. We also plan to give users total control over their career path by allowing for manual job-title inputs, ensuring that whether you are interviewing for a niche technical role or a leadership position, the AI adapts to you. With the addition of precision eye-tracking models and a session recording library, InnerView would provide the data-driven insights necessary to turn interview anxiety into professional confidence.

Built With

Share this project:

Updates