Inspiration

We drew inspiration from Baymax, the healthcare robot from Big Hero 6. In the movie, Baymax provides personalized advice and immediate care after scanning for injuries and pain. Similarly, our goal was to build a system that continuously reads and responds to human emotion. Our robot uses facial expression detection during conversations to interpret the user’s feelings, while an AI chatbot generates personalized responses in real time. By integrating emotional awareness into a physical robot, we aim to make therapeutic interactions feel more empathetic, engaging, and human than those conducted through a standard laptop interface.

What it does

The robot’s camera scans your facial expressions during the first ten seconds of interaction. Once you begin speaking, it initiates a conversation tailored to your detected mood. Simultaneously, LED lights display a corresponding color that reflects your expression, and a piezo buzzer plays a matching musical note. The AI chatbot’s voice also adapts in tone—becoming softer and more comforting when you appear upset, or more energetic and upbeat when you seem excited.

How we built it

We used a Raspberry Pi 5 (4GB) and a Logitech webcam to capture and analyze facial expressions. A Python program built with Headless and DeepFace performs facial recognition and identifies the user’s current emotion. Based on the detected emotion, the system activates the corresponding LED color and plays a matching audio frequency through the piezo buzzer. The AI chatbot runs on the Gemini API and uses ElevenLabs for natural, customizable voice responses, creating a more human-like and emotionally adaptive interaction experience.

Challenges we ran into

We encountered several challenges when integrating the software and hardware components. Initially, the facial recognition program was designed to run on a laptop camera, so connecting it to the Raspberry Pi with the Logitech camera did not work immediately. Our original setup used OpenCV to draw bounding boxes around detected faces, but since the Raspberry Pi does not require this visualization, we adapted the system to run in headless mode, making it compatible with lower-level hardware. We also faced difficulties combining the AI chatbot code with the Raspberry Pi, as the two systems could not reliably communicate—the Pi struggled to stream emotion detection results to the laptop in real time. Additionally, we lacked a reliable microphone and speaker setup, which prevented us from running the chatbot’s voice output directly through the Raspberry Pi. We also spent time refining the AI’s responsiveness and ensuring the text-to-speech output was accurate and natural rather than delayed or distorted, which was essential to creating smooth, emotionally aware interactions.

Accomplishments that we're proud of

We are proud that we successfully implemented facial recognition on the Raspberry Pi, allowing it to detect emotions and trigger the corresponding LED color and piezo buzzer frequency in real time. We’re also proud of our AI chatbot, which we optimized to be responsive and fast, accurately processing text-to-speech input without distortion or gibberish. The chatbot can now initiate conversations based on the detected emotion, generating emotion-aware responses through ElevenLabs voice synthesis for a more natural and personalized interaction. Additionally, we CAD-modeled and 3D printed a custom container to house the robot’s components, ensuring a stable structure, organized wiring, and a clean, professional final design.

What we learned

Through this project, we learned how to bridge software and hardware challenges while building an emotion-aware robot. We discovered that the Raspberry Pi cannot use OpenCV to draw bounding boxes efficiently and that it lacks analog audio outputs, requiring a DAC or amplifier to drive an 8 Ω speaker since it only supports digital signals. We gained hands-on experience in CAD modeling and 3D printing custom parts for the robot’s container, learned how to implement facial recognition and emotion detection using DeepFace, and developed skills in integrating the Gemini API with ElevenLabs to achieve responsive, accurate text-to-speech interactions that adapt to user emotions.

What's next for TheraPi

Moving forward, we plan to fully integrate the AI chatbot into the physical robot, enabling complete on-device emotion detection and response. We also aim to develop a reliable microphone and speaker system for clearer, real-time communication. In addition, we want to improve the robot’s physical design to make it more visually appealing and relatable to users. Finally, we plan to customize the ElevenLabs voice options—allowing users to choose preferred accents, tones, or gendered voices for a more personal and human experience.

Built With

Share this project:

Updates