Table Number: 4.1

What Inspired Us

The singing fish was initially a gift for a friend, who instantly came up with the idea to make more out of our fishy companion. We asked ourselves: How can we give the fish a real personality and a way to dynamically answer us? Ultimately, we wanted to free him from the limitation of the two default songs he was originally programmed to sing.

How We Built It

Our project is built upon three main pillars: Hardware, Software Frontend, and Software Backend.

  1. Hardware: At the core is the Raspberry Pi 4, which acts as the physical brain. It connects to our custom H-bridge and the DC motors that bring the fish's mouth and body to life.
  2. Software Frontend: This handles the direct user interaction. It captures the user's vocal prompt, transcribes it into a text string, and manages the physical output, playing the incoming audio chunks and executing the synchronized motor movements.
  3. Software Backend: We programmed a server to handle the core logic and AI integration. It receives the transcribed text and sends it via API to deepseek-v4-flash. Based on the prompt, DeepSeek decides which voice and emotion to use and generates a text response. This text is then transformed into audio using a Text-to-Speech (Piper) pipeline and streamed back to the frontend in data chunks.

The Challenges We Faced

One of the primary physical challenges was controlling the motors. We realized we had insufficient voltage to drive them directly, which forced us to integrate a H-bridge to supply the necessary power. To control the motor speed and position effectively, we had to calculate and apply Pulse Width Modulation (PWM), where the output voltage relies on the duty cycle D: $$V_{out} = D \cdot V_{in}$$

Beyond the electronics, figuring out the movement logic was a major hurdle. We had to conceptualize exactly how the mouth and body should move to look natural, and then program this logic into the motor controllers. Determining exactly which movements the fish should execute, when to trigger them, and how to synchronize them with the incoming audio chunks required intense problem-solving and fine-tuning.

What We Learned

Above all, we learned how to successfully build a functional bridge between hardware, frontend software, and backend servers. Connecting physical DC motors and a H-bridge to an advanced LLM API over a network taught us incredibly valuable lessons in system architecture. We also gained deep insights into handling asynchronous data streams and utilizing prompt engineering to seamlessly trigger specific physical actions and voice emotions.

Built With

Share this project:

Updates