Inspiration

Mood Paw was inspired by a real problem in multi-animal homes: pets, especially cats, can have sudden emotional conflicts such as fighting, stress, pain, or territorial aggression, and owners are not always there to notice it in time. In homes with multiple animals, tension can build quickly, and by the time a human sees the situation, the conflict may already have happened.

Regular cameras also are not enough. A cat can move out of frame, hide under furniture, stay in another room, or express discomfort in ways that are hard to interpret visually. Even when a camera captures the scene, it may still miss the emotional meaning behind the behavior. We realized that one of the most important signals is not just what we see, but what we hear. Cat vocalizations often carry urgent emotional information that visual monitoring alone cannot fully capture.

We wanted to build a system that could listen for meaningful cat sounds, distinguish them from everyday background noise, and identify emotionally important situations such as anger, pain, or fighting. Our goal was to give pet owners a smarter way to understand what is happening in the home, especially when traditional monitoring falls short.

What it does

Mood Paw is an AI-powered cat monitoring system that listens to audio in the home and turns it into actionable insight for pet owners. The system captures sound through an ESP32-S3-based device, records audio clips when triggered, and sends them to a backend server for analysis.

The server first determines whether the sound is actually cat-related or just background noise. If it is not a cat sound, the file is discarded automatically. If it is a cat vocalization, the system performs cat emotion classification. When the detected emotional state indicates a potentially urgent situation, such as anger, pain, or fighting, Mood Paw sends a Telegram alert to the owner in real time. For non-urgent cat sounds, the system stays quiet. This helps reduce unnecessary notifications while still surfacing situations that may require attention.

How we built it

We built Mood Paw by combining embedded hardware, audio recording, backend AI inference, and real-time notification.

On the device side, we used an ESP32-S3 with a microphone to continuously monitor sound. Because longer audio clips exceeded the practical memory limits of the device, we redesigned the recording pipeline so that audio could be written as WAV files to an SD card instead of being stored entirely in RAM. Once the file is saved, the device automatically uploads it to the server over Wi-Fi.

On the backend side, we designed a two-stage analysis pipeline. The first stage performs cat-vs-non-cat detection so that random household sounds, speech, and other irrelevant audio can be filtered out. The second stage runs cat vocal emotion classification on files that are likely to contain cat sounds. Finally, the backend applies notification logic and only sends Telegram alerts for the emotions we consider high-risk, specifically fighting, pain, and anger.

Challenges we ran into

One of our biggest challenges was building on hardware while learning much of the process from scratch. We had to quickly understand how embedded audio capture, storage, Wi-Fi communication, and file transfer worked together, all while trying to turn the idea into a working prototype.

The tiny XIAO device made this even more challenging. Its compact size was great for the vision of a lightweight pet-monitoring system, but it also came with stricter setup constraints. Even connecting it to Wi-Fi reliably during the event was harder than we expected.

Storage also became a major design problem. Once we moved from the idea of “recording sound” to actually handling multi-second WAV files, we had to rethink how much data to save, when to save it, and how to work within limited device resources.

On the AI side, the challenge was not just classification, but data. To detect cat emotion meaningfully, recordings need to be associated with the correct emotional context, such as anger, pain, fighting, or calm behavior. Building and validating that kind of labeled dataset takes significant time and care.

Overall, the hardest part was balancing everything at once: unfamiliar hardware, constrained storage, file transfer, cat-vs-non-cat filtering, and useful real-time alerts. Turning all of those moving parts into one working system was one of the most difficult and rewarding parts of building Mood Paw.

Accomplishments that we're proud of

We are especially proud that we were able to build a working system even though we started with little to no hardware background. This project pushed us far outside our comfort zone, and one of our biggest accomplishments was learning quickly enough to make the core system function the way we originally envisioned.

We are proud that Mood Paw became a real end-to-end prototype rather than just an idea. Despite challenges with hardware setup, Wi-Fi connectivity, storage limitations, and audio processing, we were still able to make the system work across the key functions we cared about: capturing sound, transferring recordings, analyzing cat-related audio, and supporting meaningful alert logic.

More than anything, we are proud that we proved to ourselves that we could enter an unfamiliar area, learn from scratch, and still build something functional, thoughtful, and impactful.

What we learned

We learned an enormous amount from the hardware side of this project, especially because we started with very limited background in embedded systems. Through building Mood Paw, we had to learn Arduino development, board setup, microphone integration, audio capture, SD card storage, Wi-Fi communication, and how to connect all of those components into one functioning pipeline.

We also learned that real-world hardware systems force you to think differently from pure software projects. Constraints such as memory, storage, device-specific networking behavior, and reliability under live conditions can strongly influence the overall design. Working with the tiny XIAO device made that especially clear.

Finally, we learned that building an AI-powered pet monitoring system is not just a modeling problem. It is also a systems problem and a data problem. Reliable emotion understanding depends not only on inference, but also on collecting the right sounds, filtering irrelevant signals, and associating recordings with meaningful emotional context.

What's next for Mood Paw

The next step for Mood Paw is live pet-centered training, fine-tuning, and real-world testing. We want to collect more realistic cat vocalizations, improve the model with better domain-specific data, and evaluate the system in everyday home environments rather than only in prototype conditions.

A key part of that process will be testing Mood Paw on our own cats. By observing the real situations behind the recordings, we can better connect sounds with emotional context and use that feedback to improve both the cat-vs-non-cat detector and the cat emotion classifier. This kind of live iteration is important if we want the system to become truly reliable and useful.

Looking further ahead, we want to reduce false alarms, improve emotional accuracy, and evolve Mood Paw into a richer pet intelligence platform — one that not only detects urgent situations, but also helps owners understand long-term behavioral and emotional patterns in their pets.

Built With

  • arduino
  • c++
  • embedded-audio-recording
  • esp32/xiao-microcontroller-development
  • flask/fastapi-style-backend-services
  • python
  • sd-card-storage
  • telegram
  • wav-audio-processing
  • wi-fi-networking
Share this project:

Updates