Pet Speak

Inspiration

I've always wondered what my pets are thinking. That curious head tilt, the sudden zoomies, the 3 AM meowing — there has to be something going on in there. When I saw the BearHacks 2026 theme "Break the Norm," it clicked: what if AI could break the communication barrier between humans and pets? Not literally translate animal sounds (we're not there yet!), but use AI to imagine what your pet might say based on who they actually are — their breed, their personality, their quirks.

That's how PetSpeak was born.

What it does

PetSpeak turns a single photo of your pet into a full interactive experience:

Scan — Upload a photo and Google Cloud Vision identifies the breed
Profile — Google Gemini generates a unique personality based on breed traits
Voice — ElevenLabs gives your pet a one-of-a-kind AI voice with expressive delivery
Chat — Have a real conversation with your pet, and they respond in character
Care — Get personalized care tips, daily tasks, and fun facts
Learn — Browse a knowledge base of 32 pet care articles across 6 categories
Quiz — Test your pet knowledge with a timed trivia game

How I built it

Solo developer, one weekend, three AI APIs.

The tech stack:

Next.js 16 with App Router and TypeScript for a modern, fast web experience
Google Cloud Vision API for breed identification from photos
Google Gemini AI (gemma-3-27b-it) for personality generation and chat conversations
ElevenLabs Text-to-Speech for unique, expressive pet voices
Tailwind CSS + shadcn/ui for clean, responsive UI
Zustand with localStorage persistence so your pets survive page refreshes

I built it incrementally — starting with project scaffolding, then the API integration layer, then each feature one at a time. The architecture prioritizes reliability: every external API call has error handling, fallback strategies, and caching.

Challenges I ran into

1. ElevenLabs rate limiting and account blocking Free tier abuse detection flagged my account after extensive testing. I solved this by implementing a serial request queue to prevent concurrent API calls, adding server-side voice caching (MD5 hash-based .mp3 file cache), and building a browser-native SpeechSynthesis fallback so the app never fails silently — if ElevenLabs goes down, the browser takes over.

2. Gemini model compatibility The systemInstruction parameter doesn't work with the gemma-3-27b-it model. Follow-up chat messages would fail silently. I fixed this by embedding the system prompt directly into the prompt text as a [System Instructions] block.

3. Voice uniqueness and expressiveness I wanted each pet to sound different. I built a deterministic voice assignment system using string hashing — the same pet always gets the same voice. Then I added per-pet voice settings (stability, style, similarity) that vary within expressive ranges, so each pet has a distinct personality in their delivery.

4. AI output formatting Gemini loves markdown asterisks (*like this*), which look terrible in a UI. I built a stripMarkdown() utility and added explicit "no markdown" instructions to all AI prompts.

5. Credit management With three paid APIs, every unnecessary call costs money. I implemented click-to-play voice (instead of auto-play), server-side file caching, and pre-generated voice cache files for demo pets.

Accomplishments that I'm proud of

Three AI APIs working together seamlessly in a single user flow
Voice reliability engineering — serial queue, retry logic, browser fallback, file caching
The "wow" moment when you scan a pet photo and hear it introduce itself for the first time
Built entirely solo in one weekend with a clean 12-commit git history

What I learned

How to integrate multiple AI APIs (Vision, LLM, TTS) into a cohesive product
The importance of fallback strategies when depending on external services
Rate limiting patterns and voice caching for cost optimization
That AI-generated pet personalities are surprisingly delightful and addictive to chat with

What's next for PetSpeak

Multi-pet household support (pets that know about each other)
Real-time camera scanning (no upload needed)
Pet mood detection from facial expressions
Community features — share your pet's funniest AI conversations
Mobile app with push notification reminders for care tasks

Built With

elevenlabs
google-cloud-vision-api
google-gemini-ai
next.js
node.js
react
shadcn/ui
tailwind-css
typescript
vercel
zustand

Submitted to

BearHacks 2026

Created by

I built the entire project solo — frontend, backend, API integrations, and deployment. I designed the architecture using Next.js 16 with App Router, integrated three AI APIs (Google Cloud Vision, Google Gemini, and ElevenLabs), implemented voice caching and browser TTS fallback for reliability, and deployed to Vercel. This was my first time orchestrating multiple AI services in a single user flow, and I learned a lot about API resilience, rate limiting, and making AI interactions feel natural.

Xiaoting Ma
cofft201020@gmail.com

Updates

Xiaoting Ma started this project — Apr 26, 2026 09:55 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.