Inspiration

I've always wondered what my pets are thinking. That curious head tilt, the sudden zoomies, the 3 AM meowing — there has to be something going on in there. When I saw the BearHacks 2026 theme "Break the Norm," it clicked: what if AI could break the communication barrier between humans and pets? Not literally translate animal sounds (we're not there yet!), but use AI to imagine what your pet might say based on who they actually are — their breed, their personality, their quirks.

That's how PetSpeak was born.

What it does

PetSpeak turns a single photo of your pet into a full interactive experience:

  1. Scan — Upload a photo and Google Cloud Vision identifies the breed
  2. Profile — Google Gemini generates a unique personality based on breed traits
  3. Voice — ElevenLabs gives your pet a one-of-a-kind AI voice with expressive delivery
  4. Chat — Have a real conversation with your pet, and they respond in character
  5. Care — Get personalized care tips, daily tasks, and fun facts
  6. Learn — Browse a knowledge base of 32 pet care articles across 6 categories
  7. Quiz — Test your pet knowledge with a timed trivia game

How I built it

Solo developer, one weekend, three AI APIs.

The tech stack:

  • Next.js 16 with App Router and TypeScript for a modern, fast web experience
  • Google Cloud Vision API for breed identification from photos
  • Google Gemini AI (gemma-3-27b-it) for personality generation and chat conversations
  • ElevenLabs Text-to-Speech for unique, expressive pet voices
  • Tailwind CSS + shadcn/ui for clean, responsive UI
  • Zustand with localStorage persistence so your pets survive page refreshes

I built it incrementally — starting with project scaffolding, then the API integration layer, then each feature one at a time. The architecture prioritizes reliability: every external API call has error handling, fallback strategies, and caching.

Challenges I ran into

1. ElevenLabs rate limiting and account blocking Free tier abuse detection flagged my account after extensive testing. I solved this by implementing a serial request queue to prevent concurrent API calls, adding server-side voice caching (MD5 hash-based .mp3 file cache), and building a browser-native SpeechSynthesis fallback so the app never fails silently — if ElevenLabs goes down, the browser takes over.

2. Gemini model compatibility The systemInstruction parameter doesn't work with the gemma-3-27b-it model. Follow-up chat messages would fail silently. I fixed this by embedding the system prompt directly into the prompt text as a [System Instructions] block.

3. Voice uniqueness and expressiveness I wanted each pet to sound different. I built a deterministic voice assignment system using string hashing — the same pet always gets the same voice. Then I added per-pet voice settings (stability, style, similarity) that vary within expressive ranges, so each pet has a distinct personality in their delivery.

4. AI output formatting Gemini loves markdown asterisks (*like this*), which look terrible in a UI. I built a stripMarkdown() utility and added explicit "no markdown" instructions to all AI prompts.

5. Credit management With three paid APIs, every unnecessary call costs money. I implemented click-to-play voice (instead of auto-play), server-side file caching, and pre-generated voice cache files for demo pets.

Accomplishments that I'm proud of

  • Three AI APIs working together seamlessly in a single user flow
  • Voice reliability engineering — serial queue, retry logic, browser fallback, file caching
  • The "wow" moment when you scan a pet photo and hear it introduce itself for the first time
  • Built entirely solo in one weekend with a clean 12-commit git history

What I learned

  • How to integrate multiple AI APIs (Vision, LLM, TTS) into a cohesive product
  • The importance of fallback strategies when depending on external services
  • Rate limiting patterns and voice caching for cost optimization
  • That AI-generated pet personalities are surprisingly delightful and addictive to chat with

What's next for PetSpeak

  • Multi-pet household support (pets that know about each other)
  • Real-time camera scanning (no upload needed)
  • Pet mood detection from facial expressions
  • Community features — share your pet's funniest AI conversations
  • Mobile app with push notification reminders for care tasks

Built With

  • elevenlabs
  • google-cloud-vision-api
  • google-gemini-ai
  • next.js
  • node.js
  • react
  • shadcn/ui
  • tailwind-css
  • typescript
  • vercel
  • zustand
Share this project:

Updates