Inspiration

A cat's intelligence is roughly equivalent to that of a three or four-year-old child. Parents learn about their kids' day through conversation — what they did, how they felt, whether anything unusual happened. But cats can't speak. If only my cat could tell me what it did all day. This idea inspired me to create MeowDiary, using AI technology to bridge the communication gap between humans and cats. We believe AI should not replace pets for companionship — instead, it should serve as an emotional connector between humans and their pets.

What it does

MeowDiary is an AI cat diary application powered by Gemini 3. It automatically converts surveillance footage or recorded videos into diaries written from the cat's first-person perspective, helping users truly understand their cat's daily life while providing behavioral guidance to create a positive communication loop.

Core features include:

  • Daily Diary Generation + Actionable Advice: Transforms video into personality-driven first-person cat narratives with actionable recommendations
  • Behavior Translation: Upload video clips anytime, anywhere to get a closed-loop response of behavior interpretation and interaction suggestions
  • Personality Profiling: Builds communication on a foundation of deep understanding through authoritative personality assessments (MBTI-style)
  • Long-term Tracking: Monitors behavioral trends over 7–30 days, flagging potential anomalies

How we built it

We designed a Gemini 3 Pro + Flash dual-engine layered architecture with a 6-module AI pipeline:

  1. Extract Module (Pro) — Multimodal visual analysis of video frames to identify cat behaviors, outputting structured JSON
  2. Assess Module (Flash) — 7-dimension behavioral analysis based on professional veterinary scales
  3. Story Module (Flash) — First-person diary generation based on the cat's personality profile
  4. Advice Module (Flash) — Tiered behavioral recommendations (green/yellow/red)
  5. Personality Module (Pro) — Long-term behavioral pattern analysis to build cat personality profiles
  6. Trend Module (Pro) — Multi-day data correlation with deep reasoning (thinkingConfig: 2048 tokens)

The pipeline follows a Chain-of-Thought pattern: Perceive → Understand → Express. We leveraged Gemini 3's core capabilities including Thinking Mode (deep reasoning), Structured Output (JSON output via responseMimeType: application/json), Multimodal Input, and System Instructions (role-specific prompting).

For cost control, we implemented four key strategies:

  • Smart Preprocessing: Client-side video frame extraction (1 FPS sampling + quality filtering), reducing input volume by ~99%
  • Model-layer Routing: Pro handles only vision + deep reasoning, Flash handles all text tasks, cutting costs by 70–80%
  • Local Caching: IndexedDB + localStorage tiered caching with 24-hour TTL
  • Engineering Quality Assurance: TypeScript type safety + exponential backoff retry + JSON Schema constraints

Challenges we ran into

We faced two major challenges:

  1. High Latency — Video analysis through multiple AI modules is inherently slow. We addressed this with an asynchronous "Fire-and-Forget" architecture, cognitive transfer guidance, parallel video data distribution, and edge-side preprocessing acceleration.

  2. High Cost — API costs can make or break a product. We optimized model division of labor: Gemini 3 Pro handles only "Vision + Deep Reasoning," while Gemini 3 Flash handles only "Text Generation + Rapid Assessment." Combined with video frame preprocessing, layered routing, and local caching, we successfully reduced costs to a profitable range — an estimated operational cost of $7.65 per user per month.

Accomplishments that we're proud of

  • Built a complete 6-module AI pipeline where each module has an independent professional identity and dedicated prompt, achieving high-quality output through specialized division of labor
  • Reduced video input tokens by ~99% through intelligent client-side preprocessing (50MB raw video → ~500KB optimized frames)
  • Achieved 70–80% cost reduction compared to an all-Pro approach through smart model routing, while actually improving response speed
  • Product sits in an undervalued market — profitable, no direct competition, and holds a first-mover advantage
  • Achieved zero API calls for repeated content viewing through a tiered caching strategy

What we learned

  • AI should not replace pets for companionship — it should serve as an emotional connector between humans and their pets
  • The biggest challenge in cross-species communication is understanding — and understanding is a long-term process that doesn't happen overnight
  • Model routing is crucial: choosing the right model for the right task is more impactful than upgrading to a more expensive model across the board
  • Thinking Mode (thinkingConfig) significantly improves complex reasoning quality — without a thinking budget, models easily miss subtle but important behavioral trends
  • Structured Output is essential for multi-module pipelines — a single parsing failure can break the entire chain

What's next for MeowDiary - AI Cat Diary

  • User Testing: Gauge user interest in this format — level of engagement, acceptance of privacy and pricing, whether the appealing aspects outweigh any concerns, and willingness to co-evolve the pet communication model
  • Cloud Camera Integration: Automatically pull videos from cloud-connected cameras for a truly seamless experience; add a "Configure Device" feature to fully automate diary generation
  • User System & Cloud Storage: Support user login, multi-device sync, and historical diary archiving
  • Community & Sharing: Diary sharing, cat-owner social features, optimized bullet comments (with video storage), creating an interactive community where clicking on any bullet comment shows the original video from that user's query
  • More Pet Support: Expand beyond cats to dogs and other pets
  • Monetization: Introduce premium subscription features
  • Memory Mode: Make diaries increasingly intelligent over time with automatic contextual recall
  • Prompt Optimization: Continuously refine AI prompts for better output quality
  • Diary Skins: Offer diverse diary theme customization options
  • Waiting for the Right Moment: When multimodal AI costs drop further and generation speeds increase, that will be the true dawn of MeowDiary's era

Built With

  • and-complex-reasoning-(used-with-thinkingconfig).-gemini-3-flash-preview:-for-creative-storytelling-(persona-adoption)
  • and-personality-profiling.-@google/genai-sdk:-official-node.js/web-sdk-integration.-frontend-/-????:-react-19:-utilizing-hooks-(usestate
  • api
  • canvas
  • client-side
  • fast-chat-interaction
  • frame-by-frame-behavior-extraction
  • glassmorphism-effects
  • google-gemini-api-(gemini-3-pro-&-flash)
  • google-genai-sdk
  • html5
  • indexeddb-(idb)
  • react-19
  • recharts
  • tailwind-css
  • typescript
  • useeffect
  • useref)-and-functional-components.-typescript:-for-type-safe-development-and-robust-data-interfaces.-ui-&-styling-/-?????:-tailwind-css:-for-responsive-design
  • video
Share this project:

Updates

posted an update

Hi, I’m Guan Chunlin, and my product is called "Meow Diary".

A cat's intelligence is roughly equivalent to that of a three or four-year-old child. When parents pick up their kids from school, they have conversations to learn about their day—their activities, their mood, and any unusual signs. But cats can't speak. If only my cat could tell me what it did all day. This inspired me to create Meow Diary, utilizing AI technology to bridge the communication gap between humans and cats.

Meow Diary is an AI product powered by Gemini 3. It automatically converts surveillance footage or recorded videos into a diary written from the cat's first-person perspective. It helps users correctly understand their cat's daily life while providing guidance on behavioral expression, creating a positive communication loop.

Beyond the daily diary, users can upload video clips anytime, anywhere to address specific issues. For any video, you get a closed-loop response: "Behavior Translation + Actionable Advice." The biggest challenge in cross-species communication is understanding. We use authoritative personality tests to build communication upon a foundation of deep understanding. Understanding is a long-term process; it doesn't happen overnight. Sudden behavioral changes can signal something special, so we need long-term tracking.

We faced two major challenges:

High Latency

High Cost

When designing the technical architecture, we addressed these hurdles head-on. To solve latency, we designed an asynchronous "Fire-and-Forget" architecture, cognitive transfer guidance, parallel video data distribution, and edge-side preprocessing acceleration.

Cost determines a product's survival. Therefore, we optimized the division of labor between the models:

Gemini 3 Pro handles only "Vision + Deep Reasoning."

Gemini 3 Flash handles only "Text Generation + Rapid Assessment."

We also optimized the pipeline design—incorporating video frame extraction preprocessing, layered routing, and local caching—successfully reducing costs to a profitable range. After multi-dimensional calculations, the estimated operational cost is $7.65 per month per user.

Based on our analysis of the market, profit margins, positioning, and innovation, we can see that this product sits in an undervalued market. It is profitable, faces no direct competition, and possesses a first-mover advantage.

There are many AI companionship products on the market where AI simulates humans or pets for voice chat. However, I believe AI should not replace pets for companionship. Instead, it should serve as an emotional connector between humans and their pets.

Log in or sign up for Devpost to join the conversation.