MeowDiary - AI Cat Diary

Inspiration

A cat's intelligence is roughly equivalent to that of a three or four-year-old child. Parents learn about their kids' day through conversation — what they did, how they felt, whether anything unusual happened. But cats can't speak. If only my cat could tell me what it did all day. This idea inspired me to create MeowDiary, using AI technology to bridge the communication gap between humans and cats. We believe AI should not replace pets for companionship — instead, it should serve as an emotional connector between humans and their pets.

What it does

MeowDiary is an AI cat diary application powered by Gemini 3. It automatically converts surveillance footage or recorded videos into diaries written from the cat's first-person perspective, helping users truly understand their cat's daily life while providing behavioral guidance to create a positive communication loop.

Core features include:

Daily Diary Generation + Actionable Advice: Transforms video into personality-driven first-person cat narratives with actionable recommendations
Behavior Translation: Upload video clips anytime, anywhere to get a closed-loop response of behavior interpretation and interaction suggestions
Personality Profiling: Builds communication on a foundation of deep understanding through authoritative personality assessments (MBTI-style)
Long-term Tracking: Monitors behavioral trends over 7–30 days, flagging potential anomalies

How we built it

We designed a Gemini 3 Pro + Flash dual-engine layered architecture with a 6-module AI pipeline:

Extract Module (Pro) — Multimodal visual analysis of video frames to identify cat behaviors, outputting structured JSON
Assess Module (Flash) — 7-dimension behavioral analysis based on professional veterinary scales
Story Module (Flash) — First-person diary generation based on the cat's personality profile
Advice Module (Flash) — Tiered behavioral recommendations (green/yellow/red)
Personality Module (Pro) — Long-term behavioral pattern analysis to build cat personality profiles
Trend Module (Pro) — Multi-day data correlation with deep reasoning (thinkingConfig: 2048 tokens)

The pipeline follows a Chain-of-Thought pattern: Perceive → Understand → Express. We leveraged Gemini 3's core capabilities including Thinking Mode (deep reasoning), Structured Output (JSON output via responseMimeType: application/json), Multimodal Input, and System Instructions (role-specific prompting).

For cost control, we implemented four key strategies:

Smart Preprocessing: Client-side video frame extraction (1 FPS sampling + quality filtering), reducing input volume by ~99%
Model-layer Routing: Pro handles only vision + deep reasoning, Flash handles all text tasks, cutting costs by 70–80%
Local Caching: IndexedDB + localStorage tiered caching with 24-hour TTL
Engineering Quality Assurance: TypeScript type safety + exponential backoff retry + JSON Schema constraints

Challenges we ran into

We faced two major challenges:

High Latency — Video analysis through multiple AI modules is inherently slow. We addressed this with an asynchronous "Fire-and-Forget" architecture, cognitive transfer guidance, parallel video data distribution, and edge-side preprocessing acceleration.
High Cost — API costs can make or break a product. We optimized model division of labor: Gemini 3 Pro handles only "Vision + Deep Reasoning," while Gemini 3 Flash handles only "Text Generation + Rapid Assessment." Combined with video frame preprocessing, layered routing, and local caching, we successfully reduced costs to a profitable range — an estimated operational cost of $7.65 per user per month.

Accomplishments that we're proud of

Built a complete 6-module AI pipeline where each module has an independent professional identity and dedicated prompt, achieving high-quality output through specialized division of labor
Reduced video input tokens by ~99% through intelligent client-side preprocessing (50MB raw video → ~500KB optimized frames)
Achieved 70–80% cost reduction compared to an all-Pro approach through smart model routing, while actually improving response speed
Product sits in an undervalued market — profitable, no direct competition, and holds a first-mover advantage
Achieved zero API calls for repeated content viewing through a tiered caching strategy

What we learned

AI should not replace pets for companionship — it should serve as an emotional connector between humans and their pets
The biggest challenge in cross-species communication is understanding — and understanding is a long-term process that doesn't happen overnight
Model routing is crucial: choosing the right model for the right task is more impactful than upgrading to a more expensive model across the board
Thinking Mode (thinkingConfig) significantly improves complex reasoning quality — without a thinking budget, models easily miss subtle but important behavioral trends
Structured Output is essential for multi-module pipelines — a single parsing failure can break the entire chain

What's next for MeowDiary - AI Cat Diary

User Testing: Gauge user interest in this format — level of engagement, acceptance of privacy and pricing, whether the appealing aspects outweigh any concerns, and willingness to co-evolve the pet communication model
Cloud Camera Integration: Automatically pull videos from cloud-connected cameras for a truly seamless experience; add a "Configure Device" feature to fully automate diary generation
User System & Cloud Storage: Support user login, multi-device sync, and historical diary archiving
Community & Sharing: Diary sharing, cat-owner social features, optimized bullet comments (with video storage), creating an interactive community where clicking on any bullet comment shows the original video from that user's query
More Pet Support: Expand beyond cats to dogs and other pets
Monetization: Introduce premium subscription features
Memory Mode: Make diaries increasingly intelligent over time with automatic contextual recall
Prompt Optimization: Continuously refine AI prompts for better output quality
Diary Skins: Offer diverse diary theme customization options
Waiting for the Right Moment: When multimodal AI costs drop further and generation speeds increase, that will be the true dawn of MeowDiary's era

Built With

and-complex-reasoning-(used-with-thinkingconfig).-gemini-3-flash-preview:-for-creative-storytelling-(persona-adoption)
and-personality-profiling.-@google/genai-sdk:-official-node.js/web-sdk-integration.-frontend-/-????:-react-19:-utilizing-hooks-(usestate
api
canvas
client-side
fast-chat-interaction
frame-by-frame-behavior-extraction
glassmorphism-effects
google-gemini-api-(gemini-3-pro-&-flash)
google-genai-sdk
html5
indexeddb-(idb)
react-19
recharts
tailwind-css
typescript
useeffect
useref)-and-functional-components.-typescript:-for-type-safe-development-and-robust-data-interfaces.-ui-&-styling-/-?????:-tailwind-css:-for-responsive-design
video

Updates

wudixiaolou guan posted an update — Feb 25, 2026 11:11 PM EST

Hi, I’m Guan Chunlin, and my product is called "Meow Diary".

A cat's intelligence is roughly equivalent to that of a three or four-year-old child. When parents pick up their kids from school, they have conversations to learn about their day—their activities, their mood, and any unusual signs. But cats can't speak. If only my cat could tell me what it did all day. This inspired me to create Meow Diary, utilizing AI technology to bridge the communication gap between humans and cats.

Meow Diary is an AI product powered by Gemini 3. It automatically converts surveillance footage or recorded videos into a diary written from the cat's first-person perspective. It helps users correctly understand their cat's daily life while providing guidance on behavioral expression, creating a positive communication loop.

Beyond the daily diary, users can upload video clips anytime, anywhere to address specific issues. For any video, you get a closed-loop response: "Behavior Translation + Actionable Advice." The biggest challenge in cross-species communication is understanding. We use authoritative personality tests to build communication upon a foundation of deep understanding. Understanding is a long-term process; it doesn't happen overnight. Sudden behavioral changes can signal something special, so we need long-term tracking.

We faced two major challenges:

High Latency

High Cost

When designing the technical architecture, we addressed these hurdles head-on. To solve latency, we designed an asynchronous "Fire-and-Forget" architecture, cognitive transfer guidance, parallel video data distribution, and edge-side preprocessing acceleration.

Cost determines a product's survival. Therefore, we optimized the division of labor between the models:

Gemini 3 Pro handles only "Vision + Deep Reasoning."

Gemini 3 Flash handles only "Text Generation + Rapid Assessment."

We also optimized the pipeline design—incorporating video frame extraction preprocessing, layered routing, and local caching—successfully reducing costs to a profitable range. After multi-dimensional calculations, the estimated operational cost is $7.65 per month per user.

Based on our analysis of the market, profit margins, positioning, and innovation, we can see that this product sits in an undervalued market. It is profitable, faces no direct competition, and possesses a first-mover advantage.

There are many AI companionship products on the market where AI simulates humans or pets for voice chat. However, I believe AI should not replace pets for companionship. Instead, it should serve as an emotional connector between humans and their pets.

Log in or sign up for Devpost to join the conversation.

wudixiaolou guan started this project — Feb 09, 2026 07:44 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.