Inspiration
We were inspired by the challenge of making iMessage more interactive and emotionally engaging. Text messaging can feel cold and impersonal, especially when you're stressed, sad, or just need a pick-me-up. We wanted to create something that would:
- Bring emotion to messaging: Transform everyday conversations into interactive experiences
- Solve the "impossible" problem: iMessage integration on macOS is notoriously difficult - we wanted to prove it was possible
- Combine multiple AI services: Showcase how different AI technologies can work together seamlessly
- Create a delightful UX: Build something that's not just functional, but genuinely fun to use
The idea of a pet that transforms into different animals based on your emotions came from the observation that people often anthropomorphize technology and form emotional connections with digital assistants. We wanted to take that a step further with a pet that actually responds to your feelings.
What it does
Moji is an AI-powered digital pet that lives in your iMessage, transforming your messaging experience into an interactive, emotionally intelligent companion. At its core, Moji demonstrates hybrid intelligence - combining multiple specialized AI services (OpenAI DALL-E 3, ElevenLabs, Imgflip) with rule-based mood classification and intelligent orchestration to create a cohesive, emotionally responsive system that's greater than the sum of its parts.
Core Features
π Emotional Intelligence
- Analyzes your messages to detect mood (excited, stressed, sad, neutral)
- Responds with appropriate emotional support and encouragement
- Transforms into different animals based on your emotional state
π±πΆπ¦ Animal Transformation
- Cat (π±) - Appears when you're excited or neutral
- Dog (πΆ) - Appears when you're stressed or need support
- Bird (π¦) - Appears when you're sad and need comfort
- Pet literally morphs its emoji appearance in real-time
π€ Voice Responses (ElevenLabs)
- Generates natural animal voices using ElevenLabs text-to-speech
- Creates realistic cat, dog, and bird voices on-demand
- Voices match the pet's current animal form
- Plays automatically in the desktop UI
π¨ Meme Generation
- Generates contextual memes using 100+ Imgflip templates
- Command:
@moji meme: finals stress - Intelligently selects templates and generates captions
- Sends memes directly to iMessage
β¨ Custom Stickers (DALL-E 3)
- Creates unique, high-quality stickers from natural language
- Command:
@moji sticker: cute cat studying - Powered by OpenAI DALL-E 3
- Generates custom artwork on-demand
π Reaction Stickers
- Reacts to your messages with contextual memes
- Command:
@moji send sticker - Analyzes conversation context to select appropriate reactions
π€ Voice Notes
- Record audio in the browser
- Auto-converts WebM β MP3 using ffmpeg
- Sends voice notes directly to iMessage
πΈ Image Sharing
- Upload and share images from desktop UI
- Drag-and-drop interface
- Sends images directly to iMessage
π₯οΈ Beautiful Desktop UI
- Real-time pet state updates
- Animated pet sprite with speaking animations
- Speech bubbles with pet responses
- Debug panel for monitoring
- Voice playback with visual indicators
User Experience
- In iMessage: Send commands like
@moji meme: finals stressand receive memes/stickers - In Desktop UI: Interact with Moji, see pet transformations, hear voice responses
- Real-time updates: Pet state syncs every 3 seconds
- Emotional responses: Pet adapts to your mood and responds accordingly
Hybrid Intelligence Architecture
Moji showcases hybrid intelligence - the strategic combination of multiple specialized AI systems, each optimized for specific tasks, working together to create a more intelligent and capable system than any single AI could achieve alone.
The Hybrid Intelligence Approach
Traditional AI Approach: Single model trying to do everything β Limited capabilities, high latency, expensive, compromises on quality
Moji's Hybrid Intelligence Approach:
- OpenAI DALL-E 3 β Best-in-class image generation for custom stickers (specialized for creative artwork)
- ElevenLabs β Specialized text-to-speech for natural animal voices (optimized for voice synthesis)
- Imgflip API β Pre-trained meme templates for fast, contextual memes (purpose-built for memes)
- Rule-based mood classification β Fast, efficient emotion detection (lightweight, real-time)
- Intelligent orchestration β Seamless coordination between all systems (routing, parallel processing, fallbacks)
How Hybrid Intelligence Works in Moji
- Message Reception β Rule-based mood classifier analyzes text patterns (fast, efficient, no API calls)
- Mood Detection β Determines emotional state (excited/stressed/sad/neutral) using pattern matching
- Intelligent Routing β System decides which AI service to use based on context:
- ElevenLabs for voice generation (mood-specific animal voices)
- Imgflip for meme generation (contextual template selection)
- DALL-E 3 for custom stickers (unique artwork creation)
- Parallel Processing β Multiple AI services can work simultaneously when needed
- Response Generation β Each AI service generates specialized output optimized for its task
- Orchestration β System combines outputs into cohesive, intelligent response
- User Experience β Seamless, natural interaction that feels like a single intelligent system
Benefits of Hybrid Intelligence
- π― Specialization: Each AI excels at its specific task (DALL-E for images, ElevenLabs for voices, Imgflip for memes)
- β‘ Performance: Faster than single large model (parallel processing, optimized APIs)
- π° Cost-Effective: Use expensive models (DALL-E 3) only when needed, cheaper alternatives (Imgflip) for common tasks
- π§ Flexibility: Easy to swap or upgrade individual components without affecting the whole system
- π§ Intelligence: Combines strengths of multiple AI systems (creative generation + voice synthesis + contextual memes)
- β¨ Quality: Best-in-class results for each task (no compromises)
- π Scalability: Can handle multiple requests efficiently with specialized services
Example: Meme Generation Flow (Hybrid Intelligence)
- User:
@moji meme: finals stress - Rule-based parser (fast, no AI needed) extracts topic: "finals stress"
- Intelligent template selector (rule-based logic) chooses appropriate meme template from 100+ options
- Caption generator (pattern matching) creates contextual text based on topic
- Imgflip API (specialized meme service) generates meme (fast, free, contextual)
- Orchestration layer routes meme to iMessage
- Result: Perfect meme in < 2 seconds, using the right tool for the job
Example: Voice Response Flow (Hybrid Intelligence)
- User: "I'm feeling really sad"
- Rule-based mood classifier (fast pattern matching) detects: "sad"
- Animal selector (rule-based logic) chooses: Bird (comforting)
- Response generator (template-based) creates: "Sending hugs π€"
- ElevenLabs API (specialized TTS) generates bird voice (natural, realistic, optimized for voices)
- Orchestration layer plays voice + updates UI
- Result: Emotional, natural response with perfect voice in real-time
Example: Custom Sticker Flow (Hybrid Intelligence)
- User:
@moji sticker: cute cat studying - Rule-based parser extracts prompt: "cute cat studying"
- Prompt enhancer (template-based) creates detailed prompt for image generation
- OpenAI DALL-E 3 (specialized image generation) creates high-quality sticker
- Orchestration layer sends sticker to iMessage
- Result: Unique, high-quality custom sticker in ~10 seconds
Why Hybrid Intelligence Matters
Instead of forcing a single AI model to handle everything (which would be slow, expensive, and produce mediocre results), Moji uses the right tool for each job:
- Fast tasks (mood detection, template selection) β Rule-based systems (instant, free)
- Creative tasks (custom stickers) β DALL-E 3 (best quality, worth the cost)
- Voice tasks (animal voices) β ElevenLabs (specialized, natural)
- Meme tasks (contextual memes) β Imgflip (fast, free, perfect for memes)
This hybrid approach results in a system that's faster, cheaper, more capable, and produces better results than using a single AI model for everything.
Built with
Quick List (for Devpost)
TypeScript, JavaScript, React, Fastify, Bun, Node.js, SQLite, Photon AI, OpenAI DALL-E 3, ElevenLabs, Imgflip API, ffmpeg, macOS, Vite
Detailed Breakdown
Languages
- TypeScript - Type-safe JavaScript for all backend and frontend code
- JavaScript - Runtime language
- SQL - Database queries
Frameworks & Libraries
- React - UI framework for desktop application
- Fastify - High-performance web framework for backend services
- Vite - Build tool and development server
- Bun - Fast JavaScript runtime for backend services
Platforms
- macOS - Operating system (required for iMessage integration)
- Node.js - JavaScript runtime environment
- Browser - Chrome/Firefox/Safari for desktop UI
APIs & Services
- Photon AI iMessage Kit - iMessage integration and message handling
- OpenAI DALL-E 3 - Custom sticker generation from text prompts
- ElevenLabs - Natural animal voice synthesis (text-to-speech)
- Imgflip API - Meme template library (100+ templates)
Databases
- SQLite - Lightweight database for state management, message storage, and user data
Tools & Utilities
- ffmpeg - Audio/video processing (WebM to MP3 conversion)
- Git - Version control
- npm - Package management for Node.js
- Bun - Package management and runtime
Cloud Services (Optional)
- AWS S3 - Optional cloud storage for audio files (if configured)
Development Tools
- VS Code - Code editor
- Chrome DevTools - Browser debugging
- Postman/curl - API testing
How we built it
Architecture
We built Moji as a three-service architecture:
Desktop UI (React + Vite)
β
Pet Brain (Fastify + Bun + SQLite)
β
iMessage Bridge (Photon AI SDK)
β
iMessage (macOS Messages.app)
Technology Stack
Backend
- Bun - Fast JavaScript runtime for backend services
- Fastify - High-performance web framework
- SQLite - Lightweight database for state management
- TypeScript - Type safety across the stack
Frontend
- React - UI framework for desktop app
- Vite - Lightning-fast build tool
- CSS3 - Beautiful animations and gradients
AI & APIs
- Photon AI iMessage Kit - iMessage integration via database polling
- OpenAI DALL-E 3 - Custom sticker generation
- ElevenLabs - Natural animal voice synthesis (cat, dog, bird voices)
- Imgflip API - 100+ meme templates
Infrastructure
- ffmpeg - Audio/video processing (WebM β MP3 conversion)
- Node.js - Runtime for desktop app
Implementation Details
1. iMessage Bridge (imessage-bridge/)
- Database Polling: Directly queries macOS
chat.dbSQLite database - Message Detection: Polls for new messages every few seconds
- Command Parsing: Detects
@mojicommands and routes to appropriate handlers - Message Sending: Uses Photon AI SDK to send messages, images, and audio files
2. Pet Brain (pet-brain/)
- Mood Classification: Pattern-based emotion detection (sad, stressed, excited, neutral)
- Voice Generation: Integrates with ElevenLabs API to generate animal voices
- Meme Generation: Selects templates from Imgflip and generates captions
- Sticker Generation: Calls OpenAI DALL-E 3 API for custom stickers
- State Management: Stores user data, messages, and utterances in SQLite
- Audio Processing: Converts WebM to MP3 using ffmpeg
- File Uploads: Handles audio and image uploads from desktop UI
3. Desktop UI (desktop-app/)
- Real-time Polling: Fetches pet state every 3 seconds
- Voice Playback: Plays animal voices from Pet Brain
- Audio Recording: Records audio in browser, uploads to Pet Brain
- Image Upload: Handles image file selection and upload
- Visual Feedback: Animated pet sprite, speech bubbles, debug panel
- User Interaction: Handles first-click audio enablement (browser autoplay policy)
Key Implementation Challenges Solved
- iMessage Integration: Used Photon AI SDK with database polling to access iMessage
- Audio Autoplay: Implemented user interaction tracking to enable audio playback
- Voice Synthesis: Integrated ElevenLabs API with animal voice IDs
- File Handling: Implemented WebM β MP3 conversion with ffmpeg
- State Synchronization: Real-time pet state updates between services
- Mood Detection: Pattern-based emotion classification system
Development Process
Day 1: Research and prototyping
- Investigated iMessage integration options
- Tested Photon AI SDK
- Prototyped mood classification system
Day 1: Core development
- Built iMessage Bridge with database polling
- Implemented Pet Brain with mood classification
- Created meme and sticker generation systems
Day 2: Voice integration
- Integrated ElevenLabs API
- Implemented voice generation and playback
- Added animal transformation logic
Day 2: UI and polish
- Built desktop UI with React
- Added animations and visual feedback
- Implemented audio recording and image upload
- Fixed browser autoplay issues
Challenges we ran into
1. iMessage Integration
Challenge: iMessage on macOS has no public API. Apple doesn't provide official ways to programmatically access iMessage.
Solution:
- Used Photon AI SDK which provides iMessage access through database polling
- Directly queried macOS
chat.dbSQLite database - Implemented robust message detection and parsing
- Handled edge cases like message deduplication and timestamp conversion
2. Browser Audio Autoplay Policy
Challenge: Browsers block audio autoplay until user interaction, preventing automatic voice playback.
Solution:
- Implemented user interaction tracking
- Queued audio until first user click
- Added visual indicators ("Click anywhere to enable Moji's voice!")
- Played queued audio after user interaction
3. Voice Synthesis Integration
Challenge: Integrating ElevenLabs API and generating natural animal voices.
Solution:
- Researched ElevenLabs voice IDs for animal-like voices
- Implemented voice generation based on mood and animal type
- Added fallback to pre-recorded audio files
- Handled API rate limits and error cases
4. Audio Format Conversion
Challenge: Browser records audio as WebM, but iMessage prefers MP3.
Solution:
- Integrated ffmpeg for audio conversion
- Implemented WebM β MP3 conversion pipeline
- Added fallback to WebM if conversion fails
- Handled file size calculations and duration estimates
5. Real-time State Synchronization
Challenge: Keeping desktop UI in sync with Pet Brain state.
Solution:
- Implemented polling mechanism (3-second intervals)
- Added state comparison to detect changes
- Implemented voice playback triggers on new utterances
- Added visual feedback for state changes
6. Message Deduplication
Challenge: Preventing duplicate processing of the same message.
Solution:
- Implemented message ID tracking
- Used
processedMessagesSet to track processed messages - Added timestamp-based filtering
- Handled edge cases like message updates
7. Mood Classification
Challenge: Detecting user emotions from text messages accurately.
Solution:
- Implemented pattern-based classification system
- Created regex patterns for different emotions
- Prioritized specific emotions (sad before stressed)
- Added fallback to neutral mood
8. File Upload Handling
Challenge: Handling file uploads from desktop UI and sending to iMessage.
Solution:
- Implemented multipart form data handling
- Added file validation and size checks
- Integrated with Photon AI SDK for file sending
- Added error handling and user feedback
Accomplishments that we're proud of
1. Solved the "Impossible" iMessage Integration
We successfully integrated with iMessage on macOS using Photon AI SDK, something that many developers consider nearly impossible. We built a robust system that can send and receive messages, images, and audio files.
2. Hybrid Intelligence Architecture
We implemented a hybrid intelligence system that seamlessly orchestrates 4 different AI services, each optimized for its specific task:
- Photon AI - iMessage integration
- OpenAI DALL-E 3 - Custom sticker generation (best-in-class image generation)
- ElevenLabs - Natural animal voice synthesis (specialized TTS)
- Imgflip - Meme template library (fast, contextual memes)
This hybrid approach is faster, more cost-effective, and produces better results than using a single AI model for everything. We combined rule-based systems (mood classification, template selection) with specialized AI services (DALL-E 3, ElevenLabs) to create a system that's greater than the sum of its parts.
3. Emotional Intelligence System
We built a mood classification system that accurately detects user emotions and responds appropriately. The pet transforms into different animals based on your mood, creating a unique emotional connection.
4. Real Animal Voice Synthesis
We integrated ElevenLabs to generate natural animal voices on-demand. The pet responds with realistic cat, dog, and bird voices that match its current form.
5. Beautiful Desktop UI
We created a polished desktop UI with:
- Real-time pet state updates
- Animated pet sprite
- Speech bubbles
- Voice playback
- Audio recording
- Image upload
6. Complete Product
We built a complete, working system - not just a demo. Every feature works end-to-end:
- Message detection β Mood classification β Voice generation β UI updates
- Meme generation β Template selection β Caption generation β iMessage sending
- Sticker generation β DALL-E 3 API β Image generation β iMessage sending
7. Technical Excellence
- Type-safe codebase - Full TypeScript implementation
- Error handling - Robust error handling throughout
- Performance - Optimized polling and state management
- User experience - Smooth animations and visual feedback
8. Documentation
We created comprehensive documentation:
- Detailed README with setup instructions
- Architecture diagrams
- API documentation
- Troubleshooting guides
- Video script for demo
What we learned
Technical Learnings
iMessage Integration
- Learned about macOS
chat.dbdatabase structure - Understood Photon AI SDK capabilities and limitations
- Discovered challenges with message timestamp conversion
- Learned about message deduplication strategies
- Learned about macOS
Voice Synthesis
- Explored ElevenLabs API capabilities
- Learned about voice ID selection and customization
- Understood audio format requirements
- Discovered browser autoplay policy limitations
Audio Processing
- Learned about WebM and MP3 formats
- Understood ffmpeg conversion pipeline
- Discovered file size and duration estimation techniques
- Learned about audio buffer handling
State Management
- Learned about real-time state synchronization
- Understood polling vs. WebSocket trade-offs
- Discovered state comparison techniques
- Learned about UI update optimization
API Integration
- Learned about rate limiting and error handling
- Understood API authentication and key management
- Discovered best practices for API calls
- Learned about fallback strategies
Hybrid Intelligence
- Learned about combining multiple AI services for optimal results
- Understood the importance of choosing the right tool for each task
- Discovered benefits of hybrid approaches (speed, cost, quality)
- Learned about intelligent orchestration and routing
- Understood trade-offs between rule-based and AI-based systems
Non-Technical Learnings
User Experience
- Learned about emotional design and user connection
- Understood importance of visual feedback
- Discovered value of animations and transitions
- Learned about accessibility considerations
Project Management
- Learned about breaking down complex problems
- Understood importance of incremental development
- Discovered value of testing and iteration
- Learned about time management in hackathons
Team Collaboration
- Learned about code organization and structure
- Understood importance of documentation
- Discovered value of clear communication
- Learned about version control best practices
Problem Solving
- Learned about researching solutions
- Understood importance of trying multiple approaches
- Discovered value of persistence and debugging
- Learned about asking for help when needed
What's next for Moji
Short-term Enhancements
Multi-User Support
- Support multiple users in the same chat
- Track individual user moods and preferences
- Personalized responses per user
More Animal Types
- Add more animals (fox, rabbit, penguin, etc.)
- More diverse emotional responses
- Custom animal selection
Voice Customization
- Allow users to customize animal voices
- Support for different voice styles
- Voice pitch and speed adjustment
Enhanced Mood Detection
- Improve mood classification accuracy
- Support for more emotions (angry, happy, anxious, etc.)
- Context-aware mood detection
Better Meme Generation
- More meme templates
- Smarter template selection
- Custom meme creation
Medium-term Goals
Mobile App
- React Native mobile app
- iOS and Android support
- Push notifications
Group Chat Support
- Support for group iMessage chats
- Multi-user interactions
- Group mood analytics
Voice-to-Text
- Whisper API integration
- Voice message transcription
- Better voice note handling
Image Analysis
- GPT-4 Vision integration
- Image captioning
- Visual mood detection
Pet Personality
- Customizable pet personalities
- Learning user preferences
- Adaptive responses
Long-term Vision
AI Agent Enhancement
- More sophisticated AI decision-making
- Long-term memory
- Contextual conversations
Social Features
- Share pet moments with friends
- Pet competitions
- Social mood tracking
Analytics Dashboard
- Mood trends over time
- Conversation analytics
- Pet interaction statistics
Monetization
- Premium features
- Custom animal packs
- Advanced voice options
Open Source
- Open source the project
- Community contributions
- Plugin system
Technical Improvements
Performance Optimization
- Reduce polling frequency
- Implement WebSocket connections
- Optimize database queries
Scalability
- Support for more users
- Distributed architecture
- Cloud deployment
Reliability
- Better error handling
- Automatic recovery
- Health monitoring
Security
- End-to-end encryption
- Secure API key storage
- Privacy controls
Testing
- Unit tests
- Integration tests
- End-to-end tests
Conclusion
Moji represents a successful demonstration of hybrid intelligence - the strategic combination of multiple specialized AI services working together to create something greater than the sum of its parts. By combining OpenAI DALL-E 3 (best-in-class image generation), ElevenLabs (specialized voice synthesis), Imgflip (fast meme generation), and rule-based mood classification, we've created a system that's faster, more cost-effective, and more capable than using a single AI model.
We're proud of what we built and excited about the future possibilities. The project demonstrates the power of hybrid intelligence - choosing the right tool for each task and orchestrating them seamlessly to create a cohesive, delightful user experience that's truly unique and engaging.
Built With
- bun
- elevenlabs
- fastify
- ffmpeg
- imgflip-api
- javascript
- macos
- node.js
- openai-dall-e-3
- photon-ai
- react
- sqlite
- typescript


Log in or sign up for Devpost to join the conversation.