Penguin

Inspiration

Every major team platform has AI bots: Slack, Discord, Teams. But iMessage doesn't. For most friend groups and startup co-founders, iMessage is where real decisions, plans, and authentic conversations actually happen.

Apple famously keeps iMessage locked down with zero official APIs. We wanted to smash through that walled garden. If AI is going to be truly useful, it needs to live where we already talk, without forcing everyone to download another app.

What it does

Penguin is not a chatbot, it is a context-aware AI agent that lives inside your existing iMessage group chat. Tag @penguin and it responds, with full awareness of what the group has been talking about.

Talk to it instead of typing. Drop a voice note in the chat and VoiceOS transcribes what you said into text Penguin can act on. Say "met with the Acme founder, wants a follow-up next week" and Penguin turns it into a clean contact card and action item.

Send it images. Photos people drop in the chat, whiteboard shots, screenshots, design mockups, are stored in Tencent Cloud Object Storage, so Penguin can pull them back up and give real design feedback or turn a messy whiteboard into structured notes.

It remembers. Penguin keeps a rolling memory of the group, so it acts with context from days ago without anyone re-explaining.

It can take action. We are wiring in MCP tools (Google Search, Calendar, GitHub, Linear) so you can tell @penguin to "schedule a meeting for Tuesday" or "open an issue on our repo" right from the chat.

How we built it

Because Apple has no official API, we had to get creative.

The bridge. BlueBubbles bridges iMessage via AppleScript on a macOS box, exposed over a Cloudflare Tunnel to a FastAPI webhook on Railway.

The agent (Adal). We used Adal (AdalFlow) as the agent framework. It runs the @penguin loop: read the message, decide which tool or model to use, and return one clean reply. It kept the agent logic readable instead of a tangle of if-statements.

The brain (multi-model router). Instead of one big expensive model for everything, the agent routes each message to the cheapest model that can handle it:

A zero-cost keyword filter catches the obvious stuff first.
A small Qwen 2.5 1.5B classifier picks the right specialist.
Specialists: Qwen 2.5 14B for hard reasoning, Mistral 7B for transcription, Gemma 2 9B for editing and copywriting, Qwen 2.5 7B for image and design feedback, all served via HuggingFace and featherless-ai.
Every model call has a fallback, so a failed request never makes Penguin go silent.

Storage (Tencent Cloud). Images sent in the chat go to Tencent Cloud Object Storage, which is S3-compatible, so Penguin can fetch them later for design feedback.

Memory (Butterbase). Butterbase is our database. It stores the rolling group context and the contact cards Penguin builds, so it remembers who said what across days without us running any infrastructure.

Challenges we ran into

Breaking the walled garden. Getting BlueBubbles cleanly authenticated through a Cloudflare tunnel with a secure FastAPI webhook ate up a big chunk of our morning.

The duplicate trap. BlueBubbles fires 2 to 3 duplicate webhook events per message. We added an LRU dedup cache so Penguin doesn't triple-reply to every text.

The silent agent. Early on, any model-API hiccup made Penguin reply with nothing. The failure was being swallowed instead of surfaced. We hardened every model call so it falls back instead of going quiet.

Demo-induced rate limits. Hammering the API while testing kept hitting provider rate limits. We added a /warmup endpoint and request spacing for a smooth live demo.

Accomplishments that we're proud of

We bypassed the locked gates. We got a working, context-aware AI agent running inside iMessage, a platform built to prevent exactly this.

Voice-first. With VoiceOS handling transcription, you can just talk to Penguin and still get structured, actionable output back.

Cheap and smart. Routing each message to the smallest capable model proves you can run real agent workflows fast and cheap.

What we learned

The hard part of an iMessage agent is not the model, it is everything around it: deduplicating noisy webhooks, surviving rate limits, and making sure one failed API call never makes the agent go mute. Resilience mattered more than model size. And the moment talking became an option through VoiceOS, most messages to Penguin were spoken, not typed.

What's next for Penguin

Right now, Penguin can talk. Next, we want it to act.

Proactive interventions. Instead of waiting to be tagged, Penguin will flag stalled decisions, summarize overnight threads you missed, and nudge people about open action items. We plan to hit this in the 24 hour track.

Hyper-localized fine-tuning. Training small custom models on a group's own chat history so Penguin picks up the inside jokes and slang of that specific team. Currently in testing.