Iris | Devpost

Iris Logo

Inspiration

Putting pen on paper (or Apple Pencil on iPad) is one of the most natural ways of getting early thoughts out and giving shape to ideas. It's often much easier to describe a complicated software system, or a tricky geometry problem, with a simple diagram. However, agentic AI tools today tend to heavily favor textual input. Sure, you could take a quick snap of a drawing and upload it—but what if you wanted to make a tiny edit?

What if agentic tools on your Mac could read your iPad screen?

Write some code on your Mac, have Iris draw a diagram on your iPad, edit the diagram, and have Iris edit the code—this is just one of the possibilities with Iris, the agentic AI that allows you to work cross-device.

What it does

Iris is an always-on AI that spans your Apple devices. It watches your screens, listens to your voice, and acts—drawing diagrams on your iPad, overlaying information on your Mac, and coordinating across both simultaneously.

Sees: watches your Mac and iPad screens, understands what you're working on
Hears: streams speech-to-text input from any device (including iPhones!) for natural voice interaction
Draws: renders diagrams, widgets, and visual plans directly onto your iPad canvas with Apple Pencil interaction
Thinks across devices: your Mac and iPad are one unified workspace, not two isolated screens

How we built it

We built Iris as a local-first, cross-device system centered on a Mac-hosted backend.

Backend (Python + Flask): We implemented a unified API for sessions, transcript ingestion, screenshot upload, device commands, and chat sync. Data is stored on the filesystem (backend/data/...) so the whole system is easy for agents to inspect and act on.
iPad app (SwiftUI + PencilKit): The iPad app's infinite canvas was built using a custom coordinate system. It also exposes a local canvas API for agent-driven actions.
iPhone app (SwiftUI): We built a lightweight mobile companion for session control, push-to-talk voice capture (with on-device transcription), camera/photo uploads, and live chat/status viewing.
Mac app (Electron + React): The Mac acts as the orchestration/control center, with session management and agent interaction UI.
Agent routing: We support multiple model/provider paths (including Codex/Claude Code-linked sessions) so the same Iris session can map to external coding-agent conversations.
Proactive behavior: Iris periodically analyzes incoming screen context and can propose or place useful UI help (widgets/suggestions) instead of only reacting to direct prompts.

Challenges we ran into

Getting proactive visual understanding to work reliably was our hardest problem. Foundation models are good at raw image understanding, but not so much at visual interactivity. We solved this by building a dual visual pipeline: (1) live screenshot capture from iPad + Mac at inference time so each agent turn sees current state, and (2) screenshot ingestion + structured image parsing for proactive monitoring. We also passed normalized canvas/viewport coordinate snapshots so visual reasoning could map back to actionable placement. In practice, this shifted us from “the model can describe the screen” to “the agent can interact visually.”

Accomplishments that we're proud of

Implemented proactive behavior: Iris can analyze incoming visual context and suggest useful next actions instead of waiting for explicit prompts.
Built a true cross-device AI workflow: Iris treats Mac, iPad, and iPhone as one shared workspace, not separate apps.
Shipped an iPad infinite-canvas app with agent control: The agent can place widgets, draw/trace SVGs, and interact with the canvas using coordinate-aware APIs.
Integrated external coding agents cleanly: Codex and Claude Code can be linked as first-class session backends, with synchronized chat history across Mac/iPhone/iPad views, taking advantage of these tools' impressive capabilities without reinventing the wheel—and allowing users to stick with what they already know.

What we learned

Proactive AI needs to be carefully controlled—without gating and confidence thresholds, “helpful” suggestions flood the user.
Multimodal agents need structure: screenshots alone weren’t enough—coordinate snapshots and consistent structural schemas made visual reasoning usable.

What's next for Iris

Move from periodic sync to low-latency shared state so all devices update more smoothly.
Better context management, with long-horizon context and cross-provider continuity.
Shared workspaces and role-based workflows with stronger auth/permissions scoping, allowing Iris to be used by teams.