Inspiration

Most AI products today still inherit the same fundamental limitation: they add AI to software that was originally designed only for humans.

APPI desktop overview with multiple windows open

That creates a long list of problems:

  • the agent is bolted onto the side instead of being part of the system
  • the interaction model stays trapped inside a chat window
  • the agent depends on brittle tool glue, screenshots, RAG layers, or application-specific integrations
  • every product reinvents a slightly different tool wrapper instead of giving the agent a native environment
  • collaboration between humans and AI is treated as an afterthought rather than a first-class design problem

We found that frustrating.

We wanted to explore a different question:

What if software was designed for AI from the start?

And more specifically:

What if an agent could operate inside its own lightweight environment, and humans could join it, guide it, interrupt it, and collaborate with it directly?

That is the idea behind Appi.

Appi is not a chatbot with access to tools. It is an AI-native operating environment. Instead of exposing an agent to a patchwork of human apps and APIs, Appi gives the agent a structured runtime, native commands, prepared context, and a workspace that humans can share.

This project is also inspired by a practical observation: many current AI agents are too heavy, too fragile, too expensive to run continuously, or too inconsistent in how they understand context. We wanted to move in the opposite direction:

  • lighter runtime
  • more explicit command surface
  • more structured context
  • clearer system boundaries
  • better collaboration between human and agent

Appi is our proof of concept for that direction.

APPI mascot or welcome screen

What it does

Appi is an AI-native desktop-like workspace where an agent can perceive, reason, and act across multiple apps, while humans collaborate with it in real time.

In the current prototype, Appi already provides:

  • a desktop shell with windows, a dock, a command palette, and workspace state
  • multiple built-in apps such as Notes, Mail, Files, Music, Chess, Sheets, Preview, Calendar, Reminder, Messages, Terminal, and more
  • voice/live interaction with Gemini
  • visual understanding through desktop capture and live vision streaming
  • a native command layer for agent actions
  • local sessions and remote collaborative sessions
  • public/private presence semantics

This means the agent can do things such as:

  • open and navigate apps
  • move across the workspace through native commands
  • read and update notes and text files
  • work with files and file-backed media
  • control music playback
  • play chess
  • compose mail drafts
  • inspect a spreadsheet
  • interact with the workspace in a way that is visible to the human user

The important part is not only that Appi can perform actions. The important part is how it performs them.

Instead of giving the model a giant unstructured tool menu or asking it to guess its way through a desktop entirely from pixels, Appi provides a native operational surface. The agent acts through explicit commands, and the system prepares relevant context in advance.

That makes the product feel less like "AI watching over your shoulder" and more like "AI working inside a system designed for it."

Current demoable flows

For the demo, Appi can credibly show:

  • voice-driven navigation across apps
  • opening Music, Notes, and Chess through natural spoken instructions
  • selecting and controlling music playback
  • reading and rewriting notes
  • switching from voice to text while preserving the same workspace context
  • collaborative desktop behavior with remote session foundations and presence

Why this fits the hackathon

Appi fits UI Navigator because the agent can observe the workspace and execute actions inside it.

It also fits Live Agents because the interaction can happen through live voice, interruption, and back-and-forth collaboration.

We deliberately do not position Appi primarily as a Creative Storyteller. The stronger and more honest angle is the AI-native operating environment itself.

Appi multi-window workspace

How we built it

Appi is built as a layered system rather than as a single UI with model calls attached to it.

At a high level, the architecture is split into four main layers in the current repo:

  1. Renderer The Svelte frontend that renders the desktop, windows, apps, overlays, and interactions.

  2. Runtime A pure JavaScript core that owns state, commands, dialogs, and app behavior.

  3. Local Host The authority that boots the runtime, persists local state, resolves effective presentation, and exposes a stable host interface to the renderer.

  4. Remote Host / Go Agent Engine The remote collaboration layer that lets users create or join shared sessions over WebSocket, persist them through Firebase/Firestore, and run cloud-side agent turns on Google Cloud.

Why this architecture matters

The key architectural decision is that the frontend is not supposed to be the source of truth.

The frontend reflects state.

The runtime and host layers own state, command routing, and presentation resolution. That distinction is critical because it is what makes Appi feel like an operating environment rather than a chat UI.

Native agent protocol

One of the most important technical ideas in Appi is its native command and protocol layer.

Instead of giving the model arbitrary shell access or exposing hundreds of loosely-defined tool calls, Appi uses:

  • protocol envelopes for command, query, event, and snapshot
  • a compact native command surface for the agent
  • explicit command grammar
  • explicit visible vs headless execution modes
  • per-app capabilities and references

This makes the agent surface:

  • bounded
  • more secure
  • easier to reason about
  • easier to test
  • more uniform across apps

This is a major product and technical difference from many agent demos that rely on improvised wrappers around existing applications.

Prepared context instead of heavy glue

Another core technical idea is context preparation.

Many agent systems wait until the model is already in session, then try to reconstruct the world through:

  • repeated screenshots
  • large visual payloads
  • complex retrieval layers
  • expensive and late-stage context assembly

Appi takes a different approach.

The runtime, host, and workspace model already know a lot about the environment:

  • open apps
  • windows
  • files
  • notes
  • metadata
  • session mode
  • visible state
  • workspace structure

That means the system can prepare meaningful context before or during interaction in a native, structured way. Vision still matters, but it is not the only way the agent knows what is happening.

This makes Appi lighter, cheaper, and more uniform than systems that depend entirely on screen interpretation or brittle tool chains.

Local-first and remote-capable

Appi already supports:

  • local in-browser sessions
  • remote session creation and joining
  • workspace and presence synchronization through a Go WebSocket server
  • Cloud Run-friendly Go deployment packaging
  • Firebase/Firestore-backed session persistence for remote rooms
  • a Gemini-backed cloud orchestration endpoint grounded on the native APPI command surface

In the current branch, the Go service already does more than simple room fanout:

  • it keeps the current WebSocket collaboration path for the UI
  • it can persist remote session state in Firestore
  • it exposes HTTP endpoints for session management
  • it can call Gemini in the cloud and ask for native APPI commands using the same command language documented in AGENT-COMMANDS.md

The longer-term architecture, which also exists in ongoing Go work outside this branch, moves toward a much stronger model:

  • a Go backend reflecting the operating environment state
  • a headless runtime controlled by the agent
  • a frontend that acts as a projection of that runtime rather than as the primary host

That architecture is important because it means Appi does not conceptually need the visible UI in order to exist.

Current architecture diagram

flowchart LR
    Human["Human User"]
    Frontend["APPI Frontend\nSvelte Desktop UI"]
    Host["Host Contract\ncommand / query / event / snapshot"]
    LocalHost["Local Host\nstate persistence + presentation"]
    Runtime["APPI Runtime\napps + commands + workspace state"]
    Gemini["Gemini\nlive voice + multimodal reasoning"]
    Vision["Vision Capture\nscreenshot + live stream"]
    GoServer["Go Agent Engine\nWebSocket + HTTP"]
    RemoteClients["Other APPI Clients"]
    Files["Local Files\nOPFS + metadata index"]
    Firestore["Firebase / Firestore\nremote session state"]

    Human --> Frontend
    Frontend --> Host
    Host --> LocalHost
    LocalHost --> Runtime
    Runtime --> Files
    Frontend --> Vision
    Vision --> Gemini
    Frontend --> Gemini
    LocalHost --> GoServer
    GoServer --> Firestore
    GoServer --> RemoteClients

Architecture diagram current local and remote flow

Target cloud architecture

The broader Appi direction extends the same model into a cloud-native, headless deployment shape.

This is the intended architecture:

  • a lightweight Go backend deployed on Cloud Run
  • Firebase / Firestore for remote session state and reconnectable workspaces
  • Cloud Storage for file persistence
  • WebSocket fanout for shared live state
  • HTTP surfaces where needed for integrations and control
  • a headless runtime that can keep working even when no desktop UI is open
  • the UI acting as a connected client that reflects and interacts with the live runtime
  • a cloud orchestration turn that reads the native command contract from AGENT-COMMANDS.md
flowchart LR
    Human["Human User"]
    WebUI["APPI UI Client\nDesktop / Web"]
    Agent["Appi Agent Runtime\nHeadless / Autonomous"]
    GoBackend["Go Agent Engine\nCloud Run"]
    WS["Realtime Transport\nWebSocket"]
    Firestore["Firebase / Firestore\nworkspace/session data"]
    Storage["Cloud Storage\nfiles and assets"]
    Gemini["Gemini\nLive + Cloud turns"]
    Commands["AGENT-COMMANDS.md\nnative command contract"]
    Gmail["Gmail via GCP OAuth2 + Pub/Sub"]
    Future["Future Channels\nTelegram / SMS / others"]

    Human --> WebUI
    WebUI --> WS
    WS --> GoBackend
    GoBackend --> Agent
    Agent --> Firestore
    Agent --> Storage
    Agent --> Gemini
    GoBackend --> Commands
    GoBackend --> Gmail
    GoBackend --> Future

What is on Google Cloud now

For the hackathon build, the cloud-side APPI engine is packaged as a Go service designed for Cloud Run.

It includes:

  • WebSocket remote sessions at /ws
  • REST session management at /v1/sessions
  • Firestore persistence through Google Cloud / Firebase infrastructure
  • a cloud orchestration endpoint at /v1/sessions/{sessionId}/agent-turn
  • a deploy script and Cloud Build config in the repo

This matters for the submission because it gives a direct code path showing:

  • the backend is hosted on Google Cloud
  • the project uses Gemini
  • the agent surface is grounded in an explicit command contract
  • remote collaboration is not just local browser state pretending to be multi-user

This second diagram is important conceptually: it shows why Appi is more than a desktop demo. The user interface can become one client of a runtime that also exists in the cloud.

Apps and command surface

Appi is also unusual because it gives the agent a consistent command surface across many built-in apps instead of scattering tool contracts everywhere.

Current command groups already include:

  • system
  • vision
  • files
  • text
  • notes
  • calendar
  • sheets
  • preview
  • clock
  • calculator
  • messages
  • mail
  • reminder
  • music
  • settings
  • terminal
  • chess

This matters because the agent is not navigating a random jungle of one-off wrappers. It is learning one coherent operating language.

Command palette or command list Notes app Files app Chess app Music app

Challenges we ran into

One of the biggest challenges was refusing the easiest path.

The easiest path would have been to build a normal app with a chat panel, connect a few tools, and call that an agent.

Instead, we decided to build around stronger constraints:

  • the UI should not become the source of truth
  • the runtime should remain framework-independent
  • the command surface should stay explicit
  • local and remote modes should share the same host contract
  • collaboration should be treated as a product concern, not only an infrastructure concern

That made the architecture cleaner, but it also made the implementation harder.

Challenge 1: separating product logic from UI logic

A real AI-native environment cannot just store all of its behavior in UI components.

We had to keep the renderer thin and move product logic into runtime and host layers. That is a much better long-term design, but it requires more discipline than shipping everything in a frontend component tree.

Challenge 2: making the agent surface explicit

Another challenge was deciding how much freedom to give the agent.

Giving an agent unlimited access looks impressive in a demo, but it quickly becomes:

  • inconsistent
  • harder to validate
  • harder to secure
  • harder to reason about

So we built a native command surface instead. That reduced ambiguity, but it meant designing command grammar, capability coverage, references, visible/headless modes, and agent-facing documentation.

Challenge 3: context without bloat

A lot of AI agent systems become heavy because they depend on large context assembly loops. We wanted something more lightweight.

That led us to a hybrid model:

  • native workspace state where possible
  • metadata-first file context
  • selective vision when visual confirmation is actually needed

That is technically more interesting than simply taking screenshots forever, but it also means more architecture work up front.

Challenge 4: collaboration semantics

It is easy to say "collaborative AI." It is harder to define what that means in product terms.

Questions we had to face:

  • what is shared and what is local?
  • how should public/private mode work?
  • how should presence be represented?
  • what should happen when multiple clients observe the same workspace?

These are product questions, not just transport questions, and they remain an active frontier for Appi.

Challenge 5: local-first vs cloud-native

We wanted Appi to work locally and to have a credible path to a much lighter cloud runtime.

That means balancing:

  • local persistence
  • remote sessions
  • protocol design
  • future server authority
  • headless operation

The result is a system that already demonstrates the core ideas, while still pointing clearly toward the next architecture step.

Accomplishments that we're proud of

We are proud that Appi already feels like a system, not just a prompt demo.

1. We built a real AI-native workspace

Appi already has a desktop shell, apps, windows, presence, local/remote modes, and agent actions. That gives the product a coherent world for the agent to live in.

2. We created a native operating language for the agent

The Appi protocol and command surface are among the most important parts of the project.

They provide:

  • explicit actions
  • predictable behavior
  • app-level consistency
  • a cleaner bridge between LLM reasoning and product execution

This is one of the technical accomplishments we are most proud of.

3. We made multimodal interaction feel product-native

Voice, vision, and workspace actions are not scattered around the app as unrelated features. They reinforce the same central idea: the agent is acting inside the system.

4. We built collaboration into the concept itself

Appi is not only about autonomous agents. It is about human and AI collaboration in the same environment.

That includes:

  • guiding the agent
  • interrupting it
  • observing it
  • sharing workspace state
  • switching between public and private behavior

5. We kept the architecture lightweight

A big part of the Appi thesis is that useful agents should not require a giant stack of expensive glue.

We are proud of the direction we took:

  • lightweight runtime
  • explicit command model
  • prepared context
  • metadata-first file handling
  • cloud architecture that can stay small and practical

6. We already have a credible path to headless agents

Even when not fully completed on this branch, the architecture direction is already strong: Appi can evolve from a visible collaborative workspace into a headless cloud runtime with UI clients attached to it.

That opens the door to persistent, always-on, affordable agents.

Local or private mode view Agent collaboration and approval flow

What we learned

One of the clearest lessons from building Appi is that agent quality is not only about model quality.

It is also about environment quality.

If you want a better agent, you need:

  • better runtime boundaries
  • better state models
  • better commands
  • better context preparation
  • better collaboration models

We learned that "AI-native" has to be architectural

It is easy to call a product AI-native because it uses a modern model.

But in practice, the AI-native part only becomes meaningful when:

  • the system is designed around the agent's operating constraints
  • the UI is not the only source of state
  • the command surface is coherent
  • the environment can exist independently of one visible chat panel

We learned that collaboration is a first-class systems problem

The question "how do humans collaborate with AI?" is still underexplored.

Building Appi made it clear that collaboration needs:

  • shared state
  • private state
  • presence
  • interruption
  • clear visibility rules
  • a stable runtime model

This is not a cosmetic UX issue. It is part of the core system design.

We learned that lighter can be better

There is a strong temptation in agent systems to keep adding more:

  • more tools
  • more wrappers
  • more screenshots
  • more retrieval
  • more orchestration

Appi pushed us toward a different lesson: sometimes the better path is to reduce complexity by designing a cleaner environment for the agent.

We learned that proof of concept can still mean deep infrastructure thinking

Appi is still a proof of concept, but it already forced serious questions about:

  • runtime authority
  • transport contracts
  • shared presence
  • file persistence
  • local vs remote execution
  • headless cloud architecture

In that sense, the prototype is already teaching us how a broader product category might need to be built.

What's next for Appi

Appi is not finished. In many ways, this project is the beginning of a larger direction.

1. Stronger cloud runtime

The next major step is to keep moving from client-side shared state toward a stronger Go-based backend/runtime authority model.

That means:

  • merging the server and runtime concepts more tightly
  • making the server a fuller source of truth
  • improving headless execution
  • making persistent autonomous operation more natural

2. Better integrations

A major part of the roadmap is communication and trigger-based workflows.

That includes:

  • Gmail integration through GCP services
  • OAuth2-based account connection
  • Pub/Sub-driven event ingestion
  • future messaging channels and notifications

The goal is for Appi not only to respond when opened, but to keep working when events arrive.

3. Persistent autonomous agents

We want Appi to support agents that:

  • keep a workspace alive over time
  • maintain files and memory
  • react to incoming events
  • operate headlessly in the cloud
  • stay affordable enough for real users

4. Multi-user collaboration

The remote session foundations already exist. The next steps are:

  • stronger server authority
  • richer presence propagation
  • stricter sync semantics
  • more robust collaborative workflows

We think this is one of the most important long-term areas for the product. Appi should not only be an environment for one user and one agent. It should become a collaborative workspace for humans and agents together.

5. A common runtime language for agents

One of the most exciting long-term directions is treating the Appi runtime and command surface as a common language for agents.

That could unlock:

  • primary agent + sub-agent collaboration
  • more robust agent-to-agent delegation
  • better testing and observability
  • more uniform automation behavior across applications

6. Better product polish without losing the architecture

We also want to keep improving:

  • app separation
  • UX polish
  • multimodal smoothness
  • onboarding
  • reliability

But the priority remains the same: preserve the architecture that makes Appi different.

Why we think this is technically innovative

From a technical perspective, Appi is innovative for several reasons:

  • it treats the agent as a first-class system actor
  • it introduces a native, bounded operating language instead of relying on ad hoc tool chaos
  • it prepares structured context for the agent instead of depending entirely on expensive perception loops
  • it separates runtime, host, renderer, and transport concerns clearly
  • it provides a path from local interactive use to cloud headless operation
  • it treats collaboration and presence as architectural concerns

In other words, the innovation is not "we called a model API." The innovation is the system shape around the model.

Why we think this is product innovative

From a product perspective, Appi proposes a different category:

  • not chatbot-first
  • not tool-wrapper-first
  • not browser automation first
  • not pure operating system clone

Instead, it is:

  • agent-native
  • human-collaborative
  • lightweight
  • multimodal
  • local-to-cloud
  • runtime-first

We think that matters because the next generation of AI products will not only need better models. They will need better environments to live in.

Final hero shot of Appi desktop

Short closing version

Appi is a proof of concept for a different kind of AI product.

Instead of adding AI to software designed for humans, Appi explores what happens when you design the software for AI from the start.

The result is a lightweight, structured, collaborative operating environment where humans and agents can work together locally, remotely, and eventually headlessly in the cloud.

Built With

Share this project:

Updates