Appi | Devpost

Inspiration

Most AI products today still inherit the same fundamental limitation: they add AI to software that was originally designed only for humans.

That creates a long list of problems:

the agent is bolted onto the side instead of being part of the system
the interaction model stays trapped inside a chat window
the agent depends on brittle tool glue, screenshots, RAG layers, or application-specific integrations
every product reinvents a slightly different tool wrapper instead of giving the agent a native environment
collaboration between humans and AI is treated as an afterthought rather than a first-class design problem

We found that frustrating.

We wanted to explore a different question:

What if software was designed for AI from the start?

And more specifically:

What if an agent could operate inside its own lightweight environment, and humans could join it, guide it, interrupt it, and collaborate with it directly?

That is the idea behind Appi.

Appi is not a chatbot with access to tools. It is an AI-native operating environment. Instead of exposing an agent to a patchwork of human apps and APIs, Appi gives the agent a structured runtime, native commands, prepared context, and a workspace that humans can share.

This project is also inspired by a practical observation: many current AI agents are too heavy, too fragile, too expensive to run continuously, or too inconsistent in how they understand context. We wanted to move in the opposite direction:

lighter runtime
more explicit command surface
more structured context
clearer system boundaries
better collaboration between human and agent

Appi is our proof of concept for that direction.

What it does

Appi is an AI-native desktop-like workspace where an agent can perceive, reason, and act across multiple apps, while humans collaborate with it in real time.

In the current prototype, Appi already provides:

a desktop shell with windows, a dock, a command palette, and workspace state
multiple built-in apps such as Notes, Mail, Files, Music, Chess, Sheets, Preview, Calendar, Reminder, Messages, Terminal, and more
voice/live interaction with Gemini
visual understanding through desktop capture and live vision streaming
a native command layer for agent actions
local sessions and remote collaborative sessions
public/private presence semantics

This means the agent can do things such as:

open and navigate apps
move across the workspace through native commands
read and update notes and text files
work with files and file-backed media
control music playback
play chess
compose mail drafts
inspect a spreadsheet
interact with the workspace in a way that is visible to the human user

The important part is not only that Appi can perform actions. The important part is how it performs them.

Instead of giving the model a giant unstructured tool menu or asking it to guess its way through a desktop entirely from pixels, Appi provides a native operational surface. The agent acts through explicit commands, and the system prepares relevant context in advance.

That makes the product feel less like "AI watching over your shoulder" and more like "AI working inside a system designed for it."

Current demoable flows

For the demo, Appi can credibly show:

voice-driven navigation across apps
opening Music, Notes, and Chess through natural spoken instructions
selecting and controlling music playback
reading and rewriting notes
switching from voice to text while preserving the same workspace context
collaborative desktop behavior with remote session foundations and presence

Why this fits the hackathon

Appi fits UI Navigator because the agent can observe the workspace and execute actions inside it.

It also fits Live Agents because the interaction can happen through live voice, interruption, and back-and-forth collaboration.

We deliberately do not position Appi primarily as a Creative Storyteller. The stronger and more honest angle is the AI-native operating environment itself.

How we built it

Appi is built as a layered system rather than as a single UI with model calls attached to it.

At a high level, the architecture is split into four main layers in the current repo:

Renderer The Svelte frontend that renders the desktop, windows, apps, overlays, and interactions.
Runtime A pure JavaScript core that owns state, commands, dialogs, and app behavior.
Local Host The authority that boots the runtime, persists local state, resolves effective presentation, and exposes a stable host interface to the renderer.
Remote Host / Go Agent Engine The remote collaboration layer that lets users create or join shared sessions over WebSocket, persist them through Firebase/Firestore, and run cloud-side agent turns on Google Cloud.

Why this architecture matters

The key architectural decision is that the frontend is not supposed to be the source of truth.

The frontend reflects state.

The runtime and host layers own state, command routing, and presentation resolution. That distinction is critical because it is what makes Appi feel like an operating environment rather than a chat UI.

Native agent protocol

One of the most important technical ideas in Appi is its native command and protocol layer.

Instead of giving the model arbitrary shell access or exposing hundreds of loosely-defined tool calls, Appi uses:

protocol envelopes for command, query, event, and snapshot
a compact native command surface for the agent
explicit command grammar
explicit visible vs headless execution modes
per-app capabilities and references

This makes the agent surface:

bounded
more secure
easier to reason about
easier to test
more uniform across apps

This is a major product and technical difference from many agent demos that rely on improvised wrappers around existing applications.

Prepared context instead of heavy glue

Another core technical idea is context preparation.

Many agent systems wait until the model is already in session, then try to reconstruct the world through:

repeated screenshots
large visual payloads
complex retrieval layers
expensive and late-stage context assembly

Appi takes a different approach.

The runtime, host, and workspace model already know a lot about the environment:

open apps
windows
files
notes
metadata
session mode
visible state
workspace structure

That means the system can prepare meaningful context before or during interaction in a native, structured way. Vision still matters, but it is not the only way the agent knows what is happening.

This makes Appi lighter, cheaper, and more uniform than systems that depend entirely on screen interpretation or brittle tool chains.

Local-first and remote-capable

Appi already supports:

local in-browser sessions
remote session creation and joining
workspace and presence synchronization through a Go WebSocket server
Cloud Run-friendly Go deployment packaging
Firebase/Firestore-backed session persistence for remote rooms
a Gemini-backed cloud orchestration endpoint grounded on the native APPI command surface

In the current branch, the Go service already does more than simple room fanout:

it keeps the current WebSocket collaboration path for the UI
it can persist remote session state in Firestore
it exposes HTTP endpoints for session management
it can call Gemini in the cloud and ask for native APPI commands using the same command language documented in AGENT-COMMANDS.md

The longer-term architecture, which also exists in ongoing Go work outside this branch, moves toward a much stronger model:

a Go backend reflecting the operating environment state
a headless runtime controlled by the agent
a frontend that acts as a projection of that runtime rather than as the primary host

That architecture is important because it means Appi does not conceptually need the visible UI in order to exist.

Current architecture diagram

flowchart LR
    Human["Human User"]
    Frontend["APPI Frontend\nSvelte Desktop UI"]
    Host["Host Contract\ncommand / query / event / snapshot"]
    LocalHost["Local Host\nstate persistence + presentation"]
    Runtime["APPI Runtime\napps + commands + workspace state"]
    Gemini["Gemini\nlive voice + multimodal reasoning"]
    Vision["Vision Capture\nscreenshot + live stream"]
    GoServer["Go Agent Engine\nWebSocket + HTTP"]
    RemoteClients["Other APPI Clients"]
    Files["Local Files\nOPFS + metadata index"]
    Firestore["Firebase / Firestore\nremote session state"]

    Human --> Frontend
    Frontend --> Host
    Host --> LocalHost
    LocalHost --> Runtime
    Runtime --> Files
    Frontend --> Vision
    Vision --> Gemini
    Frontend --> Gemini
    LocalHost --> GoServer
    GoServer --> Firestore
    GoServer --> RemoteClients

Target cloud architecture

The broader Appi direction extends the same model into a cloud-native, headless deployment shape.

This is the intended architecture:

a lightweight Go backend deployed on Cloud Run
Firebase / Firestore for remote session state and reconnectable workspaces
Cloud Storage for file persistence
WebSocket fanout for shared live state
HTTP surfaces where needed for integrations and control
a headless runtime that can keep working even when no desktop UI is open
the UI acting as a connected client that reflects and interacts with the live runtime
a cloud orchestration turn that reads the native command contract from AGENT-COMMANDS.md

flowchart LR
    Human["Human User"]
    WebUI["APPI UI Client\nDesktop / Web"]
    Agent["Appi Agent Runtime\nHeadless / Autonomous"]
    GoBackend["Go Agent Engine\nCloud Run"]
    WS["Realtime Transport\nWebSocket"]
    Firestore["Firebase / Firestore\nworkspace/session data"]
    Storage["Cloud Storage\nfiles and assets"]
    Gemini["Gemini\nLive + Cloud turns"]
    Commands["AGENT-COMMANDS.md\nnative command contract"]
    Gmail["Gmail via GCP OAuth2 + Pub/Sub"]
    Future["Future Channels\nTelegram / SMS / others"]

    Human --> WebUI
    WebUI --> WS
    WS --> GoBackend
    GoBackend --> Agent
    Agent --> Firestore
    Agent --> Storage
    Agent --> Gemini
    GoBackend --> Commands
    GoBackend --> Gmail
    GoBackend --> Future

What is on Google Cloud now

For the hackathon build, the cloud-side APPI engine is packaged as a Go service designed for Cloud Run.

It includes:

WebSocket remote sessions at /ws
REST session management at /v1/sessions
Firestore persistence through Google Cloud / Firebase infrastructure
a cloud orchestration endpoint at /v1/sessions/{sessionId}/agent-turn
a deploy script and Cloud Build config in the repo

This matters for the submission because it gives a direct code path showing:

the backend is hosted on Google Cloud
the project uses Gemini
the agent surface is grounded in an explicit command contract
remote collaboration is not just local browser state pretending to be multi-user

This second diagram is important conceptually: it shows why Appi is more than a desktop demo. The user interface can become one client of a runtime that also exists in the cloud.

Apps and command surface

Appi is also unusual because it gives the agent a consistent command surface across many built-in apps instead of scattering tool contracts everywhere.

Current command groups already include:

system
vision
files
text
notes
calendar
sheets
preview
clock
calculator
messages
mail
reminder
music
settings
terminal
chess

This matters because the agent is not navigating a random jungle of one-off wrappers. It is learning one coherent operating language.

Challenges we ran into

One of the biggest challenges was refusing the easiest path.

The easiest path would have been to build a normal app with a chat panel, connect a few tools, and call that an agent.

Instead, we decided to build around stronger constraints:

the UI should not become the source of truth
the runtime should remain framework-independent
the command surface should stay explicit
local and remote modes should share the same host contract
collaboration should be treated as a product concern, not only an infrastructure concern

That made the architecture cleaner, but it also made the implementation harder.

Challenge 1: separating product logic from UI logic

A real AI-native environment cannot just store all of its behavior in UI components.

We had to keep the renderer thin and move product logic into runtime and host layers. That is a much better long-term design, but it requires more discipline than shipping everything in a frontend component tree.

Challenge 2: making the agent surface explicit

Another challenge was deciding how much freedom to give the agent.

Giving an agent unlimited access looks impressive in a demo, but it quickly becomes:

inconsistent
harder to validate
harder to secure
harder to reason about

So we built a native command surface instead. That reduced ambiguity, but it meant designing command grammar, capability coverage, references, visible/headless modes, and agent-facing documentation.

Challenge 3: context without bloat

A lot of AI agent systems become heavy because they depend on large context assembly loops. We wanted something more lightweight.

That led us to a hybrid model:

native workspace state where possible
metadata-first file context
selective vision when visual confirmation is actually needed

That is technically more interesting than simply taking screenshots forever, but it also means more architecture work up front.

Challenge 4: collaboration semantics

It is easy to say "collaborative AI." It is harder to define what that means in product terms.

Questions we had to face:

what is shared and what is local?
how should public/private mode work?
how should presence be represented?
what should happen when multiple clients observe the same workspace?

These are product questions, not just transport questions, and they remain an active frontier for Appi.

Challenge 5: local-first vs cloud-native

We wanted Appi to work locally and to have a credible path to a much lighter cloud runtime.

That means balancing:

local persistence
remote sessions
protocol design
future server authority
headless operation

The result is a system that already demonstrates the core ideas, while still pointing clearly toward the next architecture step.

Accomplishments that we're proud of

We are proud that Appi already feels like a system, not just a prompt demo.

1. We built a real AI-native workspace

Appi already has a desktop shell, apps, windows, presence, local/remote modes, and agent actions. That gives the product a coherent world for the agent to live in.

2. We created a native operating language for the agent

The Appi protocol and command surface are among the most important parts of the project.

They provide:

explicit actions
predictable behavior
app-level consistency
a cleaner bridge between LLM reasoning and product execution

This is one of the technical accomplishments we are most proud of.

3. We made multimodal interaction feel product-native

Voice, vision, and workspace actions are not scattered around the app as unrelated features. They reinforce the same central idea: the agent is acting inside the system.

4. We built collaboration into the concept itself

Appi is not only about autonomous agents. It is about human and AI collaboration in the same environment.

That includes:

guiding the agent
interrupting it
observing it
sharing workspace state
switching between public and private behavior

5. We kept the architecture lightweight

A big part of the Appi thesis is that useful agents should not require a giant stack of expensive glue.

We are proud of the direction we took:

lightweight runtime
explicit command model
prepared context
metadata-first file handling
cloud architecture that can stay small and practical

6. We already have a credible path to headless agents

Even when not fully completed on this branch, the architecture direction is already strong: Appi can evolve from a visible collaborative workspace into a headless cloud runtime with UI clients attached to it.

That opens the door to persistent, always-on, affordable agents.

What we learned

One of the clearest lessons from building Appi is that agent quality is not only about model quality.

It is also about environment quality.

If you want a better agent, you need:

better runtime boundaries
better state models
better commands
better context preparation
better collaboration models

We learned that "AI-native" has to be architectural

It is easy to call a product AI-native because it uses a modern model.

But in practice, the AI-native part only becomes meaningful when:

the system is designed around the agent's operating constraints
the UI is not the only source of state
the command surface is coherent
the environment can exist independently of one visible chat panel

We learned that collaboration is a first-class systems problem

The question "how do humans collaborate with AI?" is still underexplored.

Building Appi made it clear that collaboration needs:

shared state
private state
presence
interruption
clear visibility rules
a stable runtime model

This is not a cosmetic UX issue. It is part of the core system design.

We learned that lighter can be better

There is a strong temptation in agent systems to keep adding more:

more tools
more wrappers
more screenshots
more retrieval
more orchestration

Appi pushed us toward a different lesson: sometimes the better path is to reduce complexity by designing a cleaner environment for the agent.

We learned that proof of concept can still mean deep infrastructure thinking

Appi is still a proof of concept, but it already forced serious questions about:

runtime authority
transport contracts
shared presence
file persistence
local vs remote execution
headless cloud architecture

In that sense, the prototype is already teaching us how a broader product category might need to be built.

What's next for Appi

Appi is not finished. In many ways, this project is the beginning of a larger direction.

1. Stronger cloud runtime

The next major step is to keep moving from client-side shared state toward a stronger Go-based backend/runtime authority model.

That means:

merging the server and runtime concepts more tightly
making the server a fuller source of truth
improving headless execution
making persistent autonomous operation more natural

2. Better integrations

A major part of the roadmap is communication and trigger-based workflows.

That includes:

Gmail integration through GCP services
OAuth2-based account connection
Pub/Sub-driven event ingestion
future messaging channels and notifications

The goal is for Appi not only to respond when opened, but to keep working when events arrive.

3. Persistent autonomous agents

We want Appi to support agents that:

keep a workspace alive over time
maintain files and memory
react to incoming events
operate headlessly in the cloud
stay affordable enough for real users

4. Multi-user collaboration

The remote session foundations already exist. The next steps are:

stronger server authority
richer presence propagation
stricter sync semantics
more robust collaborative workflows

We think this is one of the most important long-term areas for the product. Appi should not only be an environment for one user and one agent. It should become a collaborative workspace for humans and agents together.

5. A common runtime language for agents

One of the most exciting long-term directions is treating the Appi runtime and command surface as a common language for agents.

That could unlock:

primary agent + sub-agent collaboration
more robust agent-to-agent delegation
better testing and observability
more uniform automation behavior across applications

6. Better product polish without losing the architecture

We also want to keep improving:

app separation
UX polish
multimodal smoothness
onboarding
reliability

But the priority remains the same: preserve the architecture that makes Appi different.

Why we think this is technically innovative

From a technical perspective, Appi is innovative for several reasons:

it treats the agent as a first-class system actor
it introduces a native, bounded operating language instead of relying on ad hoc tool chaos
it prepares structured context for the agent instead of depending entirely on expensive perception loops
it separates runtime, host, renderer, and transport concerns clearly
it provides a path from local interactive use to cloud headless operation
it treats collaboration and presence as architectural concerns

In other words, the innovation is not "we called a model API." The innovation is the system shape around the model.

Why we think this is product innovative

From a product perspective, Appi proposes a different category:

not chatbot-first
not tool-wrapper-first
not browser automation first
not pure operating system clone

Instead, it is:

agent-native
human-collaborative
lightweight
multimodal
local-to-cloud
runtime-first

We think that matters because the next generation of AI products will not only need better models. They will need better environments to live in.

Short closing version

Appi is a proof of concept for a different kind of AI product.

Instead of adding AI to software designed for humans, Appi explores what happens when you design the software for AI from the start.

The result is a lightweight, structured, collaborative operating environment where humans and agents can work together locally, remotely, and eventually headlessly in the cloud.

Built With

gemini
golang
svelte