Orion

Inspiration

Orion started with a small, daily frustration. Every morning, our commute begins at the Caltrain station, where we book a cab to work. Booking the "best" cab means each of us opening Uber, Lyft, and Waymo, waiting for them to load, typing in the same addresses, comparing fares, and finally agreeing on one - every single day.

We caught ourselves wishing the assistants already living on our phones could just do this for us. The apps are installed. Our accounts are signed in. Our home addresses are saved. The information is all there - the orchestration is missing.

That small wish turned into a bigger question: if our phones know us better than any other device we own, why are agents still stuck on the desktop? Orion is our humble first attempt at an answer.

What it does

Orion does what you do on your phone - by actually using your phone.

It has access to the screen and to gestures, which means it can operate any app you already have, with all the preferences, logins, and personalization you've already set up. There's no API to wire up, no integration to maintain, and no data to migrate to the cloud. The agent simply uses the device the way a person would.

At the core is a quantized Gemma model running fully on-device on a Galaxy S25 Ultra, surprisingly fast for its size. It acts as an agentic orchestrator: perceiving the current screen, deciding the next step, and executing it through gestures — over and over until the task is done.

In practice, that means anything from booking a cab across three ride-share apps, to triaging messages, to navigating settings buried six menus deep — all without your data ever leaving the phone.

How we built it

Orion is built as a closed loop between perception, reasoning, and action, all running on-device:

Reasoning. A quantized Gemma model serves as the planner, deciding the next action given the goal. We chose a small on-device model deliberately, so nothing about the user's screen ever has to leave the device.
Runtime. We use LiteRT-LM as the on-device inference runtime. As part of this work, we ported Qwen 2.5-VL to run on-device through LiteRT-LM and contributed custom export support for that model family back to the runtime.
Orchestration. A lightweight agent loop ties it all together, with prompts carefully shaped to fit within the tight context budget of a small on-device model.

The whole system is designed around a simple constraint: the device is the boundary.

Challenges we ran into

The hardest part wasn't the plumbing; it was getting a small model to behave like a careful agent.

Writing prompts and the agentic framework under tight context limits was a constant balancing act. Smaller models are wonderfully private and fast, but they are also less forgiving — every token of context has to earn its place. We spent a lot of time learning how to give the model just enough to act correctly, and no more.

Accomplishments that we're proud of

It actually books the cab. The original frustration that started this project is now handled by Orion, end-to-end, on-device.
Qwen 2.5-VL on the edge. As a side quest, we ported Qwen 2.5-VL to run on-device using LiteRT-LM, and added custom support in LiteRT-LM for exporting models from this family — something we hope is useful to others working on on-device VLMs.
A working on-device agent loop. Perception, reasoning, and action all running locally on a phone, fast enough to feel usable rather than a demo.

What we learned

Working with tiny models and tight context windows is a craft of its own. Resource constraints shape every design decision: what to perceive, what to remember, what to forget, and how to phrase a prompt so the model can succeed instead of guess.

We also learned, very concretely, that capability does not scale linearly with size. As models get smaller, accuracy and reasoning fall off in ways that aren't always obvious until you watch the agent in the wild. Carefully curating what the model sees turned out to matter as much as which model we picked.

What's next for Orion

We'd like to open-source Orion and grow it into the OpenClaw for mobile — a community-built, on-device agentic framework that anyone can extend, improve, and trust on their own phone.

There is a lot more to do: better long-horizon planning, richer memory, broader device and app coverage, and stronger safety primitives for an agent that genuinely controls your device. We're excited to keep building, and even more excited to build it with others.