Inspiration

Genie started with a question I couldn't stop thinking about: Why does an AI assistant become useless the moment the internet disappears?

The idea came to me while I was experimenting with different AI tools. They were impressive when everything worked perfectly, but the second the connection became unstable, most of them simply stopped functioning. It felt strange that technology marketed as intelligent was completely dependent on being connected to a server somewhere else.

The more I looked into it, the more I realized this wasn't just my experience. Not everyone has fast, reliable internet all the time. Students may run out of data, travelers may lose connection, and people in rural communities often deal with limited coverage. Yet most modern AI systems are designed as if constant internet access is guaranteed.

I was also concerned about privacy. Many AI assistants rely on cloud processing, which means user requests, conversations, and documents are sent to external servers. While that approach enables powerful features, it also means users have less control over where their information goes and how it is handled.

That led me to a simple challenge: could an AI assistant work entirely on the device itself?

Genie is my answer to that question. It is a privacy-first autonomous AI assistant that runs locally on a phone without depending on cloud infrastructure. By using optimized on-device AI models, Genie can understand requests, analyze documents, perform actions, and assist users even when there is no internet connection available.

My goal with Genie wasn't just to build another AI assistant. I wanted to explore a future where advanced AI remains available regardless of connectivity and where users keep control of their own data. At its core, Genie is built on a simple belief: technology should work for people when they need it most, not only when they have a strong internet connection.

What it does

Most AI assistants today are essentially a single chat interface connected to the cloud. Genie was designed differently. It consists of nine specialized profiles running locally on-device and is powered by Google's Gemma 4 through LiteRT-LM. The goal was to combine local AI reasoning with direct operating-system interaction while maintaining user privacy.

One of the core capabilities of Genie is device autonomy. Rather than only generating text responses, Genie can observe the screen, identify interface elements, and execute actions within applications. For example, if a user asks Genie to send a WhatsApp message, it can navigate the interface, locate the correct controls, and complete the task locally without depending on cloud infrastructure.

Healthcare was one of the first use cases I explored. The Scribe Profile functions as an offline medical documentation assistant. During a patient consultation, Genie can process natural conversation, extract relevant information, and structure it into clinical records in real time. This reduces the need for manual note-taking and helps maintain continuity of information.

Memory is another important component of the system. Genie stores user-specific information locally rather than relying on external servers. For example, if a user tells Genie that they have a severe peanut allergy, that information becomes part of their local profile and can be referenced when relevant.

This memory system works alongside Genie’s multimodal Vision Profile. If the same user points their camera at a restaurant menu, Genie can perform OCR on the text, compare detected ingredients against stored allergy information, and identify potential risks before a meal is ordered.

The Vision Profile is also useful for digital safety. When viewing products on online marketplaces such as Jumia, Genie can analyze information visible on the screen and explain potential warning signs that may indicate counterfeit products or misleading listings.

Education was another area where I wanted Genie to be practical. Users can ask Genie to locate a specific document on their device, process its contents, and generate quizzes or study sessions automatically. This allows students to interact with learning materials directly instead of manually searching through folders and applications.

Genie also includes a Teaching Profile that uses an interactive whiteboard interface. Rather than responding only with text, it can draw diagrams, timelines, and visual explanations while narrating concepts step by step. This creates a more interactive learning experience, particularly for complex topics.

For healthcare-related questions, Genie's Health Profile is grounded in locally stored WHO fact sheets. Responses are generated from those verified sources rather than relying solely on model-generated knowledge, reducing the likelihood of inaccurate medical information.

Accessibility was another major design consideration. User preferences are treated as part of the system's adaptive behavior. For example, if a user with tremors indicates difficulty performing double-tap gestures, Genie can adjust how it interacts with the user interface to accommodate those limitations.

Because Genie has the ability to perform actions on behalf of a user, safety mechanisms were integrated throughout the system. Low-risk tasks such as document summarization or information retrieval can be executed immediately. High-risk actions, including financial transactions or critical system modifications, trigger a Dynamic Risk Assessment layer that pauses execution and requires biometric authentication through Face ID or fingerprint verification before proceeding.

Genie demonstrates that advanced agentic AI can operate locally on consumer devices while maintaining privacy, accessibility, and user control. The project explores an alternative approach to AI systems by showing that autonomy, multimodal reasoning, memory, and operating-system interaction can be delivered without requiring continuous cloud connectivity.

How we built it

Genie is a native Android application built entirely around local inference. We run Google's Gemma 4 (specifically the 2.4GB Effective 2-Bit Quantized model) directly on the device's mobile GPU.

Because we refused to rely on cloud computation, every piece of our architecture had to be custom-engineered to manage memory constraints, conserve battery, and handle physical UI interactions safely. We built Genie across a 7-layer architecture:

  1. The Voice Pipeline — Two Engines, One Microphone Most voice assistants use a single speech engine. We use two, and the reason is physics. Running Android's full-sentence SpeechRecognizer 24/7 drains the battery and hogs the microphone. Instead, we use Vosk, an ultra-lightweight offline library. We run it continuously in the background at 16kHz, listening for exactly one word: "Gemma." It uses ~30MB of RAM and has near-zero latency. It only activates the heavier STT engine when absolutely necessary.
    // Vosk wake-word detection override fun onPartialResult(hypothesis: String?) { val json = JSONObject(hypothesis) val partial = json.optString("partial", "") if (partial.lowercase().contains("gemma")) { speechService?.stop() // Kill Vosk setUiState(AgentUIState.Waking) // Show overlay startSttListening() // Start full STT } }

  2. Intercepted Execution (The Inference Bridge) We load Gemma 4 using Google's LiteRT-LM SDK. But our most important architectural decision lives in a single parameter: automaticToolCalling = false.

By default, the SDK executes a model's tool calls automatically. That’s fine for a chatbot, but catastrophic for an agent that can tap "Send $500" on a banking app. By disabling this, we force every single tool call through a custom Kotlin callbackFlow bridge. We intercept it, validate it, and optionally require biometric authentication before it ever touches the OS. val newConversation = engine.createConversation( ConversationConfig( samplerConfig = agentSamplerConfig(), systemInstruction = PromptFormatting.buildSystemInstruction(systemPrompt), tools = tools, automaticToolCalling = false, // ← This is everything ))

  1. The Agent Loop & The Sliding Window Chatbots guess answers in one shot. Genie plans like a human: Observe → Plan → Act → Evaluate → Repeat. Because the model has a limited context window, we built a Sliding Window Manager. It keeps the user's primary goal visible, tracks the last 9 OS actions, and dynamically prunes transient errors. If a tap fails twice because a UI is slow to render but succeeds on the third try, our code wipes the failures from history so the model doesn't get confused.

fun pruneAfterSuccess(history: MutableList) { val lastEntry = history.lastOrNull() as? HistoryEntry.ToolResult ?: return if (lastEntry.outcome !is ToolOutcome.Ok) return

val successToolName = lastEntry.toolName
var index = history.size - 2
while (index >= 0) {
    val entry = history[index]
    if (entry is HistoryEntry.ToolResult &&
        entry.toolName == successToolName &&
        entry.outcome is ToolOutcome.TransientErr) {
        history.removeAt(index)
    } else {
        break
    }
    index--
}

}

  1. The Safety Net: Dynamic Risk Assessment & Biometric HITL Static safety rules are crude. Opening Settings is safe; opening PayPal and clicking "Send" is not but both use the same click tool. Genie’s RiskAssessor evaluates the screen context in real-time. If it detects ≥2 independent high-risk signals (e.g., a currency symbol + a destructive verb like "Pay" or "Send"), it freezes the AI.

It then launches a transparent activity (Theme.Translucent.NoTitleBar) to trigger Android's native Biometric prompt. The AI physically cannot proceed until the human authorizes the action via fingerprint or Face ID.

  1. Error Taxonomy & Circuit Breakers Not all errors are equal. Genie classifies every failure into one of four tiers:

TransientErr: (e.g., UI loading). We retry with exponential backoff.

LogicErr: (e.g., Agent hallucinated a tool). Fed into history so the model self-corrects.

AuthErr: (e.g., User denied biometrics). Hard stop.

FatalErr: (e.g., Engine OOM). Hard stop.

We also built two circuit breakers: 5 consecutive failures of any kind, or requesting the same non-existent tool 3 times, immediately aborts the loop to prevent infinite freezing.

  1. Zero-Cloud Self-Improvement (The Skill Cache) Every time Genie completes a novel task, it serializes the successful plan and stores it in a local Room database: @Entity(tableName = "skills") data class Skill( @PrimaryKey(autoGenerate = true) val id: Int = 0, val goalPattern: String, val planJson: String, val successCount: Int = 0, val createdAt: Long = System.currentTimeMillis(), ) The next time you ask for something similar, Genie searches the local cache. If a match is found, it replays the cached plan step-by-step without invoking the LLM. No heavy inference needed. Instant execution that gets faster over time.

  2. The Hands & The Profile System We mapped the model to 53 custom tools across 6 families using Android's AccessibilityService. Every gesture is a Kotlin suspend function wrapping dispatchGesture() into a coroutine.

Finally, because not every task requires heavy autonomous planning, we built 9 specialized profiles. For deterministic tasks like the Scribe profile transcribing a doctor's audio, or the Health profile pulling WHO fact-sheets Genie completely bypasses the orchestrator to guarantee zero latency and zero hallucination risk.

Challenges we ran into

1. Always-On Voice Detection Without Excessive Battery Drain

One of the first challenges was implementing an always-listening wake-word system without draining the battery or blocking other applications from using the microphone.

Keeping Android's native SpeechRecognizer running continuously is not practical because it keeps the CPU active, increases battery consumption, and maintains exclusive access to the microphone.

To address this, I implemented a two-stage speech pipeline. A lightweight Vosk instance runs continuously in the background at 16 kHz, listening only for the wake word "Gemma." Vosk maintains a memory footprint of roughly 30 MB while consuming relatively little power.

Once the wake word is detected, Vosk immediately releases the microphone by calling speechService?.stop(). Control is then handed to Android's native SpeechRecognizer, which performs high-accuracy recognition for the user's actual request. This approach keeps idle power consumption low while still providing accurate speech recognition during active interactions.

2. Preventing Unsafe Autonomous Actions

One of the biggest safety concerns came from LiteRT-LM's default configuration. By default, automaticToolCalling = true, allowing the model to execute tools immediately after generating a tool call.

For an autonomous Android agent connected to AccessibilityService, I considered this too risky. If the model generated an incorrect tool call during a multi-step task, it could potentially interact with sensitive parts of the operating system before additional validation occurred.

Instead, I explicitly disabled automatic tool execution inside ConversationConfig.

if (message.toolCalls.isNotEmpty()) {
    trySend(AgentResponse.ToolCallRequest(message))
}

Every tool request is intercepted before execution and passed through a custom asynchronous bridge implemented with Kotlin callbackFlow.

The request is then evaluated by a RiskAssessor, which considers multiple signals before allowing execution. Operations involving financial interfaces, destructive actions, or sensitive system changes require biometric authentication before the tool can continue. This keeps the model responsible for reasoning while leaving final authorization with the user.

3. Managing Context Window Growth

Running an autonomous agent locally introduces another limitation: context size. As Genie completes multi-step tasks, conversation history quickly fills with screen descriptions, UI coordinates, execution logs, and intermediate tool results. Without active management, this increases inference time and can cause the model to lose track of the original objective. To solve this, I implemented a custom SlidingWindowManager.

The manager permanently keeps the user's primary objective at the beginning of the history while retaining only the most recent interactions needed for reasoning. It also removes obsolete execution history after successful actions.

if (entry is HistoryEntry.ToolResult &&
    entry.toolName == successToolName &&
    entry.outcome is ToolOutcome.TransientErr) {
    history.removeAt(index)
}

For example, if an action fails twice because an interface has not fully rendered but succeeds on the third attempt, the earlier transient failures are removed from history. This prevents the model from repeatedly reasoning over outdated failures and helps reduce logic drift.

4. Handling Infinite Loops and Invalid Tool Calls

Another issue encountered during testing was repetitive execution.

When the model encountered an unexpected application state, it could repeatedly attempt the same failing action or generate tool names that were not registered in the native ToolRegistry.

To prevent this, I implemented a four-level error taxonomy consisting of TransientErr, LogicErr, AuthErr, and FatalErr, together with runtime circuit breakers.

If execution exceeds predefined failure thresholds for example, five consecutive failures or repeated requests for an unmapped tool the orchestrator immediately terminates the execution loop, reports the issue to the user, and waits for further instructions instead of continuing indefinitely.

5. Adapting to Changing User Interfaces

Android applications change constantly. Buttons move, layouts are redesigned, and rendering behavior varies depending on network conditions and application updates.

Because of this, I avoided relying on fixed screen coordinates wherever possible.

Instead, Genie combines accessibility metadata, interface hierarchy information, and screen understanding to locate interface elements dynamically. This makes the agent more resilient to UI changes and reduces the amount of maintenance required whenever applications update.

Accomplishments that we're proud of

1. Safely Extending LiteRT-LM for Autonomous Execution

Loading a 2.4 GB language model on a mobile device was already challenging. Allowing that model to control the operating system introduced an entirely different set of problems.

One accomplishment we're especially proud of was redesigning how Google's LiteRT-LM SDK handles tool execution. By setting automaticToolCalling = false and implementing our own Kotlin callbackFlow bridge, every tool request is intercepted before execution. This gave us the flexibility to introduce a dynamic risk assessment layer, where high-risk operations require biometric authentication before the model is allowed to continue.

2. Offline Learning with the Room-Backed Skill Cache

Most AI assistants improve by relying on cloud infrastructure. We wanted Genie to improve locally.

To achieve this, we built a Room-backed Skill Cache that allows Genie to remember successful task executions. After completing a workflow once, the system can serialize the execution plan and reuse it later. In many cases, Genie can repeat previously learned tasks without running another LLM inference, improving both speed and power efficiency while remaining completely offline.

3. The Dual-Engine Voice Pipeline

Designing an always-on voice assistant for mobile devices required balancing battery life with recognition accuracy.

We developed a dual-engine speech pipeline where a lightweight Vosk model continuously listens for the wake word, while Android's native SpeechRecognizer handles full command recognition after activation. This approach minimizes idle resource consumption without sacrificing recognition quality during normal use.

4. Building Within Real Hardware Constraints

One aspect of the project we're particularly proud of is that it was built under real-world constraints rather than relying on constant cloud resources.

As student developers, we had to optimize for limited hardware, unstable internet connectivity, and limited computing resources. That meant carefully managing context windows, reducing memory usage, implementing structured error taxonomies, and designing efficient execution pipelines instead of relying on larger cloud models to solve performance issues.

Those constraints shaped many of the engineering decisions behind Genie.

5. Building AI That Can Make a Practical Difference

The accomplishment we value most is moving beyond a conversational chatbot into an assistant capable of performing meaningful tasks.

With 53 native tools and 9 specialized profiles, Genie can help users navigate applications, automate routine workflows, retrieve information, and support accessibility use cases while keeping all processing on the device. Whether assisting someone with limited mobility or allowing users to reference locally stored WHO health documents without sending data to external servers, the project demonstrates that advanced AI can remain private, reliable, and useful even without continuous internet access.

What we learned

Building for On-Device AI Changes How You Think About Performance

Developing for local inference forced us to work within the physical limits of mobile hardware. Unlike cloud-based systems, we couldn't rely on additional compute whenever performance became an issue. Every design decision affected battery life, memory usage, and latency.

One example was our voice pipeline. By combining Vosk for continuous wake-word detection with Android's native SpeechRecognizer for full command recognition, we were able to balance responsiveness with power efficiency. This taught us that good on-device AI is as much a systems engineering problem as it is a machine learning problem

Autonomous Agents Need Strong Safety Controls

One of the biggest lessons was that an autonomous agent requires a different safety model than a traditional chatbot.

During development, we realized that LiteRT-LM's default automaticToolCalling = true configuration was not appropriate for an agent capable of interacting with the Android operating system. Disabling automatic tool execution and introducing our own Kotlin callbackFlow pipeline gave us complete control over every requested action before it reached the device.

This experience reinforced the importance of treating biometric authentication, user confirmation, and risk assessment as core components of the agent architecture rather than optional interface features.

AI Models Need Structured Control Logic

Another lesson was that language models are probabilistic, while operating systems require deterministic behavior.

To bridge that gap, we designed the AgentOrchestrator together with a Four-Tier Error Taxonomy that manages execution flow, detects repeated failures, and prevents tool hallucinations from causing infinite loops.

Building these control mechanisms showed us that reliable autonomous agents depend not only on model quality but also on the engineering systems surrounding the model.

Local Learning Can Be Both Practical and Private

We also learned that useful personalization does not always require cloud infrastructure.

By implementing a Room-backed Skill Cache, Genie can store successful execution plans locally and reuse them for future tasks. In many situations, previously learned workflows can be executed without performing another LLM inference, improving response time while reducing computational cost.

This demonstrated that edge AI can continue improving through local learning while allowing users to keep their data on their own devices.

Engineering Within Constraints Leads to Better Design

Perhaps the biggest lesson from the project was that constraints often lead to better engineering decisions.

Working with limited memory, restricted context windows, mobile hardware, and offline execution forced us to think carefully about every component of the system. Features such as the SlidingWindowManager, RiskAssessor, Skill Cache, and dual-engine voice pipeline were all developed because we had to solve practical limitations rather than relying on additional cloud resources.

Those constraints ultimately shaped Genie into an AI assistant that is efficient, privacy-focused, and designed to operate reliably on-device.

What's next for Genie

Semantic Local Skill Cache Genie's current Skill Cache uses structured pattern matching to replay previously successful workflows. The next step is adding lightweight on-device embedding models so the assistant can match requests by meaning rather than exact wording.

For example, a request such as "Turn off the internet" could be matched to a cached workflow for "Disable Wi-Fi." The goal is to make Genie's offline learning system more flexible while keeping all processing on-device.

Expanding the Tool Registry Genie currently includes 53 custom tools across six functional categories. We plan to extend the Android AccessibilityService integration to support more complex interactions, including multi-touch gestures, drag-and-drop actions, and deeper application integrations.

This should allow the agent to handle a wider range of real-world workflows on Android devices.

Multi-Model Device Orchestration At the moment, Gemma 4 handles most of the reasoning tasks. A future version of Genie will introduce a multi-model orchestration layer where smaller specialized models run alongside Gemma through LiteRT.

For example, a compact vision-grounding model could handle visual localization tasks while Gemma focuses on planning and reasoning. The aim is to reduce latency and memory usage by assigning each task to the most appropriate model.

Accessibility and Localization We also want to improve Genie's accessibility features and make them more useful in local contexts.

Planned work includes expanding the Scribe Profile to better support regional dialects, improving the Teaching Profile for hands-free learning, and working with accessibility advocates to refine how Genie adapts to different physical interaction needs.

Long term, we want Genie to feel less like a generic assistant and more like a system that can adapt to the specific ways different people use their devices.

Built With

  • ai
  • api
  • gamma
  • gemma
  • kotlin
  • security
  • vosk
Share this project:

Updates