Project Story

About the project

Android Autonomous Dev Agent is an autonomous software-delivery agent for real Android apps. It does not stop at writing code. It plans a change, edits the codebase, builds the APK, installs it on a real Android device, launches the app, reads logcat, captures screenshots, diagnoses failures, patches the project again, and repeats the loop until there is verifiable evidence that the app actually works.

The project was inspired by a very practical frustration: most coding agents stop too early. They can generate plausible code, but they often do not complete the last mile of software delivery. For Android, that last mile is where many real problems happen: Gradle errors, Kotlin compilation failures, missing permissions, broken activities, runtime crashes, device-specific behavior, logcat noise, and UI states that only appear after installation on a physical phone.

I wanted to build an agent that treats “code written” and “software delivered” as two different things.

$$ \text{Done} \neq \text{Code Generated} $$

For this project:

$$ \text{Done} = \text{Built} + \text{Installed} + \text{Launched} + \text{Verified} $$

The agent follows a closed-loop workflow:

Plan
  → Patch code
  → Build APK
  → Install on real Android device
  → Launch app
  → Capture logs and screenshots
  → Diagnose errors
  → Patch again
  → Verify final result

What I built

I built an autonomous Android development loop that combines:

  • LLM-based planning and code modification
  • Build execution and compiler-error recovery
  • APK installation through ADB
  • Real-device launch verification
  • logcat analysis for runtime crashes
  • Screenshot-based evidence collection
  • Iterative repair loops
  • Human-readable delivery reports

The system is designed around the principle that every claimed result should have an artifact behind it:

  • build output
  • APK metadata
  • install result
  • launch result
  • logcat excerpt
  • screenshot
  • final verdict

This makes the agent more auditable than a normal coding assistant. Instead of saying “I implemented it,” the agent has to prove that the app built and ran.

Validation case: TransLite

As a validation case, I used the agent to deliver TransLite, an Android translation app involving Kotlin development and offline-model integration. The point of TransLite is not that it is the final product of this hackathon. The point is that it gives the agent a real mobile-app delivery problem instead of a toy example.

The agent had to go through the same workflow a human Android developer would:

  • read the project requirements
  • modify Android/Kotlin code
  • handle build issues
  • debug offline-model integration problems
  • produce a releaseable Android app
  • verify behavior on a real device

This helped validate that the system is not just generating source files. It is coordinating the full delivery loop.

Why this matters

AI coding tools are becoming very good at producing code, but production software engineering is not just code generation. A large part of engineering work is verification, debugging, recovery, and evidence.

In mobile development, this gap is especially visible. A generated Android project can look correct in text but still fail because of:

  • Gradle configuration errors
  • Android SDK incompatibilities
  • Kotlin compiler errors
  • missing runtime permissions
  • bad activity declarations
  • crashes visible only in logcat
  • behavior differences on real devices
  • model or asset loading failures

Android Autonomous Dev Agent focuses on this gap. It is an agent for software delivery, not only software generation.

Resilience

The project is also about resilient agents. A useful agent should not fail permanently after the first bad build, tool error, or runtime crash. It should preserve state, inspect the failure, and recover.

The agent handles failures as part of the normal workflow:

Failure → Evidence → Diagnosis → Patch → Retry → Verification

Examples of failure modes include:

  • compiler errors
  • Gradle build failures
  • APK install failures
  • app launch failures
  • runtime crashes
  • missing permissions
  • model-loading errors
  • incomplete or incorrect generated code

The long-term goal is to make this a general control plane for agentic software delivery: one where agents can be measured not only by whether they produce an answer, but by whether they recover from failure and ship a verified result.

How I built it

The system was built using a combination of:

  • autonomous agent orchestration
  • Android build tooling
  • ADB-based real-device automation
  • logcat inspection
  • screenshot capture
  • iterative compile-fix loops
  • structured delivery reports

The important design choice was to keep the loop grounded in real artifacts. The agent is not allowed to treat a code diff as completion. It must move through the delivery pipeline and collect evidence.

A simplified architecture looks like this:

User goal
  ↓
Planning agent
  ↓
Code modification agent
  ↓
Build runner
  ↓
Failure analyzer
  ↓
Patch/retry loop
  ↓
ADB installer
  ↓
Real-device launcher
  ↓
logcat + screenshot verifier
  ↓
Final delivery report

What I learned

The biggest lesson was that autonomy is less about making one perfect decision and more about building a reliable recovery loop.

In early versions, the agent could write code but still fail silently at later steps. That is not good enough for real software work. The useful behavior emerged when the agent became strict about verification:

  • If the build fails, read the error and patch.
  • If the app installs but crashes, read logcat and patch.
  • If the UI is not visible, capture a screenshot and inspect the state.
  • If the result cannot be verified, do not claim success.

I also learned that real-device testing changes the quality bar. Running on a physical Android phone exposes failures that are invisible in a text-only coding loop.

Challenges

The hardest challenges were not the initial code edits. The hardest parts were:

  1. Build-system recovery
    Android projects can fail for many reasons: Gradle versions, SDK configuration, Kotlin errors, dependency conflicts, manifest issues, and generated-resource problems.

  2. Runtime diagnosis
    A successful build does not mean the app works. The agent needs to inspect logcat and distinguish meaningful crash signals from noisy Android logs.

  3. Real-device verification
    Device state matters. The app has to be installed, launched, and visually checked. This adds complexity but makes the result much more trustworthy.

  4. Avoiding false completion
    The agent must not say “done” just because a file was edited. The workflow had to enforce a stricter definition of completion.

  5. Making evidence readable
    The final output needs to be useful to a human: what changed, what failed, what was fixed, what was verified, and what artifacts prove it.

What is next

The next step is to turn the current workflow into a reusable agentic delivery platform for Android teams.

Planned improvements include:

  • a dashboard showing each delivery step
  • failure-injection demos for build errors, runtime crashes, ADB failures, and model/tool outages
  • support for more Android project types
  • clearer resilience metrics such as retry count, recovery time, and fallback path
  • team-facing reports suitable for CI/CD pipelines
  • human approval gates for risky changes

The broader vision is to make AI development agents accountable. They should not only generate code; they should deliver working software with evidence.

Android Autonomous Dev Agent is my attempt to move from AI coding assistants toward AI software-delivery agents.

Built With

Share this project:

Updates