Inspiration
My grandpa has Parkinson’s and struggles with his phone, which made me wonder: Why couldn't he just "tell" his phone what to do? That's when I realized that the current phone assistants are too dumb.
What it does
I'm building a fully autonomous, general-purpose AI agent that operates your smartphone like a human - to accomplish complex user-defined goals. Simply put, it’s a far more capable smartphone assistant designed to replace today’s limited options like Siri and Gemini - through its ability to "do" and not just "tell".
The end vision is to optimize and refine it to a point where the Agent becomes much faster than using your fingers - shifting people’s preference toward simply telling their phone what to do, rather than them doing it themselves.
Use Cases (including but not limited to):
- More Accessibility for Visually-Impaired people / people with Parkinson's / Amputees, etc.
- More Intuitive use of a phone for elderly people who struggle with Technology
- More convenience for Regular smartphone users
- Automated Social Media management for influencers. increasing interaction, content output, etc.
- Automated Mobile App Testing for Corporations
- Can overnight apply 1000s of jobs for you if you prompt it so
- Can automate Dating apps for you.
- Hyper-customized morning news Aggregation (would know its users well)
- Hands-free phone use while you're driving a car or when you're cooking, etc.
How I built it
All the code up to this point was written by me from scratch in about 12 days. I started building it around 27th April, built it in around 12 days, took 2 more days to design the website (for the waitlist) and record the demo. All of the work has been full-time.
This is the most Google-ish product I've ever built. It's built using the Cursor IDE (not vibe coded).
Tech Stack:
- Android Accessibility APIs for Phone Automation.
- Gemini 1.5 Flash for the Orchestrator Agent reasoning
- Gemini 1.5 Flash 8B for Communication Agent
- Google ADK Agentic Framework with Python
- App built with Kotlin
- LiteLLM for Gemini API calls
- Firebase for User Authentication + Database
- Agent Server will be hosted in Google Cloud Run
Challenges I ran into
I could ship it right now in about 2 days of refactoring some code. But before I release the kraken, I have a larger concern I need to take care of first, which has been quite a headache. I am running out of runway as my student visa ends July first week, and I need to find an employer before then who would file for my STEM extension (extend visa until 2027), after which I will be able to afford time towards this project.
I really want to get this out there as soon as possible before the public interest dies down, but I need to get the visa concerns out of my head first, as it's a critical problem looming over me.
Accomplishments that I'm proud of
I finished building the MVP 3 days ago and posted a demo on Reddit and Linkedin. It went super viral - The reddit posts collectively have 73K+ views, and I've received 150+ comments across all social media platforms. Thousands of people from 11 different countries visited the website, and about 50 of them filled the waitlist form - all in a matter of two days. Having a fully functioning prototype built from scratch in just 12 days is an accomplishment I'm very proud of.
What I learned
The incredible viral response made me realize this has insane potential and I'm working to ship a commercial product asap. I've also learned from early feedback; for instance, I've received a lot of feedback to open source it for transparency so that privacy-centric users can host their own servers and use it, while I offer my cloud-hosted solution as an option. The main lesson is that there is a significant, global demand for a more capable smartphone assistant.
What's next for Android-Use
As soon as I find an employer and resolve my visa situation, I will release and continue working on this.
Currently, I am just talking to the people who have filled the waitlist, and testing the app myself to see if it reliably carries out their specific use-case. I have a fully functioning prototype and am currently focused on getting it out to beta testers as soon as possible. I’m going to have to do some structural changes to let testers bring their own LLM API keys and customize agent prompts, and I also might have to create a rough documentation explaining the available tools and underlying architecture.
The plan is to work out the kinks, test out different use cases, add caching mechanisms, tighten security, reduce unnecessary delays, handle edge cases, and engineer better agent prompts - making this production and distribution ready by June.
Built With
- adk
- agent-development-kit
- android
- android-accessibility-api
- android-studio
- firebase
- gcp
- gemini
- google-adk
- google-cloud
- google-cloud-run
- kotlin
- vertex
Log in or sign up for Devpost to join the conversation.