Inspiration

Before this hackathon, I published a Chrome extension on the Web Store — one I had spent about 6 months building. As a junior developer, it was my first project I actually finished after two failed attempts.

I expected the install numbers to be low. But when I opened the developer dashboard, I was shocked — the uninstall rate was over 50%. Compared to the average uninstall rate of roughly 30%, it was devastating. What made it even more confusing was that the only two reviews were both 5 stars. (One was from my sister, who had tested it for me.) I had absolutely no idea why people were uninstalling.

I added an uninstall survey after the fact, but with so few installs and even fewer uninstalls, I only got 1–2 responses — and they weren't particularly helpful. I didn't have a wide network to ask for testing either. I was left carrying that frustration, never knowing why users left.

Then I came across this hackathon. At first, I tried to think of ideas that would help others, but nothing clicked. So I flipped the question: what do I need most, now and in the future? The answer was clear — a tool that lets multiple users try my product and tells me where they dropped off and why they uninstalled.

I built UninstallMe for indie developers and small teams who have been through — or will go through — the same experience. Even without deep development expertise or a user research budget, anyone should be able to see their product through 50 pairs of eyes with just a URL.

What it does

UninstallMe takes a single product URL and has 50 AI personas independently explore and review it.

Each persona is generated based on Rogers' Innovation Adoption Lifecycle — from eager early adopters to cautious laggards. They explore the product in a real browser using Amazon Nova Act: clicking links, checking pricing pages, navigating menus — just like real users would.

After exploration, each persona independently reviews the product across 5 dimensions (landing page, usefulness, UI/design, onboarding, and trust) and decides whether to keep or uninstall the app. The 50 independent judgments are synthesized into a statistical dashboard with actionable insights: keep rate, average star rating, AARRR funnel drop-off points, churn category distribution, and prioritized improvement recommendations.

Traditional user testing costs $200+ and takes 2+ weeks. UninstallMe does it for $0.25 in 5 minutes.

How we built it

UninstallMe runs on a 4-stage AI pipeline, powered entirely by Amazon Nova.

Stage 1 — Persona Generation: Amazon Nova Lite analyzes the product URL and generates 50 diverse personas. Each has attributes like tech comfort level, adoption type (innovator → laggard), motivation level, and predicted churn trigger.

Stage 2 — Product Exploration: Amazon Nova Act drives a real browser to explore the product. 5 exploration profiles (confident user, average user, skeptical user, impatient user, struggling user) each perform 6–8 navigation steps. This is real browsing, not simulation.

Stage 3 — Independent Review: Nova Lite generates reviews through 50 parallel calls. Each persona only sees the exploration data from pages they "visited" and never references other personas' reviews. Unvisited dimensions default to a neutral score of 3 to prevent fabrication.

Stage 4 — Synthesis: All 50 reviews are aggregated. Every metric — keep rate, average star rating, NPS, AARRR funnel — is computed in JavaScript, not by the AI. Because when you ask AI to count, it gets it wrong.

The frontend is built with Next.js (App Router) + Tailwind CSS, with 10+ chart types visualized through Recharts. It supports dark/light themes and English/Korean.

Challenges we ran into

Fighting hallucination. The biggest challenge was AI writing reviews about pages it never visited. A persona would critique the "complex settings page" — except it never navigated to settings. Even when the prompt explicitly said "only mention visited pages," Nova tended to ignore instructions buried in the middle of long prompts. We built a 3-layer defense: prompt constraints + code-level validation (auto-correcting drop points to actually visited pages) + a Grounding Grade (A–F) measurement system. This improved our grounding score from Grade C to B.

Contradictions between scores and decisions. The AI would give 1 star but say "I'll keep it," or give 5 stars but choose to uninstall. Simply overriding decisions based on scores would eliminate realistic edge cases like "great app, but not for me" (4 stars + uninstall). We designed an enforceConsistency function that only corrects extreme contradictions (1–2 stars + keep, 5 stars + uninstall) while respecting Nova's judgment in the ambiguous 3–4 star range.

Speed vs. thoroughness trade-off. Running a real browser for exploration takes time, and generating 50 reviews sequentially was far too slow. We parallelized Nova Act exploration across 3 concurrent agents and batched review generation in groups of 10 with exponential backoff retry, achieving both stability and speed.

Accomplishments that we're proud of

Applying statistics to AI personas. Statistics knowledge from a college summer course came in handy here. By designing persona distribution based on Rogers' Innovation Adoption Lifecycle, we didn't just have "50 personas reviewing independently" — we reproduced a realistic user spectrum from early adopters to laggards. When 50 independent judgments converge, a Wisdom of Crowds effect emerges, making the aggregate result more accurate than any individual review.

3-layer hallucination defense. We're proud of designing and implementing a system that catches AI fabrication. From prompt constraints → code validation → measurement and reporting, we turned the principle of "trust but verify" into working code.

The simplicity of one URL. No test scenario design. No survey creation. No participant recruitment. Paste a URL, wait 5 minutes, and get a dashboard with 50 reviews and actionable insights. We're proud of that simplicity.

What we learned

This was our first hackathon and our first time building an AI agent system. We had been using LLMs frequently, but designing 50 independent agents with a hallucination prevention system from scratch revealed how surface-level our understanding really was. The gap between knowing something in theory and implementing it was larger than expected. This hackathon gave us a strong motivation to study AI systems more deeply.

The most practical lesson was the boundary between "what to delegate to AI" and "what to guarantee with code." Ask AI to calculate scores — it gets them wrong. Ask it to count — it miscounts. Tell it not to fabricate — it fabricates anyway. The key is to never rely on prompts alone and always validate at the code level. This lesson will apply to any AI-powered system we build in the future.

What's next for UninstallMe

Authenticated exploration. Currently, Nova Act cannot explore pages behind login walls. We originally planned to accept test credentials for deep exploration of dashboards, settings, and payment flows, but couldn't implement it in time. This is our top priority for the next version.

Diverse input formats. Right now, analysis requires an accessible URL. We want to expand to app store links, screenshots, and Figma prototypes so that even pre-development products can be evaluated.

Improving design evaluation reliability. Honestly, we're not fully confident that AI design assessments accurately reflect human aesthetic sensibility. We plan to establish clearer criteria for design evaluation and validate them through comparison with real user ratings.

Competitive positioning analysis. Even apps in the same category with the same core features can have very different positioning — some offer comprehensive feature sets, others focus on simplicity. We want to detect product positioning to improve the contextual accuracy of reviews.

Built With

  • amazon-nova-act
  • amazon-nova-lite-(aws-bedrock)
  • javascript
  • next.js
  • react
  • recharts
  • tailwind-css
Share this project:

Updates