Inspiration
Everyone on our team had seen the same notification: "Your iPhone storage is almost full." It is one of those annoying little problems that everybody ignores because the solution feels tedious: scrolling through hundreds of photos, squinting to spot the blurry ones, deleting near-identical burst shots one by one.
We asked ourselves: what if cleaning your camera roll was not a chore? What if it was actually fun?
That question became ParaSight.
The theme was "Make it Fun," and we took it literally. We did not want to build just another utility app with a coat of paint. We wanted to give it a personality. That meant a slime mascot named Rodney who eats your bad photos, a retro CRT aesthetic, an AI that roasts your photography skills, and a Tinder-style swipe interface for reviewing duplicates.
The goal was simple: build a tool people would actually want to open.
How We Built It
Planning and Architecture
It started with a conversation. We mapped out an MVP, debated stack options, and settled on what played to our team's strengths: React Native + Expo for mobile, FastAPI for the backend, and Groq + Gemini for AI.
We used Spec Kit to generate our initial specification, contracts, data models, API shapes, and implementation plan. That gave us a shared source of truth before a single line of code was written, which turned out to be critical when multiple people were building in parallel.
From there, we worked from a shared GitHub repository with feature branches, pull requests, and commit conventions, treating it like a real production workflow even under hackathon pressure.
The Stack
Mobile:
- React Native + Expo
- Expo Media Library for photo access
- Expo Image Manipulator for image compression
- ElevenLabs text-to-speech for Rodney's voice
Backend:
- FastAPI with a stateless architecture
- Pillow for local image analysis
- Groq with Llama 4 for roast generation
AI Pipeline:
- Local pixel analysis using sharpness, brightness, contrast, dHash duplicate detection, and RGB histogram comparison
- Groq only sees the bad photos and a text summary
- Rodney returns a personalized roast
The Duplicate Detection Pipeline
The core technical challenge was building a clustering algorithm that actually works on a real camera roll.
Our pipeline runs in three phases.
First, we use a time-window pre-filter. Photos are sorted by timestamp and grouped into 30-second anchor-based windows. Only windows with at least two photos become candidates.
Next comes visual similarity confirmation. Candidate windows are sent to a /cluster backend endpoint that computes dHash, which acts like a perceptual fingerprint, along with RGB histogram cosine similarity for every pair of images.
Photos are considered duplicates if either metric confirms similarity. dHash catches pixel-identical burst shots, while histogram comparison catches photos of the same scene taken from slightly different angles or with different lighting.
Finally, confirmed duplicate clusters move to /analyze, which scores each image on sharpness, exposure, and contrast, selects the best one to keep, and asks Groq to write a roast based on the weaker photos.
Giving Rodney a Voice
The roast does not just appear on screen. Rodney reads it out loud.
We integrated ElevenLabs text-to-speech to give Rodney a live voice, with settings pushed toward maximum expressiveness so he sounds genuinely disgusted by your blurry photos.
In Fun Mode, every swipe triggers a randomized one-liner. Rodney complains when you keep a photo and celebrates when you feed him a bad one.
The voice layer turned a functional swipe interface into something that actually made people laugh during testing.
Challenges
Clustering was harder than we expected.
Our first approach used only timestamps. Photos taken within three minutes of each other became a cluster. It grouped too aggressively and missed the point entirely.
We iterated through file-size heuristics, post-analysis filters, and eventually landed on the two-metric similarity approach.
Getting the thresholds right took real calibration. dHash Hamming distances for duplicate real-world camera photos ended up between 29 and 37 out of 64, much higher than most literature suggests for pixel-identical copies.
Token efficiency was another challenge.
Early versions sent every photo to Groq for analysis, and we burned through our API quota in a single test session.
We redesigned the pipeline so all quality evaluation happens locally with Pillow. Groq only sees the weaker photos and a short text summary, reducing token usage by roughly ten times.
Merge conflicts also became a real issue. Four people pushing to the same repository during a hackathon gets chaotic quickly.
We had several conflicts on important files like index.tsx and summary.tsx, which required careful resolution to avoid losing each other's work.
Finally, we had to design around demo-day constraints. Groq rate limits meant we needed a fallback experience that still felt intentional. If the roast API is exhausted, Rodney still has something snarky to say.
What We Learned
- Building a real duplicate detection system requires combining multiple metrics. No single algorithm is robust enough across real-world photos.
- A stateless backend made the API simpler and faster, but required more careful batching logic on the mobile side.
- The difference between a useful app and a fun app is personality. Rodney, the voice lines, the retro UI, and the roasts are what make ParaSight memorable.
- Voice adds more personality than almost any visual design choice. The moment Rodney started talking, the app felt alive in a way it did not before.
- Shipping under pressure teaches you which abstractions matter and which ones you can skip.
What’s Next
Our next step is turning ParaSight from a hackathon prototype into a polished product.
We want to improve duplicate detection further by experimenting with more advanced local similarity models, smarter clustering, and better handling of screenshots, memes, and edited photos.
We also want to expand Rodney’s personality with more voice lines, unlockable themes, and different mascot styles.
On the product side, we would like to add camera roll statistics, storage insights, batch deletion history, and support for cloud backup before deleting photos.
Long term, we think ParaSight could become a real consumer app. Everyone has a cluttered camera roll, and almost nobody enjoys cleaning it. We think there is room for a product that solves that problem while still feeling entertaining, personal, and memorable.
Built With
- axios
- elevenlabs
- fastapi
- gemini
- groq
- native
- pillow
- pydantic
- python
- react
- typescript
- uvicorn




Log in or sign up for Devpost to join the conversation.