Inspiration
Every day, over 500 million photos are shared across social media. Behind every one of those images is a human face — carrying emotion, identity, and story. Yet most AI tools treat faces as nothing more than a grid of pixels to be mathematically compared.
We started with a single question: What if AI could understand faces the way humans do?
Not by comparing pixel distances, but by reasoning about facial structure — the curve of a jawline, the spacing between eyes, the way an expression shifts the geometry of a face. That question became FrameIQ.
What It Does
FrameIQ is a multimodal AI platform with three core features:
👤 Face Recognition — Creator ID
Upload or capture a face using your live camera. FrameIQ registers facial identity using structural analysis and recognizes it across future uploads — with a confidence score showing how certain the match is.
🔍 Face Analysis — Emotion Intel
Drop any photo and FrameIQ returns:
- Age estimation
- Gender detection
- Emotion classification (Happy, Neutral, Serious, Surprised, Sad)
- Ethnicity detection
- Animated confidence scores for each attribute
📝 Text Summarization — Caption AI
Paste any article, script, caption, or report. Choose your style (Concise, Detailed, Bullet Points, Simple) and word limit. FrameIQ returns a clean, intelligent summary powered by Gemini AI.
How We Built It
| Layer | Technology |
|---|---|
| Frontend | React + TypeScript + Vite |
| Styling | Tailwind CSS + Framer Motion |
| Backend | Node.js + Express + TypeScript |
| AI Engine | Google Gemini AI (multimodal) |
| Face Detection | OpenCV Haar Cascade |
| Face Matching | Custom KNN (Euclidean distance, k=5) |
| Database | PostgreSQL |
| Deployment | Replit |
The architecture follows a clean client-server separation:
- The React frontend handles live camera capture using the
browser's native
getUserMediaAPI, renders results in real time, and communicates with the backend via typed REST API calls - The Express backend processes uploaded images, runs OpenCV face detection, serializes face vectors as JSON, and orchestrates all Gemini API calls
- Gemini AI powers both face analysis and text summarization through carefully structured prompts that return clean, parseable JSON
- PostgreSQL stores registered faces, analysis history, and summaries with full timestamps
The face recognition system works by:
- Detecting the face region using Haar Cascade
- Resizing to a normalized 100×100 pixel crop
- Flattening to a feature vector of length $100 \times 100 \times 3 = 30{,}000$
- Comparing against stored vectors using Euclidean distance:
$$d(v_1, v_2) = \sqrt{\sum_{i=1}^{n}(v_{1i} - v_{2i})^2}$$
- Selecting the k=5 nearest neighbors and returning the majority label
Challenges We Ran Into
🔴 Gemini Multimodal Integration Getting Gemini to return clean, structured JSON consistently was harder than expected. The model would sometimes wrap responses in markdown code fences or add conversational preamble. We solved this with response cleaning:
const cleaned = raw.replace(/```json|```/g, '').trim();
const result = JSON.parse(cleaned);
🔴 Live Camera on Mobile
Browser camera APIs behave differently across devices. Getting
getUserMedia to work reliably on both desktop and mobile required
careful stream management — starting, capturing a frame onto a
canvas, converting to Blob, and cleanly stopping the stream.
🔴 Face Vector Storage Without Python
The original design used NumPy .npy files. Moving to a pure
TypeScript backend meant reimplementing face vector serialization
as plain JSON arrays and rewriting the KNN algorithm from scratch
in TypeScript.
🔴 Confidence Score Calibration Raw Euclidean distances don't map intuitively to percentages. We normalized confidence using:
$$\text{confidence} = \frac{100}{1 + d_{\text{avg}}}$$
where $d_{\text{avg}}$ is the average distance across the top 3 nearest neighbors.
Accomplishments That We're Proud Of
- ✅ Built a fully functional multimodal AI app in under 48 hours
- ✅ Implemented KNN face recognition from scratch in TypeScript with no ML libraries
- ✅ Achieved live camera capture on both mobile and desktop using only native browser APIs
- ✅ Integrated Gemini AI for both vision and language tasks in a single unified app
- ✅ Shipped a production-ready mobile-first UI that feels like a real product, not a hackathon demo
What We Learned
- Gemini's multimodal capabilities are remarkably powerful for structured analysis tasks when prompted correctly
- Prompt engineering matters — the difference between a response that returns clean JSON and one that wraps it in markdown is a single line in your system prompt
- Browser-native APIs (
getUserMedia,canvas,Blob) are more capable than most developers realize — no libraries needed for live camera capture - TypeScript across the full stack dramatically reduces bugs at the API boundary — shared types between frontend and backend eliminated an entire class of runtime errors
What's Next for FrameIQ
FrameIQ is just getting started. Here's the roadmap:
🎬 Video Intelligence (Next Major Feature)
- Process video files frame by frame
- Track emotional arcs across an entire video
- Identify when audience engagement peaks and drops
- Built for content creators and social media teams
📊 Creator Analytics Dashboard
- Upload your social media content
- Get a full emotional and demographic breakdown of faces in your posts
- See which content generates the most positive emotional response
🔗 Social Media Integrations
- Connect directly to TikTok, Instagram, and YouTube
- Auto-analyze posted content and surface insights
- Suggest caption improvements using Text Summarization
🏢 Enterprise Use Cases
- HR teams analyzing candidate video interviews
- Brand teams measuring talent emotion consistency across campaigns
- Media companies building searchable face databases across archives
FrameIQ started as a hackathon project. It's ending as a product roadmap.
Built With
- api
- express.js
- gemini
- html5
- minimax
- node.js
- postgresql
- react
- replit
- typescript
Log in or sign up for Devpost to join the conversation.