FrameIQ

Inspiration

Every day, over 500 million photos are shared across social media. Behind every one of those images is a human face — carrying emotion, identity, and story. Yet most AI tools treat faces as nothing more than a grid of pixels to be mathematically compared.

We started with a single question: What if AI could understand faces the way humans do?

Not by comparing pixel distances, but by reasoning about facial structure — the curve of a jawline, the spacing between eyes, the way an expression shifts the geometry of a face. That question became FrameIQ.

What It Does

FrameIQ is a multimodal AI platform with three core features:

👤 Face Recognition — Creator ID

Upload or capture a face using your live camera. FrameIQ registers facial identity using structural analysis and recognizes it across future uploads — with a confidence score showing how certain the match is.

🔍 Face Analysis — Emotion Intel

Drop any photo and FrameIQ returns:

Age estimation
Gender detection
Emotion classification (Happy, Neutral, Serious, Surprised, Sad)
Ethnicity detection
Animated confidence scores for each attribute

📝 Text Summarization — Caption AI

Paste any article, script, caption, or report. Choose your style (Concise, Detailed, Bullet Points, Simple) and word limit. FrameIQ returns a clean, intelligent summary powered by Gemini AI.

How We Built It

Layer	Technology
Frontend	React + TypeScript + Vite
Styling	Tailwind CSS + Framer Motion
Backend	Node.js + Express + TypeScript
AI Engine	Google Gemini AI (multimodal)
Face Detection	OpenCV Haar Cascade
Face Matching	Custom KNN (Euclidean distance, k=5)
Database	PostgreSQL
Deployment	Replit

The architecture follows a clean client-server separation:

The React frontend handles live camera capture using the browser's native getUserMedia API, renders results in real time, and communicates with the backend via typed REST API calls
The Express backend processes uploaded images, runs OpenCV face detection, serializes face vectors as JSON, and orchestrates all Gemini API calls
Gemini AI powers both face analysis and text summarization through carefully structured prompts that return clean, parseable JSON
PostgreSQL stores registered faces, analysis history, and summaries with full timestamps

The face recognition system works by:

Detecting the face region using Haar Cascade
Resizing to a normalized 100×100 pixel crop
Flattening to a feature vector of length $100 \times 100 \times 3 = 30{,}000$
Comparing against stored vectors using Euclidean distance:

$$d(v_1, v_2) = \sqrt{\sum_{i=1}^{n}(v_{1i} - v_{2i})^2}$$

Selecting the k=5 nearest neighbors and returning the majority label

Challenges We Ran Into

🔴 Gemini Multimodal Integration Getting Gemini to return clean, structured JSON consistently was harder than expected. The model would sometimes wrap responses in markdown code fences or add conversational preamble. We solved this with response cleaning:

const cleaned = raw.replace(/```json|```/g, '').trim();
const result = JSON.parse(cleaned);

🔴 Live Camera on Mobile Browser camera APIs behave differently across devices. Getting getUserMedia to work reliably on both desktop and mobile required careful stream management — starting, capturing a frame onto a canvas, converting to Blob, and cleanly stopping the stream.

🔴 Face Vector Storage Without Python The original design used NumPy .npy files. Moving to a pure TypeScript backend meant reimplementing face vector serialization as plain JSON arrays and rewriting the KNN algorithm from scratch in TypeScript.

🔴 Confidence Score Calibration Raw Euclidean distances don't map intuitively to percentages. We normalized confidence using:

$$\text{confidence} = \frac{100}{1 + d_{\text{avg}}}$$

where $d_{\text{avg}}$ is the average distance across the top 3 nearest neighbors.

Accomplishments That We're Proud Of

✅ Built a fully functional multimodal AI app in under 48 hours
✅ Implemented KNN face recognition from scratch in TypeScript with no ML libraries
✅ Achieved live camera capture on both mobile and desktop using only native browser APIs
✅ Integrated Gemini AI for both vision and language tasks in a single unified app
✅ Shipped a production-ready mobile-first UI that feels like a real product, not a hackathon demo

What We Learned

Gemini's multimodal capabilities are remarkably powerful for structured analysis tasks when prompted correctly
Prompt engineering matters — the difference between a response that returns clean JSON and one that wraps it in markdown is a single line in your system prompt
Browser-native APIs (getUserMedia, canvas, Blob) are more capable than most developers realize — no libraries needed for live camera capture
TypeScript across the full stack dramatically reduces bugs at the API boundary — shared types between frontend and backend eliminated an entire class of runtime errors