Inspiration

Every day, over 500 million photos are shared across social media. Behind every one of those images is a human face — carrying emotion, identity, and story. Yet most AI tools treat faces as nothing more than a grid of pixels to be mathematically compared.

We started with a single question: What if AI could understand faces the way humans do?

Not by comparing pixel distances, but by reasoning about facial structure — the curve of a jawline, the spacing between eyes, the way an expression shifts the geometry of a face. That question became FrameIQ.


What It Does

FrameIQ is a multimodal AI platform with three core features:

👤 Face Recognition — Creator ID

Upload or capture a face using your live camera. FrameIQ registers facial identity using structural analysis and recognizes it across future uploads — with a confidence score showing how certain the match is.

🔍 Face Analysis — Emotion Intel

Drop any photo and FrameIQ returns:

  • Age estimation
  • Gender detection
  • Emotion classification (Happy, Neutral, Serious, Surprised, Sad)
  • Ethnicity detection
  • Animated confidence scores for each attribute

📝 Text Summarization — Caption AI

Paste any article, script, caption, or report. Choose your style (Concise, Detailed, Bullet Points, Simple) and word limit. FrameIQ returns a clean, intelligent summary powered by Gemini AI.


How We Built It

Layer Technology
Frontend React + TypeScript + Vite
Styling Tailwind CSS + Framer Motion
Backend Node.js + Express + TypeScript
AI Engine Google Gemini AI (multimodal)
Face Detection OpenCV Haar Cascade
Face Matching Custom KNN (Euclidean distance, k=5)
Database PostgreSQL
Deployment Replit

The architecture follows a clean client-server separation:

  • The React frontend handles live camera capture using the browser's native getUserMedia API, renders results in real time, and communicates with the backend via typed REST API calls
  • The Express backend processes uploaded images, runs OpenCV face detection, serializes face vectors as JSON, and orchestrates all Gemini API calls
  • Gemini AI powers both face analysis and text summarization through carefully structured prompts that return clean, parseable JSON
  • PostgreSQL stores registered faces, analysis history, and summaries with full timestamps

The face recognition system works by:

  1. Detecting the face region using Haar Cascade
  2. Resizing to a normalized 100×100 pixel crop
  3. Flattening to a feature vector of length $100 \times 100 \times 3 = 30{,}000$
  4. Comparing against stored vectors using Euclidean distance:

$$d(v_1, v_2) = \sqrt{\sum_{i=1}^{n}(v_{1i} - v_{2i})^2}$$

  1. Selecting the k=5 nearest neighbors and returning the majority label

Challenges We Ran Into

🔴 Gemini Multimodal Integration Getting Gemini to return clean, structured JSON consistently was harder than expected. The model would sometimes wrap responses in markdown code fences or add conversational preamble. We solved this with response cleaning:

const cleaned = raw.replace(/```json|```/g, '').trim();
const result = JSON.parse(cleaned);

🔴 Live Camera on Mobile Browser camera APIs behave differently across devices. Getting getUserMedia to work reliably on both desktop and mobile required careful stream management — starting, capturing a frame onto a canvas, converting to Blob, and cleanly stopping the stream.

🔴 Face Vector Storage Without Python The original design used NumPy .npy files. Moving to a pure TypeScript backend meant reimplementing face vector serialization as plain JSON arrays and rewriting the KNN algorithm from scratch in TypeScript.

🔴 Confidence Score Calibration Raw Euclidean distances don't map intuitively to percentages. We normalized confidence using:

$$\text{confidence} = \frac{100}{1 + d_{\text{avg}}}$$

where $d_{\text{avg}}$ is the average distance across the top 3 nearest neighbors.


Accomplishments That We're Proud Of

  • ✅ Built a fully functional multimodal AI app in under 48 hours
  • ✅ Implemented KNN face recognition from scratch in TypeScript with no ML libraries
  • ✅ Achieved live camera capture on both mobile and desktop using only native browser APIs
  • ✅ Integrated Gemini AI for both vision and language tasks in a single unified app
  • ✅ Shipped a production-ready mobile-first UI that feels like a real product, not a hackathon demo

What We Learned

  • Gemini's multimodal capabilities are remarkably powerful for structured analysis tasks when prompted correctly
  • Prompt engineering matters — the difference between a response that returns clean JSON and one that wraps it in markdown is a single line in your system prompt
  • Browser-native APIs (getUserMedia, canvas, Blob) are more capable than most developers realize — no libraries needed for live camera capture
  • TypeScript across the full stack dramatically reduces bugs at the API boundary — shared types between frontend and backend eliminated an entire class of runtime errors

What's Next for FrameIQ

FrameIQ is just getting started. Here's the roadmap:

🎬 Video Intelligence (Next Major Feature)

  • Process video files frame by frame
  • Track emotional arcs across an entire video
  • Identify when audience engagement peaks and drops
  • Built for content creators and social media teams

📊 Creator Analytics Dashboard

  • Upload your social media content
  • Get a full emotional and demographic breakdown of faces in your posts
  • See which content generates the most positive emotional response

🔗 Social Media Integrations

  • Connect directly to TikTok, Instagram, and YouTube
  • Auto-analyze posted content and surface insights
  • Suggest caption improvements using Text Summarization

🏢 Enterprise Use Cases

  • HR teams analyzing candidate video interviews
  • Brand teams measuring talent emotion consistency across campaigns
  • Media companies building searchable face databases across archives

FrameIQ started as a hackathon project. It's ending as a product roadmap.

Share this project:

Updates