OpenCloset: Devpost submission

Inspiration

Digital closets have been tried for over a decade; Cladwell, Whering, Save Your Wardrobe, Stylebook, and none scaled. The reason is brutal onboarding: nobody manually catalogs their wardrobe, and every successful product on top of wardrobe data (rental, styling, resale, dupe scouting) dies at that step. Meanwhile a single friend's closet can hold $8,000 of clothing that goes worn 20% of the time, while their best friend drops $180 on a one-wear dress for a wedding. The inventory is there. The need is there. The protocol between them is missing.

What it does

OpenCloset is "Plaid for closets." Drop your camera roll and your wardrobe becomes structured, searchable, rentable inventory across your friend graph. Gemini 3 Flash extracts every garment in every photo, deduplicates duplicate shots, and indexes each item with dual embeddings, FashionCLIP for visual, Gemini for natural-language description. Type "cream tailored blazer" on the wishlist page and in roughly 220 milliseconds you see ranked matches from every friend's closet in your graph. Click "request rental" and two Claude Sonnet 4.6 agents spin up, yours and the owner's, to negotiate price, duration, and handoff autonomously, streaming every turn into a split-pane UI. Mutual accept finalizes the deal.

How we built it

Frontend: Next.js 16 (App Router, Turbopack) + Tailwind v4, React 19
Backend: Next.js API route handlers + Drizzle ORM + porsager/postgres
Database: Postgres 16 + pgvector with HNSW indexes (cosine) on image and text embedding columns
Vision extraction: Gemini 3 Flash (preview) with strict JSON schema output, Zod-validated, low-confidence drops
Image embeddings: FashionCLIP (patrickjohncyh/fashion-clip, ViT-B/32) running locally on Apple M4 via MPS through a small FastAPI service, zero per-call cost
Text embeddings: gemini-embedding-2-preview, Matryoshka-truncated to 768-dim via raw REST
Agents: Claude Sonnet 4.6 with tool use (query_my_closet, query_friend_closet, propose_rental, counter_offer, accept, reject), information-vs-action tool separation, hard turn cap of 8 enforced in code
Streaming: Server-Sent Events for both ingestion progress and agent-turn-by-turn negotiation
Session: Pre-seeded two-user demo graph (Alice ↔ Bob) with ?as=alice|bob switcher

Challenges we ran into

Reasoning tokens ate our filter. Our "does this photo contain clothing?" prompt had maxOutputTokens: 4. Gemini 2.5/3 Flash are reasoning models, they burn hidden thinking tokens before producing output, so 4 tokens left zero for the actual answer. Every photo came back as empty string and got rejected. Fix: bump to 8192 with a defensive "accept on empty string" fallback.
Turbopack × unzipper. Unzipper unconditionally require()s @aws-sdk/client-s3 as an optional peer, and Turbopack's module resolver refused to compile around it. Swapped to adm-zip.
FashionCLIP output shape. Newer transformers versions return BaseModelOutputWithPooling, not a raw tensor, from CLIPModel.get_image_features. Our TypeScript client was receiving nested lists instead of flat 512-dim arrays. Fixed by calling vision_model + visual_projection primitives directly in the Python service.
Gemini embedding API migration mid-hackathon. text-embedding-004 was removed from the v1beta endpoint; the replacements (gemini-embedding-001, gemini-embedding-2-preview) return 3072 dims natively where our schema expected 768. Solved with Matryoshka truncation via raw REST (outputDimensionality: 768), zero schema churn, no quality loss.
Agent tool-use loop design. Letting Claude freely interleave info-tool calls with action-tool calls led to agents either refusing to commit (all info, no action) or firing action tools prematurely. Solved with a cap-and-commit pattern: ≤4 information steps per turn, then the orchestration forces one action tool before yielding to the other agent.

Accomplishments that we're proud of

End-to-end system from empty repo to live agent negotiation in a single build arc
Friend-graph search latency consistently 190–280ms including the Gemini embedding call
Retrieval quality that holds up under scrutiny: "blue striped button-up" returns the blue striped dress shirt at cosine distance 0.082; "tortoise shell glasses" returns brown tortoise-shell glasses at 0.241
Working dedup on real photos, same shirt across multiple shots collapses to one canonical garment with multiple contributing images
Fully autonomous two-agent negotiation with hard turn cap, tool-level vault enforcement, and real-time SSE streaming to a split-pane UI
Zero per-call cost on image embeddings by hosting FashionCLIP locally on Apple Silicon via MPS instead of paying Replicate

What we learned

Reasoning models need generous token budgets. Hidden thinking tokens consume your output budget invisibly. Default to ≥1024 maxOutputTokens for any structured-output call, even trivial yes/no ones.
Matryoshka embeddings are a free lunch when providers shift dimensions. Truncating from 3072 → 768 via a single request parameter beat the alternative of schema migration.
Run vision locally when you can. A small FastAPI service on Apple M4 MPS took 10 minutes to stand up and saved every per-call cent vs. a hosted image-embedding API, meaningful at demo-rehearsal volume.
Verify API response shapes before writing integration code. Our five biggest bugs were all "the call returned a shape we didn't anticipate." A pre-flight curl on every new endpoint would have saved an hour each.
Hard caps belong in code, not prompts. Agents will happily ignore "max 8 turns" written into their system prompt. The orchestration loop has to enforce it structurally.

What's next for OpenCloset

Calendar-aware handoff. Plumb Google Calendar OAuth so agents can propose "bring the dress to Thursday's dinner" against real shared events, the hero beat we deliberately cut for time.
iOS-native PhotoKit ingestion. Replace web ZIP upload with direct camera-roll access and passive backfill as new photos are taken.
Dupe scouting. Embed retailer catalogs (Poshmark, Depop, Shopify) and warn users before they buy something they, or a friend, already own.
Trust, payments, and damage flow. Stripe rails, deposits, insurance partnership, returns logistics.
On-device vision extraction. Move garment extraction to Core ML so photos never leave the phone, the privacy story Phia's HTML-capture incident taught the industry to take seriously.
Expanded social graph. Invite flow, tiered visibility (close friends vs. acquaintances), explicit opt-in-to-rental per item, vault-by-default for new additions.

Built With

claude
fastapi
huggingface
supabase

Updates

Brian Liu started this project — Apr 19, 2026 12:43 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.