OpenCloset: Devpost submission
Inspiration
Digital closets have been tried for over a decade; Cladwell, Whering, Save Your Wardrobe, Stylebook, and none scaled. The reason is brutal onboarding: nobody manually catalogs their wardrobe, and every successful product on top of wardrobe data (rental, styling, resale, dupe scouting) dies at that step. Meanwhile a single friend's closet can hold $8,000 of clothing that goes worn 20% of the time, while their best friend drops $180 on a one-wear dress for a wedding. The inventory is there. The need is there. The protocol between them is missing.
What it does
OpenCloset is "Plaid for closets." Drop your camera roll and your wardrobe becomes structured, searchable, rentable inventory across your friend graph. Gemini 3 Flash extracts every garment in every photo, deduplicates duplicate shots, and indexes each item with dual embeddings, FashionCLIP for visual, Gemini for natural-language description. Type "cream tailored blazer" on the wishlist page and in roughly 220 milliseconds you see ranked matches from every friend's closet in your graph. Click "request rental" and two Claude Sonnet 4.6 agents spin up, yours and the owner's, to negotiate price, duration, and handoff autonomously, streaming every turn into a split-pane UI. Mutual accept finalizes the deal.
How we built it
- Frontend: Next.js 16 (App Router, Turbopack) + Tailwind v4, React 19
- Backend: Next.js API route handlers + Drizzle ORM + porsager/postgres
- Database: Postgres 16 + pgvector with HNSW indexes (cosine) on image and text embedding columns
- Vision extraction: Gemini 3 Flash (preview) with strict JSON schema output, Zod-validated, low-confidence drops
- Image embeddings: FashionCLIP (
patrickjohncyh/fashion-clip, ViT-B/32) running locally on Apple M4 via MPS through a small FastAPI service, zero per-call cost - Text embeddings:
gemini-embedding-2-preview, Matryoshka-truncated to 768-dim via raw REST - Agents: Claude Sonnet 4.6 with tool use (
query_my_closet,query_friend_closet,propose_rental,counter_offer,accept,reject), information-vs-action tool separation, hard turn cap of 8 enforced in code - Streaming: Server-Sent Events for both ingestion progress and agent-turn-by-turn negotiation
- Session: Pre-seeded two-user demo graph (Alice ↔ Bob) with
?as=alice|bobswitcher
Challenges we ran into
- Reasoning tokens ate our filter. Our "does this photo contain clothing?" prompt had
maxOutputTokens: 4. Gemini 2.5/3 Flash are reasoning models, they burn hidden thinking tokens before producing output, so 4 tokens left zero for the actual answer. Every photo came back as empty string and got rejected. Fix: bump to 8192 with a defensive "accept on empty string" fallback. - Turbopack × unzipper. Unzipper unconditionally
require()s@aws-sdk/client-s3as an optional peer, and Turbopack's module resolver refused to compile around it. Swapped toadm-zip. - FashionCLIP output shape. Newer
transformersversions returnBaseModelOutputWithPooling, not a raw tensor, fromCLIPModel.get_image_features. Our TypeScript client was receiving nested lists instead of flat 512-dim arrays. Fixed by callingvision_model+visual_projectionprimitives directly in the Python service. - Gemini embedding API migration mid-hackathon.
text-embedding-004was removed from the v1beta endpoint; the replacements (gemini-embedding-001,gemini-embedding-2-preview) return 3072 dims natively where our schema expected 768. Solved with Matryoshka truncation via raw REST (outputDimensionality: 768), zero schema churn, no quality loss. - Agent tool-use loop design. Letting Claude freely interleave info-tool calls with action-tool calls led to agents either refusing to commit (all info, no action) or firing action tools prematurely. Solved with a cap-and-commit pattern: ≤4 information steps per turn, then the orchestration forces one action tool before yielding to the other agent.
Accomplishments that we're proud of
- End-to-end system from empty repo to live agent negotiation in a single build arc
- Friend-graph search latency consistently 190–280ms including the Gemini embedding call
- Retrieval quality that holds up under scrutiny: "blue striped button-up" returns the blue striped dress shirt at cosine distance 0.082; "tortoise shell glasses" returns brown tortoise-shell glasses at 0.241
- Working dedup on real photos, same shirt across multiple shots collapses to one canonical garment with multiple contributing images
- Fully autonomous two-agent negotiation with hard turn cap, tool-level vault enforcement, and real-time SSE streaming to a split-pane UI
- Zero per-call cost on image embeddings by hosting FashionCLIP locally on Apple Silicon via MPS instead of paying Replicate
What we learned
- Reasoning models need generous token budgets. Hidden thinking tokens consume your output budget invisibly. Default to ≥1024
maxOutputTokensfor any structured-output call, even trivial yes/no ones. - Matryoshka embeddings are a free lunch when providers shift dimensions. Truncating from 3072 → 768 via a single request parameter beat the alternative of schema migration.
- Run vision locally when you can. A small FastAPI service on Apple M4 MPS took 10 minutes to stand up and saved every per-call cent vs. a hosted image-embedding API, meaningful at demo-rehearsal volume.
- Verify API response shapes before writing integration code. Our five biggest bugs were all "the call returned a shape we didn't anticipate." A pre-flight
curlon every new endpoint would have saved an hour each. - Hard caps belong in code, not prompts. Agents will happily ignore "max 8 turns" written into their system prompt. The orchestration loop has to enforce it structurally.
What's next for OpenCloset
- Calendar-aware handoff. Plumb Google Calendar OAuth so agents can propose "bring the dress to Thursday's dinner" against real shared events, the hero beat we deliberately cut for time.
- iOS-native PhotoKit ingestion. Replace web ZIP upload with direct camera-roll access and passive backfill as new photos are taken.
- Dupe scouting. Embed retailer catalogs (Poshmark, Depop, Shopify) and warn users before they buy something they, or a friend, already own.
- Trust, payments, and damage flow. Stripe rails, deposits, insurance partnership, returns logistics.
- On-device vision extraction. Move garment extraction to Core ML so photos never leave the phone, the privacy story Phia's HTML-capture incident taught the industry to take seriously.
- Expanded social graph. Invite flow, tiered visibility (close friends vs. acquaintances), explicit opt-in-to-rental per item, vault-by-default for new additions.
Built With
- claude
- fastapi
- huggingface
- supabase
Log in or sign up for Devpost to join the conversation.