Affiniti

Inspiration

We’ve all been there: you walk into a crowded hackathon hall or a networking event, and the room is buzzing, but you have no idea where to start. You see 200 people, but you don't know who shares your niche interest in building AI tools for businesses or who is looking for a co-founder with your exact skill set.

We built Affiniti to act as an intelligent "social wingman." We wanted to move past the "LinkedIn-style" scrolling and create a system that understands intent, not just who you are, but what you want to achieve today.

What it does

Affiniti is an AI-driven networking layer for physical events. It eliminates the "cold start" problem of meeting strangers by providing:

Semantic Matching: Instead of keyword matching, it uses local LLMs to understand the "vibe" of your profile and current goals.
Intent-Based Discovery: Users can set a temporary "Intent" (e.g., "Looking for a Swift developer for a side project"), which instantly re-ranks their discovery feed.
AI Icebreakers: For every top match, the app generates a personalized, context-aware icebreaker to help you start the conversation.
Live Status: A real-time broadcaster showing if an attendee is "Open to Chat," "Deep in Work," or "Grabbing Food."
Privacy First: Users choose exactly which data points to share for each specific event, and all sensitive AI processing happens locally.

How we built it

Affiniti was forged with a focus on high-performance local inference. The Tech Stack: Mobile: iOS (SwiftUI, MVVM, Keychain) Server: FastAPI (Async), SQLAlchemy Database: PostgreSQL + pgvector Local LLM: Gemma 4 via Apple Silicon Metal (mlx-lm) Embeddings: sentence-transformers (all-mpnet-base-v2)

The AI Pipeline

Normalization: Raw bios are processed by Gemma 4 to create a clean, 3–5 sentence semantic summary.
Vectorization: These summaries are converted into 768-dimensional embeddings and stored in PostgreSQL.
The "Blended" Query: We use a weighted cosine similarity search. When a user has an active intent, the match score is calculated: Score = (0.6 * Profile_Similarity) + (0.4 * Intent_Similarity)
Diversity Pass: To prevent "echo chambers," we apply a constraint that ensures the top 4 matches don't all share the same primary skill tag.

Challenges we ran into

The Latency Wall: We initially used Ollama via HTTP, but the overhead added 2 seconds per call. In a hackathon, that’s an eternity. We pivoted to mlx-lm, running directly on the Apple Silicon Metal GPU, which dropped our inference time by 70%.

Thread-Local Nightmares: MLX Metal streams are thread-local. Integrating this into an asyncio FastAPI loop caused immediate crashes. We had to implement a dedicated ThreadPoolExecutor(max_workers=1) to serialize all inference calls while keeping the rest of the API responsive.

Network Fluidity: Developing for physical iPhones using ngrok meant the backend URL changed constantly. We had to build a custom Backend URL Override in the app's settings to keep the demo from breaking every time the tunnel restarted.

JWT Headaches: Debugging Auth0 tokens on physical devices with shifting local IPs was a challenge. We built a "Demo Mode" bypass to ensure the live presentation wouldn't be derailed by an expired JWKS cache.

Accomplishments that we're proud of

Zero Cloud Dependence: We are running a state-of-the-art LLM (Gemma 4) and vector search entirely on a local MacBook Pro, serving two physical iPhones. No API credits, no cloud latency.
The UI/UX Polish: We managed to build 14 distinct SwiftUI screens, including a complex privacy-masking system that ensures users only see what others have consented to share.
Efficient Vector Search: Implementing HNSW (Hierarchical Navigable Small World) indexes in pgvector allowed for near-instant match results even as we seeded the database with demo users.

What we learned

Local Inference is Ready: The "Apple Silicon as a Server" paradigm is incredibly viable for small-to-medium scale applications.
The Power of Generalization: Using an LLM to "clean" a profile before embedding it significantly improves similarity scores compared to embedding raw, messy user input.
Async vs. Threads: We gained a deep understanding of when asyncio isn't enough, specifically when dealing with hardware-bound, thread-local resources like GPUs.

What's next for Affiniti

Proximity Awareness: Integrating iBeacon or Ultra Wideband (UWB) to notify you when a "High-Affinity" match is within 10 feet of you.
Group Matching: AI-driven suggestions for "Mini-Masterminds", grouping 3 or 4 people together who have complementary skills for a specific project.
Agentic Scheduling: Allowing the AI to suggest a time to meet based on the live "Status" of both users.
Cloud Scaling: While we love local inference, we plan to explore a hybrid approach to support thousands of concurrent users at massive conferences.