Shark Finder

Inspiration

We’ve all watched founders spend weeks cold-emailing the wrong investors. Meanwhile, investors wade through generic decks that don’t fit their thesis. Shark Finder aims to compress this matchmaking loop into minutes by collecting structured signals from both sides and ranking the best fits, transparently.

What it does

Shark Finder has two entry points:

Investors: create a profile (risk tolerance, industry focus, stage, check size, board seat preference, years active, # of investments, etc.) and get a ranked feed of firms/startups that match their thesis.
Businesses (founders): shoot their shot through an on-site, immersive VC “pitch” simulation (problem, traction, market, ask) and receive a list of investors most likely to engage.

Under the hood we compute a match score from overlapping attributes (e.g., industry, stage, investment size) plus soft-signals (experience, follow-on behavior). Results are ranked and filterable by a minimum score threshold.

How we built it

Backend: FastAPI service with SQLAlchemy models for Investors and Firms, and endpoints to create profiles and fetch matches. The repo contains a backend/ folder with main.py and model files under app/models/ (e.g., investor.py, firm.py).
Database: PostgreSQL (via Supabase). We mapped our domain to normalized tables (investor profile, firm profile) and kept room for future features like interaction signals and pitch embeddings.
Frontend: React + TypeScript + Vite in frontend/, with simple forms for onboarding and a dashboard to view ranked matches (and adjust the minimum score threshold).
Cloud: We designed for AWS integration with Cognito (auth + storage) and wired environment variables for a hosted Postgres (Supabase), so the service can run locally or in the cloud with minor config changes. We used OpenAI Whisper, alongside Google Gemini to parse the business pitch with state of the art accuracy.

Architecture

API: FastAPI in backend/main.py exposes routes like:
- POST /investors & POST /firms to create/update-profiles
- GET /investors/{id}/matching-firms?min_score=...&limit=...
- GET /matching/investors-firms for batch or cross-side comparisons These live alongside SQLAlchemy models in backend/app/models/.
DB: Postgres (Supabase) with typed columns for thesis & preferences.
Web: React + TypeScript + Vite app in frontend/ for onboarding and results.

Matching (scoring) approach

We normalize attributes and compute a weighted score: $$ \text{score} = \sum_i w_i \cdot f_i(\text{investor}, \text{firm}),\quad \text{with } \sum_i w_i = 1 $$ where $f_i$ handles per-feature compatibility (exact/partial match, distance bucketing, range fit for check size, etc.). We expose min_score and limit as query params to return a ranked, thresholded list.

What we learned

Schema first, features second. The quality of matches depends on properly modeled attributes (e.g., separate “investment size” vs. “fund AUM” vs. “reserved capital”).
Start with explainability. Judges (and users) loved that we can point to why something matched (industry, stage, check size), and then layer ML afterward.
Keep infra boring. A small, well-typed FastAPI codebase and a Postgres you can inspect beats over-engineering when time is tight.

Challenges

Schema drift during a hackathon: Balancing “move fast” with data integrity as we added investor/firm fields (e.g., risk tolerance, check size, stage) required careful migrations and seed data.
Signal design: Choosing which attributes matter most (and how to weight them) without overfitting to a single use-case.
Explainability vs. simplicity: We wanted transparent scoring while keeping the UI and API simple.

What’s next

LLM-assisted intake: conversational data capture (“tell us about your fund/startup”) that writes structured fields + a pitch embedding.
Embedding-based matching: augment the rule score with cosine similarity between investor theses and founder pitches.
Event signals: track opens/saves/shortlists as weak labels to adapt weights automatically.