Inspiration
It is a brutal, well-known statistic that 90% of startups fail. Founders often spend years of their lives and thousands of dollars building products that nobody actually wants. The consulting industry for startups is incredibly gatekept, and unless you have connections to top-tier venture capitalists, you usually get stuck relying on generic advice. We wanted to build a literal predictive oracle. A platform that doesn't just vaguely grade your idea, but actively interrogates you, cross-references your answers with thousands of successful YCombinator alumni, and predicts your mathematical probability of success.
What it does
ForeSight is a hyper-intelligent, predictive startup consultant. You sit down and go through a dynamic, 20-question adaptive interview. Our AI engine acts like a notably harsh VC partner. It reads your answers, identifies holes in your logic, and drills into vague statements to demand hard numbers regarding your market size or technical moat.
Once the interview concludes, ForeSight generates a premium consulting dashboard. It mathematically scores your startup across 12 distinct dimensions, predicts your exact 5-year funding probability and failure event risk, writes a ruthless investment thesis memo, generates three highly actionable pivot strategies to patch your weak points, and matches you with three real YC startups that succeeded in your space.
How we built it
The frontend was built using React and Vite, utilizing a complex architecture of CSS modules to create a glass-morphic, dark-mode visual experience.
Our backend runs on Python and FastAPI. The intelligence layer uses the OpenAI API for the adaptive question generation, the 12-dimensional rubric scoring, and the deal memo formulation.
But the real magic happens in our localized Machine Learning layer, which is built entirely on scikit-learn. To predict the continuous funding probability $\hat{y}$ and discrete failure risk, we trained an ensemble of Random Forest Regressors ($K=200$ trees) that map the 12-dimensional venture rubric vector $\mathbf{x} \in \mathbb{R}^{12}$ to empirical startup outcomes.
For a single decision tree $k$, the mathematical space is partitioned into $M$ regions $R_m$, and the prediction is the mean response $c_m$:
$$f_k(\mathbf{x}) = \sum_{m=1}^{M} c_m \mathbb{I}(\mathbf{x} \in R_m), \quad \text{where } c_m = \frac{1}{N_m}\sum_{\mathbf{x}_i \in R_m} y_i$$
The ensemble prediction aggregates all $K$ trees to prevent variance overfitting:
$$\hat{y}(\mathbf{x}) = \frac{1}{K} \sum_{k=1}^{K} f_k(\mathbf{x})$$
For the Inspirational Role Models feature, we built a semantic search engine using TF-IDF and k-Nearest Neighbors on a dataset of 4,600 highly successful Active and Acquired YCombinator startups. The raw text data is tokenized and weighted:
$$w_{t,d} = \text{tf}(t, d) \times \log\left(\frac{N}{|{d \in D : t \in d}|}\right)$$
The algorithm then queries the user's interview history vector $\mathbf{x}{user}$ against all alumni vectors $\mathbf{x}{alumni}$ using Cosine Similarity to find the closest thematic matches:
$$\text{sim}(\mathbf{x}{user}, \mathbf{x}{alumni}) = \frac{\mathbf{x}{user} \cdot \mathbf{x}{alumni}}{|\mathbf{x}{user}| |\mathbf{x}{alumni}|}$$
Challenges we ran into
Parsing unstructured, conversational interview text into rigid, integer-based rubric scores was incredibly tough. We had to execute some heavy prompt engineering so the model would not just hand out a generic 50 out of 100 to every founder. We actually had to force the AI to be ruthlessly critical. It also took a significant amount of time to filter our massive JSON dataset of YC startups down to strictly the successful ones, and then serialize that data into extremely fast pickle files so the FastAPI server could load them into memory instantly upon boot.
Accomplishments that we're proud of
We completely bridged the gap between generative language models and hard, quantitative machine learning algorithms. It is wildly cool that the AI can hold a natural conversation with you, and then immediately hand that context-rich data off to a localized Random forest algorithm to get cold, empirical probabilities. We are also incredibly proud of the Dashboard UI. It feels genuinely premium and intuitive.
What we learned
We learned a massive amount about prompt engineering constraints. Forcing an LLM to be extremely harsh, demand numbers, and explicitly forbid vague jargon requires very strict systemic boundaries. We also learned how to seamlessly embed mathematical models and pickle files directly into an asynchronous web server without causing major bottlenecking issues.
What's next for ForeSight
We want to integrate real-time financial tracking. In the future, founders will be able to securely upload their runway Excel sheets, and the AI will automatically extract their real-world burn rate metrics to feed directly into the algorithms for significantly richer predictions.
Built With
- chart.js
- chatgpt
- claude
- fastapi
- gemini
- html2pdf
- javascript
- numpy
- pandas
- pickle
- pydanmic
- python
- railway
- react
- scikit-learn
- tf-idf
- uvicorn
- vanilla
- vercel
- vite
- ycombinator
- zustand
Log in or sign up for Devpost to join the conversation.