Inspiration: Every year, thousands of groundbreaking startups struggle to get noticed by investors, missing out on the funding they need to bring their ideas to life and solve real-world problems. Semantic Scout is changing that. What it does: Semantic Scout is a platform that empowers startups by utilizing AI-driven semantic search and real-time research analysis to connect them with funding opportunities and market trends. It categorizes recent research into topics and subtopics, identifying emerging trends and allowing VCs to discover startups aligned with these trends, while helping aspiring entrepreneurs get noticed for their work. The platform scrapes data from startup databases like YC and Wellfound, extracting detailed company profiles to create a vector space so we can use semantic search to easily look up startups by their description or industry. By using semantic search, it understands the intent behind descriptions, ensuring that even if a startup doesn’t use the exact keywords, it can still surface in relevant results. This reduces bias in discovery, helping small companies find funding they might have missed and making it easier for researchers to identify commercialization paths, ultimately accelerating innovation and growth in industries such as AI, healthcare, and climate tech. How we built it: We divide the project into three key components: data scraping, front end, and semantic search algorithms. For the research trend component, we scrape arXiv to collect details on recent papers and categorize them using keyword-based sorting across a wide range of topics. The data is then visualized, highlighting which categories have the highest volume of research, making it easy to identify emerging trends at a glance. For the data scraping process for startups, we scrape data from YC and Wellfound websites and research, extracting them into json files. We then send those json files to be created into a vector space using Llama_index which we can use semantic search to implement search on them, returning the top 5 results based on cosine similarity . For the front end, we coded using Next.js, React, html, css to create a website for the user to put in input and search for industry trends, or related startups. Challenges we ran into: We ran into a lot of challenges during the project. There were bugs with the code in every single compartment that we tried to build. In the data scraping function, we encountered problems of not being able to extract data properly, or necessary values being missing. For the frontend, sometimes there would be a hard time connecting to the server. However, we had the biggest problem when we’re trying to implement semantic search. The query was returning null and displaying an empty array. To resolve this, we built a retrieval-augmented generation (RAG) system from scratch while also debugging several smaller errors along the way.
Accomplishments that we're proud of: We are proud of our semantic search model because we made a lot of effort into debugging it and finally we managed to make it work after staying up the whole night, and changing the whole model, and we learned about different models/choosing the right mode; This allows us to search for similar words or words with connections and doesn’t have to be specific keywords which will be a huge benefit for small startups just starting which might not get as much recognition initially. Through our model, however, we will be able to present them on our platform using the semantic search model. What we learned: We’ve learned a lot from this project for example data scraping from websites, how to implement a semantic search. What's next for Semantic Scout: We wish to continue this project and implement other functionalities to the project and keep making improvements and possibly turning this into a fullflesh startup that helps other startups. Here are some more specific things we want to implement. We want to add enhanced AI & data expansion—improve semantic search, integrate more startup and research databases, and provide better trend visualization. Another idea is automated Research-to-Startup Matching, where we would use AI to connect emerging research with relevant startups and funding opportunities in real time. As a bonus feature, we could also add a collaborative Innovation Hub, as in create a platform for researchers, startups, and investors to collaborate and accelerate commercialization.
Log in or sign up for Devpost to join the conversation.