Inspiration

Pranav and I are part of a variety of communities—from entrepreneurship circles in D.C. (DCSF) to AI enthusiasts (GenAI Collective) to music lovers. Yet, despite the abundance of these groups, we often found ourselves struggling to connect with the right people. Imagine missing out on your next co-founder, collaborator, or mentor simply because you didn’t know they were right there, in your community. That’s exactly what we were experiencing.

In a world where connecting digitally has never been easier, we realized something important was missing: the ability to find relevant people in our local communities who could make a meaningful impact in our lives. This frustration inspired us to create OrgSearch, a platform designed to help people discover and connect with others in their community effortlessly.

What it does

  1. Search based on shared interests, goals, or expertise.
  2. Receive a ranked list of the most relevant people to connect with in your community.

How we built it

We started with a predefined dataset of profiles from a community we are actively part of. Each profile contained links to various sources of information, and we wanted to harness this data to create meaningful connections. Using Beautiful Soup, we scraped through these links, gathering all the relevant content tied to each profile. After scraping through all the relevant data, we chunked the text into smaller, digestible segments for analysis.

Then, we used Cohere’s embedding models, which allowed us to embed the text into high-dimensional vectors representing the essence of each person's profile. These embeddings were stored in a MongoDB database, optimized for fast retrieval + semantic searching.

The cool part happens when a user enters a query. We used semantic search to compare the query against the embeddings using cosine similarity, identifying the most relevant chunks of text. For each result, we generated a relevance score that ranks profiles based on how closely they match the query. In addition to this score, we used a retrieval-augmented generation (RAG) approach to craft custom bios for each person. These bios are tailored to the query, offering a personalized summary of why someone is relevant and worth connecting with.

Challenges we ran into

The biggest challenge was extracting meaningful data from links using Beautiful Soup, especially as it was our first experience with web scraping. Many links had inconsistent and/or dynamic structures, requiring custom parsing logic.

Processing the scraped text for embedding models added complexity. We had to chunk the data while preserving context to ensure embeddings captured relevant information. While implementing semantic search in the MongoDB database, we faced another hurdle—MongoDB lacked a built-in function for calculating cosine similarity, so we had to encode the cosine calculation manually to rank profiles effectively. Fine-tuning Cohere embeddings and integrating these calculations required significant debugging to handle the diverse and noisy dataset efficiently.

Accomplishments that we're proud of

We're proud that we can give this to the communities that we're in and get to see people use this product.

What we learned

We learned how to efficiently scrape and process unstructured data + implement semantic search using embeddings and encode custom cosine similarity calculations in MongoDB. We also gained experience integrating AI models to generate meaningful outputs tailored to user queries.

What's next for OrgSearch

We plan to expand OrgSearch to support more communities and refine our recommendation algorithms for greater relevance.

Built With

Share this project:

Updates