ScholarSchema

Inspiration

Cold emailing for research can be intimidating and exhausting, yet it's a shared experience amongst so many college students. Many of the competitive labs at universities only allow entry through working with current lab members, making cold emailing a crucial step. Faculty and lab websites are often inconsistent, unstructured, and hard to navigate, making it challenging for students to identify mentors who truly align with their interests. In hopes of making this process easier and helping students find projects that they are genuinely passionate about, ScholarSchema was born. 🌟

What it does

ScholarSchema helps students find professors whose research aligns with their interests and background, then quickly draft personalized outreach emails. can select a university, choose topic chips or add custom keywords, upload a resume for better matching, browse ranked researcher cards, view recent papers, save favorites, and generate email drafts tailored to selected papers.

How we built it

Frontend React + TypeScript

core UI framework

Vite

frontend build/dev server for fast local iteration and API proxying to the Flask backend

Fetch API

call /api/search and /api/email

Backend Serper API (Google Search + Google Scholar endpoints)

used Serper to discover researcher pages and fetch recent paper metadata from Google Scholar-style result that helped bootstrap professor profiles and publication context

Google Gemini API (Generative Language API)

used Gemini to transform raw paper snippets into structured outputs (research summaries, research areas, and paper-level summaries), and to enrich noisy scraped text into cleaner profile information.

Groq API (LLM inference)

integrated Groq for fast LLM-based generation/enrichment paths, including profile/email-related generation logic during development and iteration.

Challenges we ran into

Working with unstructured data was one of the biggest challenges. Every university had different website structures for faculty pages, and even individual researchers presented information differently across personal sites, CVs, and lab pages. Normalizing this inconsistency into a single structured format required iterative prompt design and robust parsing strategies. Eventually, I developed a reliable way to extract and parse the necessary information from the websites, but not before most of my API usage had already exceeded its limits.

Accomplishments that we're proud of

I successfully built a system that transforms highly unstructured academic data into usable researcher profiles that can reduce friction in finding potential mentors. This is something that I will definitely use and I hope it will be equally valuable for other students who are interested in research. Beyond the technical outcome, I'm proud of completing the entire project by myself from start to finish and iterating through multiple failed approaches without giving up.

What we learned

I learned the importance of iterating on small controlled datasets before scaling up. Instead of immediately running large numbers of API calls, it was more effective to first refine the extraction and prompt design on a smaller sample to ensure reliable results. This approach helped reduce wasted compute, improved output quality, and made it easier to debug and refine the system before scaling.

What's next for ScholarSchema

I plan to expand ScholarSchema to include more universities and build a larger, continuously updated database. My goal is to help students everywhere easily discover research opportunities aligned with their passions and connect more directly with labs around the world.

Built With

beautiful-soup
css
flask
google-gemini-api
groq
html
javascript
local-json
localstorage
python
react
requests
serper-api
sessionstorage
typescript
vite

Updates

Angelina K Wang started this project — Apr 26, 2026 07:36 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.