setos.ai

TEAM: Arjun -- Backend Development; Kelly -- Frontend Development

Presentation + Demo (Demo at 1:14) : https://youtu.be/0E39KpYYs78

Inspiration

Every day, millions of research papers are published, yet most people, no matter how curious, can't access the knowledge inside them. Google Scholar is overwhelming. ArXiv and PubMed are unfiltered firehoses. Even finding papers is a dense, jargon-heavy process. There is no clear, guided way to begin learning from scientific literature.

We are two high school students who met at a research program and faced the same issue: where to begin.

That's why we built setos.ai.

What it does

We use natural language processing (NLP) and machine learning (ML) to convert any normal research question (e.g., “How does mutation count affect tumor behavior?”) into a step-by-step checklist (“roadmap”) of papers to read.

From there, we use LLMs to create study aids such as:

Practice questions
Summaries
Vocabulary guides

Setos.ai is your one-stop shop for becoming an expert in any research topic.

How we built it

Website: Python FastAPI backend + React frontend
Paper database: Supabase with PostgreSQL
Roadmap creation:
- Uses Sci-BERT embeddings and LLM query expansion to map your question into the same semantic space as millions of research papers
- Finds truly relevant matches without complex keyword searching
- Organizes a personalized learning roadmap using the Kneedle algorithm, citation counts, and publication dates
Study aids: Powered by the Gemini API (free tier)

Challenges we ran into

We faced significant difficulties with:

Sourcing high-quality paper data — API rate limits and cloud free-tier restrictions were very limiting
Roadmap creation — initially tricky, but we settled on cosine similarity + Kneedle for simplicity

In the future, we hope to secure funding to overcome these limitations.

Accomplishments we're proud of

Built a fully functional prototype that converts research questions into personalized reading roadmaps
Integrated LLMs for study aids directly from papers
Developed an efficient semantic matching pipeline using Sci-BERT embeddings

What we learned

The importance of data quality and pipeline scalability for research-focused apps
Cloud storage and optimization challenges
Real-world application of NLP algorithms

What's next for setos.ai

Increase paper coverage with PubMed, ArXiv, and BioArxiv dumps
Implement Gaussian Mixture Models (GMM) for more accurate roadmap creation
Use citation networks to improve suggestion ordering
Fine-tune LLMs to improve explanations, definitions, and practice questions
Build a full recommendation system for paper suggestions beyond roadmaps
Secure funding to transform this prototype into a fully featured tool