FirstPR

Paste a GitHub URL, understand the codebase, make your first contribution.

Inspiration

I cloned a repo with 347 Python files. The README said "TODO: Add docs." No CONTRIBUTING.md. I spent three hours just trying to figure out where authentication was handled.

Every developer has been there. You find a project you want to contribute to, pick a "good first issue," and then... nothing. Which file do I even open? How does this thing work? Why are there four different config files?

The problem isn't that the issues are too hard. It's that nobody explains the codebase. You spend hours grepping through imports, reading outdated docs, trying to build a mental map of a system someone else designed.

I thought about all the times I gave up on contributing because I couldn't figure out where to start. Then I realized this isn't just my problem—it's everyone's problem. Thousands of developers want to help with open source but can't get past understanding the project structure.

So I built FirstPR. Paste a GitHub URL, get an instant breakdown of the entire codebase—architecture diagram, ranked issues, tech stack explanation, and a roadmap to your first contribution.

What it does

FirstPR analyzes any GitHub repository and builds an interactive dashboard to help you get started.

Architecture visualization - Gemini generates a diagram showing how the codebase is organized. Instead of opening 20 files to understand the structure, you see it immediately.

File explorer with explanations - Browse the codebase with syntax highlighting. Click any file and hit "Explain" to understand what it does and why it exists.

Tech stack breakdown - Parses package.json, requirements.txt, Cargo.toml, and other config files, then explains why each dependency matters to the project.

Smart issue ranking - Analyzes all open issues and ranks them by how beginner-friendly they are. We look at comment count, labels like "good-first-issue," and how recently they were opened. New contributors see the most approachable work first.

Repository health score - Checks commit frequency, PR merge rates, active contributors, and what tools are configured. You can tell if a project is actively maintained before investing time.

Onboarding roadmap - Step-by-step guide from cloning the repo to submitting your first PR, customized for the specific project you're analyzing.

How we built it

The frontend is React with TypeScript and Vite. We used TailwindCSS for styling (dark theme inspired by GitHub's interface), Mermaid.js to render the architecture diagrams, and react-markdown for displaying explanations.

The backend is Python with FastAPI. When you submit a GitHub URL, it creates an analysis job and returns a job ID. The frontend polls for progress while the backend runs several tasks in parallel:

Fetches data from GitHub's API (metadata, files, issues, PRs, commits)
Analyzes the tech stack by parsing config files
Computes repository health metrics
Sends structured prompts to Gemini to generate the architecture diagram and onboarding plan

We cache GitHub requests aggressively since the API has strict rate limits (60 requests per hour without auth, 5000 with a token). When the same repo is analyzed multiple times or you ask follow-up questions in chat, we hit the cache instead of making new requests.

For large repos, we sample the file structure intelligently—prioritizing config files, entry points, and top-level organization—then let Gemini infer the rest from patterns.

Challenges we ran into

GitHub's rate limits - A single analysis makes 10-20 API calls. Without authentication, you're limited to 60 requests per hour, which means you can only analyze 3-4 repos. We added token-based auth, built an LRU cache for repeated requests, and implemented exponential backoff when we hit rate limits.

Getting reliable output from Gemini - We needed Gemini to return valid JSON with specific fields. LLMs naturally want to add commentary, wrap JSON in markdown code blocks, or make up extra fields. It took many iterations of the prompt to get consistent, structured output we could parse reliably.

Handling huge repositories - Some repos have 70,000+ files. We can't send that much data to Gemini without blowing the context window and our API budget. We built a sampling system that grabs the top-level structure, config files, and entry points, then lets Gemini extrapolate the rest.

The wait feels long - The first analysis takes 15-30 seconds. People will leave if they just see a blank screen. We built an animated loading sequence with specific progress steps like "Fetching repository data" and "Analyzing architecture." Users told us it felt like the tool was actively working for them instead of just hanging.

Making chat actually useful - Generic chatbots aren't helpful. We load the entire analysis result (summary, tech stack, file structure) into the chat context when it opens. Now when someone asks "Where's the authentication?" they get actual file paths, not generic advice about where auth is usually located.

Accomplishments that we're proud of

We built something people actually want. Every developer we showed FirstPR to said they wished they had it when they were getting started with open source. The problem we're solving is real.

The issue ranking works well. Our algorithm successfully surfaces beginner-friendly issues by looking at comment count, labels, and how recent they are. New contributors see approachable work first instead of getting lost in 200+ open issues.

The interface feels right. We didn't just slap together a UI—we made it look and feel like the tools developers already use. Dark theme, clean layout, syntax highlighting. When people see it, they trust it immediately.

Caching made everything possible. Without aggressive caching, we'd constantly hit GitHub's rate limits and chat responses would take 2+ seconds. Now follow-up questions are instant, and we can handle repeated analyses of the same repo.

We got Gemini to produce reliable structured output. Turning unreliable LLM responses into consistent JSON with architecture diagrams, tech explanations, and roadmaps took serious iteration on the prompts, but we got there.

What we learned

Design matters for developer tools. The dark theme, syntax highlighting, and familiar layout weren't just aesthetic choices—they built trust. Developers are more likely to use a tool that looks like something they already know.

The biggest barrier to open source contribution isn't skill, it's understanding the codebase. We kept hearing the same thing from everyone we talked to: "I wanted to contribute but couldn't figure out where to start." The technical skills are there. The documentation isn't.

Caching isn't optional when you're working with API rate limits. We originally thought of it as a performance optimization. It's actually fundamental—without it, the tool doesn't work at all.

Chat needs context to be useful. A generic chatbot that gives advice about "where authentication is usually stored" isn't helpful. Loading the actual analysis results into the chat context means we can point to specific files in the repo being analyzed.

What's next for FirstPR

PR readiness checker - Before you submit a PR, validate it against the project's linting rules, test conventions, and CI requirements. Know if it'll pass checks before you open it.

Team onboarding - Generate onboarding docs for private company repos, not just public open source. Help new engineers ramp up faster.

Contribution tracking - Track your open source journey—PRs submitted, issues resolved, repos contributed to—with suggestions on where to focus next.