Docflow

Inspiration

Inspired by Mintlify, especially after Hahnbee and Ryan’s conversation about how great docs accelerate adoption, we thought about what documentation would look like if it were written for developers onboarding a repo, not just end-users. Mintlify nails user-facing clarity, but devs often face something different: sprawling, generic pages where 70% isn’t relevant to their role. That disjointed experience slows ramp-up and blurs ownership. Our answer is role-based, repo-native documentation: for any GitHub repo, we analyze the codebase and generate living guides tailored to the selected role (Frontend, Backend, or PM), highlighting only the modules, dependencies, APIs, and run/debug steps that person needs. The result is focused, current docs that cut noise, reduce cognitive load, and let engineers start shipping faster.

What it does

DocFlow Lite turns any GitHub repo into clear, role-specific documentation your team can actually use. Pick a role (Frontend, Backend, or Infra) and it delivers a focused guide that shows how to run the project, where the important pieces live, which APIs and modules matter, common pitfalls, and your best first tasks, all while filtering out everything irrelevant to that role. As the code changes, the docs stay in sync and call out what’s new, so they remain trustworthy. The result: faster onboarding, fewer “where do I start?” questions, and more confident, safer contributions from day one.

How we built it

Under the hood, DocFlow Lite is a Node.js/Express service that uses Octokit to pull live repo content, runs a hierarchical analyzer to spot frameworks/routes/components/services, and feeds a structured, role-tuned context to Gemini 1.5 Flash to generate sectioned docs (how to run, key modules, APIs, pitfalls, first tasks). It’s stateless (no server-side database), so each request pulls fresh code and streams results directly to the Next.js/Tailwind UI for progressive rendering: GitHub Repository → API Analysis → Gemini AI → JSON Response → Frontend Display. We chose this flow for stateless operations (each analysis stands alone), real-time processing (no intermediate storage), and simplicity that keeps the MVP fast and reliable without extra moving parts.

Challenges we ran into

Our two biggest hurdles were ranking and models. We wanted to use a PageRank-style centrality score over a function-call graph to weight what a developer should read first, but building a reliable graph and getting the math/damping tuned within the time box wasn’t feasible in the time given. So, we shipped a simpler heuristic (role signals + basic call counts) and earmarked full PageRank for v2. We also started with the OpenAI API but kept hitting quota/rate limits, which stalled runs mid-analysis; we switched to Gemini 1.5 Flash for its speed, lower friction, and large context window. On top of that, large repos and GitHub rate limits pushed us to cap file sizes and stream results so the stateless pipeline stayed responsive.

Accomplishments that we're proud of

We transformed the output from generic summaries into deep, role-tuned guides by (1) building a simple hierarchy that forces the model to read actual code (functions, routes, configs) rather than just repo titles, and (2) crafting a tighter prompt that maps those findings into clear sections (How to Run, Key Modules, APIs, Pitfalls, First Tasks) with anchors tied to real file paths. We also made the docs role-specific and introduced weighting so files most relevant to the selected role (Frontend, Backend, Infra) rise to the top, using role signals plus light call-frequency heuristics. Net result: sharper, more specific documentation that surfaces the right modules first and cuts the noise for every developer.

What we learned

We learned that great docs aren’t just written, but rather prioritized, scoped, and delivered in context. We learned that role-first framing (FE/BE/Infra) and a simple hierarchy over real code beats generic summaries every time, because “how to run,” key modules, and pitfalls must be anchored to actual file paths to be trusted. We also learned that shipping a stateless, real-time pipeline keeps the system fast and reliable, but forces discipline around token limits, file caps, and smart selection. On the AI side, prompt specificity > model mystique: tight, sectioned prompts and deterministic templates mattered more than squeezing in one more file, and when OpenAI quotas blocked us, switching to Gemini 1.5 Flash reminded us to design for portability. Finally, trying (and punting) on full PageRank taught us the value of progressive complexity (start with lightweight signals, measure the lift, then earn your way to fancier graph math in v2).

What's next for Docflow

In the future, we plan to ship true PageRank on the function/class/module graph and blending it with role signals, measuring lift with simple ablations, so the “read this first” list becomes even smarter. We’ll add a fast, local search (keyword + lightweight semantic) with a command-palette to find files, anchors, and Doc Diffs instantly. To make the ranking tangible, we’re building CodeGraph, an interactive visualization where node size reflects PageRank and edges show calls/imports; click a node to jump straight to the relevant doc section. Finally, we’ll expand beyond FE/BE/Infra to non-technical roles—Product, Design, QA, Support, Sales/CS—generating role-aware guides that surface workflows, dashboards, terminology, and handoffs so everyone on the team gets actionable, mission-aligned docs.