DevRipe

Senior Engineer Interface: Knowledge Ingestion (GitHub, Slack, Confluence)
Junior Engineer Interface (Role and First Ticket Input)
Day One Brief for Junior Engineer
24/7 Access to Dynamic Senior Engineer Agent via Chat Interface
Contextual Answers from Dynamic Senior Engineer Agent with Sources for Trust
Personalized Onboarding Platform (Modules, Lessons, Quizzes)
Personalized Domain Knowledge + Codebase Quizzes - Wright Answer
Personalized Domain Knowledge + Codebase Quizzes - Wrong Answer

Inspiration

Onboarding into a new codebase can feel like being dropped into a foreign city with no map. Critical context is scattered across GitHub commits, Slack threads, Confluence pages, and, most frustratingly, inside the heads of senior engineers who already have too much on their plates.

Our team has lived this firsthand. Working at companies like Amazon and early-stage startups, we watched new engineers spend weeks just finding answers that a senior already knew. The senior gets interrupted. The new hire feels lost. Productivity tanks on both sides.

We kept asking: why does this knowledge have to live only in someone's head?

The problem became impossible to ignore once we put numbers to it. A senior engineer earning ~$180K/year costs roughly:

$$\frac{\$180{,}000}{2{,}080 \text{ hrs}} \approx \$87/\text{hr}$$

With new hires typically ramping over a 3-month window, and seniors losing an estimated 30% of their focus to onboarding interruptions, the hidden senior cost per hire is:

$$520 \text{ hrs} \times 0.30 \times \$87/\text{hr} \approx \$13{,}000$$

Stack that on top of the $21,000 in new hire productivity loss, and the true cost per engineer hired is:

Total Onboarding Cost = $21,000 (new hire ramp) + $13,000 (senior interruptions) = $34,000

For a typical team onboarding 5 new engineers with a single senior leading ramp-up, the math shifts. The senior absorbs interruptions from all 5 hires at once, but there is still only one senior losing productivity:

Team Onboarding Cost = 5 × $21,000 (5 junior ramps) + 1 × $13,000 (1 senior interrupted) = $105,000 + $13,000 = $118,000

We built DevRipe because tribal knowledge shouldn't die in a Slack thread or sit locked behind a busy calendar.

What it does

Beyond Q&A, DevRipe generates a custom onboarding course for every new engineer based on their role, their team's stack, and the actual codebase they'll be working in. Instead of a generic wiki dump, each hire gets a structured learning path with lessons, coding prompts, and checkpoints pulled directly from your repos and docs, so they ramp on what actually matters, not what someone remembered to write down.

This is how DevRipe attacks the $34,000 per hire problem directly. The faster a new engineer reaches their first meaningful contribution, the less time your senior spends getting interrupted, and the sooner the hire starts generating value instead of draining it.

The goal: make tribal knowledge repeatable, searchable, and always available, so your best engineers can stay heads-down.

DevRipe is also an AI-powered onboarding agent that ingests your GitHub repositories, Slack workspace, and internal documentation, then gives new engineers a single place to ask questions and get instant, cited, source-grounded answers.

Instead of pinging a senior to ask "why do we use MQTT instead of REST?", a new engineer asks DevRipe and gets an answer traced directly back to the ADR, the Slack thread where the decision was debated, and the Confluence doc that benchmarked the alternatives.

Key capabilities:

Multi-source ingestion, connects to GitHub, Slack, Confluence, Jira, and more
RAG-powered Q&A, answers grounded in your actual codebase and docs, never hallucinated
Cited responses, every answer links back to the exact source so engineers can verify and dig deeper
Stale content flagging, automatically detects when a source may be outdated and surfaces that warning inline
Role-aware context, answers tailored to the engineer's role and current project
Access controls, respects existing permissions so engineers only see what they're supposed to
Custom onboarding courses, auto-generates a role-specific learning path with lessons, coding prompts, and checkpoints built from your actual codebase, not generic templates

How we built it

DevRipe is built as a multi-agent RAG (Retrieval-Augmented Generation) system that transforms scattered tribal knowledge into a unified, queryable knowledge base.

1. Multi-Source Ingestion Pipeline

GitHub Parser Agent, clones repositories, extracts commit history, PR discussions, code comments, and README files. Uses tree-sitter for syntax-aware code chunking.
Slack Connector, ingests message exports, preserves threading context, and identifies decision-making conversations through keyword detection (ADR, RFC, decision, etc.).
Documentation Processor, handles Markdown, Confluence exports, and PDFs. Extracts structured content while maintaining document hierarchy.

2. Knowledge Base Construction

Vector Database, Pinecone for semantic search across all ingested content. Each chunk is embedded using OpenAI's text-embedding-3-large model.
Metadata Layer, PostgreSQL stores source metadata, timestamps, author information, and cross-references between related content.
Chunking Strategy, adaptive chunking based on content type — code uses function/class boundaries, docs use section headers, Slack uses message threads.

3. RAG Query Engine

Retrieval, hybrid search combining semantic similarity (vector) and keyword matching (BM25) to surface relevant context.
Reranking, cross-encoder model reranks top-k results to prioritize the most relevant sources.
Generation, Claude 3.5 Sonnet generates answers with strict citation requirements — every claim must link back to a source chunk.

4. Staleness Detection

Timestamps on all sources are compared against recent activity (commits, messages, doc updates). Heuristic rules flag content older than 18 months or contradicted by newer sources, and inline warnings surface automatically in responses.

5. Custom Learning Path Generator

Role Analysis Agent, analyzes job role and assigned task to identify required knowledge areas.
Curriculum Builder Agent, extracts relevant code patterns, architectural decisions, and best practices from the knowledge base.
Quiz Generator Agent, creates comprehension checks based on actual codebase scenarios.
Output, structured JSON course with modules, lessons, and quizzes rendered as an interactive learning interface.

Tech Stack

Layer	Tools
Frontend	React, TypeScript, TailwindCSS, Framer Motion
Backend	Node.js, Express, Server-Sent Events (SSE)
LLM	Anthropic Claude API (reasoning), OpenAI API (embeddings)
Data	Pinecone (vectors), PostgreSQL (metadata)
Infra	Vite, GitHub API, Slack API

Key Design Decisions

Why multi-agent architecture? Each source type (code, conversations, docs) has unique structure and semantics. Specialized agents handle domain-specific parsing, then feed into a unified knowledge graph.

Why Claude over GPT-4? Claude's 200K context window allows more retrieved chunks per prompt, improving answer quality. Its instruction-following is also more reliable for citation formatting.

Why hybrid search? Pure semantic search misses exact keyword matches like function names or error codes. Pure keyword search misses conceptual queries. Combining both gives the best results.

Challenges we ran into

Multi-agent coordination, early versions had agents producing inconsistent output formats. We fixed this by implementing strict JSON schema validation with Zod, with retry logic and schema examples baked into each prompt.
Citation accuracy, the LLM occasionally hallucinated citations referencing sources that didn't contain the claimed information. We added a post-generation verification step that checks each citation against retrieved chunks, regenerating or flagging unverified ones.
GitHub parsing at scale, large monorepos (10K+ files) caused memory issues and slow ingestion. We solved this with streaming file processing and selective indexing, prioritizing recently modified files and content relevant to the user's role.
Slack thread context, short messages like "Agreed, let's go with MQTT" are meaningless without their thread. We implemented thread-aware chunking that includes parent messages and replies so every chunk carries full conversation context.
Handling contradictory information, different sources sometimes conflict (e.g., old docs say "use REST", a recent Slack thread says "migrated to MQTT"). Timestamp-aware ranking prioritizes newer sources, and when contradictions are detected, the answer explicitly flags both perspectives.

Accomplishments that we're proud of

Built an end-to-end pipeline ingesting heterogeneous sources (code, conversations, docs) and surfacing unified, cited answers in seconds
Designed a staleness detection system that proactively warns engineers when information may be out of date, without requiring manual doc maintenance
Created a cross-source knowledge cortex linking concepts across GitHub, Slack, and docs so answers reflect the full context of a decision
Validated the core pain point with real engineering teams, the $34,000 cost-per-hire problem is universal from 10-person startups to enterprise orgs

What we learned

The real product is trust. Engineers won't use an AI onboarding tool if they can't verify answers. Citations aren't a nice-to-have, they're the core feature.
Tribal knowledge is a systems problem, not a documentation problem. Teams don't fail at onboarding because they're lazy about writing docs. Knowledge is inherently distributed and dynamic, the solution has to meet it where it lives.
Chunking and retrieval quality compound. Small improvements in how we chunk source content had outsized downstream effects on answer quality. This deserved more early investment.
Senior engineers are the real customer. New hires feel the pain, but it's seniors whose time gets drained. Framing DevRipe as a tool that frees seniors, not just helps juniors, changed how we pitched and prioritized.

What's next for DevRipe

Deeper repo intelligence, Architecture-aware indexing that understands service boundaries, data flows, and dependency graphs, not just file contents
Proactive onboarding flows, DevRipe surfaces relevant context based on what files a new engineer is actively working in, rather than waiting to be asked
Role-based learning paths, Curated onboarding journeys tailored by role (backend, frontend, platform) and team
Knowledge gap analytics, Dashboards for engineering leads showing top-asked questions, thinnest knowledge areas, and stalest docs
Enterprise integrations, Deeper support for Jira, Linear, Notion, and internal wikis, plus SSO and SOC 2 compliance
Path to scale, DevRipe's long-term vision is a per-seat SaaS model targeting the $34,000-per-hire problem across the 4M+ software engineers hired annually in the US alone, a multi-billion dollar market we're just beginning to unlock