RepoTerrain — 3D Semantic Codebase Intelligence for GitLab

Paste any GitLab repo URL — RepoTerrain ingests, embeds with Google AI text-embedding-004, and builds the 3D terrain in ~15 seconds
Gemini 2.0 Flash agent with full terrain context — explains clusters, flags tech debt, answers architectural questions
Real GitLab issues created live by the agent via MCP — verifiable proof of real-world action execution
3D Heat Map — red peaks = active core, blue valleys = tech debt. Hottest files, semantic cluster, and cold zone for gitlab-org/gitlab-runner
RepoTerrain pipeline: GitLab API → text-embedding-004 → UMAP 3D projection → Gemini agent

Inspiration

Every developer has faced this: clone an unfamiliar GitLab repo and stare at a flat file tree that tells you names but nothing about relationships, activity, or importance. Onboarding to a 150-file legacy codebase takes days. Tech debt hides in plain sight. There's no spatial representation of what's hot, what's cold, and what clusters functionally with what.

What it does

Paste any public GitLab URL. In ~15 seconds RepoTerrain:

Fetches up to 150 files via GitLab REST API v4
Embeds each file into a 768-dim semantic vector using Google AI text-embedding-004
Projects all vectors into 3D space via UMAP cosine similarity — related files cluster together physically
Renders a live navigable 3D terrain using Three.js CSS3DRenderer — files as cards, colored by heat (red = active core, blue = cold legacy)
Activates a Gemini 2.0 Flash agent with full terrain context — explains clusters, flags tech debt, answers architectural questions
Executes real GitLab actions via MCP HTTP transport — creates issues, lists MRs, fetches pipeline status
Lets you navigate with bare hands via MediaPipe gesture recognition — open palm to fly, pinch to zoom, point to select, fist to rotate

How we built it

Built on the Google AI stack: Gemini 2.0 Flash for agentic reasoning and text-embedding-004 for semantic embeddings — both via Google AI API.

The agent implements multi-step agentic reasoning: intent classification → model selection → GitLab MCP action execution → structured response. Rather than using a low-code wrapper, this pipeline is implemented directly in Python for performance and full control — which is a stronger engineering signal.

Pipeline (backend/pipeline.py): GitLab REST API v4 recursive fetch → Google AI text-embedding-004 (768-dim, TF-IDF fallback) → UMAP cosine projection → heat scoring from filename patterns + size + depth

Agent (backend/agent.py): Gemini 2.0 Flash primary with full terrain context injected per query. Groq LLaMA 3.1 as transparent documented fallback when quota is exhausted. GitLab MCP HTTP transport first, REST API v4 fallback for free-tier reliability. 16-turn conversation history per session. WebSocket streaming.

Frontend: Three.js r128 CSS3DRenderer, MediaPipe Tasks Vision (4 gesture types), agent panel with WebSocket token streaming, 3D heat map modal.

Challenges we ran into

Gemini quota exhaustion — solved with transparent Groq fallback, same system prompt, documented openly
UMAP instability on small repos — fixed with n_neighbors = min(15, n-1) guard
MediaPipe CDN reliability — multi-CDN eager loading + GPU→CPU delegate fallback
Railway builder migration mid-hackathon — Nixpacks broke, fixed with explicit Python version pinning
GitLab MCP Premium requirement — implemented MCP HTTP transport with REST API v4 fallback for free-tier reliability

Accomplishments that we're proud of

Genuinely novel combination: semantic 3D terrain + gesture navigation + live Gemini agent + real GitLab actions — no prior art
Real GitLab issues created live, verifiable at: https://gitlab.com/ashish-doing/repoterrain-demo/-/issues
Tested on gitlab-org/gitlab-runner — 149 files, 19 semantic clusters, ~15s end-to-end
MediaPipe hand tracking as primary navigation — every competing submission will be a chat interface

What we learned

text-embedding-004 produces dramatically better semantic clusters than TF-IDF — Python files cluster with Python files, tests cluster near the code they test, emergent structure from pure geometry
UMAP cosine metric on code naturally separates languages and functional modules without explicit classification
Grounding the agent with actual file content (not just filenames) is the difference between hallucination and insight

What's next for RepoTerrain — 3D Semantic Codebase Intelligence for GitLab

Private repo support via user-provided GitLab tokens
Semantic diff view — terrain changes between two commits, visualizing architectural drift
Team mode — multiple cursors navigating the same terrain simultaneously
Export terrain as shareable static HTML

Built With

fastapi
gemini-2.0-flash
gitlab-mcp
gitlab-rest-api-v4
google-ai-text-embedding-004
mediapipe
railway
three.js
umap

Updates

Ashish Kumar started this project — Jun 11, 2026 03:23 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.