-
-
Paste any GitLab repo URL — RepoTerrain ingests, embeds with Google AI text-embedding-004, and builds the 3D terrain in ~15 seconds
-
Gemini 2.0 Flash agent with full terrain context — explains clusters, flags tech debt, answers architectural questions
-
Real GitLab issues created live by the agent via MCP — verifiable proof of real-world action execution
-
3D Heat Map — red peaks = active core, blue valleys = tech debt. Hottest files, semantic cluster, and cold zone for gitlab-org/gitlab-runner
-
RepoTerrain pipeline: GitLab API → text-embedding-004 → UMAP 3D projection → Gemini agent
Inspiration
Every developer has faced this: clone an unfamiliar GitLab repo and stare at a flat file tree that tells you names but nothing about relationships, activity, or importance. Onboarding to a 150-file legacy codebase takes days. Tech debt hides in plain sight. There's no spatial representation of what's hot, what's cold, and what clusters functionally with what.
What it does
Paste any public GitLab URL. In ~15 seconds RepoTerrain:
- Fetches up to 150 files via GitLab REST API v4
- Embeds each file into a 768-dim semantic vector using Google AI text-embedding-004
- Projects all vectors into 3D space via UMAP cosine similarity — related files cluster together physically
- Renders a live navigable 3D terrain using Three.js CSS3DRenderer — files as cards, colored by heat (red = active core, blue = cold legacy)
- Activates a Gemini 2.0 Flash agent with full terrain context — explains clusters, flags tech debt, answers architectural questions
- Executes real GitLab actions via MCP HTTP transport — creates issues, lists MRs, fetches pipeline status
- Lets you navigate with bare hands via MediaPipe gesture recognition — open palm to fly, pinch to zoom, point to select, fist to rotate
How we built it
Built on the Google AI stack: Gemini 2.0 Flash for agentic reasoning and text-embedding-004 for semantic embeddings — both via Google AI API.
The agent implements multi-step agentic reasoning: intent classification → model selection → GitLab MCP action execution → structured response. Rather than using a low-code wrapper, this pipeline is implemented directly in Python for performance and full control — which is a stronger engineering signal.
Pipeline (backend/pipeline.py): GitLab REST API v4 recursive fetch → Google AI text-embedding-004 (768-dim, TF-IDF fallback) → UMAP cosine projection → heat scoring from filename patterns + size + depth
Agent (backend/agent.py): Gemini 2.0 Flash primary with full terrain context injected per query. Groq LLaMA 3.1 as transparent documented fallback when quota is exhausted. GitLab MCP HTTP transport first, REST API v4 fallback for free-tier reliability. 16-turn conversation history per session. WebSocket streaming.
Frontend: Three.js r128 CSS3DRenderer, MediaPipe Tasks Vision (4 gesture types), agent panel with WebSocket token streaming, 3D heat map modal.
Challenges we ran into
- Gemini quota exhaustion — solved with transparent Groq fallback, same system prompt, documented openly
- UMAP instability on small repos — fixed with n_neighbors = min(15, n-1) guard
- MediaPipe CDN reliability — multi-CDN eager loading + GPU→CPU delegate fallback
- Railway builder migration mid-hackathon — Nixpacks broke, fixed with explicit Python version pinning
- GitLab MCP Premium requirement — implemented MCP HTTP transport with REST API v4 fallback for free-tier reliability
Accomplishments that we're proud of
- Genuinely novel combination: semantic 3D terrain + gesture navigation + live Gemini agent + real GitLab actions — no prior art
- Real GitLab issues created live, verifiable at: https://gitlab.com/ashish-doing/repoterrain-demo/-/issues
- Tested on gitlab-org/gitlab-runner — 149 files, 19 semantic clusters, ~15s end-to-end
- MediaPipe hand tracking as primary navigation — every competing submission will be a chat interface
What we learned
- text-embedding-004 produces dramatically better semantic clusters than TF-IDF — Python files cluster with Python files, tests cluster near the code they test, emergent structure from pure geometry
- UMAP cosine metric on code naturally separates languages and functional modules without explicit classification
- Grounding the agent with actual file content (not just filenames) is the difference between hallucination and insight
What's next for RepoTerrain — 3D Semantic Codebase Intelligence for GitLab
- Private repo support via user-provided GitLab tokens
- Semantic diff view — terrain changes between two commits, visualizing architectural drift
- Team mode — multiple cursors navigating the same terrain simultaneously
- Export terrain as shareable static HTML
Log in or sign up for Devpost to join the conversation.