Inspiration

Whenever I came across an open source project on github, while I massively enjoyed figuring out the logic and the codeflow by myself(& lately by prompting cursor to do it for me), I felt the need for proper visualisation of the logic. Contributing to open source projects would be a lot more intuitive if there were anchors pointing out where the codebase needs contribution in. While the agent has been built for the startup acquisition pipeline, it can be used by any and every developer on any repository in Github to figure out the pain points and other necessary due diligence of the product.

What it does

DueDiligence is a Zero-Trust Technical Due Diligence Officer. It transforms massive, opaque repositories into transparent risk-and-asset scorecards for investors and engineering leaders. With a solely read access to the startups repository, the agent builds a Deep Semantic Graph and identifies the objects that represent the crux of the company’s logic. The agent visualizes the codebase through a comprehensive and interactive Architecture Orbit. It also compiles the scorecard on important metrics like security liabilities, test coverage, operational risks and so on. Flipping every tile highlights the corresponding nodes and user can see the purpose of the node(file), and why it is categorised as a liability/asset and so on.

How we built it

Secure Ingestion: We utilized Token Vault as our secure proxy for the agent to access Github with only the necessary permissions for an audit.

Structural Mapping: We built a custom Syntax Tree parser to traverse directories, calculate code complexity, and track cross-file dependencies.

Visualization: We implemented d3-force-3d to render the Architecture Orbit. Nodes are color-coded dynamically based on risk categories (Red = Liability, Blue = Asset, Purple = Key Person Risk).

Semantic Reasoning: The highest-risk files are routed to an LLM to generate plain-English, boardroom-ready reasoning (e.g., "Flagged as a liability due to hardcoded entropy secrets").

Challenges we ran into

The major challenge was about llm hitting context limit when auditing large codebases like excalidraw. Having just a free api key, it was difficult to repeatedly test out the audit flow and the agent eventually ran out of tokens. Additionally, the audit time was excessive (~30 minutes) for a moderate to large codebase.

Accomplishments that we're proud of

When the above-mentioned challenges were hit, I pivoted to making the input to the LLM as optimized as possible. The following steps were taken for the same:

Algorithmic Pre-Filtering: Engineered a deterministic "gatekeeper" using AST parsing. By mathematically ranking file complexity and in-degree connections first, the system isolates the crux of the logic before making(& logically wasting) llm calls.

Massive Time & Token Optimization: By engineering a deterministic gatekeeper using syntax tree parsing, the agent mathematically strips out thousands of lines of boilerplate, forcing the LLM to focus on actual structural debt and proprietary IP rather than getting lost in semantic hallucinations. The agent also implements multi-threaded concurrent auditing of the repository logic using isolated Gemini Flash instances.

Graceful Degradation: Fails safely. If API quotas are hit, it bypasses the LLM but still delivers the interactive 3D structural graph for an unbroken UX.

What we learned

Token Vault was definitely the most logic addition for me. I learnt about a whole new authorization ecosystem and after some(much) trial and error, successfully figured out how to integrate it in the product pipeline. This being my first solo hackathon aswell, handled frontend/backend integration in depth and also about mapreduce(went down quite a rabbit hole on this one).

What's next for DueDiligence

The Api models used for repository analysis can be upgraded. Transitioning the agent from a "Read-Only" auditor to an active participant that can open automated Pull Requests to amortize the technical debt it discovers. With a few minor tweaks it can be adopted by the open source community to fasten development-contribution cycles. Integration into the startup acquisition pipeline is obviously the foremost usecase.

Share this project:

Updates