Inspiration
I think we've all been there: You're working on a coding project and hit a wall. So you search StackOverflow for an answer. You find one with a good title/opening question. But upon viewing the question, there is no answer, or at least no satisfactory or highly upvoted answer. And where does that leave you? I created this project to help solve that conundrum.
What it does
Coding Cold Cases Cracker treats those posts as cold support incidents rather than trivia questions. A user picks a cold case from a curated backlog, creates an isolated workspace, and starts the casework pipeline. Kiro acts as the investigator and repair engineer: it reconstructs the smallest responsible failing project, studies the evidence, and proposes the fix. Lark acts as the forensic lab: it runs the reproduction workflow before the fix, captures pass/fail evidence and logs, then runs verification after the fix. A case is only closed when Lark verification passes.
How we built it
The project is a working Dockerized prototype, not only a slide deck. It includes the web shell, browser terminal, case index parsing, isolated run workspaces, Kiro prompts, Lark workflow provisioning/execution paths, GitHub publishing hooks, gallery/report surfaces
Built with:
- Lark CLI and Lark workflow groups
- Kiro CLI with phase-specific agent prompts
- Docker Compose
- Node.js
- ttyd browser terminal
- Java 21
- Maven and Gradle-ready case runners
- GitHub workspace publishing
- Markdown evidence reports and case files
Challenges we ran into
There are many reasons that a StackOverflow question remains unsolved. Among them is the difficulty to reproduce project setups that rely on very specific dependencies, drivers, or even devices. Our system tries hard to bring together the required execution environment, but it still struggles in some hard cases.
Accomplishments that we're proud of
The system has demonstrated successful resolution in many Java cold cases. Kiro used by itself to fix bugs tends to get stuck -- often being fixated on the wrong "fix", or not digging deep enough, or relying on workarounds than long-term fixes, etc. Lark puts the bug fixing agent back on track by vetting the reproduction steps, and validating the proposed fix.
What we learned
The most compelling agentic developer tool demos are not the ones where an AI says it solved something; they are the ones where the system produces replayable evidence. Lark is powerful in that role because it can turn a plain-English testing intent into workflow execution, logs, artifacts, and a verdict that a separate coding agent must answer to.
What's next for Coding Cold Cases Cracker
Unanswered developer-support incidents are costly because they lack reproducible evidence. This project turns unresolved public bug reports into replayable labs with an independent testing verdict, making answers more trustworthy than an AI-generated explanation alone.
The same pattern can become a support automation product for developer tools teams: ingest GitHub issues, Stack Overflow posts, Discord/Slack reports, or Linear tickets; reconstruct the failure; run Lark evidence workflows; propose a fix; and publish a verified case file or PR.
Log in or sign up for Devpost to join the conversation.