Inspiration

Andrej Karpathy described a simple but important problem in his viral tweet: agents can generate custom software quickly, but they still lose time when the services they depend on are not built for agents. Instead of calling a clean interface, the agent has to inspect docs, infer authentication, guess endpoints, and reverse engineer behavior just to get access to basic data.

Karpathy viral tweet on X

LUCA is built for that gap. It helps an agent turn a messy or undocumented service into a callable interface with discovered endpoints, auth behavior, and generated tooling, so the agent can spend less time and tokens reverse engineering the infrastructure and more time building the software the user actually wants.

What it does

LUCA takes a service URL and optional credentials, then begins reconstructing how that service can be used by an agent. It pulls in the available source material, uses Nova to decide what to inspect and probe next, builds up an understanding of the available endpoints and authentication requirements, and then turns that result into a Python client bundle with an MCP-style server wrapper.

LUCA is API-first. For the demo, we added a thin frontend so judges can watch discovery, auth reasoning, and generated artifacts as the system runs, but that interface is only a visibility layer over the agent-facing backend.

How we built it

We built LUCA with a clean split between reasoning and execution. Amazon Nova makes the discovery decisions: what evidence to inspect, which path to probe next, when to test an auth variant, how to interpret auth signals, and how to generate the final bundle. The code handles the mechanical parts around that reasoning, such as fetching sources, executing probes, parsing exposed specs, storing artifacts, and validating outputs safely.

System Architecture

LUCA system architecture

LUCA is built as an API-first service. A user or agent sends a target URL and optional credentials to the backend, and LUCA opens a discovery session, stores the evidence it collects, and keeps track of the generated artifacts.

The key architectural choice is that Amazon Nova sits inside the decision loop. After LUCA ingests source material from the target service, Nova decides what to inspect next, which endpoint to probe, when to test an auth variant, how to interpret the auth signals it sees, and how to generate the final client bundle. When embeddings are enabled, Nova embeddings can also help rank the most relevant source chunks during discovery.

Around that reasoning layer, the rest of the system stays deterministic. The code handles source fetching, HTTP execution, OpenAPI parsing when a spec is exposed, artifact storage, and output validation. That split lets LUCA stay adaptive on messy real-world services while still keeping the workflow controlled, safe, and reproducible.

Reverse Engineering Loop

LUCA reverse-engineering loop

LUCA starts with a target URL and whatever context is available, including optional credentials and any exposed source material. From there, it begins building an evidence set by fetching what it can see, parsing a spec if one exists, and turning the available material into something the model can reason over.

Once that evidence exists, Nova drives the loop. It decides whether LUCA should inspect a source chunk more closely, probe a specific endpoint, test an auth variant, or stop discovery because enough of the service has been reconstructed. Each action produces new evidence, and that evidence is fed back into the next decision.

That loop is what makes LUCA more than a parser. Instead of depending on a single source of truth, it can move through partial information and gradually reconstruct how a service works. When the loop ends, LUCA uses what it learned to synthesize auth behavior and generate the final client and server bundle.

In the current codebase, that flow is split across

  • backend/app/main.py for the public API,
  • backend/app/discovery.py for orchestration,
  • backend/app/ingestion.py for source fetching and initial evidence,
  • backend/app/planner.py for the Nova-driven discovery loop,
  • backend/app/auth.py for auth reasoning, and
  • backend/app/generation.py for artifact generation and validation.

Challenges we ran into

One of the hardest parts was building a system that could reason in open-ended environments without collapsing back into narrow integrations. The more useful targets are usually the least structured ones, so LUCA had to work from fragments of evidence and keep making forward progress even when the service did not expose a clean spec.

That made discovery a much harder problem than simple API parsing. Instead of reading one source of truth, LUCA has to move through partial pages, ambiguous responses, auth failures, and small clues, then decide what to inspect or probe next in a way that actually expands understanding of the service.

We also had to make the system trustworthy enough to sit between an agent and a real service. Model output could not be treated as ground truth, and generated code could not be executed blindly, so validation, constrained parsing, and safer execution boundaries became part of the architecture itself.

Accomplishments that we're proud of

What we are most proud of is that LUCA became a clear technical answer to the problem that inspired it. Instead of stopping at API parsing or code generation around exposed specs, it now works as a reverse-engineering layer that can inspect evidence, probe a service, reason about auth, and turn that result into tooling an agent can actually use.

We are also proud that Nova is not just attached to the project as a wrapper. It sits in the center of discovery, auth interpretation, and generation, while the surrounding code exists to execute those decisions safely and preserve the results. That made the architecture much closer to the actual problem we wanted to solve.

What we learned

"Agentic" is not a UI style or a prompt. It is an architectural decision about where reasoning lives, and building LUCA made that much clearer for us. The right split is not model-only versus code-only, but model for decisions and deterministic code for execution and safety.

We also learned that reverse engineering legacy services is a real bottleneck in the path toward bespoke software. A strong agent still loses time if every service has to be rediscovered from scratch, which makes infrastructure layers like LUCA much more important than they first appear.

What's next for LUCA

The next step is to make LUCA easier for agents to consume directly through stronger agent-facing interfaces and tool integrations, so the demo UI becomes even more clearly just a visibility layer and not the main way the system is used.

We also want to push LUCA further into the hard cases by adding stronger reverse-engineering tools for undocumented services, including JavaScript asset inspection, browser and network capture, session handling, and richer multi-step workflow probing. Alongside that, we want to run real Bedrock-backed evaluations on harder targets and harden the hosted workflow so LUCA can operate as a reliable reverse-engineering service in front of legacy systems.

Built With

Share this project:

Updates