Inspiration

Customer support teams often receive vague complaints like “checkout is broken,” “my discount disappeared,” or “I got charged twice.” Most support systems treat these as one-time customer service issues: apologize, issue a refund, offer a coupon, or create a vague ticket.

But those complaints can be early warning signs of bigger product bugs. The real question is not just how to respond to the customer, but where the problem is coming from and how to stop it from continuously affecting more users.

That inspired us to build BugBack AI, a Nemotron-powered investigation agent that turns vague customer complaints into verified engineering evidence.

What it does

BugBack AI takes a messy customer complaint and turns it into a full bug investigation.

For example, if a customer says:

“I tried to use my discount code SAVE20, but it disappeared at checkout.”

BugBack:

  1. Understands the complaint.
  2. Reasons that it is likely a checkout discount issue.
  3. Opens a demo ecommerce store using browser automation.
  4. Adds an item to the cart.
  5. Applies the SAVE20 discount code.
  6. Goes to checkout.
  7. Verifies that the discount disappears.
  8. Checks logs and feature flags.
  9. Identifies the likely root cause.
  10. Checks persistent memory for repeated incidents.
  11. Applies safety guardrails before recommending risky actions.
  12. Generates an engineering ticket and a customer-ready response.

Instead of engineering getting a vague ticket like:

“Customer says checkout is broken.”

BugBack gives them:

“Bug verified. Here are the reproduction steps. Here is the matching evidence. Here is the likely cause. Here is the recommended fix path.”

How we built it

We built BugBack AI as a local agentic workflow with a dashboard and a demo e-commerce site.

The project uses:

  • Flask for the fake ecommerce store
  • HTML for the BugBack dashboard
  • Playwright for real browser automation
  • NVIDIA Nemotron as the reasoning layer
  • Local JSON memory for persistent incident history
  • Mock logs and feature flags to simulate engineering systems
  • Policy guardrails to require human approval for risky actions

The workflow follows a ReAct pattern:

Reason → Act → Observe

BugBack reasons from the complaint, acts by calling tools, observes the result, gathers evidence, checks memory, applies policy guardrails, and generates final outputs.

Challenges we ran into

One challenge was making BugBack feel like a real agent rather than just another chatbot. We wanted it to actually take action, so we added browser automation with Playwright to reproduce bugs in a real website flow.

Another challenge was making the agent’s reasoning visible. Judges and users should be able to see why BugBack made each decision, so we added sections for tool calls, root cause identification, model reasoning, memory impact, and safety policies.

We also had to balance realism with time. Since we did not have access to real production systems, we built mock logs, feature flags, and policies that can later be swapped with real tools like Datadog, LaunchDarkly, Jira, Zendesk, or Stripe.

Accomplishments that we're proud of

We are proud that BugBack completes the full support-to-engineering workflow:

Customer complaint → bug reproduction → evidence gathering → root cause reasoning → memory check → safety policy → engineering ticket → customer response

We are especially proud that the browser reproduction is real. BugBack actually opens the demo store, applies the discount code, goes through checkout, and verifies that the bug occurs.

We are also proud of the Nemotron reasoning section, where the model explains why the evidence points to a specific root cause instead of simply saying “bug found.”

What we learned

We learned that agentic AI is most powerful when it is connected to tools. A model alone can summarize, but an agent can reason, act, observe, and complete a workflow.

We also learned the importance of persistent memory. If a similar complaint appears multiple times, BugBack can treat it as a recurring issue and raise the severity instead of handling it like an isolated ticket.

Finally, we learned that safety matters. BugBack can recommend actions like rolling back a feature flag, but business-critical actions should require human approval.

What's next for BugBack AI

Next, we would connect BugBack to real company tools:

  • Zendesk or Intercom for customer tickets
  • Datadog or Splunk for logs
  • LaunchDarkly for feature flags
  • Jira or GitHub Issues for engineering tickets
  • Stripe for payment investigations
  • Slack for engineering alerts

We would also expand BugBack to handle more types of issues, including login failures, subscription problems, shipping errors, account bugs, and API outages.

Long term, BugBack could become an early-warning system for product failures. Instead of treating every complaint as a one-off support ticket, companies could use BugBack to detect repeated issues, verify bugs faster, and turn customer pain into engineering action.

Built With

  • flask
  • html/css
  • javascript
  • json
  • local-persistent-memory
  • mock-feature-flags
  • mock-logs
  • nvidia-nemotron
  • openrouter/nvidia-api
  • playwright
  • python
  • streamlit
Share this project:

Updates