Inspiration

We've all been there: pushing a "harmless" one-line CSS change that accidentally misaligns the entire signup page. Traditional CI passes because the tests technically run, and human reviewers gloss over .css diffs because they can't visualize the outcome. "LGTM" isn't enough for User Interface.

We asked ourselves: What if you had a QA engineer who reviewed every single PR in seconds, actually clicked through the app, and sent you a video of what changed? ✦That's Aura. ✦

What it does

Aura is an autonomous GitHub Agent that lives in your repository. When a developer opens a Pull Request:

  1. It Understands: Aura reads the code diff and uses Google Gemini 1.5 Flash to understand the intent of the change (e.g., "The user moved the login button to the top right").
  2. It Plans: It dynamically generates a Playwright automation script tailored to verify that specific user flow.
  3. It Verifies: It spins up a sandbox, executes the plan against the live code, and records the session.
  4. It Reports: Aura comments on the PR with a single, powerful one-line summary of the visual change and attaches a video walkthrough of the interaction.

No more "pulling the branch to check locally." Just effective, visual confirmation right in your PR.

How we built it

Aura is built on a modern, agentic stack:

  • Backend: Node.js & TypeScript for the core agent logic.
  • Brain: Google Gemini 1.5 Flash. We chose this for its massive 1M token context window, allowing us to feed it the entire relevant file tree (via GitHub API) so it understands the full application structure without hallucinating selectors. Its sub-second latency is crucial for real-time feedback.
  • Eyes & Hands: Playwright for high-fidelity browser automation and video capture.
  • Integration: GitHub Apps & Webhooks (Octokit) for a seamless, zero-config developer experience.

Challenges we ran into

  • The "Remote Context" Problem: We didn't want to clone huge repositories for every small check. We had to engineer a smart context fetcher that recursively maps the remote file tree via the GitHub API and surgically retrieves only the components related to the PR diff.
  • Selector Hallucination: LLMs often guess id="login-btn" when it doesn't exist. We solved this by injecting the actual source code of the UI components into Gemini's context, allowing it to "see" the real data-testid and attributes before writing the test.
  • Video Delivery: Automating the pipeline from "Headless Browser" -> "Video File" -> "Public Link in PR Comment" required orchestrating ephemeral storage and secure link embedding.

Accomplishments that we're proud of

  • True Autonomy: We moved beyond "chatbots" to an "agent" that executes code. Aura writes TypeScript, runs it, and reports back.
  • Zero-Config: Most testing tools require heavy setup (YAML files, test suites). Aura works immediately upon installation by inferring behavior from code.
  • The "One-Link" Experience: Refining the output to be just a helpful summary and a video, effectively replacing the need for a manual QA pass for minor visual tweaks.

What we learned

  • Context is Key: The difference between a flaky agent and a solid one is how much truth (source code) you can fit in the prompt.
  • Agents need Tools: Giving the LLM direct control over a browser (Playwright) unlocked capabilities that text-generation alone could never achieve.

What's next for AURA

  • Visual Regression: Using Gemini Pro Vision to compare the "Before" vs "After" video frames and automatically flag unexpected visual shifts.
  • Self-Healing Tests: If Aura detects a UI change that was intentional, it can automatically update the repo's regression test suite.
  • Cloud Hosting: Scaling the agent infrastructure to handle concurrent PRs across thousands of repositories.

Built With

Share this project:

Updates