Demo of SCHA
SCHA Architecture
Exploded view of device
execution logs in self-consistency
Slide_1
Slide_2
Slide_3
Slide_4
Slide_5
Slide_6
Slide_7
Slide_8

Project Story: Self-Correcting Hardware Agent (SCHA)

💡 Inspiration

The inspiration for Self-Correcting Hardware Agent (SCHA) stems from the "Right to Repair" movement. Modern electronics have become "black boxes" that are difficult for average users to fix, leading to massive e-waste. While generative AI can provide generic advice, it often fails at spatial reasoning and technical precision—crucial elements when guiding a human through a complex circuit board or a fragile screen assembly.

We realized that for AI to be a reliable repair partner, it must do more than just "imagine" a solution; it must visually verify its own logic against physical constraints.

🛠️ How We Built It

SCHA is built as an autonomous, multi-turn agent using the Google Gen AI SDK.

Thinking & Planning: The agent first uses Gemini 2.5 Flash to analyze a user-uploaded photo. It generates a structured Repair Graph in JSON, identifying specific layers and technical components.
Multimodal Generation: We utilize Imagen 4.0 to transform the logical plan into a high-fidelity, isometric exploded view.
The Vision Audit Loop: This is our core innovation. As seen in our generate_with_correction method, the agent does not trust its first attempt. It performs a "Vision Audit" where Gemini 2.5 Flash inspects the generated image for duplicate labels, missing parts, or ordering errors. If an error is found, it provides specific corrective feedback (e.g., "Numbers 2 and 3 are duplicated") and triggers a re-generation.

🚀 Challenges We Faced

The primary challenge was ensuring Visual Logical Consistency. Generative models often "hallucinate" numbers or scramble sequences in technical diagrams.

We solved this by implementing a Closed-Loop Audit System. By forcing the Auditor (Gemini) to cross-reference the pixel-data of the generated image against the initial JSON schema, we created a bridge between symbolic logic and visual representation. The success condition is defined as:

Where represents the visual output and represents the logical plan. If this condition is not met, the agent persists through a defined max_retries cycle until a technically sound guide is produced.

🧠 What We Learned

We discovered that Reasoning is the bridge between pixels and reality. Multi-turn multimodal interactions—where the AI looks, thinks, acts, observes the result, and corrects—dramatically increase the reliability of AI agents in high-stakes physical domains.

🔮 Future Work: The "PaperBanana" Evolution

We aim to scale SCHA from a standalone tool into a professional-grade engineering platform by adopting key methodologies from the PaperBanana (arXiv:2601.23265, Jan 2026) framework:

1. Reference-Driven Multi-Agent Collaboration

Following PaperBanana’s architecture, we plan to decompose the repair task into specialized roles:

Retriever Agent: To identify high-quality reference examples from engineering databases, providing the generator with structural and stylistic guidance.
Specialized Stylist & Planner: To ensure that every repair guide adheres to professional engineering norms and standardized visual clarity.

2. Benchmarking with "Repair-Banana-Bench"

Inspired by the PaperBananaBench, we propose the "Repair-Banana-Bench" to objectively measure AI-generated technical documentation.

VLM-as-a-Judge: Using Gemini 3 Pro as an automated judge to score repair guides on Fidelity (technical accuracy), Conciseness (focus on essentials), Readability (layout clarity), and Aesthetics (engineering standards).
This benchmark will help transition SCHA from "plausible" images to "publication-quality" engineering blueprints.

🛠️ Built With

Core Model: Gemini 2.5 Flash (Analysis & Audit)
Image Generation: Imagen 4.0
Language: Python 3.10+
Libraries: google-genai, PIL, IPython

Built With

googleaistudio

Updates

zaki Hiro started this project — Feb 09, 2026 04:46 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.