Project Story: Self-Correcting Hardware Agent (SCHA)
💡 Inspiration
The inspiration for Self-Correcting Hardware Agent (SCHA) stems from the "Right to Repair" movement. Modern electronics have become "black boxes" that are difficult for average users to fix, leading to massive e-waste. While generative AI can provide generic advice, it often fails at spatial reasoning and technical precision—crucial elements when guiding a human through a complex circuit board or a fragile screen assembly.
We realized that for AI to be a reliable repair partner, it must do more than just "imagine" a solution; it must visually verify its own logic against physical constraints.
🛠️ How We Built It
SCHA is built as an autonomous, multi-turn agent using the Google Gen AI SDK.
- Thinking & Planning: The agent first uses Gemini 2.5 Flash to analyze a user-uploaded photo. It generates a structured
Repair Graphin JSON, identifying specific layers and technical components. - Multimodal Generation: We utilize Imagen 4.0 to transform the logical plan into a high-fidelity, isometric exploded view.
- The Vision Audit Loop: This is our core innovation. As seen in our
generate_with_correctionmethod, the agent does not trust its first attempt. It performs a "Vision Audit" where Gemini 2.5 Flash inspects the generated image for duplicate labels, missing parts, or ordering errors. If an error is found, it provides specific corrective feedback (e.g., "Numbers 2 and 3 are duplicated") and triggers a re-generation.
🚀 Challenges We Faced
The primary challenge was ensuring Visual Logical Consistency. Generative models often "hallucinate" numbers or scramble sequences in technical diagrams.
We solved this by implementing a Closed-Loop Audit System. By forcing the Auditor (Gemini) to cross-reference the pixel-data of the generated image against the initial JSON schema, we created a bridge between symbolic logic and visual representation. The success condition is defined as:
Where represents the visual output and represents the logical plan. If this condition is not met, the agent persists through a defined max_retries cycle until a technically sound guide is produced.
🧠 What We Learned
We discovered that Reasoning is the bridge between pixels and reality. Multi-turn multimodal interactions—where the AI looks, thinks, acts, observes the result, and corrects—dramatically increase the reliability of AI agents in high-stakes physical domains.
🔮 Future Work: The "PaperBanana" Evolution
We aim to scale SCHA from a standalone tool into a professional-grade engineering platform by adopting key methodologies from the PaperBanana (arXiv:2601.23265, Jan 2026) framework:
1. Reference-Driven Multi-Agent Collaboration
Following PaperBanana’s architecture, we plan to decompose the repair task into specialized roles:
- Retriever Agent: To identify high-quality reference examples from engineering databases, providing the generator with structural and stylistic guidance.
- Specialized Stylist & Planner: To ensure that every repair guide adheres to professional engineering norms and standardized visual clarity.
2. Benchmarking with "Repair-Banana-Bench"
Inspired by the PaperBananaBench, we propose the "Repair-Banana-Bench" to objectively measure AI-generated technical documentation.
- VLM-as-a-Judge: Using Gemini 3 Pro as an automated judge to score repair guides on Fidelity (technical accuracy), Conciseness (focus on essentials), Readability (layout clarity), and Aesthetics (engineering standards).
- This benchmark will help transition SCHA from "plausible" images to "publication-quality" engineering blueprints.
🛠️ Built With
- Core Model: Gemini 2.5 Flash (Analysis & Audit)
- Image Generation: Imagen 4.0
- Language: Python 3.10+
- Libraries:
google-genai,PIL,IPython
Built With
- googleaistudio
Log in or sign up for Devpost to join the conversation.