Transforming semiconductor quality control using Amazon Nova’s multimodal vision and agentic AI. What inspired us? The semiconductor industry is the backbone of the modern technological world. In high-volume fabs, a single unnoticed wafer defect can cascade into hundreds of thousands of dollars in lost yield. While traditional automated optical inspection (AOI) systems exist, they are purely reactive—they flag a defect and stop there.We realized that floor engineers don't just need a bounding box; they need an intelligent assistant. We were inspired to build NovaFab to bridge the gap between passive computer vision and proactive, Generative AI-driven manufacturing to see if Amazon's multimodal models could not only see an edge crack but also explain the physics behind it. How we built it? We engineered a multi-stage pipeline combining classic computer vision with cutting-edge foundation models: Preprocessing: Raw wafer images undergo Gaussian noise filtering and contrast normalization using OpenCV to isolate the wafer geometry from the background. ''' import cv2

Apply Gaussian blur to remove high-frequency noise

processed_wafer = cv2.GaussianBlur(raw_wafer_image, (5, 5), 0) ''' Multimodal Vision: Preprocessed images are fed into Amazon Nova 2 Omni, which acts as our primary visual inspector to identify complex defect signatures.Visual Explainability: To ground the generative AI in spatial reality, we extract feature maps to create a localization heatmap. The neuron importance weights $\alpha_k^c$ for a feature map $A^k$ and class $c$ are computed via global average pooling of gradients: $$\alpha_k^c=\frac{1}{Z}\sum_{i}\sum_{j}\frac{\partial Y^c}{\partial A_{ij}^k}$$ Where $Z$ is the number of pixels in the feature map, $Y^c$ is the pre-softmax score for class $c$, and $A_{ij}^k$ represents the pixel values. The final heatmap is generated by applying a ReLU activation to the linear combination of the forward activation maps: $$L^c_{Grad-CAM}=ReLU\left(\sum_k\alpha_k^cA^k\right)$$ Generative Diagnostics: The raw image and mathematically derived $L^c_{Grad-CAM}$ heatmap coordinates are passed to Amazon Nova Pro to generate a human-readable engineering report. Agentic Action: Amazon Nova Act parses the diagnostic report and searches simulated machine calibration logs to automatically format a maintenance ticket. Challenges we faced! Grounding Multimodal Hallucinations: Early iterations saw the GenAI model hallucinating technical root causes that didn't match the actual visual defect. We overcame this by enforcing strict prompt engineering, forcing Nova Pro to constrain its reasoning exclusively to the regions highlighted by the Grad-CAM equations. Orchestrating Agentic Workflows: Connecting Nova Act to simulated factory logs required highly structured tool-calling and strict JSON schema adherence. What we learned? We learned that the true power of Generative AI in industrial settings lies in orchestration. Moving from a single model predicting a class label to an ecosystem of specialized models completely changes what an application can achieve. We also deepened our understanding of integrating mathematical explainability with LLM prompt design. For more details on the models we utilized, you can read the Amazon Nova Documentation.

Built With

amazon
mock
node.js
opencv
python
pytorch
react.js
tailwind

Updates

BALA MUGESH M. K started this project — Mar 16, 2026 04:03 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.