Gradio App
Gradio App

FabMi: Exploring LLM Fine-Tuning for Semiconductor Defect Analysis

Inspiration

A Gap Between Detection and Understanding

In semiconductor manufacturing, AI-powered defect detection has matured significantly—vision systems can classify defect patterns with high accuracy. However, the reasoning step that follows remains largely manual: understanding why a defect occurred and how to fix it.

This root cause analysis (RCA) typically requires experienced process engineers who can:

Correlate subtle patterns: "Wafers 3, 5, 7, 9 affected → likely FOUP slot contamination"
Apply domain knowledge: "RF-hours approaching 3,000 → check O-ring condition"
Recommend specific actions: "Inspect part P/N 839-0234, verify leak rate"

With the semiconductor industry growing toward $1 trillion and experienced engineers becoming scarce, I was curious: Could a small, fine-tuned LLM learn to assist with this reasoning task?

This project was my exploration of that question.

What I Built

FabMi is a fine-tuned ERNIE-4.5-0.3B model that generates structured root cause analysis reports for semiconductor wafer defects. Given a defect observation and process context, it produces outputs like:

## Root Cause Analysis
**Primary Cause**: O-ring seal degradation in etch chamber
- RF-hours (2,847) approaching 3,000-hour limit
- Edge-ring pattern suggests gas leakage at chamber seal

## Corrective Actions
1. **Immediate**: Tool down ETCH-LAM-07
2. **Inspect**: Main chamber O-ring for degradation
3. **Replace**: O-ring P/N 839-0234 (Viton, ID 350mm)
4. **Verify**: Leak rate < 1 mTorr/min after replacement

**Severity**: Critical | **Yield Impact**: 15%

Results

I compared the fine-tuned 0.3B model against a zero-shot 21B baseline on a held-out test set:

Metric	Baseline (21B zero-shot)	FabMi (0.3B fine-tuned)
ROUGE-L	0.063	0.446
ROUGE-1	0.101	0.561
ROUGE-2	0.028	0.282
Severity Accuracy	20.0%	60.0%
Structure Score	35.5%	88.0%

The results suggest that domain-specific fine-tuning can help smaller models perform reasonably well on specialized tasks, even compared to larger models used zero-shot.

How I Built It

1. Synthetic Data Generation

Real semiconductor RCA data is proprietary and hard to obtain. I created synthetic training data using multiple LLMs (Claude, Gemini, GPT, Grok) to ensure diversity:

$$\mathcal{D} = \bigcup_{i \in {\text{sources}}} \mathcal{D}_i$$

I generated 1,511 samples across 9 defect types:

Defect Pattern	Samples	Example Root Cause
Edge-Ring	242	Seal degradation
Donut	186	Edge ring wear
Center	180	Showerhead issues
Scratch	174	Mechanical damage
Edge-LOC	162	Chuck contamination
Random	158	Particle sources
LOC	152	Localized contamination
Near-Full	147	Process failures
None	110	Normal baseline

Each sample captures domain-specific reasoning patterns, such as:

Input: "Wafers 3,5,7,9 affected, slots 1,2,4,6,8 clean"
Reasoning: Odd-slot pattern → FOUP-side contamination
           (not robot arm, which would show sequential pattern)

2. Fine-Tuning with LoRA

I used LLaMA-Factory to fine-tune ERNIE-4.5-0.3B-PT with Low-Rank Adaptation (LoRA):

$$W' = W_0 + BA$$

Where $B \in \mathbb{R}^{d \times r}$, $A \in \mathbb{R}^{r \times k}$, with rank $r = 16$.

Training setup:

Base model: baidu/ERNIE-4.5-0.3B-PT
LoRA config: rank=16, alpha=32
Learning rate: $5 \times 10^{-5}$ with cosine decay
Training: 10 epochs
Data split: 90/10 train/test (1,356 / 155 samples)

The resulting LoRA adapter is only 24MB, making it easy to share and deploy.

3. Evaluation Approach

I evaluated using multiple metrics to capture different aspects of output quality:

ROUGE scores: Measure textual overlap with reference outputs
Severity accuracy: Whether the model correctly classifies Critical/Major/Minor
Structure score: Presence of expected sections (Root Cause, Corrective Actions, etc.)

$$S_{\text{structure}} = \frac{1}{n}\sum_{i=1}^{n} \mathbb{1}[\text{section}_i \in \text{output}]$$

Challenges Faced

Challenge 1: Training Data Scarcity

Problem: Semiconductor RCA data is proprietary and not publicly available.

Solution: I used multiple LLMs to generate diverse synthetic training samples, encoding domain-specific reasoning patterns based on publicly available semiconductor process knowledge.

Challenge 2: Output Format Consistency

Problem: LLMs often produce inconsistent formats, making downstream integration difficult.

Solution: I used consistent Alpaca-style formatting across all training samples:

{
  "instruction": "Analyze semiconductor defect...",
  "input": "Defect observation + process context",
  "output": "Structured markdown RCA report"
}

This achieved 88% structure score on the test set.

Challenge 3: Evaluation Design

Problem: No standard benchmark exists for semiconductor RCA evaluation.

Solution: I designed a multi-metric evaluation framework:

ROUGE scores for textual quality
Severity classification accuracy
Structure score for format compliance

Challenge 4: Efficient Model Selection

Problem: Balancing model capability with deployment efficiency.

Solution: LoRA fine-tuning on ERNIE-4.5-0.3B achieved strong results while keeping the adapter size at just 24MB—practical for real-world deployment.

What I Learned

Domain-specific fine-tuning is effective: A small model with targeted training data can perform well on specialized tasks, even compared to larger models used zero-shot.
Multi-source synthetic data works: Using multiple LLMs to generate diverse training data helped create a robust dataset covering various reasoning patterns.
Structure can be learned: Consistent formatting in training data translated to reliable structured outputs (88% structure score).
LoRA enables practical deployment: The 24MB adapter size makes it feasible to share and deploy without heavy infrastructure.

Future Directions

Retrieval-augmented generation for equipment-specific manuals
Multi-modal input (wafer map images + sensor data)
Expansion to additional process modules (CVD, PVD, implant, CMP)
Integration with fab MES/FDC systems

Acknowledgments

Baidu ERNIE for the base model
LLaMA-Factory for the fine-tuning framework

Built with ERNIE-4.5-0.3B and LLaMA-Factory for the Baidu ERNIE AI Developer Challenge.

Built With

ernie-4.5-0.3b
gradio
hugging-face
llama-factory
lora
python
pytorch
transformers

Updates

Faizan K started this project — Dec 23, 2025 08:49 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.