Inspiration
Fine-tuning is powerful but opaque. Developers often cannot observe what actually changes inside a model after training. Embeddings shift, probabilities redistribute, and confidence increases — yet these changes remain hidden.
I wanted to build a tool that makes fine-tuning transparent.
Instead of guessing what changed, developers should be able to measure it.
What it does
VectorScan v2 is a developer-centric diagnostic engine that detects internal representation drift after fine-tuning.
It measures:
- Embedding movement (cosine drift)
- Neighborhood structure changes
- Probability redistribution (KL divergence)
- Logit shifts
- Entropy compression
It supports both encoder models (e.g., DistilBERT) and decoder LLMs (e.g., GPT2), automatically detecting architecture and applying the appropriate analysis pipeline.
How I built it
VectorScan is built using:
- Python
- PyTorch
- HuggingFace Transformers
- NumPy
- SciPy
- scikit-learn
The system compares baseline and fine-tuned models, computes embedding drift, analyzes the top 100 most changed tokens, and generates a structured drift report.
Special care was taken to make it lightweight and CPU-friendly, allowing it to run locally without GPU dependency.
Challenges I ran into
One key challenge was supporting both encoder and decoder architectures in a unified framework. The internal behavior of masked language models and causal language models differs significantly.
Another challenge was balancing depth and performance. Brute-force token comparisons were too slow, so I optimized the system to analyze only the top 100 most drifted tokens.
What I learned
I learned that embedding movement and probability redistribution are often disconnected. A model can show minimal embedding drift but significant behavioral shift.
This reinforced the importance of measuring multiple layers of change not just token vectors.
Impact
VectorScan accelerates the fine-tuning evaluation phase. Developers can quickly identify which tokens shifted most and whether probability distributions became biased or overconfident.
It transforms hidden internal changes into measurable diagnostics.
Log in or sign up for Devpost to join the conversation.