Inspiration

Fine-tuning is powerful but opaque. Developers often cannot observe what actually changes inside a model after training. Embeddings shift, probabilities redistribute, and confidence increases — yet these changes remain hidden.

I wanted to build a tool that makes fine-tuning transparent.

Instead of guessing what changed, developers should be able to measure it.


What it does

VectorScan v2 is a developer-centric diagnostic engine that detects internal representation drift after fine-tuning.

It measures:

  • Embedding movement (cosine drift)
  • Neighborhood structure changes
  • Probability redistribution (KL divergence)
  • Logit shifts
  • Entropy compression

It supports both encoder models (e.g., DistilBERT) and decoder LLMs (e.g., GPT2), automatically detecting architecture and applying the appropriate analysis pipeline.


How I built it

VectorScan is built using:

  • Python
  • PyTorch
  • HuggingFace Transformers
  • NumPy
  • SciPy
  • scikit-learn

The system compares baseline and fine-tuned models, computes embedding drift, analyzes the top 100 most changed tokens, and generates a structured drift report.

Special care was taken to make it lightweight and CPU-friendly, allowing it to run locally without GPU dependency.


Challenges I ran into

One key challenge was supporting both encoder and decoder architectures in a unified framework. The internal behavior of masked language models and causal language models differs significantly.

Another challenge was balancing depth and performance. Brute-force token comparisons were too slow, so I optimized the system to analyze only the top 100 most drifted tokens.


What I learned

I learned that embedding movement and probability redistribution are often disconnected. A model can show minimal embedding drift but significant behavioral shift.

This reinforced the importance of measuring multiple layers of change not just token vectors.


Impact

VectorScan accelerates the fine-tuning evaluation phase. Developers can quickly identify which tokens shifted most and whether probability distributions became biased or overconfident.

It transforms hidden internal changes into measurable diagnostics.

Built With

Share this project:

Updates