Inspiration

Vgel's representation engineering (a.k.a. control vectors) https://vgel.me/posts/representation-engineering/ Recently merged into llama.cpp, and can only be used on open source models: https://github.com/ggerganov/llama.cpp/commit/877b4d0c628cc70dddb5df72ed8fc14d126ca7e8

What it does

Using 100-200 examples of prompts that share a theme, a control vector for activations is generated using principle component analysis. The control vector can be scaled and either added or subtracted from the activation of an LLM during inference to bias the output with the intended "concept" or "emotion".

This way the emotion/mood of a character can be controlled without changing the prompt, creating possibilities for dials when tuning an LLM persona.

How we built it

Made an interface to llama.cpp's ./main to use a local model with control vectors generated from the PyTorch version of the same LLM using the "repeng" library.

Challenges we ran into

Activation vectors that are scaled too high make completions worse, it's much easier to fall into repeat loops & other failure modes. However it's interesting to see exactly where the limits are and if anything the network seems to be more robust to changes than I would've expected

Accomplishments that we're proud of

Got it integrated into a multiplayer game with llama.cpp's ./main cli tool, since ./server does not have control vector support yet.

What we learned

Gained significant intuition into activation hacking

What's next for Control Vectors for NPCs

Now that the pipeline is working, I will try combining many subtle activation vectors to see how many superpositions can produce outputs without significant quality degradation.

Built With

Share this project:

Updates