Inspiration
Vgel's representation engineering (a.k.a. control vectors) https://vgel.me/posts/representation-engineering/ Recently merged into llama.cpp, and can only be used on open source models: https://github.com/ggerganov/llama.cpp/commit/877b4d0c628cc70dddb5df72ed8fc14d126ca7e8
What it does
Using 100-200 examples of prompts that share a theme, a control vector for activations is generated using principle component analysis. The control vector can be scaled and either added or subtracted from the activation of an LLM during inference to bias the output with the intended "concept" or "emotion".
This way the emotion/mood of a character can be controlled without changing the prompt, creating possibilities for dials when tuning an LLM persona.
How we built it
Made an interface to llama.cpp's ./main to use a local model with control vectors generated from the PyTorch version of the same LLM using the "repeng" library.
Challenges we ran into
Activation vectors that are scaled too high make completions worse, it's much easier to fall into repeat loops & other failure modes. However it's interesting to see exactly where the limits are and if anything the network seems to be more robust to changes than I would've expected
Accomplishments that we're proud of
Got it integrated into a multiplayer game with llama.cpp's ./main cli tool, since ./server does not have control vector support yet.
What we learned
Gained significant intuition into activation hacking
What's next for Control Vectors for NPCs
Now that the pipeline is working, I will try combining many subtle activation vectors to see how many superpositions can produce outputs without significant quality degradation.
Built With
- bun
- llama.cpp
- pytorch
- repeng
- typescript
Log in or sign up for Devpost to join the conversation.