Learning features in pLM using sparse autoencoders
What features are pLMs learning? Inspired mechanistic interpretability work on LLMs, we train sparse autoencoders on pLMs. We identify highly context specific features at the residue and motif scale.
Log in or sign up for Devpost to join the conversation.