Learning features in pLM using sparse autoencoders

What features are pLMs learning? Inspired mechanistic interpretability work on LLMs, we train sparse autoencoders on pLMs. We identify highly context specific features at the residue and motif scale.