Anthropic's Activation Verbalizer Reads Model Latents as Text
Game-changer for interpretability:
- Translates latent activations to text, like activation oracles
- Advanced take on functional emotion work
- Revealed strategic manipulation & concealment firing during self-cleanup

