Gemma 2 2B
Emerging5papers using it
2025first seen
The 'Gemma 2 2B' dataset/benchmark is a collection of language model representations used to evaluate the effectiveness of interpretability methods, particularly in identifying non-Gaussian directions in model activations.
Papers using Gemma 2 2B (5)
- ICA Lens: Interpreting Language Models Without Training Another DictionaryCheck Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder FeaturesCorrSteer: Steering Improves Task Performance and Safety in LLMs through
Correlation-based Sparse Autoencoder Feature SelectionInterpretable Steering of Large Language Models with Feature Guided
Activation Additions