Gemma 2 2B

Emerging

5papers using it

2025first seen

The 'Gemma 2 2B' dataset/benchmark is a collection of language model representations used to evaluate the effectiveness of interpretability methods, particularly in identifying non-Gaussian directions in model activations.

🔎 Find this dataset

Papers using Gemma 2 2B (5)

ICA Lens: Interpreting Language Models Without Training Another Dictionary2026

Check Your LLM's Secret Dictionary! Five Lines of Code Reveal What Your LLM Learned (Including What It Shouldn't Have)2026

CorrSteer: Generation-Time LLM Steering via Correlated Sparse Autoencoder Features2025

CorrSteer: Steering Improves Task Performance and Safety in LLMs through Correlation-based Sparse Autoencoder Feature Selection2025

Interpretable Steering of Large Language Models with Feature Guided Activation Additions2025