Awesome Similarity Search
πŸ“„Papers🧭TopicsπŸ”₯TrendingπŸ—ΊοΈMapπŸ†LeaderboardsπŸŽ“LearnπŸ€–Ask AI
β‹―More
πŸ‘₯AuthorsπŸ“šReading PacksπŸ› οΈToolsπŸ“Blogsβœ‰οΈNewsletterπŸ”–Saved
+ Add Paper

← authors Β· overview

Neel Nanda

13 papers Β· 5132 citations
Most-cited papers
  • Training A Helpful And Harmless Assistant With Reinforcement Learning From Human Feedback
    2022 Β· 3872 citations
  • Refusal In Language Models Is Mediated By A Single Direction
    2024 Β· 599 citations
  • Linear Representations Of Sentiment In Large Language Models
    2023 Β· 147 citations
  • Improving Dictionary Learning With Gated Sparse Autoencoders
    2024 Β· 145 citations
  • Transcoders Find Interpretable LLM Feature Circuits
    2024 Β· 126 citations
Topics
Model ArchitectureEvaluationSafety & AlignmentFine-TuningCodeTraining TechniquesReinforcement LearningIn-Context LearningEfficiency

Stay Updated

E-Mail Digest

Submit a paper Β· Privacy Β· Terms

Β© 2026 Awesome Papers.