4papers using it
2025first seen
The 'Reddit' dataset/benchmark contains data from the social media platform Reddit and is used to evaluate the performance of large language model agents in real-world, personalized applications.
Papers using Reddit (4)
- MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment SimulationPolarization by Default: Auditing Recommendation Bias in LLM-Based Content CurationNavigating through the hidden embedding space: steering LLMs to improve mental health assessmentIncongruent Positivity: When Miscalibrated Positivity Undermines Online Supportive Conversations