GTZAN

Canonical

7papers using it

2022first seen

GTZAN is a dataset for musical genre classification of audio signals. The dataset consists of 1,000 audio tracks, each of 30 seconds long. It contains 10 genres, each represented by 100 tracks. The tracks are all 22,050Hz Mono 16-bit audio files in WAV format. The genres are: blues, classical, country, disco, hiphop, j

🔎 Find this dataset

Papers using GTZAN (7)

S-KEY: Self-supervised Learning of Major and Minor Keys from Audio2025 · 1 cites

Whisper-AuT: Domain-Adapted Audio Encoder for Efficient Audio-LLM Training2026

Evaluating Pretrained General-Purpose Audio Representations for Music Genre Classification2026

SingNet: A Real-time Singing Voice Beat and Downbeat Tracking System2023 · 3 cites

Singing Beat Tracking With Self-supervised Front-end and Linear Transformers2022 · 1 cites

Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio2024

M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation2024