📊 Datasets — Awesome Time Series

1,321 datasets & benchmarks — 15 canonical foundations plus emerging datasets mined from recent papers. Each links to the papers that use it.

S&P 500Emerging

The S&P 500 is a stock market index that contains 500 of the largest publicly traded companies in the United States and is used to evaluate the overall performance of the U.S. stock market.

📄 36 papers

ETTCanonical

Electricity Transformer Temperature — a multivariate time-series benchmark for long-horizon forecasting.

📄 23 papers⬇ 778💛 11🤗 HFcc-by-4.0

GIFT-EvalEmerging

The 'GIFT-Eval' dataset/benchmark is used to evaluate the performance of time series foundation models by providing a standardized set of tasks and metrics for assessing their effectiveness in learning transferable representations across diverse temporal patterns.

📄 20 papers⬇ 716🤗 HF

ERA5Emerging

ERA-5 is a reanalysis dataset that contains comprehensive atmospheric data used to evaluate and improve multi-variable weather forecasting models.

📄 18 papers

M4Canonical

100,000 time series across domains and frequencies from the M4 forecasting competition.

📄 15 papers

TrafficCanonical

DepositPhotos Traffic & Transportation Dataset (Sample) Overview This dataset is a curated sample of high-resolution Traffic, Transportation, and Urban Mobility images sourced from the DepositPhotos library. This subset is designed for training, testing, and evaluating Computer Vision models for autonomous driving, smart city infrastructure, and Generative AI applications focused on urban environments. This sample demonstrates the exact quality, diversity, and… See the full description on the dataset page: https://huggingface.co/datasets/Depositphotos/traffic.

📄 14 papers⬇ 11💛 2🤗 HFother

MIMIC-IIIEmerging

MIMIC-III is a publicly available critical care database that contains de-identified health data from patients admitted to intensive care units, and it is used to evaluate predictive models for clinical time-series analysis.

📄 14 papers

PEMSCanonical

The 'PEMS' dataset is a benchmark that contains multivariate traffic series data used to evaluate traffic forecasting models, particularly in the context of extreme events and their complex spatio-temporal correlations.

📄 14 papers

ElectricityCanonical

The 'Electricity' dataset is a multivariate time series dataset used to evaluate forecasting methods by capturing the interdependencies among different variables over time.

📄 13 papers⬇ 44🤗 HF

M5Canonical

The M5 dataset is a large-scale real-world retail dataset used to evaluate time series forecasting models, capturing complex dynamics across multiple variables.

📄 12 papers⬇ 197🤗 HF

COVID-19Emerging

The 'COVID-19' dataset/benchmark contains multi-modal data related to the pandemic, including epidemiological time series, public health policies, and demographic information, and is used to evaluate forecasting models for the short-term spread of the disease.

📄 12 papers

M-4 competition datasetEmerging

The M-4 competition dataset is a benchmark that contains a diverse set of time series data used to evaluate the performance of forecasting methods.

📄 11 papers

WeatherCanonical

The 'Weather' dataset is a benchmark used to evaluate time series imputation methods by providing time recordings of weather-related data with missing values.

📄 9 papers

Mackey-GlassEmerging

Pre-generated numpy arrays of MackeyGlass time series, generated with the jitcdde library. Please note that due to lower-level solvers used in the library, different machines, even with the same ISA and library versions, may produce different data. Thus, please use the pre-generated data included here. The dataset contains 14 time series, each uses MG parameters beta=0.2, gamma=0.1, n=10. tau is varied per time series from 17 to 30. Each time series is 50 Lyapunov times in length, with 75… See the full description on the dataset page: https://huggingface.co/datasets/NeuroBench/mackey_glass.

📄 8 papers⬇ 20🤗 HFcc-by-4.0

ILICanonical

The 'ILI' dataset contains time series data related to influenza-like illness prevalence and is used to evaluate forecasting methods for time series data.

📄 8 papers

M-3Emerging

The M-3 dataset is a benchmark for evaluating time series forecasting models, containing a diverse set of time series data across various domains.

📄 7 papers

PEMS-04Emerging

PEMS-04 is a benchmark dataset used for evaluating spatiotemporal forecasting techniques, containing traffic data that reflects the complex spatiotemporal dynamics of transportation systems.

📄 7 papers

PeMS-08Emerging

PEMS-08 is a benchmark dataset used for evaluating spatiotemporal forecasting techniques, containing traffic data that captures the complex spatiotemporal dynamics of transportation systems.

📄 7 papers

BitcoinEmerging

The 'Bitcoin' dataset contains daily close-price data used to evaluate the performance of quantile deep learning models for multi-step ahead time series prediction, particularly under high volatility and extreme conditions.

📄 6 papers⬇ 31💛 1🤗 HFapache-2.0

LorenzEmerging

The 'Lorenz' dataset contains chaotic time series data used to evaluate the performance of time series prediction models, particularly in capturing intrinsic patterns and temporal dynamics.

📄 6 papers

S&P 500 indexEmerging

The S&P 500 index is a stock market index that contains the daily and hourly closing prices of 500 large-cap U.S. companies, and it is used to evaluate short-term stock market trends and the efficacy of predictive models.

📄 6 papers

UCR ArchiveCanonical

The UCR Archive is a benchmark dataset that contains a collection of time series data used to evaluate the performance of various time series classification algorithms.

📄 6 papers

EthereumEmerging

Ethereum is a cryptocurrency dataset used to evaluate the performance of quantile deep learning models for multi-step ahead time series prediction, specifically focusing on daily close-price data.

📄 5 papers⬇ 16🤗 HF

EUR/USDEmerging

The 'EUR/USD' dataset contains daily foreign exchange rates between the Euro and the US Dollar and is used to evaluate forecasting models in the context of non-stationary time series.

📄 5 papers

GermanEmerging

The 'German' dataset/benchmark contains data related to continuous intraday electricity markets in Germany and is used to evaluate forecasting models for electricity prices by analyzing the dynamics of buy and sell orders in the orderbook.

📄 5 papers

Kuramoto-Sivashinsky equationEmerging

📄 5 papers

Los AngelesEmerging

The 'Los Angeles' dataset contains hourly recordings of wind speed and is used to evaluate deep learning-based probabilistic forecasting methods for wind speed.

📄 5 papers

NASDAQEmerging

NASDAQ is a stock market index that contains historical stock price data used to evaluate stock price forecasting models and test the efficient-market hypothesis.

📄 5 papers

NASDAQ-100Emerging

The NASDAQ-100 is a stock market index that includes 100 of the largest non-financial companies listed on the NASDAQ stock exchange, and it is used to evaluate volatility forecasting models based on high-frequency trading data.

📄 5 papers

MNISTEmerging

Dataset Card for MNIST Dataset Summary The MNIST dataset consists of 70,000 28x28 black-and-white images of handwritten digits extracted from two NIST databases. There are 60,000 images in the training dataset and 10,000 images in the validation dataset, one class per digit so a total of 10 classes, with 7,000 images (6,000 train images and 1,000 test images) per class. Half of the image were drawn by Census Bureau employees and the other half by high school students… See the full description on the dataset page: https://huggingface.co/datasets/ylecun/mnist.

📄 4 papers⬇ 91.7k💛 262🤗 HFmit

SolarEmerging

Šolar is a developmental corpus of 5485 school texts (e.g., essays), written by students in Slovenian secondary schools (age 15-19) and pupils in the 7th-9th grade of primary school (13-15), with a small percentage also from the 6th grade. Part of the corpus (1516 texts) is annotated with teachers' corrections using a system of labels described in the document available at https://www.clarin.si/repository/xmlui/bitstream/handle/11356/1589/Smernice-za-oznacevanje-korpusa-Solar_V1.1.pdf (in Slovenian).

📄 4 papers⬇ 34🤗 HFcc-by-nc-sa-4.0

BeijingEmerging

The 'Beijing' dataset contains multi-year pollutant and meteorological time-series data used to evaluate the performance of various forecasting models for hourly PM2.5 prediction in Beijing, China.