Funded by the Institute for Data Valorization IVADO in 2020. Supported by the R Consortium from 2021 to 2024. Funded by the University of Lugano USI in 2025.

The goal of COVID-19 Data Hub is to provide the research community with a unified dataset by collecting worldwide fine-grained case data, merged with exogenous variables helpful for a better understanding of COVID-19.

JOB OFFER (published November 5, 2024): Hiring a research assistant with a Master’s or higher degree to work on the COVID-19 Data Hub, starting ASAP! The position is funded by the University of Lugano, Switzerland. Possibility to work partially or fully remotely. Read more here.

Download the data

All the data are provided at the download centre.

Unified dataset

The dataset includes an extensive list of epidemiological variables, several policy measures by Oxford’s government response tracker, and a set of external keys to match the data with Google and Apple mobility reports, with the Hydromet dataset, and with spatial databases such as Eurostat for Europe or GADM worldwide.

Software packages

The R and Python packages simplify the interaction with the Data Hub. In general, it is possible to import the data in any software by reading the CSV files provided at the download centre.

Data transparency

The data acquisition pipeline is open source. All the code used to generate the data files can be found at our GitHub repository. In principle, one can use the function covid19 from the repository to generate the same data available at the download centre. However, this takes between 1-2 hours, so that downloading the pre-computed files is typically more convenient. The full list of data sources where the data are pulled from is available here.

Research reproducibility

As most governments are updating the data retroactively, we provide vintage data to simplify reproducibility of academic research. These are immutable snapshots of the data taken each day. We gratefully acknowledge financial support by the R Consortium in maintaining the vintage data.

Academic publications

The first version of the project is described in “COVID-19 Data Hub”, Journal of Open Source Software, 2020. The implementation details and the latest version of the data are described in “A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution”, Scientific Data, Nature, 2022. You can browse the publications that use COVID-19 Data Hub here and here. Please cite our paper(s) when using COVID-19 Data Hub.

Contribute

If you find some issues with the data, please report a bug at our GitHub repository.

Star

Terms of use

By using COVID-19 Data Hub, you agree to our terms of use.

Authors

The project was initiated via the R package COVID19 developed by Emanuele Guidotti (University of Neuchâtel), leveraged by David Ardia (HEC Montréal) via the funding by IVADO, enhanced by an awesome open source community, and it is maintained by Emanuele Guidotti.

Logo courtesy of Gary Sandoz and Talk-to-Me.

Supported by

R ConsortiumIVADOHEC MontréalHack ZurichUniversità degli Studi di Milano