The goal of COVID-19 Data Hub is to provide the research community with a unified dataset by collecting worldwide fine-grained case data, merged with exogenous variables helpful for a better understanding of COVID-19.
All the data are provided at the download centre.
The dataset includes an extensive list of epidemiological variables, several policy measures by Oxford’s government response tracker, and a set of external keys to match the data with Google and Apple mobility reports, with the Hydromet dataset, and with spatial databases such as Eurostat for Europe or GADM worldwide.
The data acquisition pipeline is open source. All the code used to
generate the data files can be found at our GitHub repository.
In principle, one can use the function
covid19 from the
repository to generate the same data available at the download centre. However,
this takes between 1-2 hours, so that downloading the pre-computed files
is typically more convenient. The full list of data sources where the
data are pulled from is available here.
As most governments are updating the data retroactively, we provide vintage data to simplify reproducibility of academic research. These are immutable snapshots of the data taken each day. We gratefully acknowledge financial support by the R Consortium in maintaining the vintage data.
The first version of the project is described in “COVID-19 Data Hub”, Journal of Open Source Software, 2020. The implementation details and the latest version of the data are described in “A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution”, Scientific Data, Nature, 2022. You can browse the publications that use COVID-19 Data Hub here and here. Please cite our paper(s) when using COVID-19 Data Hub.
If you find some issues with the data, please report a bug at our GitHub repository.Star