Funded by the Institute for Data Valorization IVADO, Canada.
The goal of COVID-19 Data Hub is to provide the research community with a unified dataset by collecting worldwide fine-grained case data, merged with exogenous variables helpful for a better understanding of COVID-19.
All the data are provided at the download centre.
The dataset includes an extensive list of epidemiological variables, several policy measures by Oxford’s government response tracker, and a set of external keys to match the data with Google and Apple mobility reports, with the Hydromet dataset, and with spatial databases such as Eurostat for Europe or GADM worldwide.
We release R and Python packages to simplify the interaction with the Data Hub. In general, it is possible to import the data in any software by reading the CSV files provided at the download centre.
The data acquisition pipeline is open source. All the code used to generate the data files can be found at our GitHub repository. In principle, one can use the function covid19
from the repository to generate the same data we provide at the download centre. However, this takes between 1-2 hours, so that downloading the pre-computed files is typically more convenient. Here we provide the full list of data sources from which the data are pulled.
As most governments are updating the data retroactively, we provide vintage data to simplify reproducibility of academic research. These are immutable snapshots of the data taken each day. We gratefully acknowledge financial support by the R Consortium in maintaining the vintage data.
See the publications that use COVID-19 Data Hub.
If you find some issues with the data, please report a bug at our GitHub repository. Suggestions about where to find data that we do not currently provide are also very welcome! Help our project grow: star the repo!
StarBy using COVID-19 Data Hub, you agree to our terms of use.