Install from pip with
pip install covid19dh
Importing the main function covid19()
from covid19dh import covid19
= covid19() x, src
Package is regularly updated. Update with
pip install --upgrade covid19dh
The function covid19()
returns 2 pandas dataframes: *
the data and * references to the data sources.
List of country names (case-insensitive) or ISO codes (alpha-2, alpha-3 or numeric). The list of ISO codes can be found here.
Fetching data from a particular country:
= covid19("USA") # Unites States x, src
Specify multiple countries at the same time:
= covid19(["ESP","PT","andorra",250]) x, src
If country
is omitted, the whole dataset is
returned:
= covid19() x, src
Logical. Skip data cleaning? Default True
. If
raw=False
, the raw data are cleaned by filling missing
dates with NaN
values. This ensures that all locations
share the same grid of dates and no single day is skipped. Then,
NaN
values are replaced with the previous
non-NaN
value or 0
.
= covid19(raw = False) x, src
Date can be specified with datetime.datetime
,
datetime.date
or as a str
in format
YYYY-mm-dd
.
from datetime import datetime
= covid19("SWE", start = datetime(2020,4,1), end = "2020-05-01") x, src
Integer. Granularity level of the data:
from datetime import date
= covid19("USA", level = 2, start = date(2020,5,1)) x, src
Logical. Memory caching? Significantly improves performance on successive calls. By default, using the cached data is enabled.
Caching can be disabled (e.g. for long running programs) by:
= covid19("FRA", cache = False) x, src
Logical. Retrieve the snapshot of the dataset that was generated at
the end
date instead of using the latest version. Default
False
.
To fetch e.g. US data that were accessible on 22th April 2020 type
= covid19("US", end = "2020-04-22", vintage = True) x, src
The vintage data are collected at the end of the day, but published with approximately 48 hour delay, once the day is completed in all the timezones.
Hence if vintage = True
, but end
is not
set, warning is raised and None
is returned.
= covid19("USA", vintage = True) # too early to get today's vintage x, src
UserWarning: vintage data not available yet
We have invested a lot of time and effort in creating COVID-19 Data Hub, please:
The output data files are published under the CC BY license. All other code and assets are published under the GPL-3 license.
Guidotti, E., Ardia, D., (2020), “COVID-19 Data Hub”, Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.
A BibTeX entry for LaTeX users is:
@Article{guidotti2020,
title = {COVID-19 Data Hub},
year = {2020},
doi = {10.21105/joss.02376},
author = {Emanuele Guidotti and David Ardia},
journal = {Journal of Open Source Software},
volume = {5},
number = {51},
pages = {2376} }
The implementation details and the latest version of the data are described in:
Guidotti, E., (2022), “A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution”, Sci Data 9(1):112, doi: 10.1038/s41597-022-01245-1
A BibTeX entry for LaTeX users is:
@Article{guidotti2022,
title = {A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution},
year = {2022},
doi = {10.1038/s41597-022-01245-1},
author = {Emanuele Guidotti},
journal = {Scientific Data},
volume = {9},
number = {1},
pages = {112} }
Comments