All the CSV files at the download centre share the same structure: a table consisting of rows and columns. Each row represents an observation for a geographical entity at a given point in time. The columns are documented below.

The SQLite database contains two tables: the table location where each row represents a geographical entity, and the table timeseries with the time-series observations for each location. The two tables can be joined on the column id. The columns are consistent with the CSV files and are documented below.

Identifiers

Variable Description
id Unique identifier for the geographical entity.
date Observation date in the format YYYY-MM-DD.

Epidemiological variables

Confirmed cases, deaths, recovered, tests, and vaccination data are cumulative counts. When computing daily counts, you may end up with negative numbers. This is a known issue due to decreasing cumulative counts in the original data provider. COVID-19 Data Hub applies no cleaning procedure for this kind of issues, which are typically due to changes in the data collection methodology. Refer to the data sources to identify the original providers and obtain more information about the data. If the provider corrects the data retroactively, the changes are reflected in this dataset.

Variable Description
confirmed Cumulative number of confirmed cases.
deaths Cumulative number of deaths.
recovered Cumulative number of patients released from hospitals or reported recovered.
tests Cumulative number of tests.
vaccines Cumulative number of total doses administered.
people_vaccinated Cumulative number of people who received at least one vaccine dose.
people_fully_vaccinated Cumulative number of people who received all doses prescribed by the vaccination protocol.
hosp Number of hospitalized patients on date.
icu Number of hospitalized patients in intensive therapy on date.
vent Number of patients requiring invasive ventilation on date.
population Total population.

Policy measures

Policy measures are provided by Oxford Covid-19 Government Response Tracker. From the documentation: “Policies often vary by region within countries. The Tracker codes the most stringent government policy that is in place in a country/territory, as represented by the highest ordinal value. Sometimes the most stringent policy in a country/territory will only apply to a small part of the population. If the most stringent policy is only present in a limited geographic area, the Tracker uses a binary flag variable to denote this limited scope. The indicators have a flag for whether they are”targeted” to a specific geographical region (flag=0) or whether they are a “general” policy that is applied across the whole country/territory (flag=1).”

COVID-19 Data Hub uses these flags and indicators as follows:

  • General policies (flag=1) are encoded with a positive value.
  • Targeted policies (flag=0) are encoded with a negative value.
  • Missing flags are treated as flag=0 (targeted policy) to represent the fact that the policy may vary within the country.
  • All the policies for a country are used also for its sub-national areas (levels 2 and 3), unless sub-national policies are directly available.
  • When data on sub-national policies are available (level 2), policies at level 3 are inherited from the policies at level 2.

In short: positive integers identify policies applied to the entire administrative area. Negative integers are used to identify policies that represent a best guess of the policy in force, but may not represent the real status of the given area. The negative sign is used solely to distinguish the two cases, it should not be treated as a real negative value.

Variable Description
school_closing 0 - no measures
1 - recommend closing or all schools open with alterations resulting in significant differences compared to non-Covid-19 operations
2 - require closing (only some levels or categories, eg just high school, or just public schools)
3 - require closing all levels
workplace_closing 0 - no measures
1 - recommend closing (or recommend work from home) or all businesses open with alterations resulting in significant differences compared to non-Covid-19 operation
2 - require closing (or work from home) for some sectors or categories of workers
3 - require closing (or work from home) for all-but-essential workplaces (eg grocery stores, doctors)
cancel_events 0 - no measures
1 - recommend cancelling
2 - require cancelling
gatherings_restrictions 0 - no restrictions
1 - restrictions on very large gatherings (the limit is above 1000 people)
2 - restrictions on gatherings between 101-1000 people
3 - restrictions on gatherings between 11-100 people
4 - restrictions on gatherings of 10 people or less
transport_closing 0 - no measures
1 - recommend closing (or significantly reduce volume/route/means of transport available)
2 - require closing (or prohibit most citizens from using it)
stay_home_restrictions 0 - no measures
1 - recommend not leaving house
2 - require not leaving house with exceptions for daily exercise, grocery shopping, and ‘essential’ trips
3 - require not leaving house with minimal exceptions (eg allowed to leave once a week, or only one person can leave at a time, etc)
internal_movement_restrictions 0 - no measures
1 - recommend not to travel between regions/cities
2 - internal movement restrictions in place
international_movement_restrictions 0 - no restrictions
1 - screening arrivals
2 - quarantine arrivals from some or all regions
3 - ban arrivals from some regions
4 - ban on all regions or total border closure
information_campaigns 0 - no Covid-19 public information campaign
1 - public officials urging caution about Covid-19
2- coordinated public information campaign (eg across traditional and social media)
testing_policy 0 - no testing policy
1 - only those who both (a) have symptoms AND (b) meet specific criteria (eg key workers, admitted to hospital, came into contact with a known case, returned from overseas)
2 - testing of anyone showing Covid-19 symptoms
3 - open public testing (eg “drive through” testing available to asymptomatic people)
contact_tracing 0 - no contact tracing
1 - limited contact tracing; not done for all cases
2 - comprehensive contact tracing; done for all identified cases
facial_coverings 0 - No policy
1 - Recommended
2 - Required in some specified shared/public spaces outside the home with other people present, or some situations when social distancing not possible
3 - Required in all shared/public spaces outside the home with other people present or all situations when social distancing not possible
4 - Required outside the home at all times regardless of location or presence of other people
vaccination_policy 0 - No availability
1 - Availability for ONE of following: key workers/ clinically vulnerable groups (non elderly) / elderly groups
2 - Availability for TWO of following: key workers/ clinically vulnerable groups (non elderly) / elderly groups
3 - Availability for ALL of following: key workers/ clinically vulnerable groups (non elderly) / elderly groups
4 - Availability for all three plus partial additional availability (select broad groups/ages)
5 - Universal availability
elderly_people_protection 0 - no measures
1 - Recommended isolation, hygiene, and visitor restriction measures in LTCFs and/or elderly people to stay at home
2 - Narrow restrictions for isolation, hygiene in LTCFs, some limitations on external visitors and/or restrictions protecting elderly people at home
3 - Extensive restrictions for isolation and hygiene in LTCFs, all non-essential external visitors prohibited, and/or all elderly people required to stay at home and not leave the home with minimal exceptions, and receive no external visitors

Oxford Covid-19 Government Response Tracker calculates several indices to give an overall impression of government activity, as described here. COVID-19 Data Hub propagates the index value from upper level areas to lower level areas as described above for individual policy measures. When the index is not provided directly for the given area and it is inherited from an upper level area, we place a minus sign in front of the value of the index. The negative sign is used solely to distinguish the two cases, it should not be treated as a real negative value.

Variable Description
government_response_index The index records how the response of governments has varied over all indicators in the OxCGRT database, becoming stronger or weaker over the course of the outbreak.
stringency_index The index records the strictness of ‘lockdown style’ policies that primarily restrict people’s behaviour.
containment_health_index The index combines ‘lockdown’ restrictions and closures with measures such as testing policy and contact tracing, short term investment in healthcare, as well investments in vaccines.
economic_support_index The index records measures such as income support and debt relief.

Administrative areas

Variable Description
administrative_area_level Level of the administrative area: 1 for countries; 2 for states, regions, cantons, or local equivalent; 3 for cities, municipalities, or local equivalent.
administrative_area_level_1 Name of the administrative area of level 1.
administrative_area_level_2 Name of the administrative area of level 2.
administrative_area_level_3 Name of the administrative area of level 3.

Coordinates

Latitude and longitude are computed using st_point_on_surface, which guarantees to return a point on the surface of the administrative area.

Variable Description
latitude Latitude.
longitude Longitude.

ISO codes

Variable Description
iso_alpha_3 3-letter code of the country according to the standard ISO 3166-1 Alpha-3.
iso_alpha_2 2-letter code of the country according to the standard ISO 3166-1 Alpha-2.
iso_numeric Numeric code of the country according to the standard ISO 3166-1 Numeric.
iso_currency 3-letter code of the currency used in the country according to the standard ISO 4217.

External Keys

Variable Description
key_local The administrative area identifier used by the local authorities, usually the national institute of statistics or local equivalent. E.g., key_local are FIPS codes for United States, IBGE codes for Brazil, ISTAT codes for Italy, etc.
key_google_mobility The place_id used in Google Mobility Reports. The identifier also allows to interact with the Google Places and Google Maps API.
key_apple_mobility The administrative area identifier identifier used in Apple Mobility Reports. This is constructed by concatenating region and, when not null, sub-region in Apple Mobility Reports: i.e., <region>, <sub-region>.
key_jhu_csse The administrative area identifier identifier used in the JHU CSSE Unified COVID-19 Dataset. In particular, this enables to match administrative areas with the Hydromet dataset.
key_nuts The 2021 NUTS codes for Europe, by Eurostat. This is a one-to-many relation as one NUTS code can be associated with multiple administrative areas. The lowest level used is NUTS 3. Local Administrative Units (LAU) are mapped to NUTS 3.
key_gadm The identifier (GID) used in the GADM database version 3.6. This is a one-to-one relation as one GID is associated with a unique administrative area. For some administrative areas, it is not possible to find the exact correspondence in the GADM database. In this cases we proceed as follows: (a) when the administrative area is the union of multiple areas in GADM, we use the GID of the area closer to the centroid; (b) when the administrative area is a subdivision not present in GADM, we create its GID by appending .n_0 to the GID of the upper division, where n is a sequantial integer and 0 denotes the version number (read more about GID here).

Terms of use

We have invested a lot of time and effort in creating COVID-19 Data Hub, please:

The output data files are published under the CC BY license. All other code and assets are published under the GPL-3 license.

Cite as

Guidotti, E., Ardia, D., (2020), “COVID-19 Data Hub”, Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.

A BibTeX entry for LaTeX users is:

@Article{guidotti2020,
    title = {COVID-19 Data Hub},
    year = {2020},
    doi = {10.21105/joss.02376},
    author = {Emanuele Guidotti and David Ardia},
    journal = {Journal of Open Source Software},
    volume = {5},
    number = {51},
    pages = {2376}
}

The implementation details and the latest version of the data are described in:

Guidotti, E., (2022), “A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution”, Sci Data 9(1):112, doi: 10.1038/s41597-022-01245-1

A BibTeX entry for LaTeX users is:

@Article{guidotti2022,
    title = {A worldwide epidemiological database for COVID-19 at fine-grained spatial resolution},
    year = {2022},
    doi = {10.1038/s41597-022-01245-1},
    author = {Emanuele Guidotti},
    journal = {Scientific Data},
    volume = {9},
    number = {1},
    pages = {112}
}

Comments