This document describes the steps for obtaining the data used to calculate the tourism and recreation goal for the 2019 global assessment.
The general calculation is: tr = Ep * Sr * Tw and Xtr = tr/90th quantile across regions
WTTC data includes projections 10 years in the future, and in 2018 it was unclear when these projections began, so they used 2017 as their maximum data year. When downloading data for the 2019 assessment it was clear from the WTTC data gateway where the real data ended and projections began, so 2019 was used as the maximum data year.
We also updated the code to account for uninhabited/low population areas, as we did with the Artisanal Fishing Opportunities data prep script.
Previously, US state department data only were used to identify travel warnings for each country. In 2019 we incorporated advisories from the Canadian government to fill in data for countries the US did not establish warnings for (such as the US itself). The Canadian advisories were matched to the numeric scale used by the US State Department since 2018 (1-4, ranges from level 1 (normal precautions) to level 4 (do not travel)).
Data on the US State Dept website span 2018 and 2019, but all of these will be considered advisory data for 2019, regardless of when they were created.
In 2019 we also gapfilled travel warnings for missing territorial regions, using administrative country data. In 2018 a multiplier of 1 was applied to each region with no data. In 2019 all of these regions also received a multiplier of 1 based on the admin country advisories.
We were able to update the following data:
Tourism sustainability data from the WEC Travel and Tourism Competitiveness Report were not updated, as the 2019 report has not been released as of 15 July 2019.
#library(devtools)
#devtools::install_github("ohi-science/ohicore@dev")
library(ohicore)
library(tidyverse)
library(stringr)
library(WDI)
library(here)
library(janitor)
library(plotly)
source('https://raw.githubusercontent.com/OHI-Science/ohiprep_v2019/gh-pages/workflow/R/common.R')
## maximum year of wttc data:
year_max <- 2019
source(here("globalprep/tr/v2019/R/tr_fxns.R"))
These data are from the World Travel & Tourism Council. We use “direct” employment data (see mazu: git-annex/globalprep/_raw_data/WTTC/d2019/README.md for instructions on obtaining data). The data extend to 2029, which includes 10 years of projections. The actual data goes to 2019 (projected/real data are differentiated on the data gateway chart).
These data are cleaned and formatted using the R/process_WTTC.R script. Missing values are gapfilled using the UN georegion information.
Primary source of information is from the U.S. State Department, secondary source is the Canadian Government
For future assessments It would be worthwhile to see if data can be “scraped” directly from the government websites into R. This seems possible given the new format of the state department travel warning data.
The following code is used transform the warnings into a multiplier that is used to calculate tourism and recreation scores. Data from each country are copied from the US and Canada government travel websites, pasted into an excel file, and saved as a .csv in the raw folder (tr_travelwarning_20??_raw.csv)
Date downloaded: 1 July 2019
Date range of warnings: 18 June 2018 - 1 July 2019 (note: regardless of date of the warning, the advisory year will be the assessment year)
##Reading and wrangling 2019 warning data
warn_raw <- read.csv(here('globalprep/tr/v2019/raw/tr_travelwarning_2019_raw.csv'), na.strings = " ") %>%
mutate(country = as.character(country))
# Remove text information from level and filter out regional warnings
warn_clean <- warn_raw %>%
mutate(level = as.numeric(str_extract(level, '[1,2,3,4]'))) %>%
filter(!(regional %in% 1)) %>% # remove regions that have regional warnings, as those are no longer considered in the assessment
select(assess_year, level, country) %>%
rename(year = assess_year)
## Correct regions that are reported together - check to make sure these are necessary and that everything is covered as data change from year to year. Also make sure level data is coming through for each.
french_indies <- data.frame(country="French West Indies",
country_new =c("Northern Saint-Martin")) %>%
left_join(filter(warn_clean, country=="French West Indies")) %>%
select(country=country_new, year, level)
BES <- data.frame(country="Bonaire, Sint Eustatius and Saba",
country_new =c("Saba", "Sint Eustatius")) %>% # Bonaire already reported separately
left_join(filter(warn_clean, country=="Bonaire, Sint Eustatius and Saba")) %>%
select(country=country_new, year, level)
line <- data.frame(country="Line Islands (Kiribati)",
country_new =c("Line Group", "Phoenix Group")) %>%
left_join(filter(warn_clean, country=="Line Islands (Kiribati)")) %>%
select(country=country_new, year, level)
# These are not the region names reported in OHI, but they are used in the name_2_rgn function
warn_improved <- filter(warn_clean, country != "French West Indies") %>%
bind_rows(french_indies)
warn_improved <- filter(warn_improved, country != "Bonaire, Sint Eustatius and Saba") %>%
bind_rows(BES)
warn_improved <- filter(warn_improved, country != "Line Islands (Kiribati)") %>%
bind_rows(line)
##Correct names for regions not identified by the name_2_region, based off of error messages after running function below
# Change names to match those reported by OHI (not always necessary)
warn_improved <- warn_improved %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country,"Israel"), "Israel", country)) %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country, "^Republic of the Congo"), "Republique du Congo", country)) %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country,"U.S. Virgin Islands"), "Puerto Rico and Virgin Islands of the United States", country)) %>% # creates duplicate of PR/VI with same warning level
dplyr::mutate(country = ifelse(stringr::str_detect(country,"Puerto Rico"), "Puerto Rico and Virgin Islands of the United States", country)) %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country,"Guadeloupe"), "Guadeloupe and Martinique", country)) %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country,"Saint Vincent and The Grenadines"), "Saint Vincent and the Grenadines", country)) %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country,"Burma"), "Myanmar", country)) %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country,"North Korea"), "North Korea", country)) %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country,"Solomon Island"), "Solomon Islands", country)) %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country,"Guam"), "Northern Mariana Islands and Guam", country)) %>%
dplyr::mutate(country = ifelse(stringr::str_detect(country,"Northern Mariana Islands"), "Northern Mariana Islands and Guam", country))
# Look at warn_improved and remove other regions that are duplicated or aren't reported in the OHI. Make sure there are no NAs.
get_dupes(warn_improved)
warn_improved <- warn_improved %>%
filter(!duplicated(country))
write_csv(warn_improved, here("globalprep/tr/v2019/intermediate/warning_2019.csv"))
Travel warning | Multiplier | Description |
---|---|---|
Level 1 | 1 (no penalty) | Exercise Normal Precautions: This is the lowest advisory level for safety and security risk. There is some risk in any international travel. |
Level 2 | 1 (no penalty) | Exercise Increased Caution: Be aware of heightened risks to safety and security. |
Level 3 | 0.25 | Reconsider Travel: Avoid travel due to serious risks to safety and security. |
Level 4 | 0 (full penalty, results in zero scores) | Do Not Travel: This is the highest advisory level due to greater likelihood of life-threatening risks. |
warn_complete <- read_csv(here("globalprep/tr/v2019/intermediate/warning_2019.csv"))
scores <- data.frame(level = c(1, 2, 3, 4), multiplier = c(1, 1, 0.25, 0))
warn_multiplier <- warn_complete %>%
left_join(scores, by="level") %>%
group_by(year, country) %>%
mutate(warning_count = n()) %>%
ungroup()
# Check to see if there are regions with more than one warning (in general there should be no regions with more than one advisory, but some are combined after being reported separately and may have different advisories)
warn_count <- filter(warn_multiplier, warning_count>1)
# If warn_count >0, multipliers from duplicate regions will be averaged:
warn_multiplier <- warn_multiplier %>%
group_by(year, country) %>%
summarize(multiplier = mean(multiplier))
#Save file with 2019 multiplier data
write_csv(warn_multiplier, here("globalprep/tr/v2019/intermediate/warning.csv"))
Many European regions now have a travel warning due to increased terrorism (e.g., United Kingdom, Italy, Spain, Germany), although this doesn’t show up in the following figure because previously, these regions had no travel warning (and were thus, NA).
The change in not penalizing subregional warnings tended to reduce the penalty (i.e. increase the multiplier value).
georegions <- georegion_labels %>%
select(rgn_id)
warn_rgn_spread <- warn_rgn %>%
spread(year, multiplier) %>%
full_join(georegions, by=c("rgn_id")) %>%
data.frame() %>%
gather(year, multiplier, starts_with("X")) %>%
mutate(year = gsub("X", "", year)) %>%
filter(rgn_id <= 250) %>%
filter(rgn_id != 213) # Filter out Antarctica
# Check number of regions reported - should be 220
table(warn_rgn_spread$year)
# Identify territories without advisories and connect them with multipliers for admin regions
region_data() # reload common.R if this isn't working
warn_rgn_nas <- warn_rgn_spread %>%
filter(is.na(rgn_name)) %>%
select(rgn_id) %>%
left_join(rgns_eez, by = "rgn_id") %>%
select(rgn_id, rgn_name, admin_rgn_id, admin_country_name)
admin_rgn_multipliers <- warn_rgn_nas %>%
select(rgn_id = admin_rgn_id) %>%
left_join(warn_rgn_spread, by = "rgn_id") %>%
rename(admin_rgn_id = rgn_id) %>%
select(admin_rgn_id, year, multiplier) %>%
filter(!duplicated(admin_rgn_id))
warn_rgn_nas <- warn_rgn_nas %>%
left_join(admin_rgn_multipliers, by = "admin_rgn_id") %>%
select(rgn_id, rgn_name, year, multiplier)
### Finalize warnings data and save
# Remove NAs from warn_rgn_spread, then add them back in
warn_rgn_spread <- warn_rgn_spread %>%
filter(!is.na(rgn_name)) %>%
bind_rows(warn_rgn_nas)
warn_rgn_all_rgns <- warn_rgn_spread %>%
select(rgn_id, year, multiplier) %>%
arrange(year, rgn_id) %>%
mutate(year = as.numeric(year))
# Check again that we have 220 regions reported
table(warn_rgn_all_rgns$year)
# Save 2019 data
write_csv(warn_rgn_all_rgns, here('globalprep/tr/v2019/output/tr_travelwarnings_2019only.csv'))
## Create gapfill file
# Add information about gapfill method to regions that were filled based on administrative country advisory above
travelwarning_gf <- read_csv(here("globalprep/tr/v2019/output/tr_travelwarnings.csv")) %>%
mutate(gapfilled = ifelse(rgn_id %in% warn_rgn_nas$rgn_id, 1, 0)) %>%
mutate(method = ifelse(gapfilled == 1, "Gapfilled based on administrative country advisory", NA)) %>%
select(-multiplier)
write_csv(travelwarning_gf, here('globalprep/tr/v2019/output/tr_travelwarnings_gf.csv'))
## Combine with previous years' data and save
travel_warnings_all <- read_csv(here("globalprep/tr/v2018/output/tr_travelwarnings.csv")) %>%
bind_rows(warn_rgn_all_rgns)
write_csv(travel_warnings_all, here('globalprep/tr/v2019/output/tr_travelwarnings.csv'))