ohi logo
OHI Science | Citation policy

REFERENCE RMD FILE

1 Summary

This document describes the steps for obtaining the data used to calculate the tourism and recreation goal for the 2019 global assessment.

The general calculation is: tr = Ep * Sr * Tw and Xtr = tr/90th quantile across regions

  • Ep = Proportion of workforce directly employed in tourism
  • Sr = (S-1)/5; Sustainability of tourism
  • Tw = A penalty applied to regions with travel warnings from the US State Department (or Canada’s Government Travel Advise and Advisory)

1.1 The following data are used:

  • Tourism sustainability: Travel and Tourism Competitiveness Index (TTCI) from World Economic Forum (WEF) (NOT updated for 2019)
  • Proportion of workforce directly employed in tourism: World Travel & Tourism Council (WTTC)
  • Travel warnings: (U.S. State Department and Canadian Government)
  • Per capita GDP: (World Bank with gaps filled using CIA data), used to gapfill missing values in Tourism sustainability (in previous years)

2 Updates from previous assessment

2.1 Tourism employment

WTTC data includes projections 10 years in the future, and in 2018 it was unclear when these projections began, so they used 2017 as their maximum data year. When downloading data for the 2019 assessment it was clear from the WTTC data gateway where the real data ended and projections began, so 2019 was used as the maximum data year.

We also updated the code to account for uninhabited/low population areas, as we did with the Artisanal Fishing Opportunities data prep script.

2.2 Travel warnings

Previously, US state department data only were used to identify travel warnings for each country. In 2019 we incorporated advisories from the Canadian government to fill in data for countries the US did not establish warnings for (such as the US itself). The Canadian advisories were matched to the numeric scale used by the US State Department since 2018 (1-4, ranges from level 1 (normal precautions) to level 4 (do not travel)).

Data on the US State Dept website span 2018 and 2019, but all of these will be considered advisory data for 2019, regardless of when they were created.

In 2019 we also gapfilled travel warnings for missing territorial regions, using administrative country data. In 2018 a multiplier of 1 was applied to each region with no data. In 2019 all of these regions also received a multiplier of 1 based on the admin country advisories.

We were able to update the following data:

  • Proportion of jobs in tourism - WTTC data reported until 2029, but 2019 is most recent year of real data (year_max) (downloaded from WTTC on 07/08/2019)
  • Travel warnings for 2019 (downloaded from U.S State Department and Canadian Government on 07/02/2019)

Tourism sustainability data from the WEC Travel and Tourism Competitiveness Report were not updated, as the 2019 report has not been released as of 15 July 2019.

3 Ep: Proportion of workforce directly employed in tourism

These data are from the World Travel & Tourism Council. We use “direct” employment data (see mazu: git-annex/globalprep/_raw_data/WTTC/d2019/README.md for instructions on obtaining data). The data extend to 2029, which includes 10 years of projections. The actual data goes to 2019 (projected/real data are differentiated on the data gateway chart).

These data are cleaned and formatted using the R/process_WTTC.R script. Missing values are gapfilled using the UN georegion information.

3.1 Data check and outlier investigation

4 Tw: Travel warnings

Primary source of information is from the U.S. State Department, secondary source is the Canadian Government

For future assessments It would be worthwhile to see if data can be “scraped” directly from the government websites into R. This seems possible given the new format of the state department travel warning data.

4.0.1 Getting data for 2019 assessment

The following code is used transform the warnings into a multiplier that is used to calculate tourism and recreation scores. Data from each country are copied from the US and Canada government travel websites, pasted into an excel file, and saved as a .csv in the raw folder (tr_travelwarning_20??_raw.csv)

Date downloaded: 1 July 2019

Date range of warnings: 18 June 2018 - 1 July 2019 (note: regardless of date of the warning, the advisory year will be the assessment year)

4.1 After raw data are uploaded, wrangle and clean the new data:

##Reading and wrangling 2019 warning data

warn_raw <- read.csv(here('globalprep/tr/v2019/raw/tr_travelwarning_2019_raw.csv'), na.strings = " ") %>% 
  mutate(country = as.character(country)) 

# Remove text information from level and filter out regional warnings
warn_clean <- warn_raw %>% 
  mutate(level = as.numeric(str_extract(level, '[1,2,3,4]'))) %>% 
  filter(!(regional %in% 1)) %>% # remove regions that have regional warnings, as those are no longer considered in the assessment
  select(assess_year, level, country) %>% 
  rename(year = assess_year)


## Correct regions that are reported together - check to make sure these are necessary and that everything is covered as data change from year to year. Also make sure level data is coming through for each. 


french_indies <- data.frame(country="French West Indies", 
                            country_new =c("Northern Saint-Martin")) %>%
  left_join(filter(warn_clean, country=="French West Indies")) %>%
  select(country=country_new, year, level)

BES <- data.frame(country="Bonaire, Sint Eustatius and Saba", 
                            country_new =c("Saba", "Sint Eustatius")) %>% # Bonaire already reported separately 
  left_join(filter(warn_clean, country=="Bonaire, Sint Eustatius and Saba")) %>%
    select(country=country_new, year, level)


line <- data.frame(country="Line Islands (Kiribati)", 
                            country_new =c("Line Group", "Phoenix Group")) %>%
  left_join(filter(warn_clean, country=="Line Islands (Kiribati)")) %>%
    select(country=country_new, year, level)
# These are not the region names reported in OHI, but they are used in the name_2_rgn function
  
warn_improved <- filter(warn_clean, country != "French West Indies") %>%
  bind_rows(french_indies) 

warn_improved <- filter(warn_improved, country != "Bonaire, Sint Eustatius and Saba") %>%
  bind_rows(BES)

warn_improved <- filter(warn_improved, country != "Line Islands (Kiribati)") %>%
  bind_rows(line)



##Correct names for regions not identified by the name_2_region, based off of error messages after running function below
# Change names to match those reported by OHI (not always necessary)
warn_improved <- warn_improved %>%
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"Israel"), "Israel", country)) %>%
  dplyr::mutate(country = ifelse(stringr::str_detect(country, "^Republic of the Congo"), "Republique du Congo", country)) %>% 
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"U.S. Virgin Islands"), "Puerto Rico and Virgin Islands of the United States", country)) %>% # creates duplicate of PR/VI with same warning level
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"Puerto Rico"), "Puerto Rico and Virgin Islands of the United States", country)) %>%
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"Guadeloupe"), "Guadeloupe and Martinique", country)) %>%
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"Saint Vincent and The Grenadines"), "Saint Vincent and the Grenadines", country)) %>% 
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"Burma"), "Myanmar", country)) %>%
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"North Korea"), "North Korea", country)) %>% 
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"Solomon Island"), "Solomon Islands", country)) %>% 
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"Guam"), "Northern Mariana Islands and Guam", country)) %>% 
  dplyr::mutate(country = ifelse(stringr::str_detect(country,"Northern Mariana Islands"), "Northern Mariana Islands and Guam", country))
  

# Look at warn_improved and remove other regions that are duplicated or aren't reported in the OHI. Make sure there are no NAs. 
get_dupes(warn_improved)

warn_improved <- warn_improved %>% 
  filter(!duplicated(country))

write_csv(warn_improved, here("globalprep/tr/v2019/intermediate/warning_2019.csv"))

4.2 Transform the warnings into a multiplier that is used to calculate tourism and recreation scores.

Travel warning Multiplier Description
Level 1 1 (no penalty) Exercise Normal Precautions: This is the lowest advisory level for safety and security risk. There is some risk in any international travel.
Level 2 1 (no penalty) Exercise Increased Caution: Be aware of heightened risks to safety and security.
Level 3 0.25 Reconsider Travel: Avoid travel due to serious risks to safety and security.
Level 4 0 (full penalty, results in zero scores) Do Not Travel: This is the highest advisory level due to greater likelihood of life-threatening risks.

4.3 Convert names to OHI regions and clean.

4.4 Final step: Compare with previous year’s data

Many European regions now have a travel warning due to increased terrorism (e.g., United Kingdom, Italy, Spain, Germany), although this doesn’t show up in the following figure because previously, these regions had no travel warning (and were thus, NA).

The change in not penalizing subregional warnings tended to reduce the penalty (i.e. increase the multiplier value).

4.5 Gapfill territorial regions with admin country data and save the travel warning data in the output folder

georegions <- georegion_labels %>%
  select(rgn_id)
  
warn_rgn_spread <- warn_rgn %>%
  spread(year, multiplier) %>%
  full_join(georegions, by=c("rgn_id")) %>%
  data.frame() %>%
  gather(year, multiplier, starts_with("X")) %>%
  mutate(year = gsub("X", "", year)) %>%
  filter(rgn_id <= 250) %>%
  filter(rgn_id != 213) # Filter out Antarctica
  
# Check number of regions reported - should be 220 
table(warn_rgn_spread$year) 


# Identify territories without advisories and connect them with multipliers for admin regions 
region_data() # reload common.R if this isn't working
warn_rgn_nas <- warn_rgn_spread %>% 
  filter(is.na(rgn_name)) %>% 
  select(rgn_id) %>% 
  left_join(rgns_eez, by = "rgn_id") %>% 
  select(rgn_id, rgn_name, admin_rgn_id, admin_country_name)

admin_rgn_multipliers <- warn_rgn_nas %>% 
  select(rgn_id = admin_rgn_id) %>% 
  left_join(warn_rgn_spread, by = "rgn_id") %>% 
  rename(admin_rgn_id = rgn_id) %>% 
  select(admin_rgn_id, year, multiplier) %>%
  filter(!duplicated(admin_rgn_id))

warn_rgn_nas <- warn_rgn_nas %>% 
  left_join(admin_rgn_multipliers, by = "admin_rgn_id") %>% 
  select(rgn_id, rgn_name, year, multiplier)


### Finalize warnings data and save
# Remove NAs from warn_rgn_spread, then add them back in

warn_rgn_spread <- warn_rgn_spread %>%
  filter(!is.na(rgn_name)) %>% 
  bind_rows(warn_rgn_nas)
  
warn_rgn_all_rgns <- warn_rgn_spread %>%
  select(rgn_id, year, multiplier) %>%
  arrange(year, rgn_id) %>% 
  mutate(year = as.numeric(year))

# Check again that we have 220 regions reported
table(warn_rgn_all_rgns$year) 


# Save 2019 data
write_csv(warn_rgn_all_rgns, here('globalprep/tr/v2019/output/tr_travelwarnings_2019only.csv'))


## Create gapfill file
# Add information about gapfill method to regions that were filled based on administrative country advisory above

travelwarning_gf <- read_csv(here("globalprep/tr/v2019/output/tr_travelwarnings.csv")) %>% 
  mutate(gapfilled = ifelse(rgn_id %in% warn_rgn_nas$rgn_id, 1, 0)) %>% 
  mutate(method = ifelse(gapfilled == 1, "Gapfilled based on administrative country advisory", NA)) %>% 
  select(-multiplier)

write_csv(travelwarning_gf, here('globalprep/tr/v2019/output/tr_travelwarnings_gf.csv'))


## Combine with previous years' data and save

travel_warnings_all <- read_csv(here("globalprep/tr/v2018/output/tr_travelwarnings.csv")) %>%
  bind_rows(warn_rgn_all_rgns)

write_csv(travel_warnings_all, here('globalprep/tr/v2019/output/tr_travelwarnings.csv'))