ohi logo
OHI Science | Citation policy

REFERENCE RMD FILE

1 Summary

This script downloads WGI data and prepares it for a pressures (1 - WGI) and resilience data layer.

2 Updates from previous assessment

No methods updates; additional year of data added.

Consider this improvement for future assessments: create a linear model to estimate missing data rather than just taking averages of years with data (~ line 230).


3 Data Source

Reference: http://info.worldbank.org/governance/wgi/index.aspx#home

Downloaded: March 11 2019 (data updated Sep 21 2018)

Description:
The Worldwide Governance Indicators (WGI) project reports aggregate and individual governance indicators for 215 economies over the period 1996–2017, for six dimensions of governance:

  • Voice and Accountability
  • Political Stability and Absence of Violence
  • Government Effectiveness
  • Regulatory Quality
  • Rule of Law
  • Control of Corruption

Time range: 1996-2017


4 Obtain the WGI data

Download each of the 6 WGI indicators:

Combine the indicators into a single table, with a column for each indicator, and rows for each country-year pair.

4.1 Save a record of any new raw data for archival purposes

Uncomment the code chunk lines when updating WGI data, this will most likely occur when calculating for new assessment year:

5 Gapfill, part 1: filling missing years of data for indicators, within countries

The first gapfilling occurs when we use the average of previous years data within each region/indicator. This occurs when a region has data for an indicator, but not for all years.

Read in WGI data - change appended date in file name to reflect the most recent version of the saved WGI data:

5.1 Safeguard: cut regions with < 4 indicators (if any) to calculate score.

Once gapfilling is complete, the WGI scores are calculated as an average of the 6 indicators. However, if a country is missing 4 or more of the indicators within a year the average would be very biased. In these cases, a different method should be used to gapfill these data

(NOTE: for the 2019 assessment all regions had at least 3 of the 6 indicators).

6 Calculate overall WGI score for each country

This involves:

  • taking the average of the 6 indicators (assuming there are at least 4 of the 6 indicators)
  • rescaling the data from 0 to 1

7 Convert country names to ohi regions

## We report these regions at a greater spatial resolution:

## Aruba is part of the Netherlands Antilles, but it is reported separately
country_split_1 <- data.frame(country = "Netherlands Antilles", region = c('Bonaire', 'Curacao', 'Saba', 'Sint Maarten', 'Sint Eustatius'))
country_split_2 <- data.frame(country = "Jersey, Channel Islands", region = c('Jersey', 'Guernsey'))
country_split <- rbind(country_split_1, country_split_2)

country_split_data <- country_split %>%
  left_join(d_calcs) %>%
  select(-country) %>%
  rename(country = region)

d_calcs <- d_calcs %>%
  filter(!(country %in% c("Netherlands Antilles", "Jersey, Channel Islands"))) %>%
  rbind(country_split_data)  %>%
  mutate(country = as.character(country))

d_calcs$country[grep("Korea, Dem.", d_calcs$country)] <- "North Korea"
# Maybe in future update package with country synonym list


## Function to convert to OHI region ID
d_calcs_rgn <- name_2_rgn(df_in = d_calcs, 
                       fld_name='country', 
                       flds_unique=c('year'))
# Eswatini is a landlocked country (aka Swaziland) in Southern Africa

## Combine the duplicate regions (we report these at lower resolution)
## In this case, we take the weighted average
population_weights <- data.frame(country = c("Virgin Islands (U.S.)", "Puerto Rico",
                                             "China", "Hong Kong SAR, China", "Macao SAR, China"),
                                 population = c(107270, 3337180,
                                         1386395000, 7391700, 622570))
# updated population values on 1 Apr 2019 (source: World Bank website, 2017 values)


d_calcs_rgn <- d_calcs_rgn %>%
  left_join(population_weights, by="country") %>%
  mutate(population = ifelse(is.na(population), 1, population)) %>% 
  group_by(rgn_id, year) %>%
  summarize(score = weighted.mean(score, population),
            gapfill_within_rgn = weighted.mean(gap_fill, population)) %>%
  ungroup() %>%
  filter(rgn_id <= 250)

summary(d_calcs_rgn)

8 Gapfill, part 2: Filling in missing territorial region value

Assigning territorial region value to be the mean of parent country value and territorial regions with data (using same sov_id).

Define new data object from d_sovs which includes gapfill method and gapfilled scores:

Add region names and clean the region data, and make sure we have all the regions:

8.1 Look at data table for the territories (gapfilled)

9 Check data

Comparing this year’s values against last year’s. These should be the same unless there have been updates to WGI source data or a change to methods. For this year, there was a small change that effected a few territorial regions. In the past, we used the sovereign country value, but in the case, we averaged the sovereign country and the available territorial values.

Plot most recent shared year between last year and this years data, and look for a relationship close to a 1:1 relationship. If data are significantly off the line, look at the original (raw) data to investigate.

Look at top/bottom 10 regions to make sure these seem reasonable:

Look at a summary to confirm scores are between 0 and 1, there are 220 regions, and there are no NAs (for this particular dataset):

10 Save the data

Save gapfilling and data for this assessment year.

Checking on outliers in 2019 assessment: