ohi logo
OHI Science | Citation policy

REFERENCE RMD FILE

1 Summary

This document describes the steps for obtaining and wrangling the data used to calculate the mariculture population pressure subdimension for the 2019 global assessment. The general data preparation calculations are summarized here. For context and explanation see the mariculture (subgoal of food provision) model summary.

2 Updates from previous assessment

New 25 mile inland buffer raster was created; previous version did not include some regions. Hosting all spatial files on Mazu server instead of github to decrease processing time.


3 Data Sources

3.1 Gridded Population of the World (v4.11) by CIESE and CIAT

Reference: http://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-adjusted-to-2015-unwpp-country-totals-rev11/data-download

Center for International Earth Science Information Network - CIESIN - Columbia University. 2018. Gridded Population of the World, Version 4 (GPWv4): Population Density Adjusted to Match 2015 Revision UN WPP Country Totals, Revision 11. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/H4F47M65. Accessed DAY MONTH YEAR.

Downloaded: April 17 2019

Description: The Gridded Population of the World, Version 4 (GPWv4): Population Density Adjusted to Match 2015 Revision of UN WPP Country Totals, Revision 11 consists of estimates of human population density (number of persons per square kilometer) based on counts consistent with national censuses and population registers with respect to relative spatial distribution, but adjusted to match the 2015 Revision of the United Nation’s World Population Prospects (UN WPP) country totals, for the years 2000, 2005, 2011, 2015, and 2020. A proportional allocation gridding algorithm, utilizing approximately 13.5 million national and sub-national administrative units, was used to assign UN WPP-adjusted population counts to 30 arc-second grid cells. The density rasters were created by dividing the UN WPP-adjusted population count raster for a given target year by the land area raster. Documentation for gridded population of the world is located here.

Native data resolution: 30 arc-seconds

Time range: 2000-2020

Format: GeoTiff

3.2 UN Population

UN population data is used in this data prep routine to check population counts; spatially distributed counts derived from the gridded world population are aggregated by region and checked against the UN population estimates.

Reference: https://esa.un.org/unpd/wpp/

Downloaded: August 24, 2017

Description: Population (in thousands) for countries.

Native data resolution: Country scores

Time range: 1950-2015

Format: Excel file


4 Setup

Load all relevant libraries including parallel processing packages

Define frequently used pathnames. Change scenario and data years in file pathnames code chunk to reflect the most recent data (d) and current assessment year (v).

5 Methods and Calculations

5.1 Reproject and Resample

Gridded world population densities from CIESE and CIAT are projected into the World Mollweide coordinate reference system and resampled using nearest neighbor method to match the ocean raster which is sourced from spatial_commons.R and has 934.4789 x 934.4789 meter resolution.

## Resample resolution and CRS of population data, convert year to numeric, project rasters and save human_density_year_mol.tifs to server with these updates
# Use temporary file path name (_tmp.tif) so that R isn't trying to load and write the same file within the loop - MAKE SURE TO DELETE THESE FILES AFTER THE LOOP IS COMPLETE! 
# set up a loop to go through raw data frame, convert to a mollweide CRS:

cl <- makeCluster(5) # 5 clusters for 5 files being targeted
registerDoParallel(cl)
foreach(i = raw, .packages = c("raster", "rgdal", "stringr")) %dopar% {
#  i = raw[1] # send this down alone and check to see which file it calls; see if there are any errors in file naming
  yr <- as.numeric(as.character(str_sub(i, -17, -14))) # counts backwards by character in file name to grab the year digits
  raster::raster(i) %>% # read in file path to raster
    raster::projectRaster(crs = crs(ocean), res = res(ocean), over = TRUE, method = "ngb", # convert large grids to small grids using nearest neighbor method
                          filename = file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/human_density_%s_mol_tmp.tif", scenario_yr, yr)), # write to the server because the files are so large
                          overwrite = TRUE, progress = "text")
  
raster::raster(file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/human_density_%s_mol_tmp.tif", scenario_yr, yr))) %>% 
  raster::resample(ocean, method = 'ngb', overwrite = TRUE, filename = file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/human_density_%s_mol.tif", scenario_yr, yr)))
} 

# read in the created resampled raster and make sure it matches the eez_25mi_inland raster (projection and extents)


stopCluster(cl)
closeAllConnections()

# this takes a while to run: ~30-45 min; will generate some warnings 


# for(rast in raw){
#   yr <- as.numeric(as.character(str_sub(rast, -8, -5)))
#   raster(rast) %>%
#     projectRaster(crs = crs(ocean), over=TRUE) %>%
#     resample(ocean, method = 'ngb',
#              filename = file.path(dir_github, sprintf("int/human_density_%s_mol.tif", yr)),
#              overwrite = TRUE)
# }

5.2 Interpolate between Years

GWPv4 data is for years 2005, 2010, 2015, and 2020. Data for missing years must be generated by interpolation.

## define and apply function to calculate yearly change in population density
files <- list.files(file.path(dir_M, "git-annex/globalprep/mar_prs_population/v2019/int"), pattern = "human_density_\\d{4}_mol.tif$", full = TRUE)
# Can I replace v2019 with a %s so this is reproducible for next year?

# Create function for average yearly change
yearly_diff <- function(year_min, year_max, density_files = files){
  rast_diff <- stack(density_files[grep(year_max, density_files)], density_files[grep(year_min, density_files)]) %>%
    overlay(fun = function(x, y){(x - y)/5},
            filename = file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/yearly_change_%s_to_%s.tif", scenario_yr, year_min, year_max)),
            overwrite = TRUE)
}


# Create four rasters for each set of data
yearly_diff(year_min = 2000, year_max = 2005) # this should save 2001, 2002, 2003, etc to estimate intervening years using a linear model - check to make sure it's being saved in the Mazu directory properly
yearly_diff(year_min = 2005, year_max = 2010)
yearly_diff(year_min = 2010, year_max = 2015)
yearly_diff(year_min = 2015, year_max = 2020)


## define and apply function to interpolate population densities for years between years in dataset
# Apply the difference to each year, overwrite as new rasters
# Can double check this by running just one (i = 1)
yearly_interpolate <- function(year, start_raster, yearly_change){
    for(i in 1:4){
      # i = 1 send this down to see what the loop produces when i=1 
      raster::overlay(raster::raster(start_raster), raster::raster(yearly_change), fun = function(x, y){(x + i*y)},
              filename = file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/human_density_%s_mol.tif", scenario_yr, (year+i))),
              overwrite = TRUE)
    }
}


## interpolate missing years
files <- list.files(file.path(dir_M, "git-annex/globalprep/mar_prs_population/v2019/int"), pattern = ".tif$", full = TRUE)
# Can I replace v2019 with a %s so this is reproducible next year?

for(i in c(2000, 2005, 2010, 2015)) {
  yearly_interpolate(i, start_raster = files[grep(sprintf("density_%s_mol",i), files)],
                     yearly_change = files[grep(sprintf("change_%s_to_%s", i, i+5), files)])
}

# if getting a "Cannot create a RasterLayer object from this file" error after running this loop, check that previously-created _tmp.tif files have been deleted from the mazu directory
# check a random raster from the mazu directory
density_2018 <- raster(file.path(dir_M, "git-annex/globalprep/mar_prs_population/v2019/int/human_density_2018_mol.tif"))
plot(density_2018)

5.3 Convert Density to Population Counts

Density data are converted to population by multiplying density times the area of the cell (934.4789 x 934.4789m) and converting to square km from square meters, to get people per square km. GWPv4 provides both densities and population counts. However, we derive population counts from density because the density data is independent of raster scale, so we can reproject/resample and it will still be correct. Change the scale using count data, and it is necessaary to correct by dividing by cell area otherwise the values become meaningless; it is easier to just start with density.

5.9 Data Checks and/or Meta-Analysis