This document describes the steps for obtaining and wrangling the data used to calculate the mariculture population pressure subdimension for the 2019 global assessment. The general data preparation calculations are summarized here. For context and explanation see the mariculture (subgoal of food provision) model summary.
New 25 mile inland buffer raster was created; previous version did not include some regions. Hosting all spatial files on Mazu server instead of github to decrease processing time.
Center for International Earth Science Information Network - CIESIN - Columbia University. 2018. Gridded Population of the World, Version 4 (GPWv4): Population Density Adjusted to Match 2015 Revision UN WPP Country Totals, Revision 11. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). https://doi.org/10.7927/H4F47M65. Accessed DAY MONTH YEAR.
Downloaded: April 17 2019
Description: The Gridded Population of the World, Version 4 (GPWv4): Population Density Adjusted to Match 2015 Revision of UN WPP Country Totals, Revision 11 consists of estimates of human population density (number of persons per square kilometer) based on counts consistent with national censuses and population registers with respect to relative spatial distribution, but adjusted to match the 2015 Revision of the United Nation’s World Population Prospects (UN WPP) country totals, for the years 2000, 2005, 2011, 2015, and 2020. A proportional allocation gridding algorithm, utilizing approximately 13.5 million national and sub-national administrative units, was used to assign UN WPP-adjusted population counts to 30 arc-second grid cells. The density rasters were created by dividing the UN WPP-adjusted population count raster for a given target year by the land area raster. Documentation for gridded population of the world is located here.
Native data resolution: 30 arc-seconds
Time range: 2000-2020
Format: GeoTiff
UN population data is used in this data prep routine to check population counts; spatially distributed counts derived from the gridded world population are aggregated by region and checked against the UN population estimates.
Reference: https://esa.un.org/unpd/wpp/
Downloaded: August 24, 2017
Description: Population (in thousands) for countries.
Native data resolution: Country scores
Time range: 1950-2015
Format: Excel file
Load all relevant libraries including parallel processing packages
## set options for all chunks in code
knitr::opts_chunk$set(warning = FALSE, message = FALSE, fig.width = 6, fig.height = 4, fig.path = "figs/")
## load packages, installing them first where necessary
pkg <- c("raster", "rgdal", "sp", "sf", "fasterize", "tidyverse", "foreach", "parallel","doParallel")
new_pkg <- pkg[!(pkg %in% installed.packages())]
if (length(new_pkg)){install.packages(new_pkg)}
if (!("ohicore" %in% installed.packages())){devtools::install_github("ohi-science/ohicore")}
library(ohicore)
## spatial libraries
library(raster)
library(rgdal)
library(sp)
library(sf)
library(fasterize)
## data wrangling libraries
library(tidyverse)
library(here)
library(plotly)
## parallel processing libraries
library(parallel)
library(foreach)
library(doParallel)
Define frequently used pathnames. Change scenario and data years in file pathnames code chunk to reflect the most recent data (d) and current assessment year (v).
## source common and spatial common files
source('http://ohi-science.org/ohiprep_v2019/workflow/R/common.R')
## Update these!
scenario_yr <- "v2019" # change to reflect assessment year!
data_yr_gpw <- "d2019" # change to reflect year of most recently downloaded data! (no change for v2019)
data_yr_un_pop <- "d2017" # change to reflect year of most recently downloaded data! (no change for v2019)
## define commonly used file paths (matched to what we did in ico_data_prep.Rmd)
dir_server <- file.path(dir_M, "git-annex/globalprep/_raw_data")
dir_github <- here("/globalprep/mar_prs_population", scenario_yr)
# v2018: (replaced these in the script)
# path_raw_data <- file.path(dir_M, "git-annex/globalprep/_raw_data")
# path_mar_prs <- file.path(dir_M, "git-annex/globalprep/mar_prs_population", scenario_yr)
### Added this for v2019:
## Checking to see if there is a README in the mar_prs_pop and mar_prs_pop/v20?? folders - all goal prep files will need a readme!
if(!file.exists(file.path(dir_github, 'README.md'))) {
warning(sprintf('No README detected in %s', dir_github))
}
if(!file.exists(file.path(dirname(dir_github), 'README.md'))) {
warning(sprintf('No README detected in %s', file.path(dirname(dir_github))))
}
## read in the raw density data to be reprojected and resampled
raw <- list.files(file.path(dir_server, sprintf("CIESEandCIAT_population/%s", data_yr_gpw)),
full.names = TRUE, pattern = "\\.tif$",
recursive = TRUE)
raw <- raw[grep("density", raw)] # keep only the files that include the word "density"
raw # check that this looks correct; can double check with the folder on Mazu server folder
ohi_rasters() # loads rasters
# eez_raster <- zones # sourced from ohi_rasters()
## import raw, cleaned UN population data to be wrangled then used to confirm gpw-derived spatial population
pop <- read.csv(file.path(dir_server,
sprintf("UnitedNations_population/%s/UN_pop_clean.csv", data_yr_un_pop)),
strip.white = TRUE, stringsAsFactors = FALSE)
Gridded world population densities from CIESE and CIAT are projected into the World Mollweide coordinate reference system and resampled using nearest neighbor method to match the ocean raster which is sourced from spatial_commons.R and has 934.4789 x 934.4789 meter resolution.
## Resample resolution and CRS of population data, convert year to numeric, project rasters and save human_density_year_mol.tifs to server with these updates
# Use temporary file path name (_tmp.tif) so that R isn't trying to load and write the same file within the loop - MAKE SURE TO DELETE THESE FILES AFTER THE LOOP IS COMPLETE!
# set up a loop to go through raw data frame, convert to a mollweide CRS:
cl <- makeCluster(5) # 5 clusters for 5 files being targeted
registerDoParallel(cl)
foreach(i = raw, .packages = c("raster", "rgdal", "stringr")) %dopar% {
# i = raw[1] # send this down alone and check to see which file it calls; see if there are any errors in file naming
yr <- as.numeric(as.character(str_sub(i, -17, -14))) # counts backwards by character in file name to grab the year digits
raster::raster(i) %>% # read in file path to raster
raster::projectRaster(crs = crs(ocean), res = res(ocean), over = TRUE, method = "ngb", # convert large grids to small grids using nearest neighbor method
filename = file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/human_density_%s_mol_tmp.tif", scenario_yr, yr)), # write to the server because the files are so large
overwrite = TRUE, progress = "text")
raster::raster(file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/human_density_%s_mol_tmp.tif", scenario_yr, yr))) %>%
raster::resample(ocean, method = 'ngb', overwrite = TRUE, filename = file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/human_density_%s_mol.tif", scenario_yr, yr)))
}
# read in the created resampled raster and make sure it matches the eez_25mi_inland raster (projection and extents)
stopCluster(cl)
closeAllConnections()
# this takes a while to run: ~30-45 min; will generate some warnings
# for(rast in raw){
# yr <- as.numeric(as.character(str_sub(rast, -8, -5)))
# raster(rast) %>%
# projectRaster(crs = crs(ocean), over=TRUE) %>%
# resample(ocean, method = 'ngb',
# filename = file.path(dir_github, sprintf("int/human_density_%s_mol.tif", yr)),
# overwrite = TRUE)
# }
GWPv4 data is for years 2005, 2010, 2015, and 2020. Data for missing years must be generated by interpolation.
## define and apply function to calculate yearly change in population density
files <- list.files(file.path(dir_M, "git-annex/globalprep/mar_prs_population/v2019/int"), pattern = "human_density_\\d{4}_mol.tif$", full = TRUE)
# Can I replace v2019 with a %s so this is reproducible for next year?
# Create function for average yearly change
yearly_diff <- function(year_min, year_max, density_files = files){
rast_diff <- stack(density_files[grep(year_max, density_files)], density_files[grep(year_min, density_files)]) %>%
overlay(fun = function(x, y){(x - y)/5},
filename = file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/yearly_change_%s_to_%s.tif", scenario_yr, year_min, year_max)),
overwrite = TRUE)
}
# Create four rasters for each set of data
yearly_diff(year_min = 2000, year_max = 2005) # this should save 2001, 2002, 2003, etc to estimate intervening years using a linear model - check to make sure it's being saved in the Mazu directory properly
yearly_diff(year_min = 2005, year_max = 2010)
yearly_diff(year_min = 2010, year_max = 2015)
yearly_diff(year_min = 2015, year_max = 2020)
## define and apply function to interpolate population densities for years between years in dataset
# Apply the difference to each year, overwrite as new rasters
# Can double check this by running just one (i = 1)
yearly_interpolate <- function(year, start_raster, yearly_change){
for(i in 1:4){
# i = 1 send this down to see what the loop produces when i=1
raster::overlay(raster::raster(start_raster), raster::raster(yearly_change), fun = function(x, y){(x + i*y)},
filename = file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/human_density_%s_mol.tif", scenario_yr, (year+i))),
overwrite = TRUE)
}
}
## interpolate missing years
files <- list.files(file.path(dir_M, "git-annex/globalprep/mar_prs_population/v2019/int"), pattern = ".tif$", full = TRUE)
# Can I replace v2019 with a %s so this is reproducible next year?
for(i in c(2000, 2005, 2010, 2015)) {
yearly_interpolate(i, start_raster = files[grep(sprintf("density_%s_mol",i), files)],
yearly_change = files[grep(sprintf("change_%s_to_%s", i, i+5), files)])
}
# if getting a "Cannot create a RasterLayer object from this file" error after running this loop, check that previously-created _tmp.tif files have been deleted from the mazu directory
# check a random raster from the mazu directory
density_2018 <- raster(file.path(dir_M, "git-annex/globalprep/mar_prs_population/v2019/int/human_density_2018_mol.tif"))
plot(density_2018)
Density data are converted to population by multiplying density times the area of the cell (934.4789 x 934.4789m) and converting to square km from square meters, to get people per square km. GWPv4 provides both densities and population counts. However, we derive population counts from density because the density data is independent of raster scale, so we can reproject/resample and it will still be correct. Change the scale using count data, and it is necessaary to correct by dividing by cell area otherwise the values become meaningless; it is easier to just start with density.
den_raster <- list.files(file.path(dir_M, "git-annex/globalprep/mar_prs_population/v2019/int"), pattern = "density_\\d{4}_mol.tif$", full = TRUE)
den_raster # check to make sure this looks correct; should only have human_density tif files for each interpolated year
cl <- makeCluster(5)
registerDoParallel(cl)
foreach(i = den_raster, .packages = c("raster", "rgdal", "stringr")) %dopar% {
# i = den_raster[1] - does i generate the 2000 raster?
nm <- basename(i)
yr <- stringr::str_sub(nm, -12, -9) # pulls out the year from the file name
cat(nm)
tmp <- raster::raster(i) # reads in raster
raster::calc(tmp, fun = function(x){x * (934.4789 * 934.4789 * 0.000001)}, # convert to m2 from km2
filename = file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/v2019/int/human_count_%s_mol.tif", yr)),
overwrite=TRUE) # writes a human count raster to the intermediate folder
}
stopCluster(cl)
closeAllConnections()
# No longer using shape files, raster only
# raster
eez_25mi_inland <- raster(file.path(dir_M, "git-annex/globalprep/spatial/v2019/ocean_plus25mile_inland.tif"))
## Make sure that the layers make sense!
plot(eez_25mi_inland) # will appear to only feature the Antarctic ocean; can check that gray area actually has data. Because we don't include this region in the OHI, the raster has CCAMLR IDs for points in this region, which appear to have high region ID values.
click(eez_25mi_inland)
# Click on gray areas on the map and region IDs will pop up - the data are there!
# Want to make sure we're including 10km inland - zoom in on smaller region using zoom()
zoom(eez_25mi_inland)
# Once you select a small area to zoom in on, you can layer the outline of the land and EEZ boundaries to make sure the EEZ goes 10km inland:
regions_shape()
plot(regions[1], color = NA, add = TRUE)
Stack rasters of population counts for each year, sum within regions (using eez_25mi_inland) for each year, and save this as coastal_pop_zonal_sums.csv
in the intermediate folder. Gather this into long format dataframe, and save as output/mar_pop_25mi.csv
.
pop_rasters <- list.files(file.path(dir_M, "git-annex/globalprep/mar_prs_population/v2019/int"), pattern = "count_\\d{4}_mol.tif$", full = TRUE) # should list all the human_count files in folder
pop_stack <- raster::stack(pop_rasters) # stack human_count rasters 2000-2020
coastal_pop <- raster::zonal(pop_stack, eez_25mi_inland, fun="sum", progress="text") # sum within zones (regions as specified by eez_25mi_inland)
coastal_pop <- as.data.frame(coastal_pop)
write_csv(coastal_pop, file.path(here("globalprep/mar_prs_population/v2019/int/coastal_pop_zonal_sums.csv"))) # do we need to save this as intermediate data?
# Organize data to have a population sum for each region and each year
coastal_pop2 <- data.frame(coastal_pop) %>%
tidyr::gather("year", "coastal_pop_25mi", -1) %>%
dplyr::select(rgn_id = zone, year = year, popsum = coastal_pop_25mi) %>%
dplyr::mutate(year = as.numeric(substring(year, 13, 16))) %>% # extract numeric portion of year_####
dplyr::filter(rgn_id <= 250)
# v2018: write_csv(coastal_pop2, paste0(scenario_yr, "/output/mar_pop_25mi.csv"), row.names = FALSE)
write_csv(coastal_pop2, file.path(here("globalprep/mar_prs_population/v2019/output/mar_pop_25mi.csv")))
pop_rescaled <- coastal_pop2 %>%
left_join(area, by="rgn_id") %>%
mutate(density = popsum/area_km2) %>%
mutate(ln_density = log(density + 1)) %>%
mutate(scalar = max(ln_density, na.rm = TRUE)) %>%
mutate(dens_rescaled = ln_density/scalar) %>%
mutate(dens_rescaled = ifelse(dens_rescaled > 1, 1, dens_rescaled))
filter(pop_rescaled, is.na(area_km2)) # no NA values for the area column (Antarctica not included)
pressure_data <- pop_rescaled %>%
dplyr::select(rgn_id, year, pressure_score = dens_rescaled)
write_csv(pressure_data, file.path(here("globalprep/mar_prs_population/v2019/output/prs_pop_density.csv")))
#v2018: write.csv(pressure_data, sprintf("%s/output/prs_pop_density.csv", scenario_yr), row.names = FALSE)
No gapfilling was completed as part of this data prep methodology. Datasets are saved with "_gf" appended just to indicate they are finalized versions.
# v2018 pulled from 2017 data, changed to 2019 data for v2019
prs <- read.csv("globalprep/mar_prs_population/v2019/output/prs_pop_density.csv") %>%
dplyr::mutate(pressure_score = 0)
write.csv(prs, "globalprep/mar_prs_population/v2019/output/prs_pop_density_gf.csv", row.names = FALSE) # wouldn't this overwrite old data? Do we want to save this in 2019 folder?
mar <- read.csv("globalprep/mar_prs_population/v2019/output/mar_pop_25mi.csv") %>%
dplyr::mutate(popsum = 0)
write.csv(mar, "globalprep/mar_prs_population/v2019/output/mar_pop_25mi_gf.csv", row.names = FALSE)
UN_population.csv
for Data-checkingUN_population
with Calculated Countcompare_yr <- 2015 # doing below functions for comparison between raster and UN popn data for 2015
## sum counts regionally for the scenario year
pop_rast <- raster(file.path(dir_M, sprintf("git-annex/globalprep/mar_prs_population/%s/int/human_count_%s_mol.tif", scenario_yr, compare_yr)))
zones <- raster(file.path(dir_M, "git-annex/globalprep/spatial/v2017/regions_land_ocean.tif")) # pulling from 2017 server data
pop_counts <- zonal(pop_rast, zones, fun = "sum", na.rm = TRUE, progress="text")
pop_UN <- pop_rgn %>%
dplyr::filter(year == compare_yr) %>%
dplyr::select(rgn_id, pop_UN = population)
## join filtered UN population and summed calculated counts
compare_pop <- data.frame(pop_counts) %>%
dplyr::select(rgn_id = zone, pop_rast = sum) %>%
dplyr::left_join(pop_UN, by="rgn_id") %>%
filter(!is.na(pop_UN))
## check plot - investigate outliers (post to issue first)
pop_compare_plot <- ggplot(compare_pop) +
geom_point(aes(pop_rast, pop_UN, label = rgn_id), alpha = .6) +
geom_abline(slope = 1, intercept = 0, color = 'red') +
labs(x = sprintf('Calculated Population %s', compare_yr),
y = sprintf('UN Population %s', compare_yr),
title = 'Population Comparison')
ggplotly(pop_compare_plot)
plot(log(compare_pop$pop_rast), log(compare_pop$pop_UN))
abline(0, 1, col="red")
prev_scen_yr <- paste0("v", as.numeric(substr(scenario_yr, 2, 5)) -1)
old <- read_csv(here('globalprep/mar_prs_population/v2018/output/mar_pop_25mi.csv')) %>%
dplyr::select(rgn_id, year, popsum_old=popsum)
tmp <- coastal_pop2 %>%
dplyr::left_join(old, by=c("rgn_id", "year"))
years_compare_plot <- ggplot(tmp) +
geom_point(aes(log(popsum), log(popsum_old), label = rgn_id), alpha = .6) +
geom_abline(slope = 1, intercept = 0, color = 'red') +
labs(x = sprintf('Log of Calculated Population %s', scenario_yr),
y = sprintf('Log of Calculated Population %s', prev_scen_yr),
title = 'Population comparison between assessments')
ggplotly(years_compare_plot)
plot(log(tmp$popsum), log(tmp$popsum_old))
abline(0, 1, col="red")
## compare area with that calculated in previous year (should look very similar)
# v2018: old_area <- read.csv(paste0(prev_scen_yr, "/rgn_area_inland25mi.csv")) %>%
old_area <- read_csv(here('globalprep/mar_prs_population/v2018/int/area_km2_25mi.csv')) %>%
rename(old_area_km2 = area_km2) %>%
dplyr::mutate(old_area_km2 = round(old_area_km2)) %>%
dplyr::left_join(area, by = "rgn_id")
area_compare_plot <- ggplot(old_area) +
geom_point(aes(area_km2, old_area_km2, label = rgn_id), alpha = .6) +
geom_abline(slope = 1, intercept = 0, color = 'red') +
labs(x = sprintf('%s Area (km2)', scenario_yr),
y = sprintf('%s Area (km2)', prev_scen_yr),
title = 'Area Calculation Comparison')
ggplotly(area_compare_plot)
summary(old_area)