1 Summary
2 Updates from previous assessment
3 Data Source
- 3.1 Production data
- 3.2 Sustainability data
4 Methods
5 Import Raw Data: FAO Mariculture data
6 Wrangle:
7 Gapfilling
- 7.1 Fill in missing years after first year of harvest data with 0 values
8 Save file:
9 Import data: Trujillo sustainability scores
10 Wrangle
- 10.1 Convert country names to OHI region names.
11 FAO mariculture and sustainability scores
12 Save Data:
13 Wrangle: Genetic Escapes Data
14 Gapfill
15 Potential Aquaculture Calculations for New Reference Point
16 Summary
17 Data Source
- 17.1 Growth Potential (Phi) data
18 Methods

1 Summary

This analysis converts FAO mariculture data into data used to calculate the OHI global mariculture status score. This also calculates the genetic escapee from mariculture pressure data.

2 Updates from previous assessment

New year of FAO mariculture yield data for 2017.
We have updated the way we calculate reference points. This code has been added to the bottom of this script, and will stay static for future years, since the potential mariculture data will likely not be updated in the future.

3 Data Source

3.1 Production data

Reference:
http://www.fao.org/fishery/statistics/software/fishstatj/en#downlApp Release date: March 2019 FAO Global Aquaculture Production Quantity 1950_2016 FAO metadata found here

Downloaded: 5/17/2019

Description: Quantity (tonnes) of mariculture for each country, species, year.

Time range: 1950-2017

3.2 Sustainability data

Reference:
Mariculture Sustainability Index (Trujillo 2008)

Trujillo, Pablo. 2008. “Using a Mariculture Sustainability Index to Rank Countries’ Performance.” In Fisheries Centre Research Reports, edited by Jackie Alder and Daniel Pauly. University of British Columbia, Vancouver, Canada: Fisheries Centre Research Reports. https://circle.ubc.ca/handle/2429/40933.

Original data table is located in the Trujillo 2008 paper saved in the OHI Zotero database. Report starts on page 28 of the link provided in the reference.

Description:

Original MSI rescaled from 1-10 to 0-1 for mariculture taxa groups.

4 Methods

getwd()

## [1] "/home/sgclawson/github/ohiprep_v2019/globalprep/mar/v2019"

knitr::opts_chunk$set(eval=FALSE)
  
## load libraries, set directories
library(ohicore)  #devtools::install_github('ohi-science/ohicore@dev')
library(dplyr)
library(stringr)
library(tidyr)
library(tidyverse)

## Load FAO-specific user-defined functions
source('mar_fxs.R') # functions specific to mariculture dealing with compound countries
source('../../../workflow/R/fao_fxn.R') # function for cleaning FAO files
source('../../../workflow/R/common.R') # directory locations

## This file makes it easier to process data for the OHI global assessment
##  by creating the following objects:
## 
##  * dir_M = identifies correct file path to Mazu (internal server) based on your operating system
##  * mollCRS = the crs code for the mollweide coordinate reference system we use in the global assessment
##  * regions_shape() = function to load global shapefile for land/eez/high seas/antarctica regions
##  * ohi_rasters() = function to load two rasters: global eez regions and ocean region
##  * region_data() = function to load 2 dataframes describing global regions 
##  * rgn_syns() = function to load dataframe of region synonyms (used to convert country names to OHI regions)
##  * low_pop() = function to load dataframe of regions with low and no human population
##  * UNgeorgn = function to load dataframe of UN geopolitical designations used to gapfill missing data

5 Import Raw Data: FAO Mariculture data

Mariculture production in tonnes.

mar <- read.csv(file.path(dir_M, 'git-annex/globalprep/_raw_data/FAO_mariculture/d2019/FAO_GlobalAquacultureProduction_Quantity_1950_2017.csv'), check.names=FALSE, stringsAsFactors=FALSE) ; head(mar)

6 Wrangle:

6.1 Tidy mariculture data

Filter freshwater mariculture, make long format, and clean FAO codes.

mar <- mar %>%
  rename(country = `Country (Country)`,
         FAO_name = `Species (ASFIS species)`, 
         fao = `Aquaculture area (FAO major fishing area)`, 
         environment = `Environment (Environment)`)
table(mar$environment)  

## include only marine environments
mar <- mar %>%
filter(environment %in% c("Brackishwater", "Marine"))  

## convert to long format and clean FAO codes:
## for some reason, I can't provide the data range in gather programatically!
mar <- mar %>%
  select(-Unit) 

mar <- mar %>%
  gather(key="year", value="value", num_range("",1950:2017)) %>%
    fao_clean_data()

6.2 Update species names

Update species name in the raw/species_list.csv file with names in the mar dataset (exclude non-food species). I simplified the species_list. I cut the “species” name columns because it wasn’t clear what this was trying to accomplish and created potential error.

## Commented out 'read.csv' lines are different versions of the dataset, representing the three different seaweed exclusion methods (exclude some seaweed, all seaweed, all seaweed nei). Use at the end for data checking.The first one is the one which is saved to the "output" folder. 
mar_sp <- read.csv('raw/species_list.csv', stringsAsFactors=FALSE) %>%
  select(FAO_name, exclude, alias, Taxon_code)
#mar_sp <- read.csv('raw/species_list.csv', stringsAsFactors=FALSE) %>%
#  select(FAO_name, exclude_no_seaweed, alias, Taxon_code)
#mar_sp <- read.csv('raw/species_list.csv', stringsAsFactors=FALSE) %>%
#  select(FAO_name, exclude_no_nei, alias, Taxon_code)

new.spp <- setdiff(mar$FAO_name, mar_sp$FAO_name)
new.spp # check: if dim has 0 rows it means all match
## if there is a list of species, hand check species_list.csv to see whether to keep (exclude seaweeds and species harvested for ornamental/medicinal), check if synonyms match Trujillo names. 
## This is outlined in issue #8 and issue #81 for reference. You need to add the species output from this to species_list.csv and fill out the corresponding information. I.e. add FAO_name, exlude (whether to include in assessment or not. If they are only harvested for medicinal purposes, do not include), alias, and Taxon_code to species_list.csv.

## REMOVE SPECIES not relevant to mariculuture goal (i.e., non-food species)
mar <- mar %>% left_join(mar_sp, by="FAO_name") 

## Filters out species that should be excluded from the MAR sub-goal, depending on which of the 3 scenarios was read in. Searches for column name "exclude", "exclude_no_seaweed", or "exclude_no_nei"
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
  
 mar <- mar %>% filter(exclude==0)

 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
  
   mar <- mar %>% filter(exclude_no_seaweed==0)

   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
  
     mar <- mar %>% filter(exclude_no_nei==0)

     }
 
## change names using species alias or FAO species name (global changes)
mar$species <- ifelse(!is.na(mar$alias), mar$alias, mar$FAO_name) 

## sum production values for each group to account for duplicate rows after name change (remove NA values)
mar <- mar %>%
  filter(!is.na(value)) %>%
  group_by(country, fao, environment, species, year, Taxon_code) %>% 
    summarize(value = sum(value)) %>% 
  ungroup()


## eliminate country-species data with zero production throughout the time-series (1950-recent)
mar <- mar %>%
  group_by(country, species) %>%
  mutate(total_value = sum(value)) %>%
  filter(total_value > 0) %>%
  select(-total_value) %>%
  ungroup()

6.3 Convert country names to OHI regions

# Divide mariculture from countries that we report as separate regions (assume equal production in all regions)
# Netherlands Antilles: Conch restoration among Aruba, Bonaire, Curacao
# Channel Islands: Jersey and Guernsey
# Bonaire/S.Eustatius/Saba
# Yugoslavia SFR: no longer a country after 1992

mar <- mar %>%
  mutate(country = ifelse(country=="R\xe9union", "Reunion", country)) %>%  # this one is hard to get right
  mar_split()  # function in mar_fxs.R

mar_rgn <- name_2_rgn(df_in = mar, 
                       fld_name='country', 
                       flds_unique=c('species', 'fao', 'environment', 'Taxon_code', 'year')) 


## sum values of regions with multiple subregions
mar_rgn <- mar_rgn %>%
  group_by(fao, environment, species, year, Taxon_code, rgn_id) %>% 
  summarize(value = sum(value)) %>%
  ungroup()

Take a look at the tidied data for a single year and region

data.frame(filter(mar_rgn, rgn_id==130) %>%
  filter(year==2013) %>%
  arrange(species))

For some regions, a specific species can be altered so that it matches more general Trujillo sustainability data. In this case, I don’t want the name changes to be global because some regions may have more specific species data.

(Will explore this in the future, but will not implement this year).

# ## based on looking at the list, make a few name changes to match the regions Trujillo data
# # Chile name modification
# mar_rgn$species[mar_rgn$rgn_id==224 & mar_rgn$species == "Red abalone"] <- "Abalones nei"
# mar_rgn$species[mar_rgn$rgn_id==224 & mar_rgn$species == "Japanese abalone"] <- "Abalones nei"
# 
# # China
# mar_rgn$species[mar_rgn$rgn_id==209 & mar_rgn$species == "Areolate grouper"] <- "Groupers nei"
# mar_rgn$species[mar_rgn$rgn_id==209 & mar_rgn$species == "Greasy grouper"] <- "Groupers nei"
# mar_rgn$species[mar_rgn$rgn_id==209 & mar_rgn$species == "Hong Kong grouper"] <- "Groupers nei"
# mar_rgn$species[mar_rgn$rgn_id==209 & mar_rgn$species == "Orange-spotted grouper"] <- "Groupers nei"
# 
# # Honduras
# mar_rgn$species[mar_rgn$rgn_id==133 & mar_rgn$species == "Whiteleg shrimp"] <- "Penaeus shrimps nei"
# 
# # Italy
# mar_rgn$species[mar_rgn$rgn_id==84 & mar_rgn$species == "Pacific cupped oyster"] <- "Cupped oysters nei"
# 
# # New Zealand
# mar_rgn$species[mar_rgn$rgn_id==162 & mar_rgn$species == "Rainbow abalone"] <- "Abalones nei"
# 
# # Pakistan
# mar_rgn$species[mar_rgn$rgn_id==53 & mar_rgn$species == "Penaeus shrimps nei"] <- "Marine crustaceans nei"
# 
# # Philippines
# mar_rgn$species[mar_rgn$rgn_id==15 & mar_rgn$species == "Whiteleg shrimp"] <- "Penaeus shrimps nei"
# 
# # Portugal
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Golden carpet shell"] <- "Marine molluscs nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Peppery furrow"] <- "Marine molluscs nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Solen razor clams nei"] <- "Razor clams nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Atlantic bluefin tuna"] <- "Marine fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Meagre"] <- "Marine fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Seabasses nei"] <- "Marine fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Soles nei"] <- "Marine fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "White seabream"] <- "Marine fishes nei"
# 
# # Spain
# mar_rgn$species[mar_rgn$rgn_id==182 & mar_rgn$species == "Atlantic bluefin tuna"] <- "Tuna-like fishes nei"
# 
# # Turkey
# mar_rgn$species[mar_rgn$rgn_id==76 & mar_rgn$species == "Atlantic bluefin tuna"] <- "Tuna-like fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==76 & mar_rgn$species == "European seabass"] <- "Seabasses nei"
# 
# ### sum values of regions with multiple subregions
# mar_rgn <- mar_rgn %>%
#   group_by(fao, environment, species, year, Taxon_code, rgn_id) %>%
#   summarize(value = sum(value)) %>%
#   ungroup()

7 Gapfilling

7.1 Fill in missing years after first year of harvest data with 0 values

For example: Production of blue shrimp in Maine starts in 1983 – don’t include years before that.

Checked to make sure that there weren’t instances in which it made more sense to carry the previous year’s data forward as a method of gapilling. This didn’t seem to be the case.

## spread mar_rgn to create a value for every year-species-region (if missing it will be given a NA)
mar_rgn_spread <- spread(mar_rgn, year, value)
dim(mar_rgn_spread)

## turn data frame back into long format
mar_rgn_gf <- gather(mar_rgn_spread, "year", "value", num_range("",1950:2017)) %>%
  arrange(rgn_id, species, year, Taxon_code, fao, environment) 

## NA values are converted to zero.
mar_rgn_gf <- mar_rgn_gf %>%
  mutate(year = as.numeric(as.character(year))) %>%
  mutate(value_w_0 = ifelse(is.na(value), 0, value)) %>%
  group_by(fao, environment, species, Taxon_code, rgn_id) %>% 
  mutate(cum_value = cumsum(value_w_0)) %>%
  ungroup() %>%
  filter(cum_value > 0) %>% # eliminates years before mariculture began
  mutate(gap_0_fill = ifelse(is.na(value), "NA_to_zero", "0")) %>% # record gapfill
  mutate(value = ifelse(is.na(value), 0, value)) %>% # finally, convert all NAs in original column to 0
  select(-cum_value, -value_w_0)

See how may NA values were converted to 0

table(mar_rgn_gf$gap_0_fill)
## 3790 of these out of 27810+3790 cases had NA values converted to 0 - 2018 assessment
## 3763 of these out of 28694+4291 cases had NA values converted to 0 - 2019 assessment

Remove species-region-environment time series with less than four years of mariculture production > 0 tonnes (assume these are not established mariculture programs).

mar_rgn_gf = mar_rgn_gf %>% 
  group_by(rgn_id, species, fao, environment) %>%
  mutate (not_0 = length(value[value>0])) %>% # length of vector of years greater than 0
  filter (not_0>3) %>% # filter for groups that have at least four years of mariculture production 
  ungroup() %>% 
  select(rgn_id, species, fao, environment, year, value, Taxon_code, gap_0_fill)

Add a unique identifier per cultivated stock that describes each species, fao region, and environment grouping.

## add a unique identifier per cultivated stock
identifier = mar_rgn_gf %>% 
  select(rgn_id, species, fao, environment) %>% 
  unique() %>% 
  mutate(species_code = 1:n())

mar_rgn_gf = left_join(mar_rgn_gf, identifier)
maric <- mar_rgn_gf

8 Save file:

Used to estimate total mariculture yield per country.

Saves the appropriate Mariculture-FP file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above.

if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(maric, 'output/MAR_FP_data.csv', row.names=FALSE)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(maric, 'test/MAR_FP_data_no_seaweed.csv', row.names=FALSE)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(maric, 'test/MAR_FP_data_no_nei.csv', row.names=FALSE)
     }

9 Import data: Trujillo sustainability scores

These data describe the sustainability and genetic escapes for country/species combinations (and, in a couple cases, environment and fao region combinations). In cases where these data were not available for a specific county/species, we averaged the data across taxonomic groups to gapfill the missing data.

## Trujillo sustainability data:
sus = read.csv('raw/Truj_label_sust.csv', stringsAsFactors = FALSE, na.strings = NA)

10 Wrangle

## these need to be re-added (get cut when associated with region ids)
sus_no_rgn <- filter(sus, is.na(country))

10.1 Convert country names to OHI region names.

sus_rgn <- name_2_rgn(df_in = sus, 
                       fld_name='country', 
                       flds_unique=c('species_fao', 'fao', 'environment', 'species_Truj'))

sus_rgn <- bind_rows(sus_rgn, sus_no_rgn) %>%
  unique()

Check that the non-matches between Trujillo sustainability FAO spp (sus_rgn) and the FAO mariculture species in the wrangled FAO Aquaculture Production data table (maric) are not due to spelling errors or slightly different names. We want to include as many species that have sustainability scores as possible

## Make sure same species are spelled the same in the two data tables (e.g. check that there are no extra spaces)
sort(setdiff(sus_rgn$species_fao, maric$species)) # species that are no longer have mariculture industry or are not included due to being freshwater or non-food


sort(setdiff(maric$species, sus_rgn$species_fao)) # FAO species with no Trujillo data - there will probably be a long list

## Hand check each of the species output here. I mainly check to make sure there aren't obvious cases when the species is the same, but the names are presented slightly differently in the lists. For example, some species might have an extra space in the name somewhere. If any of these obvious differences exist, we need to go in and change the names to match. 
## Its not too surprising (unfortunately) that the FAO list has far more species than the sustainability list. But, I like to go over them to make sure that we are getting sustainability scores for as many species as we can.

11 FAO mariculture and sustainability scores

Append sustainability score to the FAO mariculture data.

The following joins the sustainability scores to regions/species that have Trujillo data.

table(sus_rgn$match_type)

## join taxa specific to country/species/environment
c_sp_env = sus_rgn %>% 
  filter(match_type == 'c_sp_env') %>% 
  select(rgn_id, species=species_fao, environment, Sust_c_sp_env = Maric_sustainability)

mar_sus <- maric %>%
  left_join(c_sp_env, by= c("species", "environment", "rgn_id"))

## join taxa specific to country/species/fao region
c_sp_fao = sus_rgn %>% 
  filter(match_type == 'c_sp_fao') %>% 
  select(rgn_id, species=species_fao, fao, Sust_c_sp_fao = Maric_sustainability)

mar_sus <- mar_sus %>%
  left_join(c_sp_fao, by= c("species", "fao", "rgn_id"))

Take a look at the data thus far

head(data.frame(filter(mar_sus, rgn_id==218 & species == "Atlantic salmon")))
head(data.frame(filter(mar_sus, !is.na(Sust_c_sp_fao))))

## join taxa specific to country/species
c_sp = sus_rgn %>% 
  filter(match_type == 'c_sp') %>% 
  select(rgn_id, species=species_fao, Sust_c_sp = Maric_sustainability)

mar_sus <- mar_sus %>%
  left_join(c_sp, by= c("species", "rgn_id"))

Now look at a summary after appending all the Trujillo data

summary(mar_sus)

Merge the three Trujillo type categories into a single sustainability score column in the following order:

Sust_c_sp_env: taxa specific to country/species/environment (smallest taxonomic level)
Sust_c_sp_fao: taxa specific to country/species/fao region
Sust_c_sp: taxa specific to country/species (highest taxonomic level)

For example, if Sust_c_sp_env is missing, use Sust_c_sp_fao and so on.

mar_sus = mar_sus %>% 
  mutate(Sust = ifelse(!is.na(Sust_c_sp_env), Sust_c_sp_env, Sust_c_sp_fao)) %>%
  mutate(Sust = ifelse(is.na(Sust), Sust_c_sp, Sust)) %>%
  select(-Sust_c_sp_env, -Sust_c_sp_fao, -Sust_c_sp)

This joins the sustainability data that is gapfilled either at the species level (average of specific species/genera across regions) or at a higher course taxonomic levels and documents which data are gapfilled and how.

## Select observations gapfilled at the species/genera level:
gf_sp_sus <- sus_rgn %>% 
  filter(gapfill != "actuals" & match_type == "species") %>%
  select(species = species_fao, gapfill, Sust_gf_sp = Maric_sustainability)

## check that there are no duplicated species_fao
gf_sp_sus[duplicated(gf_sp_sus$species), ]

## Match gapfilling values by species
mar_sus_gf = mar_sus %>%
  left_join(gf_sp_sus, by = 'species')

## Select observations gapfilled at the coarse taxon level:
gf_taxon_sus <- sus_rgn %>% 
  filter(gapfill != "actuals" & match_type == "taxon") %>%
  select(Taxon_code=taxon, Sust_gf_taxon = Maric_sustainability)

## Match gapfilling values by species
mar_sus_gf = mar_sus_gf %>%
  left_join(gf_taxon_sus, by = c('Taxon_code'))

Take a look at the wrangled data

summary(mar_sus_gf) # should be no NA's in Sust_gf_taxon column
table(mar_sus_gf$gapfill)

Obtain a sustainability score for each record, and a book-keeping column of whether it’s actual or gap-filled

For missing sustainability scores:

Use species-level Sustainability score
If no species-level scores, gapfill with taxon-level sustainability average

mar_sus_final = mar_sus_gf %>% 
  mutate(gapfill = ifelse(!is.na(Sust), "none", gapfill)) %>%
  mutate(Sust = ifelse(is.na(Sust), Sust_gf_sp, Sust)) %>% # gapfill with species level
  mutate(gapfill = ifelse(is.na(Sust), "taxon_average", gapfill)) %>% # add in taxon gapfill record
  mutate(Sust = ifelse(is.na(Sust), Sust_gf_taxon, Sust)) %>% # gapfill with taxon level
  mutate(taxa_code = paste(species, species_code, sep="_")) %>%
  select(rgn_id, species, species_code, taxa_code, taxa_group=Taxon_code, year, gapfill_sus = gapfill, gapfill_fao = gap_0_fill, tonnes=value, Sust)

12 Save Data:

## save mariculture harvest data
mar_harvest_tonnes = mar_sus_final %>%
  select(rgn_id, taxa_code, taxa_group, year, tonnes)

anyDuplicated(mar_harvest_tonnes) # check for duplications

## Saves the appropriate Mariculture Harvest Tonnes file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above.
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(mar_harvest_tonnes, 'output/mar_harvest_tonnes.csv', row.names=F)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(mar_harvest_tonnes, 'test/mar_harvest_tonnes_no_seaweed.csv', row.names=F)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(mar_harvest_tonnes, 'test/mar_harvest_tonnes_no_nei.csv', row.names=F)
     }

## save gapfill data for mariculture harvest
mar_harvest_tonnes_gf = mar_sus_final %>%
  select(rgn_id, taxa_code, taxa_group, year, tonnes=gapfill_fao)


## Saves the appropriate Mariculture Harvest Gapfill file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(mar_harvest_tonnes_gf, 'output/mar_harvest_tonnes_gf.csv', row.names=F)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(mar_harvest_tonnes_gf, 'test/mar_harvest_tonnes_gf_no_seaweed.csv', row.names=F)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(mar_harvest_tonnes_gf, 'test/mar_harvest_tonnes_gf_no_nei.csv', row.names=F)
     }


## save sustainability scores data for 2012
mar_sustainability_score = mar_sus_final %>% 
  mutate(year = 2012) %>% # Only 2012 sustainability scores exist (Trujillo)
  select(rgn_id, year, taxa_code, sust_coeff = Sust) %>% 
  unique()

anyDuplicated(mar_sustainability_score)


## Saves the appropriate Sustainability Score file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(mar_sustainability_score, 'output/mar_sustainability.csv', row.names=F)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(mar_sustainability_score, 'test/mar_sustainability_no_seaweed.csv', row.names=F)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(mar_sustainability_score, 'test/mar_sustainability_no_nei.csv', row.names=F)
     }


## save gapfill data for sustainability scores
mar_sustainability_score_gf = mar_sus_final %>% 
  select(rgn_id, taxa_code, sust_coeff = gapfill_sus) %>% 
  unique()


## Saves the appropriate Sustainability Score Gapfill file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(mar_sustainability_score_gf, 'output/mar_sustainability_gf.csv', row.names=F)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(mar_sustainability_score_gf, 'test/mar_sustainability_gf_no_seaweed.csv', row.names=F)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(mar_sustainability_score_gf, 'test/mar_sustainability_gf_no_nei.csv', row.names=F)
     }

13 Wrangle: Genetic Escapes Data

Combine genetic escapes data to mariculture data.

Data from Trujillo data table.
These data are used as a pressure layer to describe the risk of genetic escapees due to mariculture.
First merge with the species data (no gapfilling) for each country/species/fao region combination.

## can eliminate the environment category because these have the same scores
esc = sus_rgn %>% 
  filter(!is.na(Genetic.escapees)) %>%
  mutate(match_type = ifelse(match_type == "c_sp_env", "c_sp", match_type)) %>%
  group_by(rgn_id, species=species_fao, fao, match_type, taxon, gapfill) %>%
  summarize(Genetic.escapees = mean(Genetic.escapees)) %>%
  ungroup()
  
## join taxa specific to country/species/fao
c_sp_fao = esc %>% 
  filter(match_type == 'c_sp_fao') %>% 
  select(rgn_id, species, fao, Esc_c_sp_fao = Genetic.escapees) 

mar_esc <- maric %>%
  left_join(c_sp_fao, by= c("species", "fao", "rgn_id"))

head(data.frame(filter(mar_esc, !is.na(Esc_c_sp_fao))))

## join taxa specific to country/species
c_sp = esc %>% 
  filter(match_type == 'c_sp') %>% 
  select(rgn_id, species, Esc_c_sp = Genetic.escapees) # fao is blank

mar_esc <- mar_esc %>%
  left_join(c_sp, by= c("species", "rgn_id"))

Look at a summary of the wrangled data table

summary(mar_esc)

Merge the different match types (esc_c_sp_fao, esc_c_sp) into a single sustainability score

First use Esc_c_sp_fao
If no Esc_c_sp_fao value, use Esc_c_sp

mar_esc = mar_esc %>% 
  mutate(Escapees = ifelse(!is.na(Esc_c_sp_fao), Esc_c_sp_fao, Esc_c_sp)) %>%
  select(-Esc_c_sp_fao, -Esc_c_sp)

14 Gapfill

Join the sustainability data that is gapfilled either at the species level (average of specific species/genera across regions) or at a higher course taxonomic levels and documents which data are gapfilled and how.

## Select observations gapfilled at the species/genera level:
gf_species_esc <- esc %>% 
  filter(gapfill != "actuals" & match_type == "species") %>%
  select(species, gapfill, Esc_gf_sp = Genetic.escapees)

## check that there are no duplicated species_fao
gf_species_esc[duplicated(gf_species_esc$species), ]

## Match gapfilling values by species
mar_esc_gf = mar_esc %>%
  left_join(gf_species_esc, by = 'species')

## Select observations gapfilled at the coarse taxon level:
gf_taxon_sus <- esc %>% 
  filter(gapfill != "actuals" & match_type == "taxon") %>%
  select(Taxon_code=taxon, Esc_gf_taxon = Genetic.escapees)

## Match gapfilling values by species
mar_esc_gf = mar_esc_gf %>%
  left_join(gf_taxon_sus, by = c('Taxon_code'))

Take a look at wrangled data

summary(mar_esc_gf)
table(mar_esc_gf$gapfill)

14.1 Record Gapfilling Methods

Obtain a sustainability score for each record, and a book-keeping column of whether it’s actual or gap-filled.

For gapfill column

If no missing escapee data, record “none”
If NA, then gapfill with Esc_gf_sp (gapfill for species-level already recorded)
If no Esc_gf_sp value, then record “taxon_average” as the gapfill method
Fill in missing value with Esc_gf_taxon value

tonnes_esc = mar_esc_gf %>% 
  mutate(gapfill = ifelse(!is.na(Escapees), "none", gapfill)) %>% 
  mutate(Escapees = ifelse(is.na(Escapees), Esc_gf_sp, Escapees)) %>%
  mutate(gapfill = ifelse(is.na(Escapees), "taxon_average", gapfill)) %>%
  mutate(Escapees = ifelse(is.na(Escapees), Esc_gf_taxon, Escapees)) %>%
  select(rgn_id, species, species_code, year, gapfill_escapees = gapfill, tonnes=value, Escapees)

**Take a look at the summary

summary(tonnes_esc)
anyDuplicated(tonnes_esc) # check for duplicates

Final formatting of the escapee data. This is used as a pressure layer.

For each region/year: take a weighted average of the genetic escape probability for each taxa based on tonnes mariculture

genEscapes <- tonnes_esc %>%
  group_by(rgn_id, year) %>%
  summarize(genEscapes = weighted.mean(Escapees, tonnes, na.rm=TRUE))

Obtain gapfill information from genEscapes Gapfill values in output file are the proportion of data that is gapfilled.

# obtain corresponding gapfilling information for each region (average of gapfilled data, weighted by tonnes of mariculture).
genEscapes_gf <- tonnes_esc %>%
  mutate(gapfill = ifelse(gapfill_escapees=="none", 0, 1)) %>%
  group_by(rgn_id, year) %>%
  summarize(genEscapes = weighted.mean(gapfill, tonnes, na.rm=TRUE)) %>%
  ungroup() %>%
  filter(year==2017) %>%
  select(rgn_id, pressures.score=genEscapes) %>%
  mutate(pressures.score=ifelse(pressures.score=="NaN", NA, pressures.score)) %>%
  data.frame()

14.2 Save gapfill data

## Saves the appropriate Genetic Escapes Gapfill file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(genEscapes_gf, 'output/GenEsc_gf.csv', row.names=FALSE)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(genEscapes_gf, 'test/GenEsc_gf_no_seaweed.csv', row.names=FALSE)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(genEscapes_gf, 'test/GenEsc_gf_no_nei.csv', row.names=FALSE)
     }

Obtain escapee data layers from genEscapes

# create the escapee data layers:
data <- genEscapes %>%
    select(rgn_id, year, pressure_score = genEscapes)

14.3 Save escapee pressure layer

## Saves the appropriate Genetic Escapes data file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(data, 'output/GenEsc.csv', row.names=FALSE)  
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(data, 'test/GenEsc_no_seaweed.csv', row.names=FALSE) 
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
    write.csv(data, 'test/GenEsc_no_nei.csv', row.names=FALSE)  
     }

14.4 Data check

Comparing this year’s data to previous year’s data. Expect small variation from year to year. Plot to view differences.

## Compare genetic escapes pressure scores for Russia; saw large changes between 2018 and 2017 assessment years. No big changes between 2019 and 2018 assessment years. 
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 73) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 73) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")



## Compare yield data for Russia
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 73, year == 2016) %>% 
  select(rgn_id, species, fao, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 73, year == 2017) %>% 
  select(rgn_id, species, fao, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)



## Compare genetic escapes pressure scores for Vietnam; no change in pressures
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 207) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 207) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")



## Compare yield data for Vietnam
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 207, year == 2016) %>% 
  select(rgn_id, species, fao, environment, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 207, year == 2017) %>% 
  select(rgn_id, species, fao, environment, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao", "environment")); View(yield)



## Compare genetic escapes pressure scores for Iceland; saw large changes between 2018 and 2017 assessment years, we do not see a big difference between 2019 and 2018 assessment years. 
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 143) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 143) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")



## Compare yield data for Iceland. Atlantic Salmon increased by a lot between 2018 and 2019 assessment years. 
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 143, year == 2016) %>% 
  select(rgn_id, species, fao, environment, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 143, year == 2017) %>% 
  select(rgn_id, species, fao, environment, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao", "environment")); View(yield)



## Compare genetic escapes pressure scores for Belize; saw large changes between 2018 and 2017 assessment years, we do not see big differences between 2019 and 2018 assessment years. 
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 164) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 164) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")

## Compare yield data for Belize
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 164, year == 2016) %>% 
  select(rgn_id, species, fao, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 164, year == 2017) %>% 
  select(rgn_id, species, fao, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)


## Compare genetic escapes pressure scores for France; saw large changes between 2018 and 2017 assessment years, we do not see big difference between 2019 and 2018 assessment years. 
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 179) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 179) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")


## Compare yield data for France
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 179, year == 2016) %>% 
  select(rgn_id, species, fao, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 179, year == 2017) %>% 
  select(rgn_id, species, fao, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)

plot(yield$old_value, yield$value);abline(0,1,col="red")



## Compare new/old Mariculture sustainability scores
tmp_old <- read.csv("../v2018/raw/Truj_label_sust.csv") %>% 
  filter(country == "Russian Federation") %>% 
  select(country, species_Truj, sust_old = Maric_sustainability, gen_old = Genetic.escapees)
  
tmp_new <- read.csv("../v2019/raw/Truj_label_sust.csv") %>% 
  filter(country == "Russian Federation") %>% 
  select(country, species_Truj, Maric_sustainability, Genetic.escapees)

test <- tmp_new %>% 
  full_join(tmp_old, by=c("country", "species_Truj"))
View(test)

plot(test$sust_old, test$Maric_sustainability);abline(0,1,col="red")


## Compare new/old genetic escapees scores
plot(test$gen_old, test$Genetic.escapees);abline(0,1,col="red")

14.5 Data Check

Compare the three exclusion methods for seaweed species.

Method 1:Exclude some seaweed species (exclude) Method 2 (original): Exclude all seaweed species (exclude_no_sw) *Method 3: Exclude all seaweed nei species (exclude_no_nei)

## Read in the three different versions of wrangled data
## Best Professional Judgement (BPJ)
exclude <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(year == 2017) %>% # compare the most recent shared year between old and new
  select(rgn_id, prs_score_BPJ=pressure_score)

## Exclude all seaweeds (no seaweed)
exclude_no_sw <- read.csv("../v2019/test/GenEsc_no_seaweed.csv") %>%
  filter(year == 2017) %>% # compare the most recent shared year between old and new
  select(rgn_id, prs_score_no_sw=pressure_score)

## Exclude only nei seaweeds (no nei)
exclude_no_nei <- read.csv("../v2019/test/GenEsc_no_nei.csv") %>%
  filter(year == 2017) %>% # compare the most recent shared year between old and new
  select(rgn_id, prs_score_no_nei=pressure_score)


## Compare No Seaweed to Best Professional Judgement
compare <- full_join(exclude, exclude_no_sw, by="rgn_id")
compare <- na.omit(compare)
## Compare genetic escapes pressure scores
plot(compare$prs_score_BPJ, compare$prs_score_no_sw); abline(0,1,col="red")

## Compare No Nei to No Seaweed (not much difference)
compare <- full_join(exclude_no_nei, exclude_no_sw, by="rgn_id")
compare <- na.omit(compare)
## Compare genetic escapes pressure scores
plot(compare$prs_score_no_nei, compare$prs_score_no_sw); abline(0,1,col="red")

## Compare No Nei to BPJ
compare <- full_join(exclude_no_nei, exclude, by="rgn_id")
compare <- na.omit(compare)
## Compare genetic escapes pressure scores
plot(compare$prs_score_BPJ, compare$prs_score_no_nei); abline(0,1,col="red")

15 Potential Aquaculture Calculations for New Reference Point

[REFERENCE RMD FILE: https://raw.githubusercontent.com/OHI-Science/ohiprep_v2019/gh-pages/globalprep/mar/v2019/reference_point/CountryProductionEstimate.Rmd]

16 Summary

This analysis produces potential tonnes of aquaculture from growth potential estimates of finfish and bivalves. These aquaculture numbers will be used for the reference point for the mariculture subgoal.

17 Data Source

17.1 Growth Potential (Phi) data

Reference:
https://knb.ecoinformatics.org/view/doi:10.5063/F1CF9N69 Rebecca Gentry, Halley Froehlich, Dietmar Grimm, Peter Kareiva, Michael Parke, et al. SNAPP - Mapping the Global Potential for Marine Aquaculture. Knowledge Network for Biocomplexity. doi:10.5063/F1CF9N69.

Downloaded: 7/3/2019

Description: Growth Potential estimate raster for global cells

18 Methods

knitr::opts_chunk$set(eval=FALSE)
#######regression inputs from VBGF_Fish.r

# This is a heavily modified script from KNB:
# https://knb.ecoinformatics.org/view/doi:10.5063/F1CF9N69

# From this paper:
# https://www.nature.org/content/dam/tnc/nature/en/documents/Mapping_the_global_potential_for_marine_aquaculture.pdf


### libraries useful for data wrangling
library(dplyr)
library(tidyr)
library(tidyverse)

## libraries useful for spatial data
library(raster)       
library(rgdal)        
library(sf)         
library(fasterize)

## data visualization
library(RColorBrewer)
library(ggplot2)
library(rasterVis)    
library(maps)

## path management
library(here)


## some OHI files
source('http://ohi-science.org/ohiprep_v2019/workflow/R/common.R')
dir_git <- file.path(here(), "globalprep/mar/v2019")

18.1 Establish variables and coefficients

## Variables needed to get from PHI to tonnes of production
## MRF: assumes 1 farm is 1km2  ##
# (seems easier to just units of km2, rather than farm)

F_estCoef <- c(7.6792, (-5.8198)) #from regression estimated in VBGF_Fish_Final.r
B_estCoef <- c(2.9959,(-1.6659)) #from regression estimate in VBGF_Bivalves.r
density <- 20 #juveniles per m3
cagesize <- 9000 #m3
cagesperfarm <- 24 #located atleast 1 km apart.. cagesperkm2
bivperfarm <- 130000000 #(100 longlines/km2) * 4 km * (100 bivalves seeded/0.0003 km) = 133333333 bivalve/km2
weight35cm <- 554.8  ## in grams see VBGF_Fish_Final. Paper reports 548 grams

18.2 Finfish Production

## Global tiff file of PHI (Growth Potential) estimates
FishPhiALLConstraints <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishPhiALLConstraints95LT2.tiff"))
plot(FishPhiALLConstraints)
FishPhiVector=getValues(FishPhiALLConstraints)


## Convert Phi raster to number of years it takes to grow a 35 cm fish

LogFishYears <- calc(FishPhiALLConstraints, fun=function(x){F_estCoef[1]+F_estCoef[2]*log(x)})
LogFishYears
plot(LogFishYears)

FishYears <- calc(LogFishYears, fun=function(x){exp(x)})

FishYears
plot(FishYears)
writeRaster(FishYears,file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishYearsbyCell.tif"), overwrite=TRUE)

FishYears <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishYearsbyCell.tif"))
FishYearsVector=getValues(FishYears)



#OHI 2018 regions (original analysis used older regions file)

## call spatial file from sourced file
regions_shape()
OHIcountries <- regions %>%
  filter(type_w_ant == "eez")
OHIcountries <- st_transform(OHIcountries, crs(FishPhiALLConstraints))
OHIcountries_raster <- fasterize(OHIcountries, FishPhiALLConstraints, field="rgn_id")  
CountryVector=getValues(OHIcountries_raster)




### area of each cell (each cell is different given lat/long coordinate reference system)
areaPerCell <- area(FishPhiALLConstraints, weights=FALSE, na.rm=TRUE)
areaPerCellVector <- getValues(areaPerCell)

### Make a dataframe with raster values that includes cells: Country, area, Phi, and Years to Harvest
productionDF <- data.frame(CellID = 1:933120000,
                           Country = CountryVector,
                           AreaKM2 = areaPerCellVector,
                           PhiPrime = FishPhiVector, 
                           YearsToHarvest = FishYearsVector)

head(productionDF)

summary(FishYearsVector)
summary(areaPerCellVector) ##they seem to match



## calculate production for each cell
productionDFFishCells <- productionDF %>%
  filter(!is.na(YearsToHarvest)) %>%
  mutate(F_yieldperfarmMT = (weight35cm * density * cagesize * cagesperfarm)/1000000) %>%  # MRF: units yieldperkm2? 554.8 grams * 20 juv/m3 * 9000 m3 * 24 cages/km2 = grams/km2
  mutate(F_yieldpercellperyear = (F_yieldperfarmMT/YearsToHarvest) * AreaKM2) %>%
  arrange(YearsToHarvest) %>%
  mutate(YieldCumSum = cumsum(F_yieldpercellperyear)) %>%
  mutate(AreaCumSum = cumsum(AreaKM2))

write.csv(productionDFFishCells, file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/productionDFFishCells.csv")) #save to mazu because so large. This is functionally a raster file. 

productionDFFishCells <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/productionDFFishCells.csv"))

head(productionDFFishCells)
summary(productionDFFishCells)
str(productionDFFishCells)
##cumsum area is 11,402,629 km2 -  # matches the paper 
##cumprod is 15,950,000,000MT -   # matches the paper 

### how many of these cells are not in a country?
sum(is.na(productionDFFishCells$Country))
dim(productionDFFishCells)

## MRF: with new OHI regions: 544,569 are not in a country..probably a lot are in conflicted areas


## Calculate production if 1% of top production area is used:
productionByCountryFishDF <- productionDFFishCells %>%
  filter(!is.na(Country)) %>%
  dplyr::select(-YieldCumSum, -AreaCumSum, -X) %>%
  arrange(YearsToHarvest) %>%
  mutate(ID = Country) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  mutate(CountryYieldCumSum = cumsum(F_yieldpercellperyear)) %>%
  mutate(CountryAreaCumSum = cumsum(AreaKM2)) %>%
  mutate(MaxProdPerCountry = max(CountryYieldCumSum)) %>% 
  mutate(OnePercentDevPerCountry = .01*max(CountryAreaCumSum)) #calculating 1 percent of area

write.csv(productionByCountryFishDF,file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/FishProdByCountryByCell.csv"), row.names = FALSE) #save to mazu because so large. This is functionally a raster file. 

productionByCountryFishDF <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/FishProdByCountryByCell.csv"))

## MRF: For each area identify amount of area that corresponds to 1% of production area, 
## MRF: assume maximum production within the country for the 1% of area
CountryProdSummary <- productionByCountryFishDF %>%
  dplyr::arrange(YearsToHarvest) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  filter(CountryAreaCumSum <= OnePercentDevPerCountry) %>%
  mutate(ProdPerCountryOnePercent = max(CountryYieldCumSum)) %>%
  slice(1)


write.csv(CountryProdSummary, file.path(dir_git, "/int/FishProdByCountrySummary.csv"), row.names = FALSE) #save to github

CountryProdSummary <- read.csv(file.path(dir_git, "int/FishProdByCountrySummary.csv"))

# MRF: get fasted YearsToHarvest for each country
CountryProdSummaryNop <- productionByCountryFishDF %>%
  dplyr::arrange(YearsToHarvest) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  slice(1)



# Add country names

region_data()
CountryLabel <- rgns_eez %>%
  dplyr::select(ID = rgn_id, rgn_name)


## Final data
## I think the relevant value we want for the reference point is in this table:  ProdPerCountryOnePercent. Indonesia number for fish matches up with paper. > 24 million tonnes of fish if 1% of aquaculture potential developed. 
CountryProdSummaryFAO <- CountryProdSummary %>%
  ungroup %>%
  dplyr::select(ID:ProdPerCountryOnePercent) %>%
  full_join(CountryLabel, by= "ID")

write.csv(CountryProdSummaryFAO, file.path(dir_git, "/int/FishProdByCountrySummaryFAO.csv"), row.names = FALSE) #save to github

sum(CountryProdSummaryFAO$MaxProdPerCountry, na.rm = TRUE)
#15451023277 number matches paper. > 15 billion tonnes

18.3 Bivalve Production

##Now for Bivalves 
BivalvePhiALLConstraints=raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalvePhiALLConstraints95LT1.tif"))

plot(BivalvePhiALLConstraints)
BivalvePhiVector <- getValues(BivalvePhiALLConstraints)


#OHI 2018 regions (original analysis used older regions file)

## call spatial file from sourced file
regions_shape()

OHIcountries <- regions %>%
  filter(type_w_ant == "eez")

OHIcountries <- st_transform(OHIcountries, crs(FishPhiALLConstraints))

OHIcountries_raster <- fasterize(OHIcountries, FishPhiALLConstraints, field="rgn_id")  

CountryVector <- getValues(OHIcountries_raster)

#make the value of each cell the years it takes to grow a 4 cm bivlave
LogBivalveYears <- calc(BivalvePhiALLConstraints, fun = function(x){B_estCoef[1] + B_estCoef[2]*(x)})
LogBivalveYears
plot(LogBivalveYears)

BivalveYears=calc(LogBivalveYears,fun=function(x){exp(x)})

BivalveYears
plot(BivalveYears)

writeRaster(BivalveYears,file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalveYearsByCell.tif"), overwrite=TRUE)

BivalveYears <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalveYearsByCell.tif"))

BivalveYearsVector <- getValues(BivalveYears)


###now load in area values for each cell
#areaPerCell=raster("Spatial_Data/MiddleFiles/AreaBivalveLT1.grd")
areaPerCell <- area(BivalvePhiALLConstraints, weights=FALSE, na.rm=TRUE)
areaPerCell

areaPerCellVector=getValues(areaPerCell)


productionDFBiv <- data.frame(CellID=1:933120000,
                           Country=CountryVector,
                           AreaKM2=areaPerCellVector,
                           PhiPrime=BivalvePhiVector, 
                           YearsToHarvest=BivalveYearsVector)

head(productionDFBiv)

productionDFBivCells <- productionDFBiv %>%
  filter(!is.na(YearsToHarvest)) %>%
  mutate(B_yieldperfarmInd = bivperfarm) %>%
  mutate(B_yieldpercellperyear = (B_yieldperfarmInd / YearsToHarvest) * AreaKM2) %>%
  arrange(YearsToHarvest) %>%
  mutate(YieldCumSum = cumsum(B_yieldpercellperyear)) %>%
  mutate(AreaCumSum = cumsum(AreaKM2))

head(productionDFBivCells)
summary(productionDFBivCells)
str(productionDFBivCells)

write.csv(productionDFBivCells, file.path(dir_M, "/git-annex/globalprep/mar/v2019/int/productionDFBivCells.csv")) #save to mazu because so large. This is functionally a raster file. 

productionDFBivCells <- read.csv(file.path(dir_M, "/git-annex/globalprep/mar/v2019/int/bivalve/productionDFBivCells.csv"))

productionByCountryBivDF <- productionDFBivCells %>%
  filter(!is.na(Country)) %>%
  dplyr::select(-YieldCumSum, -X) %>%
  dplyr::select(-AreaCumSum) %>%
  arrange(YearsToHarvest) %>%
  mutate(ID = Country) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  mutate(CountryYieldCumSum = cumsum(B_yieldpercellperyear)) %>%
  mutate(CountryAreaCumSum = cumsum(AreaKM2)) %>%
  mutate(MaxPhi = max(PhiPrime)) %>%
  mutate(averagePhi = mean(PhiPrime)) %>%
  mutate(averageWeightedPhi = sum(PhiPrime*AreaKM2)/(max(CountryAreaCumSum))) %>%
  mutate(MaxDevPerCountry = max(CountryAreaCumSum)) %>%
  mutate(MaxProdPerCountry = max(CountryYieldCumSum)) %>%
  mutate(OnePercentDevPerCountry = .01*max(CountryAreaCumSum))

head(productionByCountryBivDF)

write.csv(productionByCountryBivDF,file.path(dir_M, "git-annex/globalprep/mar/v2019/int/bivalve/BivProdByCountryByCell.csv"), row.names = FALSE) #save to mazu because so large. This is functionally a raster file. 

productionByCountryBivDF <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/bivalve/BivProdByCountryByCell.csv")) 


CountryProdSummary <- productionByCountryBivDF %>%
  filter(CountryAreaCumSum <= OnePercentDevPerCountry) %>%
  mutate(ProdPerCountryOnePercent = max(CountryYieldCumSum)) %>%
  slice(1)

write.csv(CountryProdSummary,file.path(dir_git, "int/BivalveProdByCountrySummary.csv"), row.names = FALSE) #save to github


region_data()
CountryLabel <- rgns_eez %>%
  dplyr::select(ID = rgn_id, rgn_name)

head(CountryLabel)

## I think the relevant value we want for the reference point is in this table:  ProdPerCountryOnePercent. Indonesia number for bivalves is higher than the paper, however the paper does use the phrase "over 3.9*10^11 tonnes". Data says 4.7 * 10^11 million tonnes of bivalves if 1% of aquaculture potential developed, which is greater than 3.9 * 10^11 tonnes.  
CountryProdSummaryFAO <- CountryProdSummary %>%
  ungroup %>%
  dplyr::select(ID:ProdPerCountryOnePercent) %>%
  full_join(CountryLabel, by = "ID")

write.csv(CountryProdSummaryFAO, file.path(dir_git, "int/BivalveProdByCountrySummaryFAO.csv"), row.names = FALSE) #save to github 

sum(CountryProdSummaryFAO$MaxDevPerCountry, na.rm = TRUE)
#1491404 km2. Matches paper ~1,500,000 km2

18.4 Unit Conversion and Gap Filling

To compare potential vs harvest, we need to convert bivalve units to metric tonnes, they are currently in units of individual bivalves. Some figures:

Scallops: 40-50 per kg (~22g ea)
Mussels: 15-20 per 1 lb blue (~25g ea)
Oysters: 12-15 per 1 lb (~33g ea) aquaculture halfshell
Clams: 20-30 per 1 lb Manila (~18g ea), 8-14 per 1 lb littlenecks (~40g ea)

Averaging these gives about 27.5g per piece.

aq_mass_per_pc <- 0.0275 * 1e-3 ### mass of bivalve piece in tonnes

pot_b <- read_csv(file.path(dir_git, 'int/BivalveProdByCountrySummaryFAO.csv')) %>%
  mutate(potential_prod_one_percent_b = ProdPerCountryOnePercent * aq_mass_per_pc,
         potential_prod_max_b = MaxProdPerCountry * aq_mass_per_pc,
         aq_type = 'shellfish',
         units   = 'tonnes') %>%
  dplyr::select("rgn_id" = "ID", rgn_name, potential_prod_one_percent_b, potential_prod_max_b)

pot_f <- read_csv(file.path(dir_git, 'int/FishProdByCountrySummaryFAO.csv')) %>%
  mutate(aq_type = 'finfish',
         units   = 'tonnes') %>%
  dplyr::select("rgn_id" = "ID", rgn_name, "potential_prod_one_percent_f" = "ProdPerCountryOnePercent", "potential_prod_max_f" = "MaxProdPerCountry")

pot_aq_int <- full_join(pot_b, pot_f) ## makes sense. 1% of potential AREA... not 1% of potential PRODUCTION

write_csv(pot_aq, file.path(dir_git, 'int/aq_potential_int.csv')) ##intermediate file that might come in handy in the future

DT::datatable(pot_aq_int)


pot_aq_final <- pot_aq_int %>%
  mutate(potential_prod_one_percent_b = replace_na(potential_prod_one_percent_b, 0),
         potential_prod_one_percent_f = replace_na(potential_prod_one_percent_f, 0)) %>%
  mutate(potential_mar_tonnes = potential_prod_one_percent_b + potential_prod_one_percent_f) %>%
  arrange(rgn_id) %>%
  dplyr::select(rgn_id, potential_mar_tonnes) %>%
  mutate(year = 2019)

write_csv(pot_aq_final, file.path(dir_git, 'output/aq_potential_final.csv'))

DT::datatable(pot_aq_final)

## make gapfilling dataset 
pot_aq_final_gf <- pot_aq_final %>%
  mutate(gapfilled = case_when(
    potential_mar_tonnes == 0 ~ 1,
    potential_mar_tonnes > 0 ~ 0
  ), 
  method = case_when(
    potential_mar_tonnes == 0 ~ "missing regions given 0 value",
    potential_mar_tonnes > 0 ~ ""
  )) %>%
  dplyr::select(rgn_id, gapfilled, method, year)

write_csv(pot_aq_final_gf, file.path(dir_git, "output/aq_potential_gf.csv"))

OHI 2019: Mariculture

Compiled on Thu Aug 29 15:45:53 2019 by sgclawson