ohi logo
OHI Science | Citation policy

1 Summary

This analysis converts FAO mariculture data into data used to calculate the OHI global mariculture status score. This also calculates the genetic escapee from mariculture pressure data.

2 Updates from previous assessment


3 Data Source

3.1 Production data

Reference:
http://www.fao.org/fishery/statistics/software/fishstatj/en#downlApp Release date: March 2019 FAO Global Aquaculture Production Quantity 1950_2016 FAO metadata found here

Downloaded: 5/17/2019

Description: Quantity (tonnes) of mariculture for each country, species, year.

Time range: 1950-2017

3.2 Sustainability data

Reference:
Mariculture Sustainability Index (Trujillo 2008)

Trujillo, Pablo. 2008. “Using a Mariculture Sustainability Index to Rank Countries’ Performance.” In Fisheries Centre Research Reports, edited by Jackie Alder and Daniel Pauly. University of British Columbia, Vancouver, Canada: Fisheries Centre Research Reports. https://circle.ubc.ca/handle/2429/40933.

Original data table is located in the Trujillo 2008 paper saved in the OHI Zotero database. Report starts on page 28 of the link provided in the reference.

Description:

Original MSI rescaled from 1-10 to 0-1 for mariculture taxa groups.


4 Methods

## [1] "/home/sgclawson/github/ohiprep_v2019/globalprep/mar/v2019"
## This file makes it easier to process data for the OHI global assessment
##  by creating the following objects:
## 
##  * dir_M = identifies correct file path to Mazu (internal server) based on your operating system
##  * mollCRS = the crs code for the mollweide coordinate reference system we use in the global assessment
##  * regions_shape() = function to load global shapefile for land/eez/high seas/antarctica regions
##  * ohi_rasters() = function to load two rasters: global eez regions and ocean region
##  * region_data() = function to load 2 dataframes describing global regions 
##  * rgn_syns() = function to load dataframe of region synonyms (used to convert country names to OHI regions)
##  * low_pop() = function to load dataframe of regions with low and no human population
##  * UNgeorgn = function to load dataframe of UN geopolitical designations used to gapfill missing data

5 Import Raw Data: FAO Mariculture data

Mariculture production in tonnes.

6 Wrangle:

6.2 Update species names

Update species name in the raw/species_list.csv file with names in the mar dataset (exclude non-food species). I simplified the species_list. I cut the “species” name columns because it wasn’t clear what this was trying to accomplish and created potential error.

## Commented out 'read.csv' lines are different versions of the dataset, representing the three different seaweed exclusion methods (exclude some seaweed, all seaweed, all seaweed nei). Use at the end for data checking.The first one is the one which is saved to the "output" folder. 
mar_sp <- read.csv('raw/species_list.csv', stringsAsFactors=FALSE) %>%
  select(FAO_name, exclude, alias, Taxon_code)
#mar_sp <- read.csv('raw/species_list.csv', stringsAsFactors=FALSE) %>%
#  select(FAO_name, exclude_no_seaweed, alias, Taxon_code)
#mar_sp <- read.csv('raw/species_list.csv', stringsAsFactors=FALSE) %>%
#  select(FAO_name, exclude_no_nei, alias, Taxon_code)

new.spp <- setdiff(mar$FAO_name, mar_sp$FAO_name)
new.spp # check: if dim has 0 rows it means all match
## if there is a list of species, hand check species_list.csv to see whether to keep (exclude seaweeds and species harvested for ornamental/medicinal), check if synonyms match Trujillo names. 
## This is outlined in issue #8 and issue #81 for reference. You need to add the species output from this to species_list.csv and fill out the corresponding information. I.e. add FAO_name, exlude (whether to include in assessment or not. If they are only harvested for medicinal purposes, do not include), alias, and Taxon_code to species_list.csv.

## REMOVE SPECIES not relevant to mariculuture goal (i.e., non-food species)
mar <- mar %>% left_join(mar_sp, by="FAO_name") 

## Filters out species that should be excluded from the MAR sub-goal, depending on which of the 3 scenarios was read in. Searches for column name "exclude", "exclude_no_seaweed", or "exclude_no_nei"
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
  
 mar <- mar %>% filter(exclude==0)

 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
  
   mar <- mar %>% filter(exclude_no_seaweed==0)

   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
  
     mar <- mar %>% filter(exclude_no_nei==0)

     }
 
## change names using species alias or FAO species name (global changes)
mar$species <- ifelse(!is.na(mar$alias), mar$alias, mar$FAO_name) 

## sum production values for each group to account for duplicate rows after name change (remove NA values)
mar <- mar %>%
  filter(!is.na(value)) %>%
  group_by(country, fao, environment, species, year, Taxon_code) %>% 
    summarize(value = sum(value)) %>% 
  ungroup()


## eliminate country-species data with zero production throughout the time-series (1950-recent)
mar <- mar %>%
  group_by(country, species) %>%
  mutate(total_value = sum(value)) %>%
  filter(total_value > 0) %>%
  select(-total_value) %>%
  ungroup()

6.3 Convert country names to OHI regions

Take a look at the tidied data for a single year and region

For some regions, a specific species can be altered so that it matches more general Trujillo sustainability data. In this case, I don’t want the name changes to be global because some regions may have more specific species data.

(Will explore this in the future, but will not implement this year).

# ## based on looking at the list, make a few name changes to match the regions Trujillo data
# # Chile name modification
# mar_rgn$species[mar_rgn$rgn_id==224 & mar_rgn$species == "Red abalone"] <- "Abalones nei"
# mar_rgn$species[mar_rgn$rgn_id==224 & mar_rgn$species == "Japanese abalone"] <- "Abalones nei"
# 
# # China
# mar_rgn$species[mar_rgn$rgn_id==209 & mar_rgn$species == "Areolate grouper"] <- "Groupers nei"
# mar_rgn$species[mar_rgn$rgn_id==209 & mar_rgn$species == "Greasy grouper"] <- "Groupers nei"
# mar_rgn$species[mar_rgn$rgn_id==209 & mar_rgn$species == "Hong Kong grouper"] <- "Groupers nei"
# mar_rgn$species[mar_rgn$rgn_id==209 & mar_rgn$species == "Orange-spotted grouper"] <- "Groupers nei"
# 
# # Honduras
# mar_rgn$species[mar_rgn$rgn_id==133 & mar_rgn$species == "Whiteleg shrimp"] <- "Penaeus shrimps nei"
# 
# # Italy
# mar_rgn$species[mar_rgn$rgn_id==84 & mar_rgn$species == "Pacific cupped oyster"] <- "Cupped oysters nei"
# 
# # New Zealand
# mar_rgn$species[mar_rgn$rgn_id==162 & mar_rgn$species == "Rainbow abalone"] <- "Abalones nei"
# 
# # Pakistan
# mar_rgn$species[mar_rgn$rgn_id==53 & mar_rgn$species == "Penaeus shrimps nei"] <- "Marine crustaceans nei"
# 
# # Philippines
# mar_rgn$species[mar_rgn$rgn_id==15 & mar_rgn$species == "Whiteleg shrimp"] <- "Penaeus shrimps nei"
# 
# # Portugal
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Golden carpet shell"] <- "Marine molluscs nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Peppery furrow"] <- "Marine molluscs nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Solen razor clams nei"] <- "Razor clams nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Atlantic bluefin tuna"] <- "Marine fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Meagre"] <- "Marine fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Seabasses nei"] <- "Marine fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "Soles nei"] <- "Marine fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==183 & mar_rgn$species == "White seabream"] <- "Marine fishes nei"
# 
# # Spain
# mar_rgn$species[mar_rgn$rgn_id==182 & mar_rgn$species == "Atlantic bluefin tuna"] <- "Tuna-like fishes nei"
# 
# # Turkey
# mar_rgn$species[mar_rgn$rgn_id==76 & mar_rgn$species == "Atlantic bluefin tuna"] <- "Tuna-like fishes nei"
# mar_rgn$species[mar_rgn$rgn_id==76 & mar_rgn$species == "European seabass"] <- "Seabasses nei"
# 
# ### sum values of regions with multiple subregions
# mar_rgn <- mar_rgn %>%
#   group_by(fao, environment, species, year, Taxon_code, rgn_id) %>%
#   summarize(value = sum(value)) %>%
#   ungroup()

7 Gapfilling

7.1 Fill in missing years after first year of harvest data with 0 values

For example: Production of blue shrimp in Maine starts in 1983 – don’t include years before that.

Checked to make sure that there weren’t instances in which it made more sense to carry the previous year’s data forward as a method of gapilling. This didn’t seem to be the case.

See how may NA values were converted to 0

Remove species-region-environment time series with less than four years of mariculture production > 0 tonnes (assume these are not established mariculture programs).

Add a unique identifier per cultivated stock that describes each species, fao region, and environment grouping.

8 Save file:

Used to estimate total mariculture yield per country.

Saves the appropriate Mariculture-FP file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above.

9 Import data: Trujillo sustainability scores

These data describe the sustainability and genetic escapes for country/species combinations (and, in a couple cases, environment and fao region combinations). In cases where these data were not available for a specific county/species, we averaged the data across taxonomic groups to gapfill the missing data.

10 Wrangle

10.1 Convert country names to OHI region names.

Check that the non-matches between Trujillo sustainability FAO spp (sus_rgn) and the FAO mariculture species in the wrangled FAO Aquaculture Production data table (maric) are not due to spelling errors or slightly different names. We want to include as many species that have sustainability scores as possible

11 FAO mariculture and sustainability scores

Append sustainability score to the FAO mariculture data.

The following joins the sustainability scores to regions/species that have Trujillo data.

Take a look at the data thus far

Now look at a summary after appending all the Trujillo data

Merge the three Trujillo type categories into a single sustainability score column in the following order:

  1. Sust_c_sp_env: taxa specific to country/species/environment (smallest taxonomic level)
  2. Sust_c_sp_fao: taxa specific to country/species/fao region
  3. Sust_c_sp: taxa specific to country/species (highest taxonomic level)

For example, if Sust_c_sp_env is missing, use Sust_c_sp_fao and so on.

This joins the sustainability data that is gapfilled either at the species level (average of specific species/genera across regions) or at a higher course taxonomic levels and documents which data are gapfilled and how.

Take a look at the wrangled data

Obtain a sustainability score for each record, and a book-keeping column of whether it’s actual or gap-filled

For missing sustainability scores:

  1. Use species-level Sustainability score
  2. If no species-level scores, gapfill with taxon-level sustainability average

12 Save Data:

## save mariculture harvest data
mar_harvest_tonnes = mar_sus_final %>%
  select(rgn_id, taxa_code, taxa_group, year, tonnes)

anyDuplicated(mar_harvest_tonnes) # check for duplications

## Saves the appropriate Mariculture Harvest Tonnes file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above.
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(mar_harvest_tonnes, 'output/mar_harvest_tonnes.csv', row.names=F)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(mar_harvest_tonnes, 'test/mar_harvest_tonnes_no_seaweed.csv', row.names=F)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(mar_harvest_tonnes, 'test/mar_harvest_tonnes_no_nei.csv', row.names=F)
     }

## save gapfill data for mariculture harvest
mar_harvest_tonnes_gf = mar_sus_final %>%
  select(rgn_id, taxa_code, taxa_group, year, tonnes=gapfill_fao)


## Saves the appropriate Mariculture Harvest Gapfill file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(mar_harvest_tonnes_gf, 'output/mar_harvest_tonnes_gf.csv', row.names=F)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(mar_harvest_tonnes_gf, 'test/mar_harvest_tonnes_gf_no_seaweed.csv', row.names=F)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(mar_harvest_tonnes_gf, 'test/mar_harvest_tonnes_gf_no_nei.csv', row.names=F)
     }


## save sustainability scores data for 2012
mar_sustainability_score = mar_sus_final %>% 
  mutate(year = 2012) %>% # Only 2012 sustainability scores exist (Trujillo)
  select(rgn_id, year, taxa_code, sust_coeff = Sust) %>% 
  unique()

anyDuplicated(mar_sustainability_score)


## Saves the appropriate Sustainability Score file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(mar_sustainability_score, 'output/mar_sustainability.csv', row.names=F)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(mar_sustainability_score, 'test/mar_sustainability_no_seaweed.csv', row.names=F)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(mar_sustainability_score, 'test/mar_sustainability_no_nei.csv', row.names=F)
     }


## save gapfill data for sustainability scores
mar_sustainability_score_gf = mar_sus_final %>% 
  select(rgn_id, taxa_code, sust_coeff = gapfill_sus) %>% 
  unique()


## Saves the appropriate Sustainability Score Gapfill file depending on whether we are excluding species based on best judgement (exclude), all seaweeds (exclude_no_seaweed), or just seaweeds nei (exclude_no_nei). See method changes above
if(sum(str_detect(names(mar_sp), "exclude$"))==1) {
 write.csv(mar_sustainability_score_gf, 'output/mar_sustainability_gf.csv', row.names=F)
 } else if (sum(str_detect(names(mar_sp), "exclude_no_seaweed"))==1) {
   write.csv(mar_sustainability_score_gf, 'test/mar_sustainability_gf_no_seaweed.csv', row.names=F)
   } else if (sum(str_detect(names(mar_sp), "exclude_no_nei"))==1) {
     write.csv(mar_sustainability_score_gf, 'test/mar_sustainability_gf_no_nei.csv', row.names=F)
     }

13 Wrangle: Genetic Escapes Data

Combine genetic escapes data to mariculture data.

Look at a summary of the wrangled data table

Merge the different match types (esc_c_sp_fao, esc_c_sp) into a single sustainability score

  1. First use Esc_c_sp_fao
  2. If no Esc_c_sp_fao value, use Esc_c_sp

14 Gapfill

Join the sustainability data that is gapfilled either at the species level (average of specific species/genera across regions) or at a higher course taxonomic levels and documents which data are gapfilled and how.

Take a look at wrangled data

14.1 Record Gapfilling Methods

Obtain a sustainability score for each record, and a book-keeping column of whether it’s actual or gap-filled.

For gapfill column

  1. If no missing escapee data, record “none”
  2. If NA, then gapfill with Esc_gf_sp (gapfill for species-level already recorded)
  3. If no Esc_gf_sp value, then record “taxon_average” as the gapfill method
  4. Fill in missing value with Esc_gf_taxon value

**Take a look at the summary

Final formatting of the escapee data. This is used as a pressure layer.

For each region/year: take a weighted average of the genetic escape probability for each taxa based on tonnes mariculture

Obtain gapfill information from genEscapes Gapfill values in output file are the proportion of data that is gapfilled.

14.4 Data check

Comparing this year’s data to previous year’s data. Expect small variation from year to year. Plot to view differences.

## Compare genetic escapes pressure scores for Russia; saw large changes between 2018 and 2017 assessment years. No big changes between 2019 and 2018 assessment years. 
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 73) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 73) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")



## Compare yield data for Russia
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 73, year == 2016) %>% 
  select(rgn_id, species, fao, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 73, year == 2017) %>% 
  select(rgn_id, species, fao, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)



## Compare genetic escapes pressure scores for Vietnam; no change in pressures
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 207) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 207) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")



## Compare yield data for Vietnam
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 207, year == 2016) %>% 
  select(rgn_id, species, fao, environment, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 207, year == 2017) %>% 
  select(rgn_id, species, fao, environment, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao", "environment")); View(yield)



## Compare genetic escapes pressure scores for Iceland; saw large changes between 2018 and 2017 assessment years, we do not see a big difference between 2019 and 2018 assessment years. 
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 143) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 143) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")



## Compare yield data for Iceland. Atlantic Salmon increased by a lot between 2018 and 2019 assessment years. 
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 143, year == 2016) %>% 
  select(rgn_id, species, fao, environment, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 143, year == 2017) %>% 
  select(rgn_id, species, fao, environment, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao", "environment")); View(yield)



## Compare genetic escapes pressure scores for Belize; saw large changes between 2018 and 2017 assessment years, we do not see big differences between 2019 and 2018 assessment years. 
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 164) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 164) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")

## Compare yield data for Belize
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 164, year == 2016) %>% 
  select(rgn_id, species, fao, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 164, year == 2017) %>% 
  select(rgn_id, species, fao, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)


## Compare genetic escapes pressure scores for France; saw large changes between 2018 and 2017 assessment years, we do not see big difference between 2019 and 2018 assessment years. 
old <- read.csv("../v2018/output/GenEsc.csv") %>%
  filter(rgn_id == 179) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(rgn_id == 179) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")


## Compare yield data for France
mar_old <- read.csv("../v2018/output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 179, year == 2016) %>% 
  select(rgn_id, species, fao, old_value = value)
mar_new <- read.csv("output/MAR_FP_data.csv") %>% 
  filter(rgn_id == 179, year == 2017) %>% 
  select(rgn_id, species, fao, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)

plot(yield$old_value, yield$value);abline(0,1,col="red")



## Compare new/old Mariculture sustainability scores
tmp_old <- read.csv("../v2018/raw/Truj_label_sust.csv") %>% 
  filter(country == "Russian Federation") %>% 
  select(country, species_Truj, sust_old = Maric_sustainability, gen_old = Genetic.escapees)
  
tmp_new <- read.csv("../v2019/raw/Truj_label_sust.csv") %>% 
  filter(country == "Russian Federation") %>% 
  select(country, species_Truj, Maric_sustainability, Genetic.escapees)

test <- tmp_new %>% 
  full_join(tmp_old, by=c("country", "species_Truj"))
View(test)

plot(test$sust_old, test$Maric_sustainability);abline(0,1,col="red")


## Compare new/old genetic escapees scores
plot(test$gen_old, test$Genetic.escapees);abline(0,1,col="red")

14.5 Data Check

Compare the three exclusion methods for seaweed species.

Method 1:Exclude some seaweed species (exclude) Method 2 (original): Exclude all seaweed species (exclude_no_sw) *Method 3: Exclude all seaweed nei species (exclude_no_nei)

## Read in the three different versions of wrangled data
## Best Professional Judgement (BPJ)
exclude <- read.csv("../v2019/output/GenEsc.csv") %>%
  filter(year == 2017) %>% # compare the most recent shared year between old and new
  select(rgn_id, prs_score_BPJ=pressure_score)

## Exclude all seaweeds (no seaweed)
exclude_no_sw <- read.csv("../v2019/test/GenEsc_no_seaweed.csv") %>%
  filter(year == 2017) %>% # compare the most recent shared year between old and new
  select(rgn_id, prs_score_no_sw=pressure_score)

## Exclude only nei seaweeds (no nei)
exclude_no_nei <- read.csv("../v2019/test/GenEsc_no_nei.csv") %>%
  filter(year == 2017) %>% # compare the most recent shared year between old and new
  select(rgn_id, prs_score_no_nei=pressure_score)


## Compare No Seaweed to Best Professional Judgement
compare <- full_join(exclude, exclude_no_sw, by="rgn_id")
compare <- na.omit(compare)
## Compare genetic escapes pressure scores
plot(compare$prs_score_BPJ, compare$prs_score_no_sw); abline(0,1,col="red")

## Compare No Nei to No Seaweed (not much difference)
compare <- full_join(exclude_no_nei, exclude_no_sw, by="rgn_id")
compare <- na.omit(compare)
## Compare genetic escapes pressure scores
plot(compare$prs_score_no_nei, compare$prs_score_no_sw); abline(0,1,col="red")

## Compare No Nei to BPJ
compare <- full_join(exclude_no_nei, exclude, by="rgn_id")
compare <- na.omit(compare)
## Compare genetic escapes pressure scores
plot(compare$prs_score_BPJ, compare$prs_score_no_nei); abline(0,1,col="red")

15 Potential Aquaculture Calculations for New Reference Point

[REFERENCE RMD FILE: https://raw.githubusercontent.com/OHI-Science/ohiprep_v2019/gh-pages/globalprep/mar/v2019/reference_point/CountryProductionEstimate.Rmd]

16 Summary

This analysis produces potential tonnes of aquaculture from growth potential estimates of finfish and bivalves. These aquaculture numbers will be used for the reference point for the mariculture subgoal.


17 Data Source

17.1 Growth Potential (Phi) data

Reference:
https://knb.ecoinformatics.org/view/doi:10.5063/F1CF9N69 Rebecca Gentry, Halley Froehlich, Dietmar Grimm, Peter Kareiva, Michael Parke, et al. SNAPP - Mapping the Global Potential for Marine Aquaculture. Knowledge Network for Biocomplexity. doi:10.5063/F1CF9N69.

Downloaded: 7/3/2019

Description: Growth Potential estimate raster for global cells


18 Methods

18.2 Finfish Production

## Global tiff file of PHI (Growth Potential) estimates
FishPhiALLConstraints <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishPhiALLConstraints95LT2.tiff"))
plot(FishPhiALLConstraints)
FishPhiVector=getValues(FishPhiALLConstraints)


## Convert Phi raster to number of years it takes to grow a 35 cm fish

LogFishYears <- calc(FishPhiALLConstraints, fun=function(x){F_estCoef[1]+F_estCoef[2]*log(x)})
LogFishYears
plot(LogFishYears)

FishYears <- calc(LogFishYears, fun=function(x){exp(x)})

FishYears
plot(FishYears)
writeRaster(FishYears,file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishYearsbyCell.tif"), overwrite=TRUE)

FishYears <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishYearsbyCell.tif"))
FishYearsVector=getValues(FishYears)



#OHI 2018 regions (original analysis used older regions file)

## call spatial file from sourced file
regions_shape()
OHIcountries <- regions %>%
  filter(type_w_ant == "eez")
OHIcountries <- st_transform(OHIcountries, crs(FishPhiALLConstraints))
OHIcountries_raster <- fasterize(OHIcountries, FishPhiALLConstraints, field="rgn_id")  
CountryVector=getValues(OHIcountries_raster)




### area of each cell (each cell is different given lat/long coordinate reference system)
areaPerCell <- area(FishPhiALLConstraints, weights=FALSE, na.rm=TRUE)
areaPerCellVector <- getValues(areaPerCell)

### Make a dataframe with raster values that includes cells: Country, area, Phi, and Years to Harvest
productionDF <- data.frame(CellID = 1:933120000,
                           Country = CountryVector,
                           AreaKM2 = areaPerCellVector,
                           PhiPrime = FishPhiVector, 
                           YearsToHarvest = FishYearsVector)

head(productionDF)

summary(FishYearsVector)
summary(areaPerCellVector) ##they seem to match



## calculate production for each cell
productionDFFishCells <- productionDF %>%
  filter(!is.na(YearsToHarvest)) %>%
  mutate(F_yieldperfarmMT = (weight35cm * density * cagesize * cagesperfarm)/1000000) %>%  # MRF: units yieldperkm2? 554.8 grams * 20 juv/m3 * 9000 m3 * 24 cages/km2 = grams/km2
  mutate(F_yieldpercellperyear = (F_yieldperfarmMT/YearsToHarvest) * AreaKM2) %>%
  arrange(YearsToHarvest) %>%
  mutate(YieldCumSum = cumsum(F_yieldpercellperyear)) %>%
  mutate(AreaCumSum = cumsum(AreaKM2))

write.csv(productionDFFishCells, file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/productionDFFishCells.csv")) #save to mazu because so large. This is functionally a raster file. 

productionDFFishCells <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/productionDFFishCells.csv"))

head(productionDFFishCells)
summary(productionDFFishCells)
str(productionDFFishCells)
##cumsum area is 11,402,629 km2 -  # matches the paper 
##cumprod is 15,950,000,000MT -   # matches the paper 

### how many of these cells are not in a country?
sum(is.na(productionDFFishCells$Country))
dim(productionDFFishCells)

## MRF: with new OHI regions: 544,569 are not in a country..probably a lot are in conflicted areas


## Calculate production if 1% of top production area is used:
productionByCountryFishDF <- productionDFFishCells %>%
  filter(!is.na(Country)) %>%
  dplyr::select(-YieldCumSum, -AreaCumSum, -X) %>%
  arrange(YearsToHarvest) %>%
  mutate(ID = Country) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  mutate(CountryYieldCumSum = cumsum(F_yieldpercellperyear)) %>%
  mutate(CountryAreaCumSum = cumsum(AreaKM2)) %>%
  mutate(MaxProdPerCountry = max(CountryYieldCumSum)) %>% 
  mutate(OnePercentDevPerCountry = .01*max(CountryAreaCumSum)) #calculating 1 percent of area

write.csv(productionByCountryFishDF,file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/FishProdByCountryByCell.csv"), row.names = FALSE) #save to mazu because so large. This is functionally a raster file. 

productionByCountryFishDF <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/FishProdByCountryByCell.csv"))

## MRF: For each area identify amount of area that corresponds to 1% of production area, 
## MRF: assume maximum production within the country for the 1% of area
CountryProdSummary <- productionByCountryFishDF %>%
  dplyr::arrange(YearsToHarvest) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  filter(CountryAreaCumSum <= OnePercentDevPerCountry) %>%
  mutate(ProdPerCountryOnePercent = max(CountryYieldCumSum)) %>%
  slice(1)


write.csv(CountryProdSummary, file.path(dir_git, "/int/FishProdByCountrySummary.csv"), row.names = FALSE) #save to github

CountryProdSummary <- read.csv(file.path(dir_git, "int/FishProdByCountrySummary.csv"))

# MRF: get fasted YearsToHarvest for each country
CountryProdSummaryNop <- productionByCountryFishDF %>%
  dplyr::arrange(YearsToHarvest) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  slice(1)



# Add country names

region_data()
CountryLabel <- rgns_eez %>%
  dplyr::select(ID = rgn_id, rgn_name)


## Final data
## I think the relevant value we want for the reference point is in this table:  ProdPerCountryOnePercent. Indonesia number for fish matches up with paper. > 24 million tonnes of fish if 1% of aquaculture potential developed. 
CountryProdSummaryFAO <- CountryProdSummary %>%
  ungroup %>%
  dplyr::select(ID:ProdPerCountryOnePercent) %>%
  full_join(CountryLabel, by= "ID")

write.csv(CountryProdSummaryFAO, file.path(dir_git, "/int/FishProdByCountrySummaryFAO.csv"), row.names = FALSE) #save to github

sum(CountryProdSummaryFAO$MaxProdPerCountry, na.rm = TRUE)
#15451023277 number matches paper. > 15 billion tonnes 

18.3 Bivalve Production

##Now for Bivalves 
BivalvePhiALLConstraints=raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalvePhiALLConstraints95LT1.tif"))

plot(BivalvePhiALLConstraints)
BivalvePhiVector <- getValues(BivalvePhiALLConstraints)


#OHI 2018 regions (original analysis used older regions file)

## call spatial file from sourced file
regions_shape()

OHIcountries <- regions %>%
  filter(type_w_ant == "eez")

OHIcountries <- st_transform(OHIcountries, crs(FishPhiALLConstraints))

OHIcountries_raster <- fasterize(OHIcountries, FishPhiALLConstraints, field="rgn_id")  

CountryVector <- getValues(OHIcountries_raster)

#make the value of each cell the years it takes to grow a 4 cm bivlave
LogBivalveYears <- calc(BivalvePhiALLConstraints, fun = function(x){B_estCoef[1] + B_estCoef[2]*(x)})
LogBivalveYears
plot(LogBivalveYears)

BivalveYears=calc(LogBivalveYears,fun=function(x){exp(x)})

BivalveYears
plot(BivalveYears)

writeRaster(BivalveYears,file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalveYearsByCell.tif"), overwrite=TRUE)

BivalveYears <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalveYearsByCell.tif"))

BivalveYearsVector <- getValues(BivalveYears)


###now load in area values for each cell
#areaPerCell=raster("Spatial_Data/MiddleFiles/AreaBivalveLT1.grd")
areaPerCell <- area(BivalvePhiALLConstraints, weights=FALSE, na.rm=TRUE)
areaPerCell

areaPerCellVector=getValues(areaPerCell)


productionDFBiv <- data.frame(CellID=1:933120000,
                           Country=CountryVector,
                           AreaKM2=areaPerCellVector,
                           PhiPrime=BivalvePhiVector, 
                           YearsToHarvest=BivalveYearsVector)

head(productionDFBiv)

productionDFBivCells <- productionDFBiv %>%
  filter(!is.na(YearsToHarvest)) %>%
  mutate(B_yieldperfarmInd = bivperfarm) %>%
  mutate(B_yieldpercellperyear = (B_yieldperfarmInd / YearsToHarvest) * AreaKM2) %>%
  arrange(YearsToHarvest) %>%
  mutate(YieldCumSum = cumsum(B_yieldpercellperyear)) %>%
  mutate(AreaCumSum = cumsum(AreaKM2))

head(productionDFBivCells)
summary(productionDFBivCells)
str(productionDFBivCells)

write.csv(productionDFBivCells, file.path(dir_M, "/git-annex/globalprep/mar/v2019/int/productionDFBivCells.csv")) #save to mazu because so large. This is functionally a raster file. 

productionDFBivCells <- read.csv(file.path(dir_M, "/git-annex/globalprep/mar/v2019/int/bivalve/productionDFBivCells.csv"))

productionByCountryBivDF <- productionDFBivCells %>%
  filter(!is.na(Country)) %>%
  dplyr::select(-YieldCumSum, -X) %>%
  dplyr::select(-AreaCumSum) %>%
  arrange(YearsToHarvest) %>%
  mutate(ID = Country) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  mutate(CountryYieldCumSum = cumsum(B_yieldpercellperyear)) %>%
  mutate(CountryAreaCumSum = cumsum(AreaKM2)) %>%
  mutate(MaxPhi = max(PhiPrime)) %>%
  mutate(averagePhi = mean(PhiPrime)) %>%
  mutate(averageWeightedPhi = sum(PhiPrime*AreaKM2)/(max(CountryAreaCumSum))) %>%
  mutate(MaxDevPerCountry = max(CountryAreaCumSum)) %>%
  mutate(MaxProdPerCountry = max(CountryYieldCumSum)) %>%
  mutate(OnePercentDevPerCountry = .01*max(CountryAreaCumSum))

head(productionByCountryBivDF)

write.csv(productionByCountryBivDF,file.path(dir_M, "git-annex/globalprep/mar/v2019/int/bivalve/BivProdByCountryByCell.csv"), row.names = FALSE) #save to mazu because so large. This is functionally a raster file. 

productionByCountryBivDF <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/bivalve/BivProdByCountryByCell.csv")) 


CountryProdSummary <- productionByCountryBivDF %>%
  filter(CountryAreaCumSum <= OnePercentDevPerCountry) %>%
  mutate(ProdPerCountryOnePercent = max(CountryYieldCumSum)) %>%
  slice(1)

write.csv(CountryProdSummary,file.path(dir_git, "int/BivalveProdByCountrySummary.csv"), row.names = FALSE) #save to github


region_data()
CountryLabel <- rgns_eez %>%
  dplyr::select(ID = rgn_id, rgn_name)

head(CountryLabel)

## I think the relevant value we want for the reference point is in this table:  ProdPerCountryOnePercent. Indonesia number for bivalves is higher than the paper, however the paper does use the phrase "over 3.9*10^11 tonnes". Data says 4.7 * 10^11 million tonnes of bivalves if 1% of aquaculture potential developed, which is greater than 3.9 * 10^11 tonnes.  
CountryProdSummaryFAO <- CountryProdSummary %>%
  ungroup %>%
  dplyr::select(ID:ProdPerCountryOnePercent) %>%
  full_join(CountryLabel, by = "ID")

write.csv(CountryProdSummaryFAO, file.path(dir_git, "int/BivalveProdByCountrySummaryFAO.csv"), row.names = FALSE) #save to github 

sum(CountryProdSummaryFAO$MaxDevPerCountry, na.rm = TRUE)
#1491404 km2. Matches paper ~1,500,000 km2

18.4 Unit Conversion and Gap Filling

To compare potential vs harvest, we need to convert bivalve units to metric tonnes, they are currently in units of individual bivalves. Some figures:

Averaging these gives about 27.5g per piece.

aq_mass_per_pc <- 0.0275 * 1e-3 ### mass of bivalve piece in tonnes

pot_b <- read_csv(file.path(dir_git, 'int/BivalveProdByCountrySummaryFAO.csv')) %>%
  mutate(potential_prod_one_percent_b = ProdPerCountryOnePercent * aq_mass_per_pc,
         potential_prod_max_b = MaxProdPerCountry * aq_mass_per_pc,
         aq_type = 'shellfish',
         units   = 'tonnes') %>%
  dplyr::select("rgn_id" = "ID", rgn_name, potential_prod_one_percent_b, potential_prod_max_b)

pot_f <- read_csv(file.path(dir_git, 'int/FishProdByCountrySummaryFAO.csv')) %>%
  mutate(aq_type = 'finfish',
         units   = 'tonnes') %>%
  dplyr::select("rgn_id" = "ID", rgn_name, "potential_prod_one_percent_f" = "ProdPerCountryOnePercent", "potential_prod_max_f" = "MaxProdPerCountry")

pot_aq_int <- full_join(pot_b, pot_f) ## makes sense. 1% of potential AREA... not 1% of potential PRODUCTION

write_csv(pot_aq, file.path(dir_git, 'int/aq_potential_int.csv')) ##intermediate file that might come in handy in the future

DT::datatable(pot_aq_int)


pot_aq_final <- pot_aq_int %>%
  mutate(potential_prod_one_percent_b = replace_na(potential_prod_one_percent_b, 0),
         potential_prod_one_percent_f = replace_na(potential_prod_one_percent_f, 0)) %>%
  mutate(potential_mar_tonnes = potential_prod_one_percent_b + potential_prod_one_percent_f) %>%
  arrange(rgn_id) %>%
  dplyr::select(rgn_id, potential_mar_tonnes) %>%
  mutate(year = 2019)

write_csv(pot_aq_final, file.path(dir_git, 'output/aq_potential_final.csv'))

DT::datatable(pot_aq_final)

## make gapfilling dataset 
pot_aq_final_gf <- pot_aq_final %>%
  mutate(gapfilled = case_when(
    potential_mar_tonnes == 0 ~ 1,
    potential_mar_tonnes > 0 ~ 0
  ), 
  method = case_when(
    potential_mar_tonnes == 0 ~ "missing regions given 0 value",
    potential_mar_tonnes > 0 ~ ""
  )) %>%
  dplyr::select(rgn_id, gapfilled, method, year)

write_csv(pot_aq_final_gf, file.path(dir_git, "output/aq_potential_gf.csv"))