ohi logo
OHI Science | Citation policy

1 Summary

This analysis converts FAO mariculture and seafood watch sustainability data into data used to calculate the OHI global mariculture status score. This also calculates the genetic escapee from mariculture pressure data.

2 Updates from previous assessment


3 Data Source

3.1 Production data

Reference:
https://www.fao.org/fishery/en/collection/aquaculture?lang=en Release date: March 2023 FAO Global aquaculture production Quantity (1950 - 2021) Global aquaculture production (Value) (1950-2021)

Downloaded: July 5th, 2023

Instructions on downloading the FAO aquaculture data:

Navigate to this link: https://www.fao.org/fishery/statistics-query/en/home.

Click browse on Global aquaculture production. Click browse on Global aquaculture production (Quantity). Scroll to the bottom of the dimensions section, and de-select all of the pre-selected years. Under the query definition section, drag Country Name En (if not already present), ASFIS species Name En, FAO major fishing area Name En, ASFIS species Family scientific name, and Environment Name En, from the possible fields column into the selected rows column. Click on show data and confirm that data is present for 1950- two years prior to current year. Click download and select yes to include Null Values. Rename the file in the format FAO_GlobalAquacultureProduction_Quantity_1950_last_data_year. Upload the file to the appropriate year folder in the FAO_mariculture folder on mazu.

Description: Quantity (tonnes) of mariculture for each country, species, year.

Time range: 1950-2021

3.2 Seafood Watch sustainability data

Obtained upon request of SFW.

Reference:
Seafood Watch Complete Recommendation documentation
Seafood Watch Scoring Theory
Seafood Watch Global Seaweed Assessment (provides gapfilling constants)

Downloaded: May 31, 2023

Description: Monterey Bay Aquarium Seafood Watch aquaculture recommendations. Sustainability scored from 0-10. Rescaled to 0-1 using the max score in the data.


4 Methods

knitr::opts_chunk$set(eval=FALSE)
  
## Load libraries, set directories
library(ohicore)  #devtools::install_github('ohi-science/ohicore@dev')
library(dplyr)
library(stringr)
library(tidyr)
library(tidyverse)
#library(taxize) had issues installing taxize because its dependency phanghorn is not compatible with this version of R
library(knitr)
library(kableExtra)
library(here)
library(plotly)

## Load FAO-specific user-defined functions
source('mar_fxs.R') # functions specific to mariculture dealing with compound countries
source('../../../workflow/R/fao_fxn.R') # function for cleaning FAO files
source('../../../workflow/R/common.R') # directory locations
## This file makes it easier to process data for the OHI global assessment
##  by creating the following objects:
## 
##  * dir_M = identifies correct file path to Mazu (internal server) based on your operating system
##  * mollCRS = the crs code for the mollweide coordinate reference system we use in the global assessment
##  * regions_shape() = function to load global shapefile for land/eez/high seas/antarctica regions
##  * ohi_rasters() = function to load two rasters: global eez regions and ocean region
##  * region_data() = function to load 2 dataframes describing global regions 
##  * rgn_syns() = function to load dataframe of region synonyms (used to convert country names to OHI regions)
##  * low_pop() = function to load dataframe of regions with low and no human population
##  * UNgeorgn = function to load dataframe of UN geopolitical designations used to gapfill missing data

5 Import Raw Data: FAO Mariculture data

Mariculture production in tonnes.

## UPDATE THESE!
current_year <- 2023
last_year <- 2022 #last year of analysis

#last year of data available
#update if not two years behind the current assessment year
latest_data_year <- current_year-2 #most up to date year in the data aquaculture data
previous_latest_data_year <- last_year -2 #last year of data in the last assessment, for comparison plots

current_data_folder <- paste0("d", current_year) # assessment year for file paths
previous_data_folder <- paste0("d", last_year)  # previous assessment year

#update the data file name to the current last year of data 
mar <- read_csv(file.path(dir_M, "git-annex/globalprep/_raw_data/FAO_mariculture", current_data_folder, "FAO_GlobalAquacultureProduction_Quantity_1950_2021.csv")) ; head(mar)

6 Wrangle:

6.1 Tidy mariculture data

Filter freshwater mariculture, make long format, and clean FAO codes.

mar <- mar %>%
  dplyr::select(-`Unit Name`) %>% #should all be "Tonnes - live weight"
  rename(country = `Country Name En`,
         FAO_name = `ASFIS species Name En`, 
         fao = `FAO major fishing area Name En`, 
         environment = `Environment Name En`,
         family_scientific = `Family Scientific name`) %>%
  rename_at(vars(matches("\\[")), ~ str_remove(., "\\[")) %>%
  rename_at(vars(matches("\\]")), ~ str_remove(., "\\]"))


table(mar$environment) 

## Include only marine environments
mar <- mar %>%
filter(environment %in% c("Brackishwater", "Marine")) 


#As of 2023 we no longer need to run the fao_clean_data_new function
# NAs are no longer saved as ... and the flags are in a separate column
#however we should still replace columns where the flag is N w/.1
sub_N = 0.1

mar <- mar %>% mutate(row_id = row_number())


#pivot all of the year/value columns 
mar_values <- mar %>% 
  select(-c(paste("1950":latest_data_year, "Flag"))) %>% 
  pivot_longer(cols = paste0("1950":latest_data_year),
               names_to = "year",
               values_to = "value")

#pivot all of the flag columns   
mar_flags <- mar %>% select(-paste0("1950":latest_data_year)) %>% 
  pivot_longer(cols = paste("1950":latest_data_year, "Flag"),
               names_to = "flag_year",
               values_to = "flag") %>% 
  mutate(year = str_remove(flag_year, " Flag")) %>% 
  select(year, flag, row_id)
  
#combine flag and row id 
mar <- mar_values %>% left_join(mar_flags, by = c("row_id", "year"))

mar <- mar %>% 
  mutate(value = case_when(str_detect(flag, "N") ~ sub_N,
                               TRUE ~value)) %>%
  select(-row_id) %>% 
  mutate(FAO_name = ifelse(!is.na(FAO_name), FAO_name, family_scientific)) 

6.2 Update species names

Update species name in the raw/species_list.csv file with names in the mar dataset (exclude non-food species).

Instructions for updating the species list:

More information can be found in issues #8, #81, #241, #246 for reference.

The species_list.csv is a master list of all the mariculture species in the FAO data. Every year there are potential changes to the FAO data, additional species may be added or names can change, etc. When a species is in the FAO data, but not the species_list.csv it will need to be added to the species_list.csv. To do this, do a quick search within the species_list.csv to make sure it wasn’t some small change to the name (e.g. “long-back eel” vs. “longback eel”). Otherwise, do some research to see if any of the synonyms for the FAO species match the names in species_list.csv. If it is one of these scenarios, the information can be copied to a new row with the name modification. If it seems to be a new taxa, You need to add the species output from this (new.spp) to species_list.csv and fill out the corresponding information. Add the new taxa name, Taxon_code (categories defined in the README in the /raw folder), and family (general family from google) to the species_list.csv. Do some research to make sure the species is being used for food and is marine/brackish. If it is not used for food or is freshwater then it is excluded from the analysis (i.e., an 1 in the exclude column). I.e. add FAO_name, exclude (whether to include in assessment or not. If they are only harvested for medicinal purposes, do not include, or not eaten by humans do not include). Because seaweeds are generally used for both human and non-human consumption, we quantify whether to exclude or not for seaweed species as proportions. For example, if a seaweed species is 80% used for non-human consumption, we would exclude 80% of the harvest (exclude = 0.8). However, for fish species, we only classify as 0 or 1.

## Commented out 'read.csv' lines are different versions of the dataset, representing the three different seaweed exclusion methods (exclude some seaweed, all seaweed, all seaweed nei). Use at the end for data checking. The first one is the one which is saved to the "output" folder, and the main analysis to run. 
mar_sp <- read.csv('raw/species_list.csv', stringsAsFactors=FALSE) %>%
  select(FAO_name, exclude,Taxon_code, family, notes, notes_2)

new.spp <- setdiff(mar$FAO_name, mar_sp$FAO_name)

new.spp # Check: if dim has 0 rows it means all match
## If there is a list of species, hand check species_list.csv to see whether to keep (exclude seaweeds and species harvested for ornamental/medicinal). See instructions above.

### NOTE: if the list of species to add is huge, and it is unmanagable to do by hand, see the mar_dataprep.Rmd in the v2020 folder, and it will have code to use Taxize. Otherwise just do it by hand, because taxize is a pain, and most of them won't be found in taxize anyways... 

## REMOVE SPECIES not relevant to mariculture goal (i.e., non-food species)
mar <- mar %>% left_join(mar_sp, by="FAO_name")

## Filters out species that should be excluded from the MAR subgoal Searches for column name "exclude"
mar <- mar %>% filter(exclude < 1)

## create an "include" column (1-exclude) 
mar <- mar %>%
   mutate(include = 1-exclude)

#rename FAO_name to species
mar <- mar %>% rename(species = FAO_name)


## Multiply value by proportion of harvest to include (include column)
# value column is 
mar <- mar %>%
  mutate(value_include = value*include)

## Check to see if 85% of seaweed goes to human consumption... filter for seaweeds, sum the tonnes included, divide by total tonnes seaweed

# seaweed_check2 <- mar %>%
#   filter(!is.na(value)) %>%
#   filter(Taxon_code == "AL") %>%
#   group_by(year) %>%
#   summarise(include_total = sum(value_include),
#             overall_total = sum(value)) %>%
#   mutate(percent = include_total/overall_total)
# 
#  mean(seaweed_check2$percent) ## 81%.... matches pretty darn well.



##remove NA values 
mar <- mar %>%
  filter(!is.na(value_include)) %>%
  select(country, fao, environment, species, year, Taxon_code, family, value = value_include) 

## Eliminate country-species data with zero production throughout the time-series (1950-recent)
mar <- mar %>%
  group_by(country, species) %>%
  mutate(total_value = sum(value)) %>%
  filter(total_value > 0) %>%
  select(-total_value) %>%
  ungroup()

6.3 Convert country names to OHI regions

#update names before using name_to_rgn

mar <- mar %>%
  mar_split()  # function in mar_fxs.R 
#Divides mariculture from countries that we report as separate regions (assume equal production in all regions)
  # Netherlands Antilles: Conch restoration among Aruba, Bonaire, Curacao
  # Channel Islands: Jersey and Guernsey
  # Bonaire/S.Eustatius/Saba
  # Yugoslavia SFR: no longer a country after 1992, no values 



mar_rgn <- name_2_rgn(df_in = mar, 
                       fld_name='country', 
                       flds_unique=c('species', 'fao', 'environment', 'Taxon_code', 'year')) 


## Sum values of regions with multiple subregions
mar_rgn <- mar_rgn %>%
  group_by(fao, environment, species, year, Taxon_code, rgn_id, family) %>% 
  summarize(value = sum(value)) %>%
  ungroup()

Take a look at the tidied data for a single year and region

test <- data.frame(filter(mar_rgn, rgn_id==207) %>%
  filter(year==2020) %>%
  arrange(species))

7 Gapfilling

7.1 Fill in missing years after first year of harvest data with 0 values

For example: Production of blue shrimp in Maine starts in 1983 – don’t include years before that.

Checked to make sure that there weren’t instances in which it made more sense to carry the previous year’s data forward as a method of gapfilling. This didn’t seem to be the case.

## Spread mar_rgn to create a value for every year-species-region (if missing it will be given a NA)
mar_rgn_spread <- pivot_wider(mar_rgn, names_from = year, values_from = value)
dim(mar_rgn_spread)

## Turn data frame back into long format
mar_rgn_gf <- pivot_longer(mar_rgn_spread, cols = num_range("",paste0(1950:latest_data_year)), names_to = "year", values_to = "value") %>% arrange(rgn_id, species, year, Taxon_code, family, fao, environment)

## NA values are converted to zero.
mar_rgn_gf <- mar_rgn_gf %>%
  mutate(year = as.numeric(year)) %>%
  mutate(value_w_0 = ifelse(is.na(value), 0, value)) %>%
  group_by(fao, environment, species, Taxon_code, family, rgn_id) %>% 
  mutate(cum_value = cumsum(value_w_0)) %>%
  ungroup() %>%
  filter(cum_value > 0) %>% # Eliminates years before mariculture began
  mutate(gap_0_fill = ifelse(is.na(value), "NA_to_zero", "0")) %>% # Record gapfill
  mutate(value = ifelse(is.na(value), 0, value)) %>% # Finally, convert all NAs in original column to 0
  select(-cum_value, -value_w_0)

Check how may NA values were converted to 0

table(mar_rgn_gf$gap_0_fill)
## 3790 of these out of 27810+3790 cases had NA values converted to 0 - 2018 assessment
## 3763 of these out of 28694+4291 cases had NA values converted to 0 - 2019 assessment
## 5932 of these out of 29305 +5932 cases had NA values converted to 0 - 2020 assessment
## 6441 of these out of 30170 + 6441 cases had NA values converted to 0 - 2021 assessment
## 7037 of these out of 30975 + 7037 cases had NA values converted to 0 - 2022 assessment
##7367 of these out of 34187 + 7367 cases had NA values converted to 0 -2023 assessment

Remove family-region-environment time series with less than four years of mariculture production > 0 tonnes (assume these are not established mariculture programs).

mar_rgn_gf = mar_rgn_gf %>% 
  group_by(rgn_id, family, fao, environment) %>%
  mutate (not_0 = length(value[value>0])) %>% # Length of vector of years greater than 0
  filter (not_0>3) %>% # Filter for groups that have at least four years of mariculture production 
  ungroup() %>% 
  select(rgn_id, species, fao, environment, year, value, Taxon_code, family, gap_0_fill) 

maric <- mar_rgn_gf

8 Save mariculture file

Used to estimate total mariculture yield per country.

Saves the Mariculture-FP file

write.csv(maric, 'output/MAR_FP_data.csv', row.names=FALSE)

9 Sustainability Scores from Seafood Watch Data

9.1 Import data: Seafood Watch sustainability scores

These data describe the sustainability country/species combinations. In cases where these data were not available for a specific county/species, we averaged the data across taxonomic groups to gapfill the missing data. We also calculate the genetic escapes pressure layer from this, based on the column AqCriteria6.

## Load in Seafood Watch sustainability scores data from mazu:
#update with name of new data file
sw_sus <- read.csv(file.path(dir_M, "git-annex/globalprep/_raw_data/seafood_watch_mar_sustainability", current_data_folder, "SFW_Aquaculture_ratings_053123.csv"), check.names = FALSE, stringsAsFactors = FALSE, na.strings = c("NA", ""))

head(sw_sus)

#update with name of data file from previous year 
old_sw_sus <- read.csv(file.path(dir_M, "git-annex/globalprep/_raw_data/seafood_watch_mar_sustainability", previous_data_folder, "SFW_Aquaculture_ratings_062222.csv"), check.names = FALSE, stringsAsFactors = FALSE, na.strings = c("NA", ""))

9.2 Wrangle

9.2.1 Tidy Seafood Watch sustainability data

Rename columns to match with MAR data and fill in species column

## Rename columns
sw_sus <- sw_sus %>%
  rename(report_title = 'ReportTitle',
         published_date = 'PublishedDate',
         sw_species = 'CommonNames',
         fao_species = 'FAOCommonName',
         fda_species = 'FDACommonName',
         water_body = 'BOWs',
         country = 'Countries',
        scientific_name = 'ScientificName',
         method = 'Methods',
         score = 'AssessmentScore', 
         escapes_score = 'C6Score',
         rec = 'AssessmentColor' 
         ) %>% 
  dplyr::select(report_title, published_date, sw_species, scientific_name, fao_species, fda_species, country, water_body, method, escapes_score, score, rec) %>%
  filter(!(method %in% c("Freshwater net pen"))) %>% 
  mutate(score = case_when(score < 0 ~ 0,
            TRUE ~ score))

## Change species names using FAO species name (fao_species); if NA, use common name (sw_species)
sw_sus$species <- ifelse(!is.na(sw_sus$fao_species), sw_sus$fao_species, sw_sus$sw_species)

9.2.2 Keep NA countries

## These need to be re-added later (get cut when associated with region ids)
sw_sus_no_rgn <- filter(sw_sus, is.na(country)|country == "Worldwide")
# 118 entries with no country, in 2023 assessment there were 114 "Worldwide" rows and only 1 NA row

9.2.3 Convert country names to OHI region IDs.

## Change country names to match OHI region names (separate rows that are associated with multiple countries)
sw_sus_multiple <- sw_sus %>% 
  filter(str_detect(country, "\\|")) %>%
  separate_rows(country, sep = " \\| ")

sw_sus_df <- sw_sus %>%
  filter(!str_detect(country, "\\|")) %>%
  # rejoin the rows we just removed, but in clean format (1 row per country)
  rbind(sw_sus_multiple) %>%
  # remove all NA and 'Worldwide' rows in order to correctly rejoin those in the next steps, these rows are stored in the variable sw_sus_no_rgn
  filter(!is.na(country), country!="Worldwide")
  
## Convert country names to OHI region IDs. (ohicore/R/name_2_rgn.R)
sw_sus_rgn <- name_2_rgn(df_in = sw_sus_df, 
                       fld_name='country', 
                       flds_unique=c('fao_species', 'fda_species', 'sw_species', 'score'),
                       keep_fld_name = TRUE) # 'country' now shows the original Seafood Watch data name; 'rgn_name' is what we want to use from now on
# 151 entries

## Re-add NA countries
sw_sus_rgn <- bind_rows(sw_sus_rgn, sw_sus_no_rgn) %>%
  unique()
# 259 entries

#2023 switched to comparing common names because the scientific name column was two columns in the previous years data. Could compare scientific names next year. No differences in species detected. 

setdiff(sw_sus_rgn$sw_species, old_sw_sus$CommonNames)
setdiff(old_sw_sus$CommonNames, sw_sus_rgn$sw_species) 

Check that the non-matches between Seafood Watch sustainability data species (sw_sus_rgn) and the FAO mariculture species in the wrangled FAO Aquaculture Production data table (maric) are not due to spelling errors or slightly different names. We want to include as many species that have sustainability scores as possible

maric <- read_csv("output/MAR_FP_data.csv")

## Make sure same species are spelled the same in the two data tables (e.g. check that there are no extra spaces)
sort(setdiff(sw_sus_rgn$species, maric$species)) # Seafood Watch species with no FAO data
sort(setdiff(maric$species, sw_sus_rgn$species)) # FAO species with no Seafood Watch data - there will probably be a long list
#this is a baseline check, run the fixes in the next chunk from previous years before hand fixing any species

#instructions for updating speices:
## Hand check each of the species output here, check to make sure there aren't obvious cases when the species is the same, but the names are presented slightly differently in the lists. For example, some species might have an extra space in the name somewhere. If any of these obvious differences exist, we need to go in and change the names to match. 
## The FAO list has far more species than the sustainability list. But, go over them to make sure that we are getting sustainability scores for as many species as we can.

Fix the non matches

## Rename species in Seafood Watch data to match FAO species data
sw_sus_rgn$species <- gsub("Atlantic Bluefin tuna", "Atlantic bluefin tuna", sw_sus_rgn$species)
#sw_sus_rgn$species <- gsub("Scallops", "Scallops nei", sw_sus_rgn$species)

## rename any "mistakes" in the FAO data, such as species names whose first word is uncapitalized  
maric$species <- gsub("blood cockle", "Blood cockle", maric$species)
maric$species <- gsub("\\[Solea spp\\]", "Common sole", maric$species)

#rename FAO species to general categories when used in seafood watch data 
maric <- maric %>% 
  # Oysters
  mutate(species = case_when(
    family == "Cardiidae" ~ "Cockles",
    str_detect(species,"oyster") ~ "Oysters",
    #clams
    species == "CORBICULIDAE" ~ "Clams",
    str_detect(species, "clam") ~ "Clams",
    str_detect(species, "Clam") ~ "Clams",
    str_detect(species, "mussel") ~ "Mussels",
    Taxon_code == "AL" ~ "Seaweed",
    family == "Haliotidae" ~ "Abalones", 
    family == "Pectinidae" ~ "Scallops",
    TRUE ~ species)) %>% 
  mutate(family = case_when(species == "Clams" ~ NA,
                            species == "Oysters"~ NA,
                            TRUE ~family
         )) #setting family as NA when the group contains more than one family


#updating the names combines species and creates creates duplicate species which we will need to sum
#also sums duplicates due to multiple fao codes for the same region/species
maric <- maric %>%
  group_by(environment, species, year, Taxon_code, rgn_id, family) %>% 
  summarize(value = sum(value), gap_0_fill =first(gap_0_fill)) %>%
  ungroup()

#run this code to check that there are no duplicates for any species, year, rgn_id combinations. If there are they will be unintentionally deleted later, so you may need to change family or taxon codes to match and rerun the above group by
duplicate_check <- maric %>% group_by(environment, species, year, rgn_id) %>% 
  summarize(n = n())
unique(duplicate_check$n) #should be one


#Add a unique identifier per cultivated stock that describes each species, region, and environment grouping.
identifier = maric %>% 
  select(rgn_id, species, environment) %>% 
  unique() %>% 
  mutate(species_code = 1:n())

maric = left_join(maric, identifier)

#run again after cleaning to check there are no remaining mismatched species
sort(setdiff(sw_sus_rgn$species, maric$species)) # Seafood Watch species with no FAO data
sort(setdiff(maric$species, sw_sus_rgn$species))

10 FAO mariculture and Seafood Watch sustainability scores

## Add column "match_type" to categorize obs. that 1) have a country associated, 2) are "global" Seafood Watch data, or 3) only distinguished by water body
sw_sus_rgn <- sw_sus_rgn %>% 
  dplyr::mutate(match_type = 
                  dplyr::case_when(!is.na(rgn_name) ~ "sw_sp_c", # Add match type specific to species/country
                                  country == "Worldwide" ~ "sw_sp_g", # Add match type specific to species/global Seafood Watch data
                                  TRUE ~ "sw_sp_w" # Add match type specific to species/water body
  ))

table(sw_sus_rgn$match_type)

#sw_sp_c sw_sp_g sw_sp_w 
#    144     114       1

Assign a Mediterranean bordering rgn id to each waterbody row

## Add a line for each country that borders each water body
  # The only water body without a rgn is the Mediterranean; add all countries that border it (source: https://www.medqsr.org/mediterranean-marine-and-coastal-environment#:~:text=Today%2021%20countries%2C%20with%20surface,Syria%2C%20Tunisia%2C%20and%20Turkey.)

## filter for waterbody specific rows 
sw_sus_rgn_water <- sw_sus_rgn %>%
  filter(match_type == 'sw_sp_w')

## define Mediterranean rgns
med_rgns <- data.frame(rgn_id = c(82, 84, 232, 187, 81, 214, 179, 80, 79, 184, 78, 67, 68, 185, 186, 62, 188, 182, 77, 61, 76))

# create new function called expand.grid.df()
expand.grid.df <- function(...) Reduce(function(...) merge(..., by=NULL), list(...))

gf_sw_sp_w <- expand.grid.df(sw_sus_rgn_water, med_rgns) %>%
  dplyr::select(-rgn_id.x, rgn_id = rgn_id.y)
## Join data specific to species/country
sw_sp_c <- sw_sus_rgn %>%
  filter(match_type == 'sw_sp_c') %>%
  select(rgn_id, species, Sust_sw_sp_c = score, esc_sw_sp_c = escapes_score)

#there will be many to many relationship b/c maric has more than one of the same species for a rgn_id due to year and the sustainability may have more than one method for a species/rgn combination
mar_sw_sus <- maric %>%
  left_join(sw_sp_c, by = c("species", "rgn_id"))

## Join data specific to species/water body
sw_sp_w <- gf_sw_sp_w %>%
  select(rgn_id, species, Sust_sw_sp_w = score, esc_sw_sp_w = escapes_score)

mar_sw_sus <- mar_sw_sus %>%
  left_join(sw_sp_w, by= c("species", "rgn_id"))

## Join data specific to species/global Seafood Watch data
sw_sp_g <- sw_sus_rgn %>% 
  filter(match_type == 'sw_sp_g') %>%
  select(species, Sust_sw_sp_g = score, esc_sw_sp_g = escapes_score)

mar_sw_sus <- mar_sw_sus %>%
  left_join(sw_sp_g, by= c("species"))

# test3 <- mar_sw_sus %>%
#   filter(Taxon_code == "GAST")

Merge the three Seafood Watch type categories into a single sustainability score and genetic escapes score column in the following order:

  1. Sust_sw_sp_c: specific to species/country
  2. Sust_sw_sp_w: specific to species/water body
  3. Sust_sw_sp_g: specific to species/global Seafood Watch data

For example, if Sust_sw_sp_c is missing, use Sust_sw_sp_w and so on.

mar_sw_sus = mar_sw_sus %>% 
  dplyr::mutate(Sust = ifelse(!is.na(Sust_sw_sp_c), Sust_sw_sp_c, Sust_sw_sp_w)) %>%
  dplyr::mutate(Sust = ifelse(is.na(Sust), Sust_sw_sp_g, Sust)) %>%
  dplyr::mutate(gapfill_sus = case_when(Sust_sw_sp_c == 
                                        Sust ~ "none",
                                        is.na(Sust_sw_sp_c) & Sust_sw_sp_w == Sust ~ "water_body",
                                        is.na(Sust_sw_sp_c) & is.na(Sust_sw_sp_w) & Sust_sw_sp_g == Sust ~ "global_species")) %>%
  dplyr::mutate(esc_score = ifelse(!is.na(esc_sw_sp_c), esc_sw_sp_c, esc_sw_sp_w)) %>%
  dplyr::mutate(esc_score = ifelse(is.na(esc_score), esc_sw_sp_g, esc_score)) %>%
  dplyr::mutate(gapfill_esc = case_when(esc_sw_sp_c == esc_score ~ "none",
                                        is.na(esc_sw_sp_c) & esc_sw_sp_w == esc_score ~ "water_body",
                                        is.na(esc_sw_sp_c) & is.na(esc_sw_sp_w) & esc_sw_sp_g == esc_score ~ "global_species")) %>%
  dplyr::select(-Sust_sw_sp_c, -Sust_sw_sp_w, -Sust_sw_sp_g, -esc_sw_sp_c, -esc_sw_sp_w, -esc_sw_sp_g) 

summary(mar_sw_sus)

Since some regions have multiple sustainability scores for the same species due to multiple aquaculture methods, but we don’t know what proportions of which methods are used, we take the average of the sustainability scores in these instances.

Average sustainability scores within regions with more than one score (due to more than one aquaculture method):

# check which countries have duplicates
anyDuplicated(mar_sw_sus)
# 2022 scenario year: 532
#2023 280

mar_sw_sus <- mar_sw_sus %>% 
  dplyr::group_by(rgn_id, species) %>% 
  dplyr::mutate(Sust_avg = mean(Sust, na.rm=TRUE),
                esc_avg = mean(esc_score, na.rm = TRUE)) %>% 
  dplyr::ungroup()

# code NAN values in Sust_avg and esc_avg columns as NA 
mar_sw_sus$Sust_avg[is.nan(mar_sw_sus$Sust_avg)] <- NA
mar_sw_sus$esc_avg[is.nan(mar_sw_sus$esc_avg)] <- NA
# nrows = 48,748

Get rid of duplicates for region/species/year:

mar_sw_sus <- mar_sw_sus %>% 
  dplyr::distinct(rgn_id, species, environment, year, .keep_all = TRUE) %>%
  # remove old Sust column (which was used to produce Sust_avg column) &
  # rename Sust_avg to Sust &
  # remove old esc_score column (which was used to produce esc_avg column) &
  # rename esc_avg to esc_score :
  dplyr::select(-Sust, Sust = Sust_avg, -esc_score, esc_score = esc_avg)

Now look at a summary after appending all the Seafood Watch data

summary(mar_sw_sus) 
# 21215 NA values out of 32051 total obs; ~68% without scores (2022)
# 16378 NA values out of 33732 total obs ~ 49% without scores (2023)

Gapfilling based on families and taxon_code

Steps:
- Group by country and family and summarise to get mean sustainability/escapes values (gapfilled by country/family)
- Group by georegion and family and summarise to get mean sustainability/escapes values (gapfilled by georegion/family)
- Group by family (global family gapfilled)
- Gapfill by average of the taxon_code
- Assign any Algae with the overall SFW seaweeds score from SFW data (7.92) and the C6 Escapes score (4, a subcomponent of the overall seaweed score)
- link to the SFW Global Assessment for Seaweed available for download here
- Apply the global average to any remaining sustainability/escapes score NAs
- Finally, we rescale the scores to the highest sustainability score, under the assumption that the highest score is the best sustainability currently out there, and should be given a score of 100.

## First we need to add a georegion column that we can group by later on. 

mar_sw_sus <- mar_sw_sus %>%
  add_georegion_id() %>%
  dplyr::select(-level)

# ## Exploratory plots on whether regions or species drive the sustainability scores...
# plot(mar_sw_sus$georgn_id, mar_sw_sus$Sust)
#  plot(as.factor(mar_sw_sus$species), mar_sw_sus$Sust)
# # 
#  mod_spp <- lm(Sust ~ as.factor(georgn_id) + species, data = mar_sw_sus)
# # anova(mod_spp)
#  summary(mod_spp)
# # 
#  mod_rgn <- lm(Sust ~ as.factor(georgn_id), data = mar_sw_sus)
# # anova(mod_rgn)
# # summary(mod_rgn)
# # 
# mod_spp1 <- lm(Sust ~ family, data = mar_sw_sus)
# # anova(mod_rgn)
#  summary(mod_spp1)
# it makes more sense to use species/taxon gapfilling methods

#unique(mar_sw_sus$avg_global_esc)

mar_sw_sus <- mar_sw_sus %>%
  group_by(rgn_id, family) %>%
  mutate(avg_rgn_fam_sust = mean(Sust, na.rm = TRUE), # produced NaN values (2022 scenario year)
         avg_rgn_fam_esc = mean(esc_score, na.rm = TRUE)) %>% # produced NaN values (2022 secario year)
  ungroup() %>%
  group_by(georgn_id, family) %>%
  mutate(avg_gr_fam_sust = mean(Sust, na.rm = TRUE), # produced some NaN values (2023 scenario year)
         avg_gr_fam_esc = mean(esc_score, na.rm = TRUE)) %>% # produced NaN values (2023 scenario year)
  ungroup() %>%
  group_by(family) %>%
  mutate(avg_fam_sust = mean(Sust, na.rm = TRUE), # produced some NaN values (2022 scenario year)
         avg_fam_esc = mean(esc_score, na.rm = TRUE)) %>% # produced NaN values (2022 scen ario year)
  ungroup() %>%
  group_by(Taxon_code) %>%
  mutate(avg_taxon_code_sust = mean(Sust, na.rm = TRUE), # produced some NaN values (2022 scecario year)
         avg_taxon_code_esc = mean(esc_score, na.rm = TRUE)) %>% # produced some NaN values (2022 scecario year)
  ungroup() %>%
  mutate(avg_global = mean(Sust, na.rm = TRUE),
         avg_global_esc = mean(esc_score, na.rm = TRUE)) %>%
  ungroup()

# code NaN as NA for the columns that produced NaN, likely due to taking the mean of a set of all NA values 
mar_sw_sus$avg_rgn_fam_sust[is.nan(mar_sw_sus$avg_rgn_fam_sust)] <- NA
mar_sw_sus$avg_rgn_fam_esc[is.nan(mar_sw_sus$avg_rgn_fam_esc)] <- NA
mar_sw_sus$avg_gr_fam_sust[is.nan(mar_sw_sus$avg_gr_fam_sust)] <- NA
mar_sw_sus$avg_gr_fam_esc[is.nan(mar_sw_sus$avg_gr_fam_esc)] <- NA
mar_sw_sus$avg_fam_sust[is.nan(mar_sw_sus$avg_fam_sust)] <- NA
mar_sw_sus$avg_fam_esc[is.nan(mar_sw_sus$avg_fam_esc)] <- NA
mar_sw_sus$avg_taxon_code_sust[is.nan(mar_sw_sus$avg_taxon_code_sust)] <- NA
mar_sw_sus$avg_taxon_code_esc[is.nan(mar_sw_sus$avg_taxon_code_esc)] <- NA

Obtain a sustainability and escapes score for each record, and a book-keeping column of whether it’s actual or gapfilled

For missing sustainability and escapes scores:

Steps:

  1. Group by country and family and summarise to get mean sustainability and escapes values (gapfilled by country/family)
  2. Group by georegion and family and summarise to get mean sustainability and escapes values (gapfilled by georegion/family)
  3. Group by family to get mean sustainability and escapes values (global family gapfilled)
  4. Gapfill by average of the taxon_code to get mean sustainability and escapes values (taxon_group gapfilled)
  5. Assign any Algae with the seaweeds score from SFW data (6.72 for sustainability, 4 for escapes)
  6. Apply the global average to any remaining sustainability and escapes score NAs
# define gapfilling scores for global seaweed and escapes (a subcomponent of the global seaweed score) found in pdf here (updated occasionally): https://www.seafoodwatch.org/recommendation/seaweed/seaweed-31568
# seaweed_gf_score <- 6.72
# escapes_gf_score <- 4 
#in 2023 this wasn't necessary because these scores were already included in the seafood watch data, and I already applied them to all seaweed species when cleaning seafood watch data. Left the code above in case it is needed in future years. 

mar_sw_sus_final = mar_sw_sus %>%
  
  #### sustainability scores ####
  mutate(Sust = ifelse(is.na(Sust), avg_rgn_fam_sust, Sust)) %>% # Gapfill with region/family level
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_rgn_fam_sust, "region_family_avg", gapfill_sus)) %>% ## add in rgn family gapfill record
  mutate(Sust = ifelse(is.na(Sust), avg_gr_fam_sust, Sust)) %>% # Gapfill with georegion/family level
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_gr_fam_sust, "georegion_family_avg", gapfill_sus)) %>% # Add in georegion/family gapfill record
  mutate(Sust = ifelse(is.na(Sust),  avg_fam_sust, Sust)) %>% # Gapfill with family level
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_fam_sust, "family_avg", gapfill_sus)) %>% ## Add in family gapfill record
  mutate(Sust = ifelse(is.na(Sust), avg_taxon_code_sust, Sust)) %>%
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_taxon_code_sust, "taxon_group_avg", gapfill_sus)) %>%
  #mutate(Sust = ifelse(Taxon_code == "AL", seaweed_gf_score, Sust)) %>% ## assign any algae with the overall SFW seaweed score, see Steps above for reference to document
  #mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == seaweed_gf_score, "seaweed_score", gapfill_sus)) %>%
  mutate(Sust = ifelse(is.na(Sust), avg_global, Sust)) %>% 
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_global, "global_avg", gapfill_sus)) %>%
  
  #### genetic escapes scores ####
  mutate(esc_score = ifelse(is.na(esc_score), avg_rgn_fam_esc, esc_score)) %>% # Gapfill with region/family level
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_rgn_fam_esc, "region_family_avg", gapfill_esc)) %>% ## add in rgn family gapfill record
  mutate(esc_score = ifelse(is.na(esc_score), avg_gr_fam_esc, esc_score)) %>% # Gapfill with georegion/family level
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_gr_fam_esc, "georegion_family_avg", gapfill_esc)) %>% # Add in georegion/family gapfill record
  mutate(esc_score = ifelse(is.na(esc_score),  avg_fam_esc, esc_score)) %>% # Gapfill with family level
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_fam_esc, "family_avg", gapfill_esc)) %>% ## Add in family gapfill record
  mutate(esc_score = ifelse(is.na(esc_score), avg_taxon_code_esc, esc_score)) %>%
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_taxon_code_esc, "taxon_group_avg", gapfill_esc)) %>%
  #mutate(esc_score = ifelse(Taxon_code == "AL", escapes_gf_score, esc_score)) %>% ## assign any algae with the SFW seaweed score for C6 Escapes, see Steps above for reference to document
  #mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == escapes_gf_score, "seaweed_score", gapfill_esc)) %>%
  mutate(esc_score = ifelse(is.na(esc_score), avg_global_esc, esc_score)) %>% 
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_global_esc, "global_avg", gapfill_esc)) %>%

  #### final tidying ####
  mutate(taxa_code = paste(species, species_code, sep="_")) %>%
  select(rgn_id, species, species_code, taxa_code, taxa_group=Taxon_code, family, year, gapfill_sus, gapfill_esc, gapfill_fao = gap_0_fill, tonnes=value, Sust, esc_score) %>%
  mutate(Sust = round(Sust/10, 2),
         esc_score = 1 - round(esc_score/10, 2))

summary(mar_sw_sus_final)
filter(mar_sw_sus_final, gapfill_sus == "georegion_family_avg")
filter(mar_sw_sus_final, gapfill_sus == "family_avg")
filter(mar_sw_sus_final, gapfill_sus == "taxon_group_avg")

Now we need to prep the escapes pressure layer a bit more. Final formatting of the escapee data. This is used as a pressure layer.

For each region/year: take a weighted average of the genetic escape probability for each taxa based on tonnes of mariculture.

# subset the cleaned data to only include escapes data
tonnes_esc <- mar_sw_sus_final %>%
  dplyr::select(rgn_id, year, species, gapfill_esc, esc_score, tonnes)

# use "tonnes" values to weight the regional average of escapes
genEscapes <- tonnes_esc %>%
  group_by(rgn_id, year) %>%
  summarize(genEscapes = weighted.mean(esc_score, tonnes, na.rm=TRUE))

Obtain gapfill information from genEscapes

Gapfill values in output file are the proportion of data that is gapfilled.

## Obtain corresponding gapfilling information for each region (average of gapfilled data, weighted by tonnes of mariculture).
genEscapes_gf <- tonnes_esc %>%
  mutate(gapfill = ifelse(gapfill_esc == "none", 0, 1)) %>%
  group_by(rgn_id, year) %>%
  summarize(genEscapes = weighted.mean(gapfill, tonnes, na.rm=TRUE)) %>%
  ungroup() %>%
  filter(year == latest_data_year) %>%
  select(rgn_id, pressures.score=genEscapes) %>%
  mutate(pressures.score=ifelse(pressures.score == "NaN", NA, pressures.score)) %>%
  data.frame()

11 Save data: Genetic Escapes

11.1 Save gapfill data

## Saves the Genetic Escapes Gapfill file
write.csv(genEscapes_gf, 'output/GenEsc_gf.csv', row.names = FALSE)

Obtain escapee data layers from genEscapes

# Create the escapee data layers:
data <- genEscapes %>%
    select(rgn_id, year, pressure_score = genEscapes)

11.2 Save escapee pressure layer

## Saves the Genetic Escapes data file 
write.csv(data, 'output/GenEsc.csv', row.names=FALSE)  

12 Save Data: Mariculture tonnes and sustainability scores

## Save mariculture harvest data
mar_harvest_tonnes = mar_sw_sus_final %>%
  select(rgn_id, taxa_code, taxa_group, family, year, tonnes)

anyDuplicated(mar_harvest_tonnes) # Check for duplication

## Saves the appropriate Mariculture Harvest Tonnes file
write.csv(mar_harvest_tonnes, 'output/mar_harvest_tonnes.csv', row.names=F)


## Save gapfill data for mariculture harvest
mar_harvest_tonnes_gf = mar_sw_sus_final %>%
  select(rgn_id, taxa_code, taxa_group, family, year, tonnes=gapfill_fao)


## Saves the appropriate Mariculture Harvest Gapfill file 
write.csv(mar_harvest_tonnes_gf, 'output/mar_harvest_tonnes_gf.csv', row.names=F)

# define the maximum sustainability score, check with: summary(mar_sw_sus_final)
max_sus <- 0.77

## Save sustainability scores data for 2023 (SFW data year)
mar_sustainability_score = mar_sw_sus_final %>%
  mutate(year = current_year) %>% #change year if data not updated to current year
  select(rgn_id, year, taxa_code, sust_coeff = Sust) %>%
  unique() %>%
  mutate(sust_coeff = ifelse(sust_coeff > max_sus, 1, sust_coeff/max_sus)) # rescale to the highest sust score

anyDuplicated(mar_sustainability_score)


## Saves the appropriate Sustainability Score file 
write.csv(mar_sustainability_score, 'output/mar_sustainability.csv', row.names=F)


## Save gapfill data for sustainability scores
mar_sustainability_score_gf = mar_sw_sus_final %>%
  select(rgn_id, taxa_code, sust_coeff = gapfill_sus) %>%
  unique()


## Saves the appropriate Sustainability Score Gapfill file 
write.csv(mar_sustainability_score_gf, 'output/mar_sustainability_gf.csv', row.names=F)


## save mar sustainability for data checking; rgn_id, species, score
mar_sustainability_score = mar_sw_sus_final %>%
  mutate(year = current_year) %>% # Only 2023 sustainability scores exist (Seafood Watch)
  select(rgn_id, year, species, sust_coeff = Sust) %>%
  unique() %>% 
    mutate(sust_coeff = ifelse(sust_coeff > max_sus, 1, sust_coeff/max_sus)) # rescale to the highest sust score

## Saves the appropriate Sustainability Score file
write.csv(mar_sustainability_score, 'int/mar_sustainability_check.csv', row.names=F)

12.1 Data check mariculture and sustainability scores

Comparing this year’s data to previous year’s data. Expect small variation from year to year. Plot to view differences.

#read old and new data for plots
mar_old_data <- read.csv(file.path("..", paste0( "v", last_year), "output/MAR_FP_data.csv")) 
mar_new_data <- read.csv("output/MAR_FP_data.csv") 


## Compare mariculture yield data for all countries: 2021 versus 2020
mar_old <- mar_old_data %>% 
  filter(year == previous_latest_data_year) %>% 
  select(rgn_id, species, old_value = value, environment)
mar_new <- mar_new_data %>% 
  filter(year == latest_data_year) %>% 
  select(rgn_id, species, value, environment)


yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","environment"))#; View(yield)

comp_plot <- ggplot() + 
  geom_point(data = yield, aes(x = old_value, y = value, text = paste("species", species, "<br>Regionid:", rgn_id))) +
  labs(x = paste("previous value",previous_latest_data_year), y = paste("Current Value", latest_data_year)) +
  geom_abline(slope = 1, intercept = 0, col = "red")


ggplotly(comp_plot)



## Compare mariculture yield data for all countries: 2020 last year to 2020 this year
mar_old <- mar_old_data %>% 
  filter(year == previous_latest_data_year) %>% 
  select(rgn_id, species, old_value = value, environment, fao)
mar_new <- mar_new_data %>% 
  filter(year == previous_latest_data_year) %>% 
  select(rgn_id, species, value, environment, fao)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species", "environment", "fao")); View(yield)


comp_plot <- ggplot() + 
  geom_point(data = yield, aes(x = old_value, y = value, text = paste("species", species, "<br>Regionid:", rgn_id))) +
  labs(x = paste("Previous value",previous_latest_data_year), y = paste("Current Value", previous_latest_data_year)) +
  geom_abline(slope = 1, intercept = 0, col = "red")


ggplotly(comp_plot)


## Compare mariculture yield data for Russia: 2019 data from 2021 assessment to 2020 data from 2022 assessment
mar_old <- mar_old_data %>% 
  filter(rgn_id == 73, year == previous_latest_data_year) %>% 
  select(rgn_id, species, fao, old_value = value, environment)
mar_new <- mar_new_data %>% 
  filter(rgn_id == 73, year == latest_data_year) %>% 
  select(rgn_id, species, fao, value, environment)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)

plot(yield$old_value, yield$value);abline(0,1,col="red")


## Compare mariculture yield data for Iceland: 2019 data from 2021 assessment to 2020 data from 2022 assessment
mar_old <- mar_old_data %>% 
  filter(rgn_id == 143, year == previous_latest_data_year) %>% 
  select(rgn_id, species, fao, old_value = value)
mar_new <- mar_new_data %>% 
  filter(rgn_id == 143, year == latest_data_year) %>% 
  select(rgn_id, species, fao, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)

plot(yield$old_value, yield$value);abline(0,1,col="red")


## Compare mariculture yield data for Vietnam
mar_old <- mar_old_data %>% 
  filter(rgn_id == 207, year == previous_latest_data_year) %>% 
  select(rgn_id, species, fao, environment, old_value = value)
mar_new <- mar_new_data %>% 
  filter(rgn_id == 207, year == latest_data_year) %>% 
  select(rgn_id, species, fao, environment, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao", "environment")); #View(yield)

plot(yield$old_value, yield$value)


## Compare yield data for Belize; yield of Whiteleg shrimp was halved between 2019 and 2020 assessment years.
mar_old <- mar_old_data%>% 
  filter(rgn_id == 164, year == previous_latest_data_year) %>% 
  select(rgn_id, species, fao, old_value = value)
mar_new <- mar_new_data %>% 
  filter(rgn_id == 164, year == latest_data_year) %>% 
  select(rgn_id, species, fao, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)


## Compare yield data for tanzania
mar_old <- mar_old_data %>% 
  filter(rgn_id == 202, year == previous_latest_data_year) %>% 
  select(rgn_id, species, fao, old_value = value, environment)
mar_new <- mar_new_data %>% 
  filter(rgn_id == 202, year == previous_latest_data_year) %>% 
  select(rgn_id, species, fao, value, environment)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao", "environment"))

plot(yield$old_value, yield$value);abline(0,1,col="red")


## Compare yield data for denmark; pressure score decreased
mar_old <- mar_old_data %>% 
  filter(rgn_id == 175, year == previous_latest_data_year) %>% 
  select(rgn_id, species, fao, old_value = value, environment)
mar_new <- mar_new_data %>% 
  filter(rgn_id == 175, year == latest_data_year) %>% 
  select(rgn_id, species, fao, value, environment)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao", "environment")); View(yield)

plot(yield$old_value, yield$value);abline(0,1,col="red")

## Compare yield data for Ecuador, score increased by a lot
mar_old <- mar_old_data %>% 
  filter(rgn_id == 137, year == previous_latest_data_year) %>% 
  select(rgn_id, species, fao, old_value = value)
mar_new <- mar_new_data %>% 
  filter(rgn_id == 137, year == latest_data_year) %>% 
  select(rgn_id, species, fao, value)

yield <- mar_old %>% 
  full_join(mar_new, by = c("rgn_id","species","fao")); View(yield)

plot(yield$old_value, yield$value)

# ----Sustainability scores--------------------------------------------------------------------------


## Compare new/old Mariculture sustainability scores

sust_old <- read.csv(file.path("..", paste0( "v", last_year), "output/mar_sustainability.csv")) %>% 
  select(rgn_id, taxa_code, sust_old = sust_coeff)

  sust_old$species = str_remove(sust_old$taxa_code, "\\_[^.]*$")
sust_new <- read.csv("int/mar_sustainability_check.csv") %>% 
  select(rgn_id, species, sust_new = sust_coeff)

test <- sust_new %>% 
  full_join(sust_old, by=c("rgn_id", "species")) %>%
  mutate(difference = sust_new - sust_old) %>%
  group_by(rgn_id) %>%
  summarize(mean_rgn_sus_diff = mean(difference, na.rm = TRUE))

mean_diff_sus <- ggplot(test, aes(x = rgn_id, y = mean_rgn_sus_diff)) + 
  geom_point()

ggplotly(mean_diff_sus)

12.2 Data check genetic escapes

Comparing this year’s data to previous year’s data. Expect small variation from year to year. Plot to view differences.

## Compare genetic escapes pressure score; saw large changes between 2018 and 2017 assessment years. No big changes between 2019 and 2018 assessment years. Big changes from 2019 to 2020 due to a data update. Small changes between 2020 and 2021 due to updating the FAO data, gapfilling caused this. Also fixed some name errors. Small changes between 2021 and 2022, added a few name conversions. Some changes between 2022 and 2023, fixed a handfull of errors. 
old <- read.csv("output/GenEsc.csv") %>%
  filter(year == previous_latest_data_year) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("output/GenEsc.csv") %>%
  filter(year == latest_data_year) %>%
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id"))


comp_plot <- ggplot() + 
  geom_point(data = compare, aes(x = prs_score_old, y = pressure_score, text = paste("Regionid:", rgn_id))) +
  labs(x = paste("Previous pressure",previous_latest_data_year), y = paste("Current pressre", latest_data_year)) +
  geom_abline(slope = 1, intercept = 0, col = "red")


ggplotly(comp_plot)


#compare old genetic escape pressure scores for the same year 
old <- read.csv(file.path("..", paste0( "v", last_year), "output/GenEsc.csv")) %>%
  filter(year == previous_latest_data_year) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <-  read.csv("output/GenEsc.csv") %>% 
  filter(year == previous_latest_data_year) %>%
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id"))#; View(compare)

comp_plot <- ggplot() + 
  geom_point(data = compare, aes(x = prs_score_old, y = pressure_score, text = paste("Regionid:", rgn_id))) +
  labs(x = paste("Previous pressure",previous_latest_data_year), y = paste("Current pressre", previous_latest_data_year)) +
  geom_abline(slope = 1, intercept = 0, col = "red")


ggplotly(comp_plot)



## Compare genetic escapes pressure scores for Vietnam, data was updated by fao
old <- read.csv(file.path("..", paste0( "v", last_year), "output/GenEsc.csv")) %>%
  filter(rgn_id == 207) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("output/GenEsc.csv") %>%
  filter(rgn_id == 207) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); #View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")


## Compare genetic escapes pressure scores for Denmark
old <- read.csv(file.path("..", paste0( "v", last_year), "output/GenEsc.csv")) %>%
  filter(rgn_id == 175) %>% 
  select(rgn_id, year, prs_score_old=pressure_score)

new <- read.csv("output/GenEsc.csv") %>%
  filter(rgn_id == 175) %>% 
  select(rgn_id, year, pressure_score)

compare <- full_join(old, new, by=c("rgn_id","year")); View(compare)

plot(compare$prs_score_old, compare$pressure_score);abline(0,1,col="red")

NOTE: THIS DOES NOT NEED TO BE RUN ANYMORE. Below here is archival… unless we change the reference point.

13 Potential Aquaculture Calculations for New Reference Point

\[REFERENCE RMD FILE: <https://raw.githubusercontent.com/OHI-Science/ohiprep_v2019/gh-pages/globalprep/mar/v2019/reference_point/CountryProductionEstimate.Rmd>\]

14 Summary

This analysis produces potential tonnes of aquaculture from growth potential estimates of finfish and bivalves. These aquaculture numbers will be used for the reference point for the mariculture subgoal. This only needs to be run for the 2019 global assessment, as this data is static.


15 Data Source

15.1 Growth Potential (Phi) data

Reference:
https://knb.ecoinformatics.org/view/doi:10.5063/F1CF9N69 Rebecca Gentry, Halley Froehlich, Dietmar Grimm, Peter Kareiva, Michael Parke, et al. SNAPP - Mapping the Global Potential for Marine Aquaculture. Knowledge Network for Biocomplexity. doi:10.5063/F1CF9N69.

Downloaded: 7/3/2019

Description: Growth Potential estimate raster for global cells


16 Methods

knitr::opts_chunk$set(eval=FALSE)
#######regression inputs from VBGF_Fish.r

# This is a heavily modified script from KNB:
# https://knb.ecoinformatics.org/view/doi:10.5063/F1CF9N69

# From this paper:
# https://www.nature.org/content/dam/tnc/nature/en/documents/Mapping_the_global_potential_for_marine_aquaculture.pdf


### libraries useful for data wrangling
library(dplyr)
library(tidyr)
library(tidyverse)

## libraries useful for spatial data
library(raster)       
library(rgdal)        
library(sf)         
library(fasterize)

## data visualization
library(RColorBrewer)
library(ggplot2)
library(rasterVis)    
library(maps)

## path management
library(here)


## some OHI files
source('http://ohi-science.org/ohiprep_v2019/workflow/R/common.R')
dir_git <- file.path(here(), "globalprep/mar/v2019")

16.1 Establish variables and coefficients

## Variables needed to get from PHI to tonnes of production
## MRF: assumes 1 farm is 1km2  ##
# (seems easier to just units of km2, rather than farm)

F_estCoef <- c(7.6792, (-5.8198)) #from regression estimated in VBGF_Fish_Final.r
B_estCoef <- c(2.9959,(-1.6659)) #from regression estimate in VBGF_Bivalves.r
density <- 20 #juveniles per m3
cagesize <- 9000 #m3
cagesperfarm <- 24 #located atleast 1 km apart.. cagesperkm2
bivperfarm <- 130000000 #(100 longlines/km2) * 4 km * (100 bivalves seeded/0.0003 km) = 133333333 bivalve/km2
weight35cm <- 554.8  ## in grams see VBGF_Fish_Final. Paper reports 548 grams 

16.2 Finfish Production

## Global tiff file of PHI (Growth Potential) estimates
FishPhiALLConstraints <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishPhiALLConstraints95LT2.tiff"))
plot(FishPhiALLConstraints)
FishPhiVector=getValues(FishPhiALLConstraints)


## Convert Phi raster to number of years it takes to grow a 35 cm fish

LogFishYears <- calc(FishPhiALLConstraints, fun=function(x){F_estCoef[1]+F_estCoef[2]*log(x)})
LogFishYears
plot(LogFishYears)

FishYears <- calc(LogFishYears, fun=function(x){exp(x)})

FishYears
plot(FishYears)
writeRaster(FishYears,file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishYearsbyCell.tif"), overwrite=TRUE)

FishYears <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishYearsbyCell.tif"))
FishYearsVector=getValues(FishYears)



#OHI 2018 regions (original analysis used older regions file)

## call spatial file from sourced file
regions_shape()
OHIcountries <- regions %>%
  filter(type_w_ant == "eez")
OHIcountries <- st_transform(OHIcountries, crs(FishPhiALLConstraints))
OHIcountries_raster <- fasterize(OHIcountries, FishPhiALLConstraints, field="rgn_id")  
CountryVector=getValues(OHIcountries_raster)




### area of each cell (each cell is different given lat/long coordinate reference system)
areaPerCell <- area(FishPhiALLConstraints, weights=FALSE, na.rm=TRUE)
areaPerCellVector <- getValues(areaPerCell)

### Make a dataframe with raster values that includes cells: Country, area, Phi, and Years to Harvest
productionDF <- data.frame(CellID = 1:933120000,
                           Country = CountryVector,
                           AreaKM2 = areaPerCellVector,
                           PhiPrime = FishPhiVector, 
                           YearsToHarvest = FishYearsVector)

head(productionDF)

summary(FishYearsVector)
summary(areaPerCellVector) ##they seem to match



## calculate production for each cell
productionDFFishCells <- productionDF %>%
  filter(!is.na(YearsToHarvest)) %>%
  mutate(F_yieldperfarmMT = (weight35cm * density * cagesize * cagesperfarm)/1000000) %>%  # MRF: units yieldperkm2? 554.8 grams * 20 juv/m3 * 9000 m3 * 24 cages/km2 = grams/km2
  mutate(F_yieldpercellperyear = (F_yieldperfarmMT/YearsToHarvest) * AreaKM2) %>%
  arrange(YearsToHarvest) %>%
  mutate(YieldCumSum = cumsum(F_yieldpercellperyear)) %>%
  mutate(AreaCumSum = cumsum(AreaKM2))

write.csv(productionDFFishCells, file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/productionDFFishCells.csv")) #save to mazu because so large. This is functionally a raster file. 

productionDFFishCells <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/productionDFFishCells.csv"))

head(productionDFFishCells)
summary(productionDFFishCells)
str(productionDFFishCells)
##cumsum area is 11,402,629 km2 -  # matches the paper 
##cumprod is 15,950,000,000MT -   # matches the paper 

### how many of these cells are not in a country?
sum(is.na(productionDFFishCells$Country))
dim(productionDFFishCells)

## MRF: with new OHI regions: 544,569 are not in a country..probably a lot are in conflicted areas


## Calculate production if 1% of top production area is used:
productionByCountryFishDF <- productionDFFishCells %>%
  filter(!is.na(Country)) %>%
  dplyr::select(-YieldCumSum, -AreaCumSum, -X) %>%
  arrange(YearsToHarvest) %>%
  mutate(ID = Country) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  mutate(CountryYieldCumSum = cumsum(F_yieldpercellperyear)) %>%
  mutate(CountryAreaCumSum = cumsum(AreaKM2)) %>%
  mutate(MaxProdPerCountry = max(CountryYieldCumSum)) %>% 
  mutate(OnePercentDevPerCountry = .01*max(CountryAreaCumSum)) #calculating 1 percent of area

write.csv(productionByCountryFishDF,file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/FishProdByCountryByCell.csv"), row.names = FALSE) #save to mazu because so large. This is functionally a raster file. 

productionByCountryFishDF <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/FishProdByCountryByCell.csv"))

## MRF: For each area identify amount of area that corresponds to 1% of production area, 
## MRF: assume maximum production within the country for the 1% of area
CountryProdSummary <- productionByCountryFishDF %>%
  dplyr::arrange(YearsToHarvest) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  filter(CountryAreaCumSum <= OnePercentDevPerCountry) %>%
  mutate(ProdPerCountryOnePercent = max(CountryYieldCumSum)) %>%
  slice(1)


write.csv(CountryProdSummary, file.path(dir_git, "/int/FishProdByCountrySummary.csv"), row.names = FALSE) #save to github

CountryProdSummary <- read.csv(file.path(dir_git, "int/FishProdByCountrySummary.csv"))

# MRF: get fasted YearsToHarvest for each country
CountryProdSummaryNop <- productionByCountryFishDF %>%
  dplyr::arrange(YearsToHarvest) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  slice(1)



# Add country names

region_data()
CountryLabel <- rgns_eez %>%
  dplyr::select(ID = rgn_id, rgn_name)


## Final data
## I think the relevant value we want for the reference point is in this table:  ProdPerCountryOnePercent. Indonesia number for fish matches up with paper. > 24 million tonnes of fish if 1% of aquaculture potential developed. 
CountryProdSummaryFAO <- CountryProdSummary %>%
  ungroup %>%
  dplyr::select(ID:ProdPerCountryOnePercent) %>%
  full_join(CountryLabel, by= "ID")

write.csv(CountryProdSummaryFAO, file.path(dir_git, "/int/FishProdByCountrySummaryFAO.csv"), row.names = FALSE) #save to github

sum(CountryProdSummaryFAO$MaxProdPerCountry, na.rm = TRUE)
#15451023277 number matches paper. > 15 billion tonnes 

16.3 Bivalve Production

##Now for Bivalves 
BivalvePhiALLConstraints=raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalvePhiALLConstraints95LT1.tif"))

plot(BivalvePhiALLConstraints)
BivalvePhiVector <- getValues(BivalvePhiALLConstraints)


#OHI 2018 regions (original analysis used older regions file)

## call spatial file from sourced file
regions_shape()

OHIcountries <- regions %>%
  filter(type_w_ant == "eez")

OHIcountries <- st_transform(OHIcountries, crs(FishPhiALLConstraints))

OHIcountries_raster <- fasterize(OHIcountries, FishPhiALLConstraints, field="rgn_id")  

CountryVector <- getValues(OHIcountries_raster)

#make the value of each cell the years it takes to grow a 4 cm bivlave
LogBivalveYears <- calc(BivalvePhiALLConstraints, fun = function(x){B_estCoef[1] + B_estCoef[2]*(x)})
LogBivalveYears
plot(LogBivalveYears)

BivalveYears=calc(LogBivalveYears,fun=function(x){exp(x)})

BivalveYears
plot(BivalveYears)

writeRaster(BivalveYears,file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalveYearsByCell.tif"), overwrite=TRUE)

BivalveYears <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalveYearsByCell.tif"))

BivalveYearsVector <- getValues(BivalveYears)


###now load in area values for each cell
#areaPerCell=raster("Spatial_Data/MiddleFiles/AreaBivalveLT1.grd")
areaPerCell <- area(BivalvePhiALLConstraints, weights=FALSE, na.rm=TRUE)
areaPerCell

areaPerCellVector=getValues(areaPerCell)


productionDFBiv <- data.frame(CellID=1:933120000,
                           Country=CountryVector,
                           AreaKM2=areaPerCellVector,
                           PhiPrime=BivalvePhiVector, 
                           YearsToHarvest=BivalveYearsVector)

head(productionDFBiv)

productionDFBivCells <- productionDFBiv %>%
  filter(!is.na(YearsToHarvest)) %>%
  mutate(B_yieldperfarmInd = bivperfarm) %>%
  mutate(B_yieldpercellperyear = (B_yieldperfarmInd / YearsToHarvest) * AreaKM2) %>%
  arrange(YearsToHarvest) %>%
  mutate(YieldCumSum = cumsum(B_yieldpercellperyear)) %>%
  mutate(AreaCumSum = cumsum(AreaKM2))

head(productionDFBivCells)
summary(productionDFBivCells)
str(productionDFBivCells)

write.csv(productionDFBivCells, file.path(dir_M, "/git-annex/globalprep/mar/v2019/int/productionDFBivCells.csv")) #save to mazu because so large. This is functionally a raster file. 

productionDFBivCells <- read.csv(file.path(dir_M, "/git-annex/globalprep/mar/v2019/int/bivalve/productionDFBivCells.csv"))

productionByCountryBivDF <- productionDFBivCells %>%
  filter(!is.na(Country)) %>%
  dplyr::select(-YieldCumSum, -X) %>%
  dplyr::select(-AreaCumSum) %>%
  arrange(YearsToHarvest) %>%
  mutate(ID = Country) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  mutate(CountryYieldCumSum = cumsum(B_yieldpercellperyear)) %>%
  mutate(CountryAreaCumSum = cumsum(AreaKM2)) %>%
  mutate(MaxPhi = max(PhiPrime)) %>%
  mutate(averagePhi = mean(PhiPrime)) %>%
  mutate(averageWeightedPhi = sum(PhiPrime*AreaKM2)/(max(CountryAreaCumSum))) %>%
  mutate(MaxDevPerCountry = max(CountryAreaCumSum)) %>%
  mutate(MaxProdPerCountry = max(CountryYieldCumSum)) %>%
  mutate(OnePercentDevPerCountry = .01*max(CountryAreaCumSum))

head(productionByCountryBivDF)

write.csv(productionByCountryBivDF,file.path(dir_M, "git-annex/globalprep/mar/v2019/int/bivalve/BivProdByCountryByCell.csv"), row.names = FALSE) #save to mazu because so large. This is functionally a raster file. 

productionByCountryBivDF <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/bivalve/BivProdByCountryByCell.csv")) 


CountryProdSummary <- productionByCountryBivDF %>%
  filter(CountryAreaCumSum <= OnePercentDevPerCountry) %>%
  mutate(ProdPerCountryOnePercent = max(CountryYieldCumSum)) %>%
  slice(1)

write.csv(CountryProdSummary,file.path(dir_git, "int/BivalveProdByCountrySummary.csv"), row.names = FALSE) #save to github


region_data()
CountryLabel <- rgns_eez %>%
  dplyr::select(ID = rgn_id, rgn_name)

head(CountryLabel)

## I think the relevant value we want for the reference point is in this table:  ProdPerCountryOnePercent. Indonesia number for bivalves is higher than the paper, however the paper does use the phrase "over 3.9*10^11 tonnes". Data says 4.7 * 10^11 million tonnes of bivalves if 1% of aquaculture potential developed, which is greater than 3.9 * 10^11 tonnes.  
CountryProdSummaryFAO <- CountryProdSummary %>%
  ungroup %>%
  dplyr::select(ID:ProdPerCountryOnePercent) %>%
  full_join(CountryLabel, by = "ID")

write.csv(CountryProdSummaryFAO, file.path(dir_git, "int/BivalveProdByCountrySummaryFAO.csv"), row.names = FALSE) #save to github 

sum(CountryProdSummaryFAO$MaxDevPerCountry, na.rm = TRUE)
#1491404 km2. Matches paper ~1,500,000 km2

16.4 Unit Conversion and Gap Filling

To compare potential vs harvest, we need to convert bivalve units to metric tonnes, they are currently in units of individual bivalves. Some figures:

Averaging these gives about 27.5g per piece.

aq_mass_per_pc <- 0.0275 * 1e-3 ### mass of bivalve piece in tonnes

pot_b <- read_csv(file.path(dir_git, 'int/BivalveProdByCountrySummaryFAO.csv')) %>%
  mutate(potential_prod_one_percent_b = ProdPerCountryOnePercent * aq_mass_per_pc,
         potential_prod_max_b = MaxProdPerCountry * aq_mass_per_pc,
         aq_type = 'shellfish',
         units   = 'tonnes') %>%
  dplyr::select("rgn_id" = "ID", rgn_name, potential_prod_one_percent_b, potential_prod_max_b)

pot_f <- read_csv(file.path(dir_git, 'int/FishProdByCountrySummaryFAO.csv')) %>%
  mutate(aq_type = 'finfish',
         units   = 'tonnes') %>%
  dplyr::select("rgn_id" = "ID", rgn_name, "potential_prod_one_percent_f" = "ProdPerCountryOnePercent", "potential_prod_max_f" = "MaxProdPerCountry")

pot_aq_int <- full_join(pot_b, pot_f) ## makes sense. 1% of potential AREA... not 1% of potential PRODUCTION

write_csv(pot_aq, file.path(dir_git, 'int/aq_potential_int.csv')) ##intermediate file that might come in handy in the future

DT::datatable(pot_aq_int)


pot_aq_final <- pot_aq_int %>%
  mutate(potential_prod_one_percent_b = replace_na(potential_prod_one_percent_b, 0),
         potential_prod_one_percent_f = replace_na(potential_prod_one_percent_f, 0)) %>%
  mutate(potential_mar_tonnes = potential_prod_one_percent_b + potential_prod_one_percent_f) %>%
  arrange(rgn_id) %>%
  dplyr::select(rgn_id, potential_mar_tonnes) %>%
  mutate(year = 2019)

write_csv(pot_aq_final, file.path(dir_git, 'output/aq_potential_final.csv'))

DT::datatable(pot_aq_final)

## make gapfilling dataset 
pot_aq_final_gf <- pot_aq_final %>%
  mutate(gapfilled = case_when(
    potential_mar_tonnes == 0 ~ 1,
    potential_mar_tonnes > 0 ~ 0
  ), 
  method = case_when(
    potential_mar_tonnes == 0 ~ "missing regions given 0 value",
    potential_mar_tonnes > 0 ~ ""
  )) %>%
  dplyr::select(rgn_id, gapfilled, method, year)

write_csv(pot_aq_final_gf, file.path(dir_git, "output/aq_potential_gf.csv"))