ohi logo
OHI Science | Citation policy

Summary

This analysis converts FAO mariculture and seafood watch sustainability data into data used to calculate the OHI global mariculture status score. This also calculates the genetic escapee from mariculture pressure data.

2024 updates from 2023 assessment

  • fixed issues relating to system-dependent file paths by using {here} package

  • made some file names more reproducible with updated entries in setup chunks

  • updated formatting of code and comments using in-line/chunk comment headers

  • updated from read.csv and write.csv to {readr}’s read_csv and write_csv

  • renamed some variables so that iterations of data processing steps weren’t overwriting the same variable – allows for comparison between different stages of dataprep (with a few exceptions mentioned below)

  • generally rewrote data cleaning to follow tidyverse style more closely and be more reproducible & less prone to human error with the exception of some object names that were rewritten to save space (consider updating this in future years)

  • updated mar_split() function in mar_fxs.R to use pivot_longer() instead of gather(), and added the new grouped name of “Bonaire, Sint Eustatius and Saba” (make sure to source the current year’s mar_fxs.R)

  • fixed an issue in mar_fxs.R where updated data was not being properly bound to original data

  • updated README of FishStatJ data download to be more explicit and functional with updated program; created README with detailed instructions on how to download data from FAO Data Query Portal if next year decides to pivot to this method instead (will need to rewrite data cleaning code, refer to workflow/R/fao_query_data_tidy.R for guidance and examples of how we approached this in v2024)

  • added notes on what “escapes” are – Criterion 6 in Seafood Watch sustainability scoring theory guide https://www.seafoodwatch.org/globalassets/sfw/pdf/standards/aquaculture/seafood-watch-aquaculture-standard-version-a4.pdf


Data Source

Production data

Reference:
https://www.fao.org/fishery/en/collection/aquaculture?lang=en Release date: March 2023 FAO Global aquaculture production Quantity (1950 - 2022) Global aquaculture production (Value) (1950-2022)

Downloaded: July 26th, 2024. Last updated: 2024-03-29

Instructions on downloading the FAO aquaculture data:

Navigate to this link: https://www.fao.org/fishery/statistics-query/en/home.

Click browse on Global aquaculture production. Click browse on Global aquaculture production (Quantity). Scroll to the bottom of the dimensions section, and de-select all of the pre-selected years. Under the query definition section, drag Country Name En (if not already present), ASFIS species Name En, FAO major fishing area Name En, ASFIS species Family scientific name, and Environment Name En, from the possible fields column into the selected rows column. Click on show data and confirm that data is present for 1950- two years prior to current year. Click download and select yes to include Null Values. Rename the file in the format FAO_GlobalAquacultureProduction_Quantity_1950_last_data_year. Upload the file to the appropriate year folder in the FAO_mariculture folder on mazu.

Description: Quantity (tonnes) of mariculture for each country, species, year.

Time range: 1950-2022

Seafood Watch sustainability data

Obtained upon request of SFW.

Reference:
Seafood Watch Complete Recommendation documentation
Seafood Watch Scoring Theory
Seafood Watch Global Seaweed Assessment (provides gapfilling constants)

Downloaded: May 31, 2023

Description: Monterey Bay Aquarium Seafood Watch aquaculture recommendations. Sustainability scored from 0-10. Rescaled to 0-1 using the max score in the data.


Methods

Setup

knitr::opts_chunk$set(eval = FALSE)
  
# ---- Load packages ----
if (!require(librarian)){
  install.packages("librarian")
  library(librarian)
}
librarian::shelf(
  ohicore,  #devtools::install_github('ohi-science/ohicore@dev')
  dplyr,
  tidyverse, # loads readr, stringr, tidyr, etc.
  tictoc,
  knitr,
  kableExtra,
  here,
  plotly
)

#library(taxize) had issues installing taxize because its dependency phanghorn is not compatible with this version of R


# ---- Set directories & source functions ----
current_year <- 2024 # update this!
version_year <- paste0("v", current_year)
current_mar_dir <- here::here("globalprep", "mar", version_year)

# ---- Load FAO-specific user-defined functions ----
# functions specific to mariculture dealing with compound countries, updated in 2024
source(here::here("globalprep", "mar", version_year, "mar_fxs.R"))
# functions for cleaning fao data
source(here::here("workflow", "R", "fao_fxn.R")) 
#source(here::here("workflow", "R", "fao_online_portal_clean.R")) # fao cleaning function # switched to new version in 2024
source(here::here("workflow", "R", "fao_query_data_tidy.R")) # 2024 version of FAO data tidying function 
source(here::here("workflow", "R", "common.R")) # directory locations, OHI essentials
## This file makes it easier to process data for the OHI global assessment
##  by creating the following objects:
## 
##  * dir_M = identifies correct file path to Mazu (internal server) based on your operating system
##  * mollCRS = the crs code for the mollweide coordinate reference system we use in the global assessment
##  * regions_shape() = function to load global shapefile for land/eez/high seas/antarctica regions
##  * ohi_rasters() = function to load two rasters: global eez regions and ocean region
##  * region_data() = function to load 2 dataframes describing global regions 
##  * rgn_syns() = function to load dataframe of region synonyms (used to convert country names to OHI regions)
##  * low_pop() = function to load dataframe of regions with low and no human population
##  * UNgeorgn = function to load dataframe of UN geopolitical designations used to gapfill missing data
# general raw data directory in Mazu
raw_data_dir <- here::here(dir_M, "git-annex", "globalprep", "_raw_data")
# FAO data directory in Mazu
FAO_mariculture_dir <- here::here(raw_data_dir, "FAO_mariculture")
# Seafood Watch data directory in Mazu
seafood_watch_dir <- here::here(raw_data_dir, "seafood_watch_mar_sustainability")

Import Raw Data: FAO Mariculture data

Mariculture production in tonnes.

## UPDATE THESE!
current_year <- current_year # update in previous chunk
last_year <- 2023 #last year of analysis

#last year of data available
#update if not two years behind the current assessment year
latest_data_year <- current_year - 2 #most up to date year in the data aquaculture data
previous_latest_data_year <- last_year - 2 #last year of data in the last assessment, for comparison plots

current_data_folder <- paste0("d", current_year) # assessment year for file paths
previous_data_folder <- paste0("d", last_year)  # previous assessment year

#update the data file name to the current last year of data 
mar_raw <- readr::read_csv(here::here(FAO_mariculture_dir, current_data_folder,
                          paste0("FAO_GlobalAquacultureProduction_Quantity_1950_", latest_data_year, ".csv")))

head(mar_raw)

Wrangle:

Tidy mariculture data

Filter freshwater mariculture, make long format, and clean FAO codes.

# source 2024 FAO query data tidying function here if you skipped it in the setup chunk
# source(here::here("workflow", "R", "fao_query_data_tidy.R"))

# ---- preliminary tidying of FAO data ----
mar_tidy <- fao_query_data_tidy(fao = mar_raw, # FAO data frame
                                initial_data_year = 1950, # first year of data in mar df
                                last_data_year = latest_data_year, # most recent year of data in df
                                sub_N = 0.1 # what flagged values are substituted with (same as default value in function)
                                )

#names(mar_tidy) # run to easily copy+paste column names

#table(mar_tidy$unit_name)

# ---- clean tidied FAO data ----
mar_clean <- mar_tidy %>% 
  dplyr::select(-c(unit_name)) %>% # all are "Tonnes - live weight"
  dplyr::rename(
    country = country_name_en,
    FAO_name = asfis_species_name_en,
    fao = fao_major_fishing_area_name_en,
    environment = environment_name_en,
    family_scientific = family_scientific_name
  ) %>% 
  dplyr::mutate(FAO_name = ifelse(!is.na(FAO_name), FAO_name, # if species name is not NA, keep species name
                                  family_scientific)) #if species name is NA, use the family name


# investigate environment types
table(mar_clean$environment)

# filter out Freshwater, only keep Brackishwater and Marine
mar_filter <- mar_clean %>% 
  dplyr::filter(environment %in% c("Brackishwater", "Marine"))

# check
table(mar_filter$environment)

# rename cleaned, tidied, and filtered data to "mar"
mar <- mar_filter

Update species names

Update species name in the raw/species_list.csv file with names in the mar dataset (exclude non-food species).

Instructions for updating the species list:

More information can be found in issues #8, #81, #241, #246 for reference.

The species_list.csv is a master list of all the mariculture species in the FAO data. Every year there are potential changes to the FAO data, additional species may be added or names can change, etc. When a species is in the FAO data, but not the species_list.csv it will need to be added to the species_list.csv. To do this, do a quick search within the species_list.csv to make sure it wasn’t some small change to the name (e.g. “long-back eel” vs. “longback eel”). Otherwise, do some research to see if any of the synonyms for the FAO species match the names in species_list.csv. If it is one of these scenarios, the information can be copied to a new row with the name modification. If it seems to be a new taxa, You need to add the species output from this (new_spp) to species_list.csv and fill out the corresponding information. Add the new taxa name, Taxon_code (categories defined in the README in the /raw folder), and family (general family from Google) to the species_list.csv. Do some research to make sure the species is being used for food and is marine/brackish. If it is not used for food or is freshwater then it is excluded from the analysis (i.e., an 1 in the exclude column). I.e. add FAO_name, exclude (whether to include in assessment or not. If they are only harvested for medicinal purposes, do not include, or not eaten by humans do not include). Because seaweeds are generally used for both human and non-human consumption, we quantify whether to exclude or not for seaweed species as proportions. For example, if a seaweed species is 80% used for non-human consumption, we would exclude 80% of the harvest (exclude = 0.8). However, for fish species, we only classify as 0 or 1.

Some helpful resources for researching new species:

FAO SPECIES IDENTIFICATION GUIDE FOR FISHERY PURPOSES – THE LIVING MARINE RESOURCES OF THE WESTERN CENTRAL PACIFIC;

SeaLifeBase search tool

## Commented out 'read.csv' lines are different versions of the dataset, representing the three different seaweed exclusion methods (exclude some seaweed, all seaweed, all seaweed nei (nei = not elsewhere identified)). Use at the end for data checking. The first one is the one which is saved to the "output" folder, and the main analysis to run. 
# Read in species list
mar_sp <- readr::read_csv(here::here(current_mar_dir, "raw", "species_list.csv")) %>%
  dplyr::select(FAO_name, exclude, Taxon_code, family, notes, notes_2)

# to compare to previous year's csv (in v2024)
# old_mar_sp <- readr::read_csv(here::here("globalprep/mar/v2023/raw/species_list.csv")) %>%
#   dplyr::select(FAO_name, exclude, Taxon_code, family, notes, notes_2)
# old_test_spp <- setdiff(mar$FAO_name, old_mar_sp$FAO_name)
# old_test_spp

# find differences between species names in new FAO data vs. species list
new_spp <- setdiff(mar$FAO_name, mar_sp$FAO_name)

new_spp # Check: if dim has 0 rows it means all match
## If there is a list of species, hand check species_list.csv to see whether to keep (exclude seaweeds and species harvested for ornamental/medicinal). See instructions above.

### NOTE: if the list of species to add is huge, and it is unmanageable to do by hand, see the mar_dataprep.Rmd in the v2020 folder, and it will have code to use Taxize. Otherwise just do it by hand, because taxize is a pain, and most of them won't be found in taxize anyways... 

# Name fixing (v2024)
mar <- mar %>% 
  mutate(FAO_name = case_when(
    FAO_name %in% c("Mexican spiny loster") ~ "Mexican spiny lobster",
    .default = FAO_name
  ))

# run the new_spp code again to check if this update has been successfully implemented
#new_spp <- setdiff(mar$FAO_name, mar_sp$FAO_name)
#new_spp
#character(0) # good!

# Remove species not relevant to mariculture goal (i.e., non-food species)
mar_sp_join <- mar %>% dplyr::left_join(mar_sp, by = "FAO_name")

# Create intermediate table, perform data wrangling and prep for analysis
mar_join_int <- mar_sp_join %>%
  dplyr::filter(exclude < 1) %>% # filter out species that should be excluded from the MAR subgoal
  dplyr::mutate(include = 1 - exclude) %>% # create an "include" column (1-exclude) 
  dplyr::rename(species = FAO_name) %>%  # rename FAO_name to species
  dplyr::mutate(value_include = value * include) # multiply value by proportion of harvest to include

## Check to see if 85% of seaweed goes to human consumption... filter for seaweeds, sum the tonnes included, divide by total tonnes seaweed
# seaweed_check2 <- mar_join_int %>%
#   filter(!is.na(value)) %>%
#   filter(Taxon_code == "AL") %>%
#   group_by(year) %>%
#   summarise(include_total = sum(value_include),
#             overall_total = sum(value)) %>%
#   mutate(percent = include_total/overall_total)
# 
#  mean(seaweed_check2$percent)
#  ## v2023: 81% .... matches pretty darn well.
#  # v2024: 80.1% 

# ---- Eliminate country-species data with zero production across the entire time-series (1950-recent) ----
mar_value_filter <- mar_join_int %>%
  dplyr::filter(!is.na(value_include)) %>% # remove NA values 
  dplyr::select(country, fao, environment, species, year, # select relevant columns
                Taxon_code, family, value = value_include) %>% 
  dplyr::group_by(country, species) %>% # for each country-species combination
  dplyr::mutate(total_value = sum(value)) %>% # create total_value column
  dplyr::filter(total_value > 0) %>% # omit data with zero production
  dplyr::select(-total_value) %>% # drop temporary column
  dplyr::ungroup() # remove grouping

Convert country names to OHI regions

# ---- update names before using name_to_rgn ----
mar_name_split <- mar_value_filter %>%
  mar_split()  # function in mar_fxs.R 
# Divides mariculture from countries that we report as separate regions (assume equal production in all regions)
  # Netherlands Antilles: Conch restoration among Aruba, Bonaire, Curacao
  # Channel Islands: Jersey and Guernsey
  # Bonaire/S.Eustatius/Saba
  # Yugoslavia SFR: no longer a country after 1992, no values 
  # v2024: added new formatting of "Bonaire, Sint Eustatius and Saba"

# note: this function was updated in 2024 to use pivot_longer instead of gather, and the new grouped name of "Bonaire, Sint Eustatius and Saba" was accounted for

# ---- Run name_2_rgn ----
mar_rgn <- name_2_rgn(df_in = mar_name_split, 
                       fld_name ='country', 
                       flds_unique = c('species', 'fao', 'environment', 'Taxon_code', 'year')) 

# ---- Addressing duplicates, warnings ----
# v2024 notes:
## deletes Yugoslavia SFR -- this is okay because after 1991 (starting in 1992), the dataset differentiates between the countries that used to make up Yugoslavia (like Croatia, Serbia and Montenegro (Serbia is landlocked and we'll need to deal with the duplicates from this group), Bosnia and Herzegovina, Slovenia) starting in 1992.

# initially deleted bonaire sint eustatius and saba, but once the mar_split() function in mar_fxs.R was updated (and re-sourced), this problem was resolved.

## Serbia and Montenegro: separate data for Montenegro beginning in 2006; Serbia is landlocked, so these values only apply to Montenegro (data is from FAO major fishing area classifications: Mediterranean and Black Sea)

## China & Hong Kong: sum Hong Kong values within China OHI region classification

## Guadeloupe and Martinique: have data for each separately -- need to aggregate (sum) values for OHI grouped region

## Northern Mariana Islands and Guam: have data for Guam for full range, started having data for Northern Mariana Islands in 2007. Can sum both for OHI grouped region. 

## United Republic of Tanzania + United Republic of Tanzania, Zanzibar: According to Wikipedia, "Zanzibar is an insular semi-autonomous region which united with Tanganyika in 1964 to form the United Republic of Tanzania." 
### in past years, these values have been summed under the OHI region "Tanzania"


# ---- Aggregation: sum values of regions with multiple subregions ----
mar_rgn_agg <- mar_rgn %>%
  dplyr::group_by(fao, environment, species, year, Taxon_code, rgn_id, rgn_name, family) %>% 
  dplyr::summarize(value = sum(value)) %>%
  dplyr::ungroup()

Take a look at the tidied data for a single year and region

test <- data.frame(
  dplyr::filter(mar_rgn_agg, rgn_id == 207) %>% # Vietnam
  dplyr::filter(year == 2020) %>%
  arrange(species))
#View(test)

Gapfilling

Fill in missing years after first year of harvest data with 0 values

For example: Production of blue shrimp in Maine starts in 1983 – don’t include years before that.

Checked to make sure that there weren’t instances in which it made more sense to carry the previous year’s data forward as a method of gapfilling. This didn’t seem to be the case.

# # v2024 method [scrapped]
# tic()
# 
# # update dataframe -- fill out "years" column for all countries -- fill with NAs if no data for value
# # mar_rgn_agg with years (yrs) filled
# mar_rgn_yrs <- mar_rgn_agg %>% 
#   dplyr::mutate(year = as.numeric(year),
#          rgn_id = as.character(rgn_id)) %>% 
#   dplyr::group_by(rgn_id, rgn_name, species, family, Taxon_code, fao, environment) %>% 
#   tidyr::complete(
#     year = min(mar_rgn_agg$year):max(mar_rgn_agg$year),
#     fill = list(value = NA)) %>% 
#   dplyr::mutate(rgn_id = as.numeric(rgn_id)) %>% 
#   dplyr::arrange(rgn_id, species, year, Taxon_code, family, fao, environment) %>% 
#   dplyr::ungroup()
# 
# toc()
# # this may take a minute to run # 25.81 s
# ---- fill "years" column for gapfilling ----
# Spread mar_rgn to create a value for every year-species-region (if missing it will be given a NA)
mar_rgn_spread <- tidyr::pivot_wider(mar_rgn_agg,
                                     names_from = year,
                                     values_from = value)
dim(mar_rgn_spread)

# Turn data frame back into long format
mar_rgn_yrs_filled <- tidyr::pivot_longer(mar_rgn_spread,
                           cols = num_range("",paste0(1950:latest_data_year)),
                           names_to = "year",
                           values_to = "value") %>% 
  arrange(rgn_id, species, year, Taxon_code, family, fao, environment)


## NA values are converted to zero.
mar_rgn_gf <- mar_rgn_yrs_filled %>%
  mutate(year = as.numeric(year)) %>%
  mutate(value_w_0 = ifelse(is.na(value), 0, value)) %>%
  group_by(fao, environment, species, Taxon_code, family, rgn_id) %>% 
  mutate(cum_value = cumsum(value_w_0)) %>%
  ungroup() %>%
  filter(cum_value > 0) %>% # Eliminates years before mariculture began
  mutate(gap_0_fill = ifelse(is.na(value), "NA_to_zero", "0")) %>% # Record gapfill
  mutate(value = ifelse(is.na(value), 0, value)) %>% # Finally, convert all NAs in original column to 0
  dplyr::select(-cum_value, -value_w_0)

Check how may NA values were converted to 0

table(mar_rgn_gf$gap_0_fill)
## 3790 of these out of 27810+3790 cases had NA values converted to 0 - 2018 assessment
## 3763 of these out of 28694+4291 cases had NA values converted to 0 - 2019 assessment
## 5932 of these out of 29305 +5932 cases had NA values converted to 0 - 2020 assessment
## 6441 of these out of 30170 + 6441 cases had NA values converted to 0 - 2021 assessment
## 7037 of these out of 30975 + 7037 cases had NA values converted to 0 - 2022 assessment
##7367 of these out of 34187 + 7367 cases had NA values converted to 0 -2023 assessment
## 8201 of these out of 35029 + 8201 (43230) cases had NA values converted to 0 = 0.1897 ≈ 19% of values - 2024 assessment 

Remove family-region-environment time series with less than four years of mariculture production > 0 tonnes (assume these are not established mariculture programs).

mar_rgn_gf_filter <- mar_rgn_gf %>% 
  group_by(rgn_id, family, fao, environment) %>%
  mutate (not_0 = length(value[value > 0])) %>% # Length of vector of years greater than 0
  filter(not_0 > 3) %>% # Filter for groups that have at least four years of mariculture production 
  ungroup() %>% 
  dplyr::select(rgn_id, species, fao, environment, year, value, Taxon_code, family, gap_0_fill) 

maric <- mar_rgn_gf_filter

Save mariculture file

Used to estimate total mariculture yield per country.

Saves the Mariculture-FP file

readr::write_csv(maric, file = here::here(current_mar_dir, "output", "MAR_FP_data.csv"))

Sustainability Scores from Seafood Watch Data

Import data: Seafood Watch sustainability scores

These data describe the sustainability country/species combinations. In cases where these data were not available for a specific county/species, we averaged the data across taxonomic groups to gapfill the missing data. We also calculate the genetic escapes pressure layer from this, based on the column AqCriteria6.

# ---- Load in Seafood Watch sustainability scores data from Mazu: ----
# Read in new Seafood Watch data 
# sw_sus <- read.csv(file.path(seafood_watch_dir,
#                              #current_data_folder,
#                              previous_data_folder, # had to use old data from v2023 in v2024
#                              "SFW_Aquaculture_ratings_053123.csv"), # update with name of new data file
#                    check.names = FALSE, stringsAsFactors = FALSE, na.strings = c("NA", ""))

sw_sus_raw <- readr::read_csv(file.path(seafood_watch_dir,
                             #current_data_folder,
                             previous_data_folder, # had to use old data in v2024
                             "SFW_Aquaculture_ratings_053123.csv"), # update with name of new data file
                   na = c("NA", ""))

head(sw_sus_raw)

# Read in Seafood Watch data from previous year
# old_sw_sus <- read.csv(file.path(seafood_watch_dir, previous_data_folder,
#                                  "SFW_Aquaculture_ratings_053123.csv"), # update with name of data file from previous year 
#                        check.names = FALSE, stringsAsFactors = FALSE, na.strings = c("NA", ""))

# v2024: couldn't get new seafood watch data, so pulling in data from prior year

old_sw_sus <- readr::read_csv(file.path(seafood_watch_dir,
                                 #previous_data_folder,
                                 paste0("d", current_year - 2),
                                 "SFW_Aquaculture_ratings_062222.csv" #v2022 data
                               #  "SFW_Aquaculture_ratings_053123.csv" #v2023 data
                               ), # update with name of data file from previous year 
                       na = c("NA", ""))

Wrangle

Tidy Seafood Watch sustainability data

Rename columns to match with MAR data and fill in species column

Notes on AssessmentScore column:

This column represents the Overall Score, which is a combination of 7 primary criteria and 3 “exceptional criterion that may not be relevant to all aquaculture production, yet can be a significant concern for those production practices where it is relevant” (pg. 57). These criteria are named as follows:

  • Criterion 1 (C1): Data

  • C2: Effluent

  • C3: Habitat

  • C4: Chemical Use

  • C5: Feed

  • C6: Escapes

  • C7: Disease, pathogen and parasite interaction

  • C8X: Source of Stock – Independence from wild fish stocks

  • C9X: Wildlife Mortalities

  • C10X: Introduction of Secondary Species

The Final numerical score =

\[\mathrm{final \:score} = \frac{\sum{(C1–C7 \mathrm{\:scores})} + (C8X + C9X + C10X)}{7}\] with the final score having a range of 0-10.

Note on C6Score column – C6: Escapes

From pages 43-52 of the Aquaculture Sustainability Standards document

Criterion 6 (C6) final scores are a combination of Factor 6.1 and 6.2.

Factor 6.1: > “Factor 6.1 assigns a level of risk to each type of production system based on the ability of farmed species to escape the system and enter the surrounding ecosystem. Production system escape risks are categorized as Low to High based on openness, management practices, escape trends, and vulnerability to environmental factors (e.g., tsunami, flood, predator damage, etc.).” (pg. 44)

  • basically assigns a risk level based on the ability of farmed species to escape – “Risk of escape”
  • adjustments made based on factors such as recapture of “escapes” before they have an impact

Factor 6.2: > “Invasiveness, referred to as the risk of competitive and genetic interactions (CGI), is defined as “…the degree to which an organism is able to spread from site of primary introduction, to establish a viable population in the ecosystem, to negatively affect biodiversity on the individual, community, or ecosystem level and cause adverse socioeconomic consequence” (Panov et al. 2008). According to this definition, Factor 6.2 considers both the short-term and long-term ecological impacts of escape. This factor has been adapted (and greatly simplified) from the Marine Fish Invasiveness Screening Kit (MFISK) (and other similar tools developed by Copp et al. [2007, 2009]), and from the Global Aquaculture Performance Index (GAPI)’s similar use and adaptation of the same tools (Volpe et al. 2013).” (pg. 45)

  • basically another quantification of risk based on the threat escapees pose on the environment – “Competitiveness and genetic interactions”

“The risk of impacts resulting from repeated escapes of farmed stock (regardless of their ability to establish), or the risks resulting in the establishment of escapees differs according to species-specific characteristics, and particularly between native and non-native species…” (pg. 45)

A C6 score of 10 indicates a very low risk of escape with low estimated competitiveness and genetic interactions. An Escape Criterion (C6) score ≤ 1 is considered “Critical” (pg. 52), and may indicate a “High Concern” risk of escape and “High Concern” potential for competitiveness and genetic impact.

# ---- Tidy Seafood Watch (SFW) data ----
sw_sus <- sw_sus_raw %>% 
  # rename columns ----
  janitor::clean_names() %>% 
  dplyr::rename(sw_species = "common_names",
                fao_species = "fao_common_name", 
                fda_species = "fda_common_name",
                water_body = "bo_ws", # so curious as to why this is what janitor::clean_names does to BOWs...
                country = "countries",
                method = "methods",
                score = "assessment_score",
                escapes_score = "c6score", # 
                rec = "assessment_color"
                ) %>% 
  # select relevant columns
  dplyr::select(report_title, published_date, sw_species, scientific_name, fao_species, fda_species, country, water_body, method, escapes_score, score, rec) %>%
  # filter out freshwater aquaculture
  filter(!(method %in% c("Freshwater net pen"))) %>% 
  # set scores less than 0 equal to 0
  mutate(score = case_when(score < 0 ~ 0,
            TRUE ~ score))

## Change species names using FAO species name (fao_species); if NA, use common name (sw_species)
sw_sus$species <- ifelse(!is.na(sw_sus$fao_species), sw_sus$fao_species, sw_sus$sw_species)

Keep NA countries

## These need to be re-added later (get cut when associated with region ids)
sw_sus_no_rgn <- filter(sw_sus, is.na(country)|country == "Worldwide")
# 118 entries with no country, 
# in 2023 assessment there were 114 "Worldwide" rows and only 1 NA row
# 115 entries in 2024

Convert country names to OHI region IDs.

# ---- Change/update country names to match OHI region names ----
# Separate rows that are associated with multiple countries
sw_sus_multiple <- sw_sus %>% 
  filter(str_detect(country, "\\|")) %>%
  separate_rows(country, sep = " \\| ")

# Rejoin the rows we just removed, but in clean format (1 row per country)
sw_sus_df <- sw_sus %>%
  filter(!str_detect(country, "\\|")) %>%
  rbind(sw_sus_multiple) %>%
  # remove all NA and 'Worldwide' rows in order to correctly rejoin those in the next steps, these rows are stored in the variable sw_sus_no_rgn
  filter(!is.na(country), country!="Worldwide")
  
# Convert country names to OHI region IDs. (ohicore/R/name_2_rgn.R)
sw_sus_rgn <- ohicore::name_2_rgn(df_in = sw_sus_df, 
                       fld_name='country', 
                       flds_unique=c('fao_species', 'fda_species', 'sw_species', 'score'),
                       keep_fld_name = TRUE) # 'country' now shows the original Seafood Watch data name; 'rgn_name' is what we want to use from now on
# v2023: 151 entries
# v2024: (used same data, 151 entries)

# v2024 duplicates found: 
# 1 Chile         
# 2 Norway        
# 3 United Kingdom
# 4 United States 
# 5 Canada  
# after investigating the data further, these duplicates don't seem to be an issue -- each entry is an unique species and/or based on an unique report

# Add NA countries back in 
sw_sus_rgn <- bind_rows(sw_sus_rgn, sw_sus_no_rgn) %>%
  unique()
# 259 entries
# v2024: same (because same exact data)

#2023 switched to comparing common names because the scientific name column was two columns in the previous years data. Could compare scientific names next year. No differences in species detected. 

setdiff(sw_sus_rgn$sw_species, old_sw_sus$CommonNames) # character(0)
setdiff(old_sw_sus$CommonNames, sw_sus_rgn$sw_species) # character(0)

Check that the non-matches between Seafood Watch sustainability data species (sw_sus_rgn) and the FAO mariculture species in the wrangled FAO Aquaculture Production data table (maric) are not due to spelling errors or slightly different names. We want to include as many species that have sustainability scores as possible

maric <- readr::read_csv(here::here(current_mar_dir, "output", "MAR_FP_data.csv"))

## Make sure same species are spelled the same in the two data tables (e.g. check that there are no extra spaces)
sort(setdiff(sw_sus_rgn$species, maric$species)) # Seafood Watch species with no FAO data
sort(setdiff(maric$species, sw_sus_rgn$species)) # FAO species with no Seafood Watch data - there will probably be a long list
#this is a baseline check, run the fixes in the next chunk from previous years before hand fixing any species

#instructions for updating speices:
## Hand check each of the species output here, check to make sure there aren't obvious cases when the species is the same, but the names are presented slightly differently in the lists. For example, some species might have an extra space in the name somewhere. If any of these obvious differences exist, we need to go in and change the names to match. 
## The FAO list has far more species than the sustainability list. But, go over them to make sure that we are getting sustainability scores for as many species as we can.
# note -- Seafood Watch makes a distinction between Common sole and Dover sole

Fix the non-matches

# Rename species in Seafood Watch data to match FAO species data 
sw_sus_rgn$species <- gsub("Atlantic Bluefin tuna", "Atlantic bluefin tuna", sw_sus_rgn$species)
#sw_sus_rgn$species <- gsub("Scallops", "Scallops nei", sw_sus_rgn$species)

# ---- Fix FAO species names ----
# rename any "mistakes" in the FAO data, such as species names whose first word is uncapitalized  
maric$species <- gsub("blood cockle", "Blood cockle", maric$species)
maric$species <- gsub("\\[Solea spp\\]", "Common sole", maric$species)

# rename FAO species to general categories when used in Seafood Watch data 
maric <- maric %>% 
  # Oysters
  mutate(species = case_when(
    family == "Cardiidae" ~ "Cockles",
    str_detect(species,"oyster") ~ "Oysters",
    # Clams
    species == "CORBICULIDAE" ~ "Clams",
    str_detect(species, "clam") ~ "Clams",
    str_detect(species, "Clam") ~ "Clams",
    str_detect(species, "mussel") ~ "Mussels",
    Taxon_code == "AL" ~ "Seaweed",
    family == "Haliotidae" ~ "Abalones", 
    family == "Pectinidae" ~ "Scallops",
    TRUE ~ species)
    ) %>% 
  mutate(family = case_when(species == "Clams" ~ NA,
                            species == "Oysters"~ NA,
                            TRUE ~ family
  )) #setting family as NA when the group contains more than one family


#updating the names combines species and creates creates duplicate species which we will need to sum
#also sums duplicates due to multiple fao codes for the same region/species
maric <- maric %>%
  group_by(environment, species, year, Taxon_code, rgn_id, family) %>% 
  summarize(value = sum(value), gap_0_fill = first(gap_0_fill)) %>%
  ungroup()

#run this code to check that there are no duplicates for any species, year, rgn_id combinations. If there are they will be unintentionally deleted later, so you may need to change family or taxon codes to match and rerun the above group by
duplicate_check <- maric %>% group_by(environment, species, year, rgn_id) %>% 
  summarize(n = n())
unique(duplicate_check$n) #should be 1


#Add a unique identifier per cultivated stock that describes each species, region, and environment grouping.
identifier <- maric %>% 
  select(rgn_id, species, environment) %>% 
  unique() %>% 
  mutate(species_code = 1:n())

maric <- left_join(maric, identifier)

#run again after cleaning to check there are no remaining mismatched species
sort(setdiff(sw_sus_rgn$species, maric$species)) # Seafood Watch species with no FAO data
sort(setdiff(maric$species, sw_sus_rgn$species))

FAO mariculture and Seafood Watch sustainability scores

## Add column "match_type" to categorize obs. that 1) have a country associated, 2) are "global" Seafood Watch data, or 3) only distinguished by water body
sw_sus_rgn <- sw_sus_rgn %>% 
  dplyr::mutate(match_type = 
                  dplyr::case_when(!is.na(rgn_name) ~ "sw_sp_c", # Add match type specific to species/country
                                  country == "Worldwide" ~ "sw_sp_g", # Add match type specific to species/global Seafood Watch data
                                  TRUE ~ "sw_sp_w" # Add match type specific to species/water body
  ))

table(sw_sus_rgn$match_type)

#sw_sp_c sw_sp_g sw_sp_w 
#    144     114       1

Assign a Mediterranean bordering rgn id to each waterbody row

## Add a line for each country that borders each water body
  # The only water body without a rgn is the Mediterranean; add all countries that border it (source: https://www.medqsr.org/mediterranean-marine-and-coastal-environment#:~:text=Today%2021%20countries%2C%20with%20surface,Syria%2C%20Tunisia%2C%20and%20Turkey.)

# filter for waterbody specific rows 
sw_sus_rgn_water <- sw_sus_rgn %>%
  filter(match_type == 'sw_sp_w')

# define Mediterranean rgns
med_rgns <- data.frame(rgn_id = c(82, 84, 232, 187, 81, 214, 179, 80, 79, 184, 78, 67, 68, 185, 186, 62, 188, 182, 77, 61, 76))

# create new function called expand.grid.df()
expand.grid.df <- function(...) Reduce(function(...) merge(..., by=NULL), list(...))
# coming from MEDS, this ^^ was wild to read
# Reduce: uses a binary function to successively combine the elements of a given vector and a possibly given initial value.
# merge: Merge two data frames by common columns or row names, or do other versions of database join operations.

gf_sw_sp_w <- expand.grid.df(sw_sus_rgn_water, med_rgns) %>%
  dplyr::select(-rgn_id.x, rgn_id = rgn_id.y)
## Join data specific to species/country
sw_sp_c <- sw_sus_rgn %>%
  filter(match_type == 'sw_sp_c') %>%
  select(rgn_id, species, Sust_sw_sp_c = score, esc_sw_sp_c = escapes_score)

#there will be many to many relationship b/c maric has more than one of the same species for a rgn_id due to year and the sustainability may have more than one method for a species/rgn combination
mar_sw_sus <- maric %>%
  left_join(sw_sp_c, by = c("species", "rgn_id"))

## Join data specific to species/water body
sw_sp_w <- gf_sw_sp_w %>%
  select(rgn_id, species, Sust_sw_sp_w = score, esc_sw_sp_w = escapes_score)

mar_sw_sus <- mar_sw_sus %>%
  left_join(sw_sp_w, by= c("species", "rgn_id"))

## Join data specific to species/global Seafood Watch data
sw_sp_g <- sw_sus_rgn %>% 
  filter(match_type == 'sw_sp_g') %>%
  select(species, Sust_sw_sp_g = score, esc_sw_sp_g = escapes_score)

mar_sw_sus <- mar_sw_sus %>%
  left_join(sw_sp_g, by= c("species"))

# test3 <- mar_sw_sus %>%
#   filter(Taxon_code == "GAST")

Merge the three Seafood Watch type categories into a single sustainability score and genetic escapes score column in the following order:

  1. Sust_sw_sp_c: specific to species/country
  2. Sust_sw_sp_w: specific to species/water body
  3. Sust_sw_sp_g: specific to species/global Seafood Watch data

For example, if Sust_sw_sp_c is missing, use Sust_sw_sp_w and so on.

mar_sw_sus <- mar_sw_sus %>% 
  dplyr::mutate(Sust = ifelse(!is.na(Sust_sw_sp_c), Sust_sw_sp_c, Sust_sw_sp_w)) %>%
  dplyr::mutate(Sust = ifelse(is.na(Sust), Sust_sw_sp_g, Sust)) %>%
  dplyr::mutate(gapfill_sus = case_when(Sust_sw_sp_c == 
                                          Sust ~ "none",
                                        is.na(Sust_sw_sp_c) & Sust_sw_sp_w == Sust ~ "water_body",
                                        is.na(Sust_sw_sp_c) & is.na(Sust_sw_sp_w) & Sust_sw_sp_g == Sust ~ "global_species")) %>%
  dplyr::mutate(esc_score = ifelse(!is.na(esc_sw_sp_c), esc_sw_sp_c, esc_sw_sp_w)) %>%
  dplyr::mutate(esc_score = ifelse(is.na(esc_score), esc_sw_sp_g, esc_score)) %>%
  dplyr::mutate(gapfill_esc = case_when(esc_sw_sp_c == esc_score ~ "none",
                                        is.na(esc_sw_sp_c) & esc_sw_sp_w == esc_score ~ "water_body",
                                        is.na(esc_sw_sp_c) & is.na(esc_sw_sp_w) & esc_sw_sp_g == esc_score ~ "global_species")) %>%
  dplyr::select(-c(Sust_sw_sp_c, Sust_sw_sp_w, Sust_sw_sp_g, esc_sw_sp_c, esc_sw_sp_w, esc_sw_sp_g)) 

summary(mar_sw_sus)

Since some regions have multiple sustainability scores for the same species due to multiple aquaculture methods, but we don’t know what proportions of which methods are used, we take the average of the sustainability scores in these instances.

Average sustainability scores within regions with more than one score (due to more than one aquaculture method):

# check which countries have duplicates
anyDuplicated(mar_sw_sus)
# 2022 scenario year: 532
#2023 280
# 2024 scenario year: 316

mar_sw_sus <- mar_sw_sus %>% 
  dplyr::group_by(rgn_id, species) %>% 
  dplyr::mutate(Sust_avg = mean(Sust, na.rm = TRUE),
                esc_avg = mean(esc_score, na.rm = TRUE)) %>% 
  dplyr::ungroup()

# code NAN values in Sust_avg and esc_avg columns as NA 
mar_sw_sus$Sust_avg[is.nan(mar_sw_sus$Sust_avg)] <- NA
mar_sw_sus$esc_avg[is.nan(mar_sw_sus$esc_avg)] <- NA
# nrows = 48,748

Get rid of duplicates for region/species/year:

mar_sw_sus <- mar_sw_sus %>% 
  dplyr::distinct(rgn_id, species, environment, year, .keep_all = TRUE) %>%
  # remove old Sust column (which was used to produce Sust_avg column) &
  # rename Sust_avg to Sust &
  # remove old esc_score column (which was used to produce esc_avg column) &
  # rename esc_avg to esc_score :
  dplyr::select(-Sust, Sust = Sust_avg, -esc_score, esc_score = esc_avg)

Now look at a summary after appending all the Seafood Watch data

summary(mar_sw_sus) 
# 21215 NA values out of 32051 total obs; ~68% without scores (2022)
# 16378 NA values out of 33732 total obs ~ 49% without scores (2023)
# 17126 NA values out of 34920 total obs; ~ 49% without scores (2024)

Gapfilling based on families and taxon_code

Steps:
- Group by country and family and summarise to get mean sustainability/escapes values (gapfilled by country/family)
- Group by georegion and family and summarise to get mean sustainability/escapes values (gapfilled by georegion/family)
- Group by family (global family gapfilled)
- Gapfill by average of the taxon_code
- Assign any Algae with the overall SFW seaweeds score from SFW data (7.92) and the C6 Escapes score (4, a subcomponent of the overall seaweed score)
- link to the SFW Global Assessment for Seaweed available for download here
- Apply the global average to any remaining sustainability/escapes score NAs
- Finally, we rescale the scores to the highest sustainability score, under the assumption that the highest score is the best sustainability currently out there, and should be given a score of 100.

## First we need to add a georegion column that we can group by later on. 

mar_sw_sus <- mar_sw_sus %>%
  add_georegion_id() %>%
  dplyr::select(-level)

# ## Exploratory plots on whether regions or species drive the sustainability scores...
# plot(mar_sw_sus$georgn_id, mar_sw_sus$Sust)
#  plot(as.factor(mar_sw_sus$species), mar_sw_sus$Sust)
# # 
#  mod_spp <- lm(Sust ~ as.factor(georgn_id) + species, data = mar_sw_sus)
# # anova(mod_spp)
#  summary(mod_spp)
# # 
#  mod_rgn <- lm(Sust ~ as.factor(georgn_id), data = mar_sw_sus)
# # anova(mod_rgn)
# # summary(mod_rgn)
# # 
# mod_spp1 <- lm(Sust ~ family, data = mar_sw_sus)
# # anova(mod_rgn)
#  summary(mod_spp1)
# it makes more sense to use species/taxon gapfilling methods

#unique(mar_sw_sus$avg_global_esc)

mar_sw_sus <- mar_sw_sus %>%
  group_by(rgn_id, family) %>%
  mutate(avg_rgn_fam_sust = mean(Sust, na.rm = TRUE), # produced NaN values (2022 scenario year)
         avg_rgn_fam_esc = mean(esc_score, na.rm = TRUE)) %>% # produced NaN values (2022 secario year)
  ungroup() %>%
  group_by(georgn_id, family) %>%
  mutate(avg_gr_fam_sust = mean(Sust, na.rm = TRUE), # produced some NaN values (2023 scenario year)
         avg_gr_fam_esc = mean(esc_score, na.rm = TRUE)) %>% # produced NaN values (2023 scenario year)
  ungroup() %>%
  group_by(family) %>%
  mutate(avg_fam_sust = mean(Sust, na.rm = TRUE), # produced some NaN values (2022 scenario year)
         avg_fam_esc = mean(esc_score, na.rm = TRUE)) %>% # produced NaN values (2022 scen ario year)
  ungroup() %>%
  group_by(Taxon_code) %>%
  mutate(avg_taxon_code_sust = mean(Sust, na.rm = TRUE), # produced some NaN values (2022 scecario year)
         avg_taxon_code_esc = mean(esc_score, na.rm = TRUE)) %>% # produced some NaN values (2022 scecario year)
  ungroup() %>%
  mutate(avg_global = mean(Sust, na.rm = TRUE),
         avg_global_esc = mean(esc_score, na.rm = TRUE)) %>%
  ungroup()

# code NaN as NA for the columns that produced NaN, likely due to taking the mean of a set of all NA values 
mar_sw_sus$avg_rgn_fam_sust[is.nan(mar_sw_sus$avg_rgn_fam_sust)] <- NA
mar_sw_sus$avg_rgn_fam_esc[is.nan(mar_sw_sus$avg_rgn_fam_esc)] <- NA
mar_sw_sus$avg_gr_fam_sust[is.nan(mar_sw_sus$avg_gr_fam_sust)] <- NA
mar_sw_sus$avg_gr_fam_esc[is.nan(mar_sw_sus$avg_gr_fam_esc)] <- NA
mar_sw_sus$avg_fam_sust[is.nan(mar_sw_sus$avg_fam_sust)] <- NA
mar_sw_sus$avg_fam_esc[is.nan(mar_sw_sus$avg_fam_esc)] <- NA
mar_sw_sus$avg_taxon_code_sust[is.nan(mar_sw_sus$avg_taxon_code_sust)] <- NA
mar_sw_sus$avg_taxon_code_esc[is.nan(mar_sw_sus$avg_taxon_code_esc)] <- NA

Obtain a sustainability and escapes score for each record, and a book-keeping column of whether it’s actual or gapfilled

For missing sustainability and escapes scores:

Steps:

  1. Group by country and family and summarise to get mean sustainability and escapes values (gapfilled by country/family)
  2. Group by georegion and family and summarise to get mean sustainability and escapes values (gapfilled by georegion/family)
  3. Group by family to get mean sustainability and escapes values (global family gapfilled)
  4. Gapfill by average of the taxon_code to get mean sustainability and escapes values (taxon_group gapfilled)
  5. Assign any Algae with the seaweeds score from SFW data (6.72 for sustainability, 4 for escapes)
  6. Apply the global average to any remaining sustainability and escapes score NAs
# define gapfilling scores for global seaweed and escapes (a subcomponent of the global seaweed score) found in pdf here (updated occasionally): https://www.seafoodwatch.org/recommendation/seaweed/seaweed-31568
# seaweed_gf_score <- 6.72
# escapes_gf_score <- 4 
#in 2023 this wasn't necessary because these scores were already included in the seafood watch data, and I already applied them to all seaweed species when cleaning seafood watch data. Left the code above in case it is needed in future years. 

mar_sw_sus_final <- mar_sw_sus %>%
  
  
  #### sustainability scores ####
  mutate(Sust = ifelse(is.na(Sust), avg_rgn_fam_sust, Sust)) %>% # Gapfill with region/family level
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_rgn_fam_sust, "region_family_avg", gapfill_sus)) %>%
  ## add in rgn family gapfill record
  mutate(Sust = ifelse(is.na(Sust), avg_gr_fam_sust, Sust)) %>% # Gapfill with georegion/family level
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_gr_fam_sust, "georegion_family_avg", gapfill_sus)) %>% # Add in georegion/family gapfill record
  mutate(Sust = ifelse(is.na(Sust),  avg_fam_sust, Sust)) %>% # Gapfill with family level
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_fam_sust, "family_avg", gapfill_sus)) %>% ## Add in family gapfill record
  mutate(Sust = ifelse(is.na(Sust), avg_taxon_code_sust, Sust)) %>%
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_taxon_code_sust, "taxon_group_avg", gapfill_sus)) %>%
  #mutate(Sust = ifelse(Taxon_code == "AL", seaweed_gf_score, Sust)) %>% ## assign any algae with the overall SFW seaweed score, see Steps above for reference to document
  #mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == seaweed_gf_score, "seaweed_score", gapfill_sus)) %>%
  mutate(Sust = ifelse(is.na(Sust), avg_global, Sust)) %>% 
  mutate(gapfill_sus = ifelse(is.na(gapfill_sus) & Sust == avg_global, "global_avg", gapfill_sus)) %>%
  
  #### genetic escapes scores ####
  mutate(esc_score = ifelse(is.na(esc_score), avg_rgn_fam_esc, esc_score)) %>% # Gapfill with region/family level
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_rgn_fam_esc, "region_family_avg", gapfill_esc)) %>% ## add in rgn family gapfill record
  mutate(esc_score = ifelse(is.na(esc_score), avg_gr_fam_esc, esc_score)) %>% # Gapfill with georegion/family level
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_gr_fam_esc, "georegion_family_avg", gapfill_esc)) %>% # Add in georegion/family gapfill record
  mutate(esc_score = ifelse(is.na(esc_score),  avg_fam_esc, esc_score)) %>% # Gapfill with family level
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_fam_esc, "family_avg", gapfill_esc)) %>% ## Add in family gapfill record
  mutate(esc_score = ifelse(is.na(esc_score), avg_taxon_code_esc, esc_score)) %>%
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_taxon_code_esc, "taxon_group_avg", gapfill_esc)) %>%
  #mutate(esc_score = ifelse(Taxon_code == "AL", escapes_gf_score, esc_score)) %>% ## assign any algae with the SFW seaweed score for C6 Escapes, see Steps above for reference to document
  #mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == escapes_gf_score, "seaweed_score", gapfill_esc)) %>%
  mutate(esc_score = ifelse(is.na(esc_score), avg_global_esc, esc_score)) %>% 
  mutate(gapfill_esc = ifelse(is.na(gapfill_esc) & esc_score == avg_global_esc, "global_avg", gapfill_esc)) %>%
  
  #### final tidying ####
mutate(taxa_code = paste(species, species_code, sep="_")) %>%
  select(rgn_id, species, species_code, taxa_code, taxa_group = Taxon_code,
         family, year, gapfill_sus, gapfill_esc, gapfill_fao = gap_0_fill,
         tonnes = value, Sust, esc_score) %>%
  mutate(Sust = round(Sust/10, 2),
         esc_score = 1 - round(esc_score/10, 2))

summary(mar_sw_sus_final)
filter(mar_sw_sus_final, gapfill_sus == "georegion_family_avg")
filter(mar_sw_sus_final, gapfill_sus == "family_avg")
filter(mar_sw_sus_final, gapfill_sus == "taxon_group_avg")

Now we need to prep the escapes pressure layer a bit more. Final formatting of the escapee data. This is used as a pressure layer.

For each region/year: take a weighted average of the genetic escape probability for each taxa based on tonnes of mariculture.

# subset the cleaned data to only include escapes data
tonnes_esc <- mar_sw_sus_final %>%
  dplyr::select(rgn_id, year, species, gapfill_esc, esc_score, tonnes)

# use "tonnes" values to weight the regional average of escapes
genEscapes <- tonnes_esc %>%
  group_by(rgn_id, year) %>%
  summarize(genEscapes = weighted.mean(esc_score, tonnes, na.rm=TRUE))

Obtain gapfill information from genEscapes

Gapfill values in output file are the proportion of data that is gapfilled.

# Obtain corresponding gapfilling information for each region (average of gapfilled data, weighted by tonnes of mariculture).
genEscapes_gf <- tonnes_esc %>%
  mutate(gapfill = ifelse(gapfill_esc == "none", 0, 1)) %>%
  group_by(rgn_id, year) %>%
  summarize(genEscapes = weighted.mean(gapfill, tonnes, na.rm = TRUE)) %>%
  ungroup() %>%
  filter(year == latest_data_year) %>%
  select(rgn_id, pressures.score = genEscapes) %>%
  mutate(pressures.score = ifelse(pressures.score == "NaN", NA, pressures.score)) %>%
  data.frame()

Save data: Genetic Escapes

Save gapfill data

# Save the Genetic Escapes Gapfill file
readr::write_csv(genEscapes_gf, here(current_mar_dir, "output", "GenEsc_gf.csv"))

Obtain escapee data layers from genEscapes

# Create the escapee data layers:
data <- genEscapes %>%
    dplyr::select(rgn_id, year, pressure_score = genEscapes)

Save escapee pressure layer

## Saves the Genetic Escapes data file 
readr::write_csv(data, here(current_mar_dir, "output", "GenEsc.csv"))  

Save Data: Mariculture tonnes and sustainability scores

## Save mariculture harvest data
mar_harvest_tonnes <- mar_sw_sus_final %>%
  dplyr::select(rgn_id, taxa_code, taxa_group, family, year, tonnes)

anyDuplicated(mar_harvest_tonnes) # Check for duplication

## Saves the appropriate Mariculture Harvest Tonnes file
#write.csv(mar_harvest_tonnes, here(current_mar_dir, "output", "mar_harvest_tonnes.csv"), row.names = F)
readr::write_csv(mar_harvest_tonnes, here::here(current_mar_dir, "output", "mar_harvest_tonnes.csv"))


## Save gapfill data for mariculture harvest
mar_harvest_tonnes_gf <- mar_sw_sus_final %>%
  dplyr::select(rgn_id, taxa_code, taxa_group, family, year, tonnes = gapfill_fao)


## Saves the appropriate Mariculture Harvest Gapfill file 
#write.csv(mar_harvest_tonnes_gf, here(current_mar_dir, "output", "mar_harvest_tonnes_gf.csv"), row.names = F)
readr::write_csv(mar_harvest_tonnes_gf, here::here(current_mar_dir, "output", "mar_harvest_tonnes_gf.csv"))

# define the maximum sustainability score, check with: summary(mar_sw_sus_final)
max_sus <- 0.77

## Save sustainability scores data for 2023 (SFW data year)
mar_sustainability_score <- mar_sw_sus_final %>%
  mutate(year = current_year) %>% #change year if data not updated to current year
  dplyr::select(rgn_id, year, taxa_code, sust_coeff = Sust) %>%
  unique() %>%
  mutate(sust_coeff = ifelse(sust_coeff > max_sus, 1, sust_coeff/max_sus)) # rescale to the highest sust score

anyDuplicated(mar_sustainability_score)


## Saves the appropriate Sustainability Score file 
#write.csv(mar_sustainability_score, here(current_mar_dir, "output", "mar_sustainability.csv"), row.names = F)
readr::write_csv(mar_sustainability_score, here::here(current_mar_dir, "output", "mar_sustainability.csv"))


## Save gapfill data for sustainability scores
mar_sustainability_score_gf = mar_sw_sus_final %>%
  select(rgn_id, taxa_code, sust_coeff = gapfill_sus) %>%
  unique()


## Saves the appropriate Sustainability Score Gapfill file 
#write.csv(mar_sustainability_score_gf, here(current_mar_dir, "output", "mar_sustainability_gf.csv"), row.names = F)
readr::write_csv(mar_sustainability_score_gf, here::here(current_mar_dir, "output", "mar_sustainability_gf.csv"))


## save mar sustainability for data checking; rgn_id, species, score
mar_sustainability_score <- mar_sw_sus_final %>%
  mutate(year = current_year) %>% # Only 2023 sustainability scores exist (Seafood Watch)
  select(rgn_id, year, species, sust_coeff = Sust) %>%
  unique() %>% 
    mutate(sust_coeff = ifelse(sust_coeff > max_sus, 1, sust_coeff/max_sus)) # rescale to the highest sust score

## Saves the appropriate Sustainability Score file
#write.csv(mar_sustainability_score, here(current_mar_dir, "int", "mar_sustainability_check.csv"), row.names = F)
readr::write_csv(mar_sustainability_score, here::here(current_mar_dir, "int", "mar_sustainability_check.csv"))

Data check mariculture and sustainability scores

Comparing this year’s data to previous year’s data. Expect small variation from year to year. Plot to view differences.

options(scipen = 999)

# ============== Mariculture Yield =============================================

# ---- Read in old and new data for plots ----
mar_old_data <- readr::read_csv(here::here("globalprep", "mar", paste0( "v", last_year), "output", "MAR_FP_data.csv")) 
mar_new_data <- readr::read_csv(here::here(current_mar_dir, "output", "MAR_FP_data.csv"))


# ---- Compare mariculture yield data for all countries: ----
# previous data year (2021) vs. current data year (2022) 
mar_old_1 <- mar_old_data %>% 
  filter(year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, old_value = value, environment)

mar_new_1 <- mar_new_data %>% 
  filter(year == latest_data_year) %>% 
  dplyr::select(rgn_id, species, value, environment)


yield_1 <- mar_old_1 %>% 
  full_join(mar_new_1, by = c("rgn_id", "species", "environment")); View(yield_1)

# Plot
comp_plot_1 <- ggplot() + 
  geom_point(data = yield_1, aes(x = old_value, y = value, text = paste("species:", species, "<br>region ID:", rgn_id))) +
  labs(x = paste("Previous Value", previous_latest_data_year), y = paste("Current Value", latest_data_year)) +
  geom_abline(slope = 1, intercept = 0, col = "red")

ggplotly(comp_plot_1)


# ---- Compare mariculture yield data for all countries: ----
# 2021 last year to 2021 this year (previous_latest_data_year)
mar_old_2 <- mar_old_data %>% 
  filter(year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, old_value = value, environment, fao)

mar_new_2 <- mar_new_data %>% 
  filter(year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, value, environment, fao)

yield_2 <- mar_old_2 %>% 
  full_join(mar_new_2, by = c("rgn_id","species", "environment", "fao")); View(yield_2)


comp_plot_2 <- ggplot() + 
  geom_point(data = yield_2, aes(x = old_value, y = value, text = paste("Species:", species, "<br>Region ID:", rgn_id))) +
  labs(x = paste("Previous Value", previous_latest_data_year), y = paste("Current Value", previous_latest_data_year)) +
  geom_abline(slope = 1, intercept = 0, col = "red")


ggplotly(comp_plot_2)


# ---- Compare mariculture yield data for Russia: ----
# 2021 data from 2023 assessment to 2022 data from 2024 assessment
mar_old_3 <- mar_old_data %>% 
  filter(rgn_id == 73, year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, old_value = value, environment)

mar_new_3 <- mar_new_data %>% 
  filter(rgn_id == 73, year == latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, value, environment)

yield_3 <- mar_old_3 %>% 
  full_join(mar_new_3, by = c("rgn_id","species","fao")); View(yield_3)

# old plot code
plot(yield_3$old_value, yield_3$value); abline(0, 1, col = "red")
# new plot code
comp_plot_3 <- ggplot() + 
  geom_point(data = yield_3,
             aes(x = old_value, y = value,
                 text = paste("species:", species,
                              "<br>region ID:", rgn_id))) +
  labs(x = paste("Previous Value", previous_latest_data_year),
       y = paste("Current Value", latest_data_year),
       title = "Russia Mariculture Yield Data Comparison") +
  geom_abline(slope = 1, intercept = 0, col = "red") +
  theme_bw()

ggplotly(comp_plot_3)


# ---- Compare mariculture yield data for Iceland: ----
# 2021 data from 2023 assessment to 2022 data from 2024 assessment
# one Atlantic Salmon observation was doubled (copied) in 2022 so I added "environment" which resolved the issue (Brackish vs Marine issue)
mar_old_4 <- mar_old_data %>% 
  filter(rgn_id == 143, year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, old_value = value, environment) %>% 
  ungroup()

mar_new_4 <- mar_new_data %>% 
  filter(rgn_id == 143, year == latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, value, environment) %>% 
  ungroup()

yield_4 <- mar_old_4 %>% 
  full_join(mar_new_4, by = c("rgn_id","species","fao", "environment")); View(yield_4)

# old plot code
plot(yield_4$old_value, yield_4$value);abline(0,1,col="red")
# new plot code
comp_plot_4 <- ggplot() + 
  geom_point(data = yield_4,
             aes(x = old_value, y = value,
                 text = paste("species:", species,
                              "<br>region ID:", rgn_id))) +
  labs(x = paste("Previous Value", previous_latest_data_year),
       y = paste("Current Value", latest_data_year),
       title = "Iceland Mariculture Yield Data Comparison") +
  geom_abline(slope = 1, intercept = 0, col = "red") +
  theme_bw()

ggplotly(comp_plot_4)


# ---- Compare mariculture yield data for Vietnam ----
mar_old_5 <- mar_old_data %>% 
  filter(rgn_id == 207, year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, environment, old_value = value)
mar_new_5 <- mar_new_data %>% 
  filter(rgn_id == 207, year == latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, environment, value)

yield_5 <- mar_old_5 %>% 
  full_join(mar_new_5, by = c("rgn_id","species","fao", "environment")); View(yield_5)

plot(yield_5$old_value, yield_5$value,
     xlab = paste("Previous Value", previous_latest_data_year),
     ylab = paste("Current Value", latest_data_year),
     main = "Vietnam mariculture yield data comparison"); abline(0, 1, col = "red")

comp_plot_5 <- ggplot() + 
  geom_point(data = yield_5,
             aes(x = old_value, y = value,
                 text = paste("species:", species, "<br>region ID:", rgn_id))) +
  labs(x = paste("Previous Value", previous_latest_data_year),
       y = paste("Current Value", latest_data_year),
       title = "Vietnam Mariculture Yield Data Comparison") +
  geom_abline(slope = 1, intercept = 0, col = "red") +
  theme_bw()

ggplotly(comp_plot_5)


# ---- Compare yield data for Belize; ----
# yield of Whiteleg shrimp was halved between 2019 and 2020 assessment years.
# Whiteleg shrimp doubled between 2021 and 2022
mar_old_6 <- mar_old_data %>% 
  filter(rgn_id == 164, year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, old_value = value)
mar_new_6 <- mar_new_data %>% 
  filter(rgn_id == 164, year == latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, value)

yield_6 <- mar_old_6 %>% 
  full_join(mar_new_6, by = c("rgn_id","species","fao")); View(yield_6)

comp_plot_6 <- ggplot() + 
  geom_point(data = yield_6,
             aes(x = old_value, y = value,
                 text = paste("species:", species, "<br>region ID:", rgn_id))) +
  labs(x = paste("Previous Value", previous_latest_data_year),
       y = paste("Current Value", latest_data_year),
       title = "Belize Mariculture Yield Data Comparison") +
  geom_abline(slope = 1, intercept = 0, col = "red") +
  theme_bw()

ggplotly(comp_plot_6)


# ---- Compare yield data for Tanzania ----
mar_old_7 <- mar_old_data %>% 
  filter(rgn_id == 202, year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, old_value = value, environment)
mar_new_7 <- mar_new_data %>% 
  filter(rgn_id == 202, year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, value, environment)

yield_7 <- mar_old_7 %>% 
  full_join(mar_new_7, by = c("rgn_id","species","fao", "environment"))

plot(yield_7$old_value, yield_7$value,
     xlab = paste("Previous Value", previous_latest_data_year),
     ylab = paste("Current Value", latest_data_year),
     main = "Tanzania mariculture yield data comparison"); abline(0, 1, col = "red")

comp_plot_7 <- ggplot() + 
  geom_point(data = yield_7,
             aes(x = old_value, y = value,
                 text = paste("species:", species, "<br>region ID:", rgn_id))) +
  labs(x = paste("Previous Value", previous_latest_data_year),
       y = paste("Current Value", latest_data_year),
       title = "Tanzania Mariculture Yield Data Comparison") +
  geom_abline(slope = 1, intercept = 0, col = "red") +
  theme_bw()

ggplotly(comp_plot_7)


# ---- Compare yield data for Denmark ----
# pressure score decreased (not sure which version this comment was about)
mar_old_8 <- mar_old_data %>% 
  filter(rgn_id == 175, year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, old_value = value, environment)
mar_new_8 <- mar_new_data %>% 
  filter(rgn_id == 175, year == latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, value, environment)

yield_8 <- mar_old_8 %>% 
  full_join(mar_new_8, by = c("rgn_id","species","fao", "environment")); View(yield_8)

plot(yield_8$old_value, yield_8$value); abline(0, 1, col = "red")

comp_plot_8 <- ggplot() + 
  geom_point(data = yield_8,
             aes(x = old_value, y = value,
                 text = paste("species:", species, "<br>region ID:", rgn_id))) +
  labs(x = paste("Previous Value", previous_latest_data_year),
       y = paste("Current Value", latest_data_year),
       title = "Denmark Mariculture Yield Data Comparison") +
  geom_abline(slope = 1, intercept = 0, col = "red") +
  theme_bw()

ggplotly(comp_plot_8)


# ---- Compare yield data for Ecuador ----
# score increased by a lot (again, not sure which version this comment is from)
# v2024: (2022 data) whiteleg shrimp yield has notably increased 
mar_old_9 <- mar_old_data %>% 
  filter(rgn_id == 137, year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, old_value = value, environment)

mar_new_9 <- mar_new_data %>% 
  filter(rgn_id == 137, year == latest_data_year) %>% 
  dplyr::select(rgn_id, species, fao, value, environment)

yield_9 <- mar_old_9 %>% 
  full_join(mar_new_9, by = c("rgn_id","species","fao")); View(yield_9)

plot(yield_9$old_value, yield_9$value)

comp_plot_9 <- ggplot() + 
  geom_point(data = yield_9,
             aes(x = old_value, y = value,
                 text = paste("species:", species, "<br>region ID:", rgn_id))) +
  labs(x = paste("Previous Value", previous_latest_data_year),
       y = paste("Current Value", latest_data_year),
       title = "Ecuador Mariculture Yield Data Comparison") +
  geom_abline(slope = 1, intercept = 0, col = "red") +
  theme_bw()

ggplotly(comp_plot_9)


# ============= Sustainability scores ==========================================

# Compare new/old Mariculture sustainability scores

sust_old <- readr::read_csv(here::here("globalprep", "mar", paste0("v", last_year),
                                       "output", "mar_sustainability.csv")) %>% 
  dplyr::select(rgn_id, taxa_code, sust_old = sust_coeff)

# tidy up species names (such as "Abalones_1"), rename as species column
sust_old$species <- str_remove(sust_old$taxa_code, "\\_[^.]*$") 

sust_new <- readr::read_csv(here::here(current_mar_dir, "int", "mar_sustainability_check.csv")) %>% 
  dplyr::select(rgn_id, species, sust_new = sust_coeff)

sust_compare <- sust_new %>% 
  full_join(sust_old, by = c("rgn_id", "species")) %>%
  mutate(difference = sust_new - sust_old) %>%
  group_by(rgn_id) %>%
  summarize(mean_rgn_sus_diff = mean(difference, na.rm = TRUE))

mean_diff_sus <- ggplot(sust_compare, aes(x = rgn_id, y = mean_rgn_sus_diff)) + 
  geom_point() +
  theme_bw() +
  labs(title = paste("Mariculture Sustainability: Mean Difference <br>", latest_data_year, "Data - ", previous_latest_data_year, "Data"))

ggplotly(mean_diff_sus)

Data check genetic escapes

Comparing this year’s data to previous year’s data. Expect small variation from year to year. Plot to view differences.

# ==== Compare genetic escapes pressure score; ====
# saw large changes between 2018 and 2017 assessment years. No big changes between 2019 and 2018 assessment years. Big changes from 2019 to 2020 due to a data update. Small changes between 2020 and 2021 due to updating the FAO data, gapfilling caused this. Also fixed some name errors. Small changes between 2021 and 2022, added a few name conversions. Some changes between 2022 and 2023, fixed a handful of errors. 

current_gen_esc_data <- readr::read_csv(here::here(current_mar_dir, "output", "GenEsc.csv"))

gen_old <- current_gen_esc_data %>%
  filter(year == previous_latest_data_year) %>% 
  dplyr::select(rgn_id, year, prs_score_old = pressure_score)

gen_new <- current_gen_esc_data %>%
  filter(year == latest_data_year) %>%
  dplyr::select(rgn_id, year, pressure_score)

gen_compare <- full_join(gen_old, gen_new, by = c("rgn_id"))

# plot current data of previous and current year
comp_gen_years_plot <- ggplot() + 
  geom_point(data = gen_compare, aes(x = prs_score_old, y = pressure_score, text = paste("Region ID:", rgn_id))) +
  labs(x = paste("Previous Pressure", previous_latest_data_year), y = paste("Current Pressure", latest_data_year)) +
  geom_abline(slope = 1, intercept = 0, col = "red") +
  theme_bw()

ggplotly(comp_gen_years_plot)


# ---- Compare old genetic escape pressure scores for the same year ----

old_gen_esc_data <- readr::read_csv(here::here("globalprep", "mar", paste0("v", last_year),
                                  "output", "GenEsc.csv")) 

old_last_year <- old_gen_esc_data %>%
  filter(year == previous_latest_data_year) %>% 
  select(rgn_id, year, prs_score_old = pressure_score)

new_last_year <- current_gen_esc_data %>% 
  filter(year == previous_latest_data_year) %>%
  select(rgn_id, year, pressure_score)

compare_gen_data_years <- full_join(old_last_year, new_last_year, by = c("rgn_id")); View(compare_gen_data_years)

comp_gen_data_years_plot <- ggplot() + 
  geom_point(data = compare_gen_data_years, aes(x = prs_score_old, y = pressure_score, text = paste("Region ID:", rgn_id))) +
  labs(x = paste("Previous Pressure", previous_latest_data_year), y = paste("Current Pressure", previous_latest_data_year)) +
  geom_abline(slope = 1, intercept = 0, col = "red") + theme_bw()


ggplotly(comp_gen_data_years_plot)



# Compare genetic escapes pressure scores for Vietnam, data was updated by FAO
old_gen_vietnam <- old_gen_esc_data %>%
  filter(rgn_id == 207) %>% 
  select(rgn_id, year, prs_score_old = pressure_score)

new_gen_vietnam <- current_gen_esc_data %>%
  filter(rgn_id == 207) %>% 
  select(rgn_id, year, pressure_score)

compare_gen_vietnam <- full_join(old_gen_vietnam, new_gen_vietnam, by = c("rgn_id", "year")); View(compare_gen_vietnam)

plot(compare_gen_vietnam$prs_score_old, compare_gen_vietnam$pressure_score); abline(0, 1, col = "red")


# Compare genetic escapes pressure scores for Denmark
old_gen_denmark <- old_gen_esc_data %>%
  filter(rgn_id == 175) %>% 
  select(rgn_id, year, prs_score_old = pressure_score)

new_gen_denmark <- current_gen_esc_data %>%
  filter(rgn_id == 175) %>% 
  select(rgn_id, year, pressure_score)

compare_gen_denmark <- full_join(old_gen_denmark, new_gen_denmark, by = c("rgn_id","year")); View(compare_gen_denmark)

plot(compare_gen_denmark$prs_score_old, compare_gen_denmark$pressure_score); abline(0, 1, col = "red")

NOTE: THIS DOES NOT NEED TO BE RUN ANYMORE. Below here is archival… unless we change the reference point.

Potential Aquaculture Calculations for New Reference Point

\[REFERENCE RMD FILE: <https://raw.githubusercontent.com/OHI-Science/ohiprep_v2019/gh-pages/globalprep/mar/v2019/reference_point/CountryProductionEstimate.Rmd>\]

Summary

This analysis produces potential tonnes of aquaculture from growth potential estimates of finfish and bivalves. These aquaculture numbers will be used for the reference point for the mariculture subgoal. This only needs to be run for the 2019 global assessment, as this data is static.


Data Source

Growth Potential (Phi) data

Reference:
https://knb.ecoinformatics.org/view/doi:10.5063/F1CF9N69 Rebecca Gentry, Halley Froehlich, Dietmar Grimm, Peter Kareiva, Michael Parke, et al. SNAPP - Mapping the Global Potential for Marine Aquaculture. Knowledge Network for Biocomplexity. doi:10.5063/F1CF9N69.

Downloaded: 7/3/2019

Description: Growth Potential estimate raster for global cells


Methods

knitr::opts_chunk$set(eval=FALSE)
#######regression inputs from VBGF_Fish.r

# This is a heavily modified script from KNB:
# https://knb.ecoinformatics.org/view/doi:10.5063/F1CF9N69

# From this paper:
# https://www.nature.org/content/dam/tnc/nature/en/documents/Mapping_the_global_potential_for_marine_aquaculture.pdf


### libraries useful for data wrangling
library(dplyr)
library(tidyr)
library(tidyverse)

## libraries useful for spatial data
library(raster)       
library(rgdal)        
library(sf)         
library(fasterize)

## data visualization
library(RColorBrewer)
library(ggplot2)
library(rasterVis)    
library(maps)

## path management
library(here)


## some OHI files
source('http://ohi-science.org/ohiprep_v2019/workflow/R/common.R')
dir_git <- file.path(here(), "globalprep/mar/v2019")

Establish variables and coefficients

## Variables needed to get from PHI to tonnes of production
## MRF: assumes 1 farm is 1km2  ##
# (seems easier to just units of km2, rather than farm)

F_estCoef <- c(7.6792, (-5.8198)) #from regression estimated in VBGF_Fish_Final.r
B_estCoef <- c(2.9959,(-1.6659)) #from regression estimate in VBGF_Bivalves.r
density <- 20 #juveniles per m3
cagesize <- 9000 #m3
cagesperfarm <- 24 #located atleast 1 km apart.. cagesperkm2
bivperfarm <- 130000000 #(100 longlines/km2) * 4 km * (100 bivalves seeded/0.0003 km) = 133333333 bivalve/km2
weight35cm <- 554.8  ## in grams see VBGF_Fish_Final. Paper reports 548 grams 

Finfish Production

## Global tiff file of PHI (Growth Potential) estimates
FishPhiALLConstraints <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishPhiALLConstraints95LT2.tiff"))
plot(FishPhiALLConstraints)
FishPhiVector=getValues(FishPhiALLConstraints)


## Convert Phi raster to number of years it takes to grow a 35 cm fish

LogFishYears <- calc(FishPhiALLConstraints, fun=function(x){F_estCoef[1]+F_estCoef[2]*log(x)})
LogFishYears
plot(LogFishYears)

FishYears <- calc(LogFishYears, fun=function(x){exp(x)})

FishYears
plot(FishYears)
writeRaster(FishYears,file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishYearsbyCell.tif"), overwrite=TRUE)

FishYears <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/FishYearsbyCell.tif"))
FishYearsVector=getValues(FishYears)



#OHI 2018 regions (original analysis used older regions file)

## call spatial file from sourced file
regions_shape()
OHIcountries <- regions %>%
  filter(type_w_ant == "eez")
OHIcountries <- st_transform(OHIcountries, crs(FishPhiALLConstraints))
OHIcountries_raster <- fasterize(OHIcountries, FishPhiALLConstraints, field="rgn_id")  
CountryVector=getValues(OHIcountries_raster)




### area of each cell (each cell is different given lat/long coordinate reference system)
areaPerCell <- area(FishPhiALLConstraints, weights=FALSE, na.rm=TRUE)
areaPerCellVector <- getValues(areaPerCell)

### Make a dataframe with raster values that includes cells: Country, area, Phi, and Years to Harvest
productionDF <- data.frame(CellID = 1:933120000,
                           Country = CountryVector,
                           AreaKM2 = areaPerCellVector,
                           PhiPrime = FishPhiVector, 
                           YearsToHarvest = FishYearsVector)

head(productionDF)

summary(FishYearsVector)
summary(areaPerCellVector) ##they seem to match



## calculate production for each cell
productionDFFishCells <- productionDF %>%
  filter(!is.na(YearsToHarvest)) %>%
  mutate(F_yieldperfarmMT = (weight35cm * density * cagesize * cagesperfarm)/1000000) %>%  # MRF: units yieldperkm2? 554.8 grams * 20 juv/m3 * 9000 m3 * 24 cages/km2 = grams/km2
  mutate(F_yieldpercellperyear = (F_yieldperfarmMT/YearsToHarvest) * AreaKM2) %>%
  arrange(YearsToHarvest) %>%
  mutate(YieldCumSum = cumsum(F_yieldpercellperyear)) %>%
  mutate(AreaCumSum = cumsum(AreaKM2))

write.csv(productionDFFishCells, file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/productionDFFishCells.csv")) #save to mazu because so large. This is functionally a raster file. 

productionDFFishCells <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/productionDFFishCells.csv"))

head(productionDFFishCells)
summary(productionDFFishCells)
str(productionDFFishCells)
##cumsum area is 11,402,629 km2 -  # matches the paper 
##cumprod is 15,950,000,000MT -   # matches the paper 

### how many of these cells are not in a country?
sum(is.na(productionDFFishCells$Country))
dim(productionDFFishCells)

## MRF: with new OHI regions: 544,569 are not in a country..probably a lot are in conflicted areas


## Calculate production if 1% of top production area is used:
productionByCountryFishDF <- productionDFFishCells %>%
  filter(!is.na(Country)) %>%
  dplyr::select(-YieldCumSum, -AreaCumSum, -X) %>%
  arrange(YearsToHarvest) %>%
  mutate(ID = Country) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  mutate(CountryYieldCumSum = cumsum(F_yieldpercellperyear)) %>%
  mutate(CountryAreaCumSum = cumsum(AreaKM2)) %>%
  mutate(MaxProdPerCountry = max(CountryYieldCumSum)) %>% 
  mutate(OnePercentDevPerCountry = .01*max(CountryAreaCumSum)) #calculating 1 percent of area

write.csv(productionByCountryFishDF,file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/FishProdByCountryByCell.csv"), row.names = FALSE) #save to mazu because so large. This is functionally a raster file. 

productionByCountryFishDF <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/fish/FishProdByCountryByCell.csv"))

## MRF: For each area identify amount of area that corresponds to 1% of production area, 
## MRF: assume maximum production within the country for the 1% of area
CountryProdSummary <- productionByCountryFishDF %>%
  dplyr::arrange(YearsToHarvest) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  filter(CountryAreaCumSum <= OnePercentDevPerCountry) %>%
  mutate(ProdPerCountryOnePercent = max(CountryYieldCumSum)) %>%
  slice(1)


write.csv(CountryProdSummary, file.path(dir_git, "/int/FishProdByCountrySummary.csv"), row.names = FALSE) #save to github

CountryProdSummary <- read.csv(file.path(dir_git, "int/FishProdByCountrySummary.csv"))

# MRF: get fasted YearsToHarvest for each country
CountryProdSummaryNop <- productionByCountryFishDF %>%
  dplyr::arrange(YearsToHarvest) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  slice(1)



# Add country names

region_data()
CountryLabel <- rgns_eez %>%
  dplyr::select(ID = rgn_id, rgn_name)


## Final data
## I think the relevant value we want for the reference point is in this table:  ProdPerCountryOnePercent. Indonesia number for fish matches up with paper. > 24 million tonnes of fish if 1% of aquaculture potential developed. 
CountryProdSummaryFAO <- CountryProdSummary %>%
  ungroup %>%
  dplyr::select(ID:ProdPerCountryOnePercent) %>%
  full_join(CountryLabel, by= "ID")

write.csv(CountryProdSummaryFAO, file.path(dir_git, "/int/FishProdByCountrySummaryFAO.csv"), row.names = FALSE) #save to github

sum(CountryProdSummaryFAO$MaxProdPerCountry, na.rm = TRUE)
#15451023277 number matches paper. > 15 billion tonnes 

Bivalve Production

##Now for Bivalves 
BivalvePhiALLConstraints=raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalvePhiALLConstraints95LT1.tif"))

plot(BivalvePhiALLConstraints)
BivalvePhiVector <- getValues(BivalvePhiALLConstraints)


#OHI 2018 regions (original analysis used older regions file)

## call spatial file from sourced file
regions_shape()

OHIcountries <- regions %>%
  filter(type_w_ant == "eez")

OHIcountries <- st_transform(OHIcountries, crs(FishPhiALLConstraints))

OHIcountries_raster <- fasterize(OHIcountries, FishPhiALLConstraints, field="rgn_id")  

CountryVector <- getValues(OHIcountries_raster)

#make the value of each cell the years it takes to grow a 4 cm bivlave
LogBivalveYears <- calc(BivalvePhiALLConstraints, fun = function(x){B_estCoef[1] + B_estCoef[2]*(x)})
LogBivalveYears
plot(LogBivalveYears)

BivalveYears=calc(LogBivalveYears,fun=function(x){exp(x)})

BivalveYears
plot(BivalveYears)

writeRaster(BivalveYears,file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalveYearsByCell.tif"), overwrite=TRUE)

BivalveYears <- raster(file.path(dir_M, "git-annex/globalprep/mar/v2019/Spatial_Data/NewLayersWOHypoxia/BivalveYearsByCell.tif"))

BivalveYearsVector <- getValues(BivalveYears)


###now load in area values for each cell
#areaPerCell=raster("Spatial_Data/MiddleFiles/AreaBivalveLT1.grd")
areaPerCell <- area(BivalvePhiALLConstraints, weights=FALSE, na.rm=TRUE)
areaPerCell

areaPerCellVector=getValues(areaPerCell)


productionDFBiv <- data.frame(CellID=1:933120000,
                           Country=CountryVector,
                           AreaKM2=areaPerCellVector,
                           PhiPrime=BivalvePhiVector, 
                           YearsToHarvest=BivalveYearsVector)

head(productionDFBiv)

productionDFBivCells <- productionDFBiv %>%
  filter(!is.na(YearsToHarvest)) %>%
  mutate(B_yieldperfarmInd = bivperfarm) %>%
  mutate(B_yieldpercellperyear = (B_yieldperfarmInd / YearsToHarvest) * AreaKM2) %>%
  arrange(YearsToHarvest) %>%
  mutate(YieldCumSum = cumsum(B_yieldpercellperyear)) %>%
  mutate(AreaCumSum = cumsum(AreaKM2))

head(productionDFBivCells)
summary(productionDFBivCells)
str(productionDFBivCells)

write.csv(productionDFBivCells, file.path(dir_M, "/git-annex/globalprep/mar/v2019/int/productionDFBivCells.csv")) #save to mazu because so large. This is functionally a raster file. 

productionDFBivCells <- read.csv(file.path(dir_M, "/git-annex/globalprep/mar/v2019/int/bivalve/productionDFBivCells.csv"))

productionByCountryBivDF <- productionDFBivCells %>%
  filter(!is.na(Country)) %>%
  dplyr::select(-YieldCumSum, -X) %>%
  dplyr::select(-AreaCumSum) %>%
  arrange(YearsToHarvest) %>%
  mutate(ID = Country) %>%
  dplyr::arrange(ID) %>%
  group_by(ID) %>%
  mutate(CountryYieldCumSum = cumsum(B_yieldpercellperyear)) %>%
  mutate(CountryAreaCumSum = cumsum(AreaKM2)) %>%
  mutate(MaxPhi = max(PhiPrime)) %>%
  mutate(averagePhi = mean(PhiPrime)) %>%
  mutate(averageWeightedPhi = sum(PhiPrime*AreaKM2)/(max(CountryAreaCumSum))) %>%
  mutate(MaxDevPerCountry = max(CountryAreaCumSum)) %>%
  mutate(MaxProdPerCountry = max(CountryYieldCumSum)) %>%
  mutate(OnePercentDevPerCountry = .01*max(CountryAreaCumSum))

head(productionByCountryBivDF)

write.csv(productionByCountryBivDF,file.path(dir_M, "git-annex/globalprep/mar/v2019/int/bivalve/BivProdByCountryByCell.csv"), row.names = FALSE) #save to mazu because so large. This is functionally a raster file. 

productionByCountryBivDF <- read.csv(file.path(dir_M, "git-annex/globalprep/mar/v2019/int/bivalve/BivProdByCountryByCell.csv")) 


CountryProdSummary <- productionByCountryBivDF %>%
  filter(CountryAreaCumSum <= OnePercentDevPerCountry) %>%
  mutate(ProdPerCountryOnePercent = max(CountryYieldCumSum)) %>%
  slice(1)

write.csv(CountryProdSummary,file.path(dir_git, "int/BivalveProdByCountrySummary.csv"), row.names = FALSE) #save to github


region_data()
CountryLabel <- rgns_eez %>%
  dplyr::select(ID = rgn_id, rgn_name)

head(CountryLabel)

## I think the relevant value we want for the reference point is in this table:  ProdPerCountryOnePercent. Indonesia number for bivalves is higher than the paper, however the paper does use the phrase "over 3.9*10^11 tonnes". Data says 4.7 * 10^11 million tonnes of bivalves if 1% of aquaculture potential developed, which is greater than 3.9 * 10^11 tonnes.  
CountryProdSummaryFAO <- CountryProdSummary %>%
  ungroup %>%
  dplyr::select(ID:ProdPerCountryOnePercent) %>%
  full_join(CountryLabel, by = "ID")

write.csv(CountryProdSummaryFAO, file.path(dir_git, "int/BivalveProdByCountrySummaryFAO.csv"), row.names = FALSE) #save to github 

sum(CountryProdSummaryFAO$MaxDevPerCountry, na.rm = TRUE)
#1491404 km2. Matches paper ~1,500,000 km2

Unit Conversion and Gap Filling

To compare potential vs harvest, we need to convert bivalve units to metric tonnes, they are currently in units of individual bivalves. Some figures:

Averaging these gives about 27.5g per piece.

aq_mass_per_pc <- 0.0275 * 1e-3 ### mass of bivalve piece in tonnes

pot_b <- read_csv(file.path(dir_git, 'int/BivalveProdByCountrySummaryFAO.csv')) %>%
  mutate(potential_prod_one_percent_b = ProdPerCountryOnePercent * aq_mass_per_pc,
         potential_prod_max_b = MaxProdPerCountry * aq_mass_per_pc,
         aq_type = 'shellfish',
         units   = 'tonnes') %>%
  dplyr::select("rgn_id" = "ID", rgn_name, potential_prod_one_percent_b, potential_prod_max_b)

pot_f <- read_csv(file.path(dir_git, 'int/FishProdByCountrySummaryFAO.csv')) %>%
  mutate(aq_type = 'finfish',
         units   = 'tonnes') %>%
  dplyr::select("rgn_id" = "ID", rgn_name, "potential_prod_one_percent_f" = "ProdPerCountryOnePercent", "potential_prod_max_f" = "MaxProdPerCountry")

pot_aq_int <- full_join(pot_b, pot_f) ## makes sense. 1% of potential AREA... not 1% of potential PRODUCTION

write_csv(pot_aq, file.path(dir_git, 'int/aq_potential_int.csv')) ##intermediate file that might come in handy in the future

DT::datatable(pot_aq_int)


pot_aq_final <- pot_aq_int %>%
  mutate(potential_prod_one_percent_b = replace_na(potential_prod_one_percent_b, 0),
         potential_prod_one_percent_f = replace_na(potential_prod_one_percent_f, 0)) %>%
  mutate(potential_mar_tonnes = potential_prod_one_percent_b + potential_prod_one_percent_f) %>%
  arrange(rgn_id) %>%
  dplyr::select(rgn_id, potential_mar_tonnes) %>%
  mutate(year = 2019)

write_csv(pot_aq_final, file.path(dir_git, 'output/aq_potential_final.csv'))

DT::datatable(pot_aq_final)

## make gapfilling dataset 
pot_aq_final_gf <- pot_aq_final %>%
  mutate(gapfilled = case_when(
    potential_mar_tonnes == 0 ~ 1,
    potential_mar_tonnes > 0 ~ 0
  ), 
  method = case_when(
    potential_mar_tonnes == 0 ~ "missing regions given 0 value",
    potential_mar_tonnes > 0 ~ ""
  )) %>%
  dplyr::select(rgn_id, gapfilled, method, year)

write_csv(pot_aq_final_gf, file.path(dir_git, "output/aq_potential_gf.csv"))