1 Summary

This script takes prepped non-industrial v2022 SAUP data which goes up to 2019 (created in ao_catch_prep_saup.Rmd), and B/Bmsy estimates of those stocks to calculate a score for artisanal fishing species per OHI region. The FIS data prep will need to be completed prior to this data prep. Also make sure to run ao_catch_prep_saup before this script.

1.1 Updates from previous assessment

For v2023 and v2024 the Sea Around us Project (SAUP) data used in the ao_catch_prep_saup had not been updated, only the RAM data was updated for the FIS layers. Data files that were not newly updated and were stored in the repository were copied from the v2023 to v2024 folder for consistency.

2 Data Source

Reference: Pauly D., Zeller D., Palomares M.L.D. (Editors), 2020. Sea Around Us Concepts, Design and Data (seaaroundus.org).

Downloaded: September 27, 2022

Description: Tons per year and SAUP region with information on sector type, industry type, fishing entitity, reporting status and taxonomic information.

Time range: 1950 - 2019

Format: CSV

Additional Information: Methods

Reference: RAM Legacy Stock Assessment Database v4.65

Downloaded: 08/07/2024

Description: B/Bmsy value by stock and year (other data, which we do not use, are also available in the database)

Native data resolution: stock (fish stock, species and region specific)

Time range: 1800 - 2023 (we only use the year which matches our fisheries catch data (2019 for v2024))

Format: CSV format

Additional Information: We use the finalized b/bmsy layer from OHI-global for this data prep. We do not actually read in the raw RAM data here.

3 Methods

Steps: 1. Join the non-industrial catch data with the final B/Bmsy layer used in the FIS model 2. Convert the B/Bmsy values to scores (cap them at 1.. we wont penalize for underharvesting). 3. Take a catch weighted average of B/Bmsy scores for each region/year and gapfill those regions that are missing.

knitr::opts_chunk$set(message = FALSE, warning = FALSE, eval=FALSE)

library(tidyverse)
library(here)
library(htmlwidgets)
library(patchwork)
library(ggplot2)
source(here("workflow", "R", "common.R"))

3.1 Join with the final B/Bmsy layer from the fis model, convert the B/Bmsy values to scores (this is done in functions.R for FIS subgoal, but we do it in the dataprep for AO), and take a catch weighted average of B/Bmsy scores for each region/year.

#update these!! 
version_year <- "v2024"
previous_version_year <- "v2023"

fis_bbmsy <- read_csv(here::here("globalprep","fis", version_year, "output","fis_bbmsy.csv"))


catch_AO <- read_csv(here::here("globalprep","ao", version_year, "intermediate","mean_catch.csv")) 

length(unique(catch_AO$rgn_id)) # there are only 196 regions here... we want there to be 220 regions. Which regions are missing? 

region_data() # to have access to rgns_all and rgns_eez

test <- catch_AO %>%
  left_join(rgns_eez) 

length(setdiff(rgns_eez$rgn_id, test$rgn_id)) # v2024: 25 regions are within the rgns_eez but not in test

cat(paste(shQuote(setdiff(rgns_eez$rgn_name, test$rgn_name), type = "cmd"), collapse = ","))
cat(paste(shQuote(setdiff(rgns_eez$rgn_id, test$rgn_id), type = "cmd"), collapse = ","))

# v2024: "Macquarie Island","Wake Island","Glorioso Islands","Juan de Nova Island","Bassas da India","Ile Europa","Ile Tromelin","British Indian Ocean Territory","Gibraltar","South Georgia and the South Sandwich Islands","Prince Edward Islands","Crozet Islands","Amsterdam Island and Saint Paul Island","Kerguelen Islands","Heard and McDonald Islands","Bouvet Island","Clipperton Island","Jan Mayen","Jarvis Island","Palmyra Atoll","Howland Island and Baker Island","Johnston Atoll","Monaco","Antarctica","Oecussi Ambeno"  
# "4","12","30","33","34","35","36","38","60","89","90","91","92","93","94","105","107","144","149","150","158","159","185","213","237"

## All small islands.. weird. So we need to gapfill these regions somehow... do they have b/bmsy data?

missing <- c("Macquarie Island","Wake Island","Glorioso Islands","Juan de Nova Island","Bassas da India","Ile Europa","Ile Tromelin","British Indian Ocean Territory","Gibraltar","South Georgia and the South Sandwich Islands","Prince Edward Islands","Crozet Islands","Amsterdam Island and Saint Paul Island","Kerguelen Islands","Heard and McDonald Islands","Bouvet Island","Clipperton Island","Jan Mayen","Jarvis Island","Palmyra Atoll","Howland Island and Baker Island","Johnston Atoll","Monaco","Antarctica","Oecussi Ambeno")

missing_id <- as.numeric(c("4","12","30","33","34","35","36","38","60","89","90","91","92","93","94","105","107","144","149","150","158","159","185","213","237"))

test <- fis_bbmsy %>%
  left_join(rgns_eez) %>%
  filter(rgn_name %in% missing)

setdiff(missing, unique(test$rgn_name)) # bouvet island and Antarctica (scores arent calculated for this, that is okay) is missing?
## they do have b/bmsy data! Lets just use their overall b/bmsy scores (for industrial fishing), as their AO b/bsmy scores.. not perfect, but better than nothing!

bouvet_test <- fis_bbmsy %>%
  filter(rgn_id == 105)

bouvet_test <- read.csv("https://raw.githubusercontent.com/OHI-Science/ohi-global/draft/eez/scores.csv") %>%
  filter(region_id == 105,
         goal == "FIS") # bouvet has fisheries scores... so lets use that for their AO score

## First cap b/bmsy scores
b <- fis_bbmsy %>%
  dplyr::mutate(bbmsy = ifelse(bbmsy > 1, 1, bbmsy))

c <- catch_AO %>%
  dplyr::mutate(stock_id_taxonkey = as.character(stock_id_taxonkey)) %>%
  dplyr::mutate(taxon_key = stringr::str_sub(stock_id_taxonkey,-6,-1)) %>%
  dplyr::mutate(stock_id = substr(stock_id_taxonkey, 1, nchar(stock_id_taxonkey) -
                                    7)) %>%
  dplyr::mutate(catch = as.numeric(mean_catch)) %>%
  dplyr::mutate(year = as.numeric(as.character(year))) %>%
  dplyr::mutate(rgn_id = as.numeric(as.character(rgn_id))) %>%
  dplyr::mutate(taxon_key = as.numeric(as.character(taxon_key))) %>%
  dplyr::select(rgn_id, year, stock_id, taxon_key, mean_catch)


## read in fisheries mean catch so we can use to gapfill missing regions
fis_mean_catch <- read_csv(here::here("globalprep","fis", version_year, "int","mean_catch.csv")) %>% 
  dplyr::mutate(stock_id_taxonkey = as.character(stock_id_taxonkey)) %>%
  dplyr::mutate(taxon_key = sub('.*_', '', stock_id_taxonkey)) %>%
  dplyr::mutate(stock_id = sub('_[^_]*$', '', stock_id_taxonkey)) %>%
  dplyr::mutate(catch = as.numeric(mean_catch)) %>%
  dplyr::mutate(year = as.numeric(as.character(year))) %>%
  dplyr::mutate(rgn_id = as.numeric(as.character(rgn_id))) %>%
  dplyr::mutate(taxon_key = as.numeric(as.character(taxon_key))) %>%
  dplyr::select(rgn_id, year, stock_id, taxon_key, catch)

test <- fis_mean_catch %>%
  filter(rgn_id == 105) # yay!

b <- b %>%
  dplyr::mutate(bbmsy = as.numeric(bbmsy)) %>%
  dplyr::mutate(rgn_id = as.numeric(as.character(rgn_id))) %>%
  dplyr::mutate(year = as.numeric(as.character(year))) %>%
  dplyr::mutate(stock_id = as.character(stock_id)) # fix some classes

#  ===== Merge the b/bmsy data with catch data ====

data_fis <- c %>%
  dplyr::left_join(b, by = c('rgn_id', 'stock_id', 'year')) %>%
  dplyr::select(rgn_id, stock_id, year, taxon_key, mean_catch, bbmsy)


gapfill_missing <- b %>%
  filter(rgn_id %in% missing_id) %>% 
  left_join(fis_mean_catch, by = c("rgn_id", "stock_id", "year")) %>%
  dplyr::select(rgn_id, stock_id, year, taxon_key, mean_catch = catch, bbmsy) 

fix_bouvet <- fis_mean_catch %>%
  filter(rgn_id == 105) %>%
  dplyr::select(rgn_id, stock_id, year, taxon_key, mean_catch = catch) %>%
  mutate(bbmsy = NA)

data_fis_final <- rbind(data_fis, gapfill_missing, fix_bouvet)

length(unique(data_fis_final$rgn_id)) # 220 regions ; perfect

# ==== Estimate scores for taxa without b/bmsy values ====
# Mean score of other fish in the region is the starting point
# Then a penalty is applied based on the level the taxa are reported at

## this takes the mean score within each region and year
data_fis_gf <- data_fis_final %>%
  dplyr::group_by(rgn_id, year) %>%
  dplyr::mutate(mean_score = mean(bbmsy, na.rm = TRUE)) %>%
  dplyr::ungroup()

## this takes the mean score of the bbmsy across all regions within a year and replaces any regions that have no mean_score with the global mean score by year
# (when no stocks have scores within a region)
data_fis_gf <- data_fis_gf %>%
  dplyr::group_by(year) %>%
  dplyr::mutate(mean_score_global = mean(bbmsy, na.rm = TRUE)) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(mean_score = ifelse(is.na(mean_score), mean_score_global, mean_score)) %>%
  dplyr::select(-mean_score_global)


data_fis_gf <- data_fis_gf %>%
  dplyr::mutate(TaxonPenaltyCode = as.numeric(substring(taxon_key, 1, 1))) %>%
  dplyr::mutate(score_gf = mean_score) %>%
  dplyr::mutate(method = ifelse(is.na(bbmsy), "Mean gapfilled", NA)) %>%
  dplyr::mutate(gapfilled = ifelse(is.na(bbmsy), 1, 0)) %>%
  dplyr::mutate(score = ifelse(is.na(bbmsy), score_gf, bbmsy)) %>%
  dplyr::mutate(method = ifelse(rgn_id %in% missing_id, "Used fisheries subgoal b/bmsy and catch", method)) %>%
  dplyr::mutate(gapfilled = ifelse(rgn_id %in% missing_id, 1, gapfilled))

test <- data_fis_gf %>%
  filter(rgn_id == 105)
# filter(method == "Used fisheries subgoal b/bmsy and catch") # perfect

# select only the columns we want
gap_fill_data <- data_fis_gf %>%
  dplyr::select(rgn_id,
                stock_id,
                taxon_key,
                year,
                mean_catch,
                score,
                gapfilled,
                method) 

write_csv(gap_fill_data, here::here("globalprep","ao", version_year, "output","AO_bbmsy_summary_gf.csv"))

# ===== Calculate status for each region =====

## Take a catch weighted average of B/Bmsy scores for each region/year.

score_data <- data_fis_gf %>%
  dplyr::select(rgn_id, stock_id, year, mean_catch, score) %>% 
  dplyr::group_by(rgn_id, year) %>%
  dplyr::summarize(score = weighted.mean(score, mean_catch)) %>% # use the mean_catch to get a weighted average of the scores
  dplyr::ungroup() 

summary(score_data) # v2024: no NAs, yay!

length(unique(score_data$rgn_id)) # 220 - perfect! 

test <- score_data %>%
  filter(rgn_id %in% missing_id) # perfect! 

write_csv(score_data, here::here("globalprep","ao", version_year, "output","ao_nind_scores.csv"))

3.2 Data check

region_data()

stk <- read_csv(here::here("globalprep","ao",version_year, "output","ao_nind_scores.csv")) %>%
  left_join(rgns_eez)

need <- read_csv(here::here("globalprep","ao", version_year, "output","wb_gdppcppp_rescaled_gf.csv")) %>%
  left_join(rgns_eez)

setdiff(need$rgn_name, stk$rgn_name) # v2024: character(0); perfect! 

#update these accordingly!! (v2024 still was using the most recent year of SAUP data, which was 2019)
latest_data_year <- 2019
previous_assessment_latest_data_year <- 2019

#look at the difference between this year and last
new <- read_csv(here::here("globalprep","ao",version_year, "output","ao_nind_scores.csv")) %>%
  left_join(rgns_eez)

old <- read_csv(here::here("globalprep","ao",previous_version_year, "output","ao_nind_scores.csv")) %>% 
  select(rgn_id, year, old_score = score)

compare <- new %>% 
  left_join(old, by = c("year", "rgn_id")) %>% 
  filter(year == previous_assessment_latest_data_year) # filter the data to only 2019

library(plotly)
library(ggplot2)

compare_plot <- ggplot(data = compare) + geom_point(aes(x = old_score, y = score, text = rgn_id, alpha = 0.7))  +
  geom_abline(color = "blue") + 
  labs(x = paste("score", previous_assessment_latest_data_year, previous_version_year),
       y = paste("score", previous_assessment_latest_data_year, version_year), title = paste(previous_assessment_latest_data_year, "data comparison")) +
  theme_minimal()
compare_plotly <- ggplotly(compare_plot, tooltip = c("text", "score", "old_score"))
compare_plotly

htmlwidgets::saveWidget(compare_plotly, here::here("globalprep","ao", version_year, "figs","v2023_v2024_compare_plot.html"))

#look at the difference between this year and last
new <- read_csv(here::here("globalprep","ao",version_year, "output","ao_nind_scores.csv")) %>%
  left_join(rgns_eez)

old <- read_csv(here::here("globalprep","ao",previous_version_year, "output","ao_nind_scores.csv")) %>% 
  select(rgn_id, year, old_score = score)

# ==== Percent change between this year and last year =====
compare_diff <- new %>% 
  dplyr::left_join(old, by = c("year", "rgn_id")) %>% 
  filter(year == previous_assessment_latest_data_year) %>%  # filter the data to only 2019
  mutate(diff = ((score - old_score)/old_score)*100) # using percent change equation

# interactive plotly of the percent change in ao_sust scores by region between v2023 and v2024 in 2019
diff_plot <- compare_diff %>%
  plot_ly(x = ~rgn_id, y = ~diff, 
          type = "scatter", mode = "lines") %>%
  layout(title = "AO Sustainability Score Percent Change by Region",
         xaxis = list(title = "Region ID"),
         yaxis = list(title = "Percent change v2023 vs v2024"))
diff_plot
# saveWidget(diff_plot, here::here("globalprep","ao", version_year, "figs","ao_sust_percent_change.html"))

# looking more into region 3, which had a large percent change in 2019
compare_diff_rgn_3 <- new %>% 
  left_join(old, by = c("year", "rgn_id")) %>%
  filter(rgn_id %in% "3") %>% 
  mutate(diff = ((score - old_score)/old_score)*100)

# plot of region 3's percent change over time 
rgn_3_plot <- compare_diff_rgn_3 %>%
  plot_ly(x = ~year, y = ~diff, 
          type = "scatter", mode = "lines") %>%
  layout(title = "AO Sustainability Score Percent Change for Norfolk Island",
         xaxis = list(title = "Year"),
         yaxis = list(title = "Percent change v2023 vs v2024"))
rgn_3_plot
# saveWidget(rgn_3_plot, here::here("globalprep","ao", version_year, "figs","rgn_3_percent_change.html"))

# plot of region 3's v2024 ao_sust score over time
rgn_3_scores_plot <- compare_diff_rgn_3 %>%
  plot_ly(x = ~year, y = ~score, 
          type = "scatter", mode = "lines") %>%
  layout(title = "AO Sustainability Scores v2024 for Norfolk Island",
         xaxis = list(title = "Year"),
         yaxis = list(title = "Scores over time v2024"))
rgn_3_scores_plot

# plot of region 3's v2023 ao_sust score over time
rgn_3_oldscores_plot <- compare_diff_rgn_3 %>%
  plot_ly(x = ~year, y = ~old_score, 
          type = "scatter", mode = "lines") %>%
  layout(title = "AO Sustainability Scores for Norfolk Island",
         xaxis = list(title = "Year"),
         yaxis = list(title = "Scores over time v2023"))
rgn_3_oldscores_plot

# patched interactive plot of v2023 and v2024 ao_sust scores
patch_rgn_3 <- subplot(rgn_3_scores_plot, rgn_3_oldscores_plot, nrows = 2, shareX = TRUE, titleX = TRUE, shareY = FALSE, titleY = TRUE)
patch_rgn_3

# saveWidget(patch_rgn_3, here::here("globalprep","ao", version_year, "figs","rgn_3_score_comparison.html"))

region_data() # to have access to rgns_all and rgns_eez

fis_bbmsy <- read_csv(here::here("globalprep","fis", version_year, "output","fis_bbmsy.csv")) # v2024

test_rgn3 <- fis_bbmsy %>%
  left_join(rgns_eez) %>% 
  filter(rgn_id %in% "3")

length(unique(test_rgn3$stock_id)) # 12
length(test_rgn3$stock_id) # 228

fis_bbmsy_prev <- read_csv(here::here("globalprep","fis", previous_version_year, "output","fis_bbmsy.csv")) # v2023

test_rgn3_prev <- fis_bbmsy_prev %>%
  left_join(rgns_eez) %>% 
  filter(rgn_id %in% "3") %>% 
  rename(old_bbmsy = bbmsy)

length(unique(test_rgn3_prev$stock_id)) # 12
length(test_rgn3_prev$stock_id) # 228


# ======= catch =======
catch_AO <- read_csv(here::here("globalprep","ao", version_year, "intermediate","mean_catch.csv"))

catch_test_rgn3 <- catch_AO %>%
  left_join(rgns_eez) %>% 
  filter(rgn_id %in% "3")

length(unique(catch_test_rgn3$stock_id_taxonkey)) # 8
length(catch_test_rgn3$stock_id_taxonkey) # 152 observations

catch_AO_prev <- read_csv(here::here("globalprep","ao", previous_version_year, "intermediate","mean_catch.csv")) # v2023

catch_test_rgn3_prev <- catch_AO_prev %>%
  left_join(rgns_eez) %>% 
  filter(rgn_id %in% "3")

length(unique(catch_test_rgn3_prev$stock_id_taxonkey)) # 8
length(catch_test_rgn3_prev$stock_id_taxonkey) # 152

# bbmsy
compare_bbmsy <- test_rgn3 %>% 
  left_join(test_rgn3_prev, by = c("year", "rgn_id", "stock_id")) %>% 
  filter(year %in% 2019) %>%
  mutate(diff = as.numeric(((bbmsy - old_bbmsy)/old_bbmsy)*100))

ggplot(data = compare_bbmsy, aes(x = stock_id, y = diff)) +
  geom_point(color = "darkblue",alpha = 0.7) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)
  ) +
  labs(
    title = "Percent Change in B/Bmsy Estimates v2024 vs v2023",
    x = "Stock ID",
    y = "Percent change",
    caption = "% change: ((new - old) / old) * 100"
  ) +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10))

# it looks like Pagrus_auratus-81 has the greatest percent change, ~140%.  Kajikia_audax-81 has 50% change.

# lets look and see if Pagrus_auratus-81 is in any other region.
test_pag_aur <- fis_bbmsy %>%
  left_join(rgns_eez) %>% 
  filter(stock_id %in% "Pagrus_auratus-81")
unique(test_pag_aur$rgn_id) 
# [1]   3  16 162, however region 16 and 162 did not get very impacted by Pagrus_auratus-81: % change for 16 is ~0, % change for 162 is ~1.4

# lets look and see if Kajikia_audax-91 is in any other region.
test_kaj <- fis_bbmsy %>%
  left_join(rgns_eez) %>% 
  filter(stock_id %in% "Kajikia_audax-81")
unique(test_kaj$rgn_id) 
# [1]   3   5  16  18 146 147 153 155 162
# of this, 162 has the highest other percent change, at ~1.4

v2024 data check notes:

Looking further into Norfolk Island, it looks like the main change must have come from the changes in bbmsy score for multiple stocks found in the region due to v4.65 RAM data in v2024 vs v4.61 RAM data in v2023.

It looks like Pagrus_auratus-81 has the greatest percent change in 2019, ~140%. Kajikia_audax-81 has ~50% change between v2023 and v2024’s B/Bmsy estimates.

What is weird is that Pagrus_auratus-81 was also present in regions 16 and 162, but those did not have a high % change from v2023 to v2024. My intuition is that Norfolk Island may have a greater mean catch of Pagrus_auratus (Australasian snapper), so it was weighted more than regions 16 (Australia) and 162 (New Zealand) when calculating the ao_sust score.

Since Norfolk Island is rather small, it is possible that biomass for the Pagrus_auratus-81 population was under-estimated, and newer data shows that the stock actually has more biomass than previously thought. Therefore, the B/Bmsy is higher and the overall stock has more biomass than needed for maximum sustainable yield.

OHI 2024 - Artisanal Opportunities: Preparing stock status data