Summary

This script prepares the RAM B/Bmsy data: 1. Relevant data are collected from the RAM database 2. Missing years are gapfilled when appropriate 3. RAM and SAUP species names are harmonized in a few cases 4. RAM stocks are associated with the corresponding OHI and FAO regions

Updates from previous assessment

Using new SAUP catch data

Data

B/Bmsy values from stock assessments

Reference: RAM Legacy Stock Assessment Database v4.495

Downloaded: 06/03/2021
Description: B/Bmsy value by stock and year (other data, which we do not use, are also available in the database)
Native data resolution: stock (fish stock, species and region specific)
Time range: 1950 - 2024 (we only use the year which matches our fisheries catch data (2018 for v2021))
Format: R data files (.rds)
DOI: http://doi.org/10.5281/zenodo.4824192

Stock range data

Reference: Christopher M. Free. 2017. Mapping fish stock boundaries for the original Ram Myers stock-recruit database. https://marine.rutgers.edu/~cfree/mapping-fish-stock-boundaries-for-the-original-ram-myers-stock-recruit-database/. downloaded 9/25/2017.

Downloaded: 08/20/2018
Description: Shapefiles for each stock describing their distribution
Native data resolution: Spatial shapefiles
Format: Shapefiles

Setup

knitr::opts_chunk$set(fig.width = 6, fig.height = 4, fig.path = 'figs/',message = FALSE, warning = FALSE, echo = TRUE, eval=FALSE)

## Libraries
library(dplyr)
library(tidyr)
library(readr)
library(sf)
library(ggplot2)
library(here) 

## highlight out when knitting
setwd(here::here("globalprep/fis/v2021"))
source('../../../workflow/R/common.R')

Obtain RAM B/Bmsy data

The data is stored as a relational database in an R object. Check that the names of each element have not changed from last year! Update as appropriate in the below list.

The following tables are included (for full list, see loadDBdata.r in mazu):

timeseries
The time series data is a data frame containing all assessments conducted per stock with the following headers/columns:

assessid (2) stockid (3) stocklong (4) tsid (5) tsyear (6) tsvalue

bioparams
The time series data is a data frame with parameter values for all stocks and assessments. It has the following headers/columns:

assessid (2) stockid (3) stocklong (4) bioid (5) biovalue (6) bioyear (7) bionotes

timeseries_values_views
This stores the timeseries values, using the most recent assessment available, with timeseries type. The dataframe contains the following headers/columns: stockid, stocklong, year, TBbest, TCbest, ERbest, BdivBmsypref, UdivUmsypref, BdivBmgtpref, UdivUmgtpref, TB, SSB, TN, R, TC, TL, RecC, F, ER, TBdivTBmsy, SSBdivSSBmsy, NdivNmsy, FdivFmsy, ERdivERmsy, CdivMSY, CdivMEANC, TBdivTBmgt, SSBdivSSBmgt, NdivNmgt, FdivFmgt, ERdivERmgt, Cpair, TAC, Cadvised, survB, CPUE, EFFORT, and stocks along the rows.
timeseries_units_views
This stores the timeseries units (or time series source for touse time series), with timeseries type. The dataframe contains the following headers/columns: stockid, stocklong, TBbest, TCbest, ERbest, BdivBmgtpref, UdivUmsypref, BdivBmgtpret, UdivUmgtpref, TB, SSB, TN, R, TC, TL, RecC, F, ER, TBdivTBmsy, SSBdivSSBmsy, NdivNmsy, FdivFmsy, ERdivERmsy, CdivMSY, CdivMEANC, TBdivTBmgt, SSBdivSSBmgt, NdivNmsy, FdivFmgt, ERdivERmgt, Cpair, TAC, Cadvised, survB, CPUE, EFFORT, and stocks along the rows
timeseries_id_views
This stores the timeseries ids with timeseries id along the columns. The dataframe contains the following headers/columns: stockid, stocklong, TBbest, TCbest, ERbest, BdivBmsypref, UdivUmsypref, BdivBmgtpref, UdivUmgtpref, TB, SSB, TN, R, TC, TL, RecC, F, ER, TBdivTBmsy, SSBdivSSBmsy, NdivNmsy, FdivFmsy, ERdivERmsy, CdivMSY, CdivMEANC, TBdivTBmgt, SSBdivSSBmgt, NdivNmgt, FdivFmgt, ERdivERmgt, Cpair, TAC, Cadvised, survB, CPUE, EFFORT, and stocks along the rows.
bioparams_values_views
This stores the bioparams values, with bioparam type along the columns (TBmsybest, ERmsybest, MSYbest, TBmgtbest, ERmgtbest, TBmsy, SSBmsy, Nmsy, MSY, Fmsy, ERmsy, TBmgt, SSBmgt, Fmgt, ERmgt, TB0, SSB0, M, TBlim, SSBlim, Flim, ERlim) and stocks along the rows.
bioparams_units_views
This stores the bioparams units, with bioparam type along the columns (TBmsybest, ERmsybest, MSYbest, TBmgtbest, ERmgtbest, TBmsy, SSBmsy, Nmsy, MSY, Fmsy, ERmsy, TBmgt, SSBmgt, Fmgt, ERmgt, TB0, SSB0, M, TBlim, SSBlim, Flim, ERlim) and stocks along the rows.
bioparams_ids_views
This stores the bioparams ids, with bioparam id along the columns (TBmsybest, ERmsybest, MSYbest, TBmgtbest, ERmgtbest, TBmsy, SSBmsy, Nmsy, MSY, Fmsy, ERmsy, TBmgt, SSBmgt, Fmgt, ERmgt, TB0, SSB0, M, TBlim, SSBlim, Flim, ERlim) and stocks along the rows.
metadata
This stores assorted metadata associated with the stock, with datatypes along the columns (assessid, stockid, stocklong, assessyear, scientificname, commonname, areaname, managementauthority, assessorfull, region, FisheryType, taxGroup, primary_FAOarea, primary_country) and stock by row.
tsmetrics Contains metadata, with columns tscategory, tsshort, tslong, tsunitsshort, tsunitslong, tsunique.

For this data prep we primarily use and consult timeseries_values_views, tsmetrics, and metadata

load(file.path(dir_M, "git-annex/globalprep/_raw_data/RAM/d2021/RAMLDB v4.495/R Data/DBdata[asmt][v4.495].RData"))

ram_bmsy_new <- timeseries_values_views %>%
  dplyr::select(stockid, stocklong, year, TBdivTBmsy, SSBdivSSBmsy, TBdivTBmgt, SSBdivSSBmgt) %>%
  mutate(ram_bmsy = 
           ifelse(!is.na(TBdivTBmsy), TBdivTBmsy, SSBdivSSBmsy)) %>%
  mutate(ram_bmsy =
           ifelse(is.na(TBdivTBmsy) & is.na(SSBdivSSBmsy), TBdivTBmgt, ram_bmsy)) %>%
  mutate(ram_bmsy = 
           ifelse(is.na(TBdivTBmsy) & is.na(SSBdivSSBmsy) & is.na(TBdivTBmgt), SSBdivSSBmgt, ram_bmsy)) %>%
  dplyr::filter(year > 1979) %>%
  filter(!is.na(ram_bmsy)) %>%
  dplyr::select(stockid, stocklong, year, ram_bmsy)

Gapfill RAM data when there are missing years

For each stock: - Missing years are gapfilled using a linear regression model that includes data from 2001 to 2018 (2018 is the final year of SAUP data). To be included in the gapfilling, there have to be 5 or more years of B/Bmsy data occuring over the last 11 years of data, from 2005 to 2015. - We convert any predicted RAM B/Bmsy value less than the minimum observed B/Bmsy value to that the minimum observed value, as there are some negative predicted values.

Summary: - There are 432 RAM stocks with at least 5 years of B/Bmsy data from 2005 to 2015. - 289 of these stocks have at least 1 year of gapfilled data.
- A few of the predicted B/Bmsy values go below zero. We convert anything with a RAM B/Bmsy value < 0.000001475954 to 0.000001475954, which is the minimum observed B/Bmsy value in the data.

## gap fill ram_bmsy
## based on this it seems reasonable to gap-fill missing values

ram_gf_check <- ram_bmsy_new %>%
   filter(year >= 2001) %>%
  spread(year, ram_bmsy) 

# identify stocks for gapfilling (those with 5 or more years of data since 2005).
# NOTE: we potentially gapfill to 2001, but we want stocks with adequate *recent* data 
ram_bmsy_gf <- ram_bmsy_new %>%
  filter(year >= 2001 & year <= 2018) %>%   # 2017 corresponds to the final year of SAUP catch data
  group_by(stockid) %>%
  mutate(years_data_2005_now = length(ram_bmsy[year >= 2005])) %>%
  mutate(years_data_2001_now = length(ram_bmsy[year >= 2001])) %>%
  ungroup() %>%
  filter(years_data_2005_now >= 5)


## Get rows for stocks/years with no B/Bmsy (identified as NA B/Bmsy value for now)
ram_bmsy_gf <- ram_bmsy_gf %>%
  spread(year, ram_bmsy) %>% 
  gather("year", "ram_bmsy", -stockid, -years_data_2005_now, -years_data_2001_now, - stocklong) %>%
  mutate(year = as.numeric(year)) 


## gapfilling record keeping
ram_bmsy_gf <- ram_bmsy_gf %>%   
  mutate(gapfilled = NA) %>%
  mutate(gapfilled = ifelse(years_data_2001_now == 18, gapfilled, 
                            paste(18 - years_data_2001_now, "years gf", sep = " ")))

## see unique values of stocks
tmp <- ram_bmsy_gf %>%
  dplyr::select(stockid, gapfilled) %>%
  unique()


## check out gapfilling stats
length(tmp$gapfilled)  # 397 stocks with at least 5 years of data in past 11 years - v2019
                      # 398 stocks with at least 5 years of data in past 11 years - v2020
                      # 432 stocks with at least 5 years of data in past 11 years - v2021
sum(table(tmp$gapfilled))  # 222 stocks have at least one year of B/Bmsy values gapfilled - v2019
                          # 314 stocks have at least one year of B/Bmsy values gapfilled; this is because there is an additional year of data in RAM... 2016 - v2020
# 289 stocks have at least one year of B/Bmsy values gapfilled; I assume this loss is because version 4.495 is more refined than the previous version - v2021


## regression model for prediction for each stock
ram_bmsy_gf <- ram_bmsy_gf %>%
  group_by(stockid) %>%
  do({
    mod <- lm(ram_bmsy ~ year, data=.)  
    ram_bmsy_predict <- predict(mod, newdata=.[c('year')])
    data.frame(., ram_bmsy_predict)
  }) %>%
  ungroup()

summary(ram_bmsy_gf) #1235 NAs for ram_bmsy
sum(ram_bmsy_gf$ram_bmsy_predict < 0 )  # 50 of the predicted B/Bmsy values go below zero.  
min(ram_bmsy_gf$ram_bmsy, na.rm = TRUE) #0.000001475954

## We convert anything with a RAM BBmsy value < 0.000001475954 to 0.000001475954, which is the minimum observed B/Bmsy value in the data; add method documentation
ram_bmsy_gf <- ram_bmsy_gf %>%
  mutate(ram_bmsy_predict = ifelse(ram_bmsy_predict < 0.000001475954, 0.000001475954, ram_bmsy_predict)) 

## gapfilling record keeping
ram_bmsy_gf_final <- ram_bmsy_gf %>%
  mutate(method = ifelse(is.na(ram_bmsy), paste0("lm, ", gapfilled), NA)) %>%
  mutate(gapfilled = ifelse(is.na(ram_bmsy), "1", "0")) %>%
  mutate(ram_bmsy = ifelse(is.na(ram_bmsy), ram_bmsy_predict, ram_bmsy)) %>%
  dplyr::select(stockid, year, ram_bmsy, gapfilled, method) 

write.csv(ram_bmsy_gf_final, "int/ram_stock_bmsy_gf.csv", row.names=FALSE)

Get a general idea of how well the model predicts missing data based on observed and model predicted values. This model appears to do fairly well.

plot(ram_bmsy_gf$ram_bmsy, ram_bmsy_gf$ram_bmsy_predict)
abline(0,1, col="red")

plot(log(ram_bmsy_gf$ram_bmsy), log(ram_bmsy_gf$ram_bmsy_predict))
abline(0,1, col="red")

mod <- lm(ram_bmsy ~ ram_bmsy_predict, data=ram_bmsy_gf)
summary(mod)

Identify FAO and OHI regions for RAM stocks

Identify the FAO/OHI regions where each RAM stock is located (fao and ohi regions are assigned to RAM Data in STEP4b_fao_ohi_rgns.Rmd. Run fao_ohi_rgns.Rmd now.

If there are many differences between RAM spatial file and RAM metadata, check the fao_ohi_rgns.Rmd prep again.

## Read in RAM spatial stocks file
ram_spatial <- read.csv("int/RAM_fao_ohi_rgns_final.csv", stringsAsFactors = FALSE)

ram_meta <- metadata %>% 
  dplyr::select(stockid, stocklong, scientificname)

setdiff(ram_spatial$stockid, ram_meta$stockid) # make sure all the spatial data has corresponding metadata (should be 0). It is not 0, probably because these are ones that have been removed from the RAM database since the 2017 assessment... delete these from the data frame below. 

# join with metadata to get scientific name
ram_spatial <- ram_spatial %>%
  dplyr::select(-stocklong) %>%
  left_join(ram_meta, by = c("stockid")) %>%
  rename(RAM_species = scientificname) %>%
  filter(!is.na(stocklong)) ## filtering out ones that didn't match above

setdiff(ram_spatial$stockid, ram_meta$stockid) # now it is 0

Standardize species names

In most cases, the RAM and SAUP data use the same species names, but there are a few exceptions. The following code identifies species in the RAM data that are not in the SAUP data. In these cases, different species names may be used (although not necessarily because some of the species may be present in RAM, but not SAUP, for other reasons). For these species, I used fishbase to explore synonyms and create a table to harmonize the RAM species names with the SAUP species names (saved as: int/RAM_species_to_SAUP.csv).

ram_bmsy_gf_final <- read_csv(file.path("int/ram_stock_bmsy_gf.csv"))

# get list of RAM species, scientific name
ram_sp <- ram_bmsy_gf_final %>%
  left_join(data.frame(metadata), by = "stockid") %>%
  dplyr::select(scientificname) %>%
  unique() %>%
  arrange(scientificname)


# SAUP species, sci name (read in the datatable that includes TaxonKey)
SAUP_sp <- read_csv(file.path(dir_M,'git-annex/globalprep/fis/v2021/int/stock_catch_by_rgn_taxa.csv')) %>% 
  dplyr::rename(SAUP_scientificname = TaxonName) %>%
  dplyr::select(SAUP_scientificname) %>%
  unique() %>%
  arrange(SAUP_scientificname)


# compare names - what's in RAM that's not in SAUP
tmp <- data.frame(scientificname = sort(setdiff(ram_sp$scientificname, SAUP_sp$SAUP_scientificname))) # 47 species names

# compare names - what's in SAUP that's not in RAM
tmp2 <- data.frame(scientificname = sort(setdiff(SAUP_sp$SAUP_scientificname, ram_sp$scientificname))) # 2070 species names.. unfortunately a lot.

write.csv(tmp, "int/unmatched_RAM_species.csv", row.names=FALSE)
write.csv(tmp2, "int/SAUP_species_no_RAM.csv", row.names=FALSE)

#setdiff(tmp, ram_name_corr$RAM_species)


## join ram spatial with RAM species on scientific name. We can use this to help check whether questionable species names across the ram and SAUP data match by region and fao id...
ram_sp_fao_ohi <- tmp %>%
  left_join(ram_spatial, by = c("scientificname" = "RAM_species")) %>%
  unique()

write_csv(ram_sp_fao_ohi, "int/new_ram_sp.csv")  

## get SAUP fao_ohi regions
SAUP_sp_fao_ohi <- read_csv(file.path(dir_M,'git-annex/globalprep/fis/v2021/int/stock_catch_by_rgn_taxa.csv')) %>% 
  dplyr::rename(SAUP_scientificname = TaxonName) %>%
   dplyr::filter(year > 1979) # %>%
  #filter(str_detect(SAUP_scientificname, "Lepidorhombus"))

RAM_species_to_SAUP <- read.csv("int/RAM_species_to_SAUP.csv", stringsAsFactors = FALSE)

# Then I hand-looked up each of the missing ones from "unmatched_RAM_species.csv", and added those new ones to "RAM_species_to_SAUP.csv" to generate this list - most still unmatched. See "RAM_species_to_SAUP_notes.csv" for our reasoning behind these changes. 

setdiff(tmp$scientificname, RAM_species_to_SAUP$SAUP_species) ## these are the new species to add to "RAM_species_to_SAUP.csv".

ram_name_corr <- read.csv("int/RAM_species_to_SAUP.csv", stringsAsFactors = FALSE) %>%
   filter(!is.na(SAUP_species))  # SAUP to RAM name conversion

ram_name_corr # matched species, unfortunately only 12

Final formatting

Harmonize names between RAM and SAUP data.

# correct names in a few cases to match with SAUP names
ram_name_corr <- read.csv("int/RAM_species_to_SAUP.csv", stringsAsFactors = FALSE) %>%
  filter(!is.na(SAUP_species)) # SAUP to RAM name conversion

ram_spatial <- ram_spatial %>%
  left_join(ram_name_corr, by="RAM_species") %>%
  dplyr::mutate(species = ifelse(!is.na(SAUP_species), SAUP_species, RAM_species)) %>%
  dplyr::select(rgn_id, fao_id, stockid, stocklong, species, RAM_area_m2)

length(unique(ram_spatial$stockid)) # 487 RAM stocks with B/Bmsy data - v2021
length(unique(ram_spatial$species)) #237

Re-name stockid column to stockid_ram and create new column stockid that matches with the stockid column in the CMSY data table prepared in STEP3_calculate_bbmsy.Rmd.

## Combine RAM spatial data with B/Bmsy data
ram_bmsy_gf <- read.csv("int/ram_stock_bmsy_gf.csv")


# check every stock has a location:
setdiff(ram_bmsy_gf$stockid, ram_spatial$stockid) # should be 0: every ram stock should have ohi/fao rgn
setdiff(ram_spatial$stockid, ram_bmsy_gf$stockid) # these are stocks that were dropped due to insufficient years of data

ram_data <- ram_bmsy_gf %>% 
  left_join(ram_spatial, by="stockid") %>%
  rename(stockid_ram = stockid) %>% 
  dplyr::mutate(stockid = paste(species, fao_id, sep="-")) %>%
  dplyr::mutate(stockid = gsub(" ", "_", stockid)) %>%
  dplyr::select(rgn_id, stockid, stockid_ram, stocklong, year, RAM_area_m2, ram_bmsy, gapfilled, method) %>%
  unique() %>% 
  filter(!is.na(rgn_id))

write.csv(ram_data, "int/ram_bmsy.csv", row.names=FALSE)
summary(ram_data)


ram_2020 <- read_csv(file.path("../v2020/int/ram_bmsy.csv"))

OHI 2021: Food Provision/Fisheries, Preparing RAM B/Bmsy data

Compiled on Fri Nov 12 14:04:14 2021 by sgclawson