Summary

This script prepares the RAM B/Bmsy data: 1. Relevant data are collected from the RAM database 2. Missing years are gapfilled when appropriate 3. RAM and Watson species names are harmonized in a few cases 4. RAM stocks are associated with the corresponding OHI and FAO regions

Updates from previous assessment

This year we have additional stocks without spatial information from Christopher Free (2017). We manually assigned ohi and fao region id information to the additional stocks in fao_ohi_rgns.Rmd using best available information on stock distribution and saved the file in int/RAM_fao_ohi_rgns.csv. Watson species names should be similar if not the same as the SAUP species names used prior the 2018 assessment. We are also using management target values for biomass values, which add more RAM species to our analysis.

Data

B/Bmsy values from stock assessments

Reference: RAM Legacy Stock Assessment Database v4.491

Downloaded: 06/10/2020
Description: B/Bmsy value by stock and year (other data, which we do not use, are also available in the database)
Native data resolution: stock (fish stock, species and region specific)
Time range: 1950 - 2016
Format: R data files (.rds)
DOI: 10.5281/zenodo.2542919

Stock range data

Reference: Christopher M. Free. 2017. Mapping fish stock boundaries for the original Ram Myers stock-recruit database. https://marine.rutgers.edu/~cfree/mapping-fish-stock-boundaries-for-the-original-ram-myers-stock-recruit-database/. downloaded 9/25/2017.

Downloaded: 08/20/2018
Description: Shapefiles for each stock describing their distribution
Native data resolution: Spatial shapefiles
Format: Shapefiles

Setup

knitr::opts_chunk$set(fig.width = 6, fig.height = 4, fig.path = 'figs/',message = FALSE, warning = FALSE, echo = TRUE, eval=FALSE)

## Libraries
library(dplyr)
library(tidyr)
library(readr)
library(sf)
library(ggplot2)
library(here) 

## highlight out when knitting
setwd(here::here("globalprep/fis/v2020"))
source('../../../workflow/R/common.R')

## Paths for data
path_raw_data = file.path(dir_M, "git-annex/globalprep/fis/v2020/int/annual_catch")

Obtain RAM B/Bmsy data

The data is stored as a relational database in an R object. Check that the names of each element have not changed from last year! Update as appropriate in the below list.

The following tables are included (for full list, see loadDBdata.r in mazu):

timeseries
The time series data is a data frame containing all assessments conducted per stock with the following headers/columns:

assessid (2) stockid (3) stocklong (4) tsid (5) tsyear (6) tsvalue

bioparams
The time series data is a data frame with parameter values for all stocks and assessments. It has the following headers/columns:

assessid (2) stockid (3) stocklong (4) bioid (5) biovalue (6) bioyear (7) bionotes

timeseries_values_views
This stores the timeseries values, using the most recent assessment available, with timeseries type. The dataframe contains the following headers/columns: stockid, stocklong, year, TBbest, TCbest, ERbest, BdivBmsypref, UdivUmsypref, BdivBmgtpref, UdivUmgtpref, TB, SSB, TN, R, TC, TL, RecC, F, ER, TBdivTBmsy, SSBdivSSBmsy, NdivNmsy, FdivFmsy, ERdivERmsy, CdivMSY, CdivMEANC, TBdivTBmgt, SSBdivSSBmgt, NdivNmgt, FdivFmgt, ERdivERmgt, Cpair, TAC, Cadvised, survB, CPUE, EFFORT, and stocks along the rows.
timeseries_units_views
This stores the timeseries units (or time series source for touse time series), with timeseries type. The dataframe contains the following headers/columns: stockid, stocklong, TBbest, TCbest, ERbest, BdivBmgtpref, UdivUmsypref, BdivBmgtpret, UdivUmgtpref, TB, SSB, TN, R, TC, TL, RecC, F, ER, TBdivTBmsy, SSBdivSSBmsy, NdivNmsy, FdivFmsy, ERdivERmsy, CdivMSY, CdivMEANC, TBdivTBmgt, SSBdivSSBmgt, NdivNmsy, FdivFmgt, ERdivERmgt, Cpair, TAC, Cadvised, survB, CPUE, EFFORT, and stocks along the rows
timeseries_id_views
This stores the timeseries ids with timeseries id along the columns. The dataframe contains the following headers/columns: stockid, stocklong, TBbest, TCbest, ERbest, BdivBmsypref, UdivUmsypref, BdivBmgtpref, UdivUmgtpref, TB, SSB, TN, R, TC, TL, RecC, F, ER, TBdivTBmsy, SSBdivSSBmsy, NdivNmsy, FdivFmsy, ERdivERmsy, CdivMSY, CdivMEANC, TBdivTBmgt, SSBdivSSBmgt, NdivNmgt, FdivFmgt, ERdivERmgt, Cpair, TAC, Cadvised, survB, CPUE, EFFORT, and stocks along the rows.
bioparams_values_views
This stores the bioparams values, with bioparam type along the columns (TBmsybest, ERmsybest, MSYbest, TBmgtbest, ERmgtbest, TBmsy, SSBmsy, Nmsy, MSY, Fmsy, ERmsy, TBmgt, SSBmgt, Fmgt, ERmgt, TB0, SSB0, M, TBlim, SSBlim, Flim, ERlim) and stocks along the rows.
bioparams_units_views
This stores the bioparams units, with bioparam type along the columns (TBmsybest, ERmsybest, MSYbest, TBmgtbest, ERmgtbest, TBmsy, SSBmsy, Nmsy, MSY, Fmsy, ERmsy, TBmgt, SSBmgt, Fmgt, ERmgt, TB0, SSB0, M, TBlim, SSBlim, Flim, ERlim) and stocks along the rows.
bioparams_ids_views
This stores the bioparams ids, with bioparam id along the columns (TBmsybest, ERmsybest, MSYbest, TBmgtbest, ERmgtbest, TBmsy, SSBmsy, Nmsy, MSY, Fmsy, ERmsy, TBmgt, SSBmgt, Fmgt, ERmgt, TB0, SSB0, M, TBlim, SSBlim, Flim, ERlim) and stocks along the rows.
metadata
This stores assorted metadata associated with the stock, with datatypes along the columns (assessid, stockid, stocklong, assessyear, scientificname, commonname, areaname, managementauthority, assessorfull, region, FisheryType, taxGroup, primary_FAOarea, primary_country) and stock by row.
tsmetrics Contains metadata, with columns tscategory, tsshort, tslong, tsunitsshort, tsunitslong, tsunique.

For this data prep we primarily use and consult timeseries_values_views, tsmetrics, and metadata

load(file.path(dir_M, "git-annex/globalprep/_raw_data/RAM/d2020/RAMLDB v4.491/DB Files With Assessment Data/R Data/DBdata[asmt][v4.491].RData"))

ram_bmsy_new <- timeseries_values_views %>%
  dplyr::select(stockid, stocklong, year, TBdivTBmsy, SSBdivSSBmsy, TBdivTBmgt, SSBdivSSBmgt) %>%
  mutate(ram_bmsy = 
           ifelse(!is.na(TBdivTBmsy), TBdivTBmsy, SSBdivSSBmsy)) %>%
  mutate(ram_bmsy =
           ifelse(is.na(TBdivTBmsy) & is.na(SSBdivSSBmsy), TBdivTBmgt, ram_bmsy)) %>%
  mutate(ram_bmsy = 
           ifelse(is.na(TBdivTBmsy) & is.na(SSBdivSSBmsy) & is.na(TBdivTBmgt), SSBdivSSBmgt, ram_bmsy)) %>%
  dplyr::filter(year > 1979) %>%
  filter(!is.na(ram_bmsy)) %>%
  dplyr::select(stockid, stocklong, year, ram_bmsy)

Gapfill RAM data when there are missing years

For each stock: - Missing years are gapfilled using a linear regression model that includes data from 2001 to 2015 (2015 is the final year of Watson data). To be included in the gapfilling, there have to be 5 or more years of B/Bmsy data occuring over the last 11 years of data, from 2005 to 2015. - We convert any predicted RAM B/Bmsy value less than the minimum observed B/Bmsy value to that the minimum observed value, as there are some negative predicted values.

Summary: - There are 398 RAM stocks with at least 5 years of B/Bmsy data from 2005 to 2015. - 314 of these stocks have at least 1 year of gapfilled data.
- A few of the predicted B/Bmsy values go below zero. We convert anything with a RAM B/Bmsy value < 0.0026 to 0.0026, which is the minimum observed B/Bmsy value in the data.

## gap fill ram_bmsy
## based on this it seems reasonable to gap-fill missing values

ram_gf_check <- ram_bmsy_new %>%
   filter(year >= 2001) %>%
  spread(year, ram_bmsy) 

# identify stocks for gapfilling (those with 5 or more years of data since 2005).
# NOTE: we potentially gapfill to 2001, but we want stocks with adequate *recent* data 
ram_bmsy_gf <- ram_bmsy_new %>%
  filter(year >= 2001 & year <= 2017) %>%   # 2017 corresponds to the final year of Watson catch data
  group_by(stockid) %>%
  mutate(years_data_2005_now = length(ram_bmsy[year >= 2005])) %>%
  mutate(years_data_2001_now = length(ram_bmsy[year >= 2001])) %>%
  ungroup() %>%
  filter(years_data_2005_now >= 5)


## Get rows for stocks/years with no B/Bmsy (identified as NA B/Bmsy value for now)
ram_bmsy_gf <- ram_bmsy_gf %>%
  spread(year, ram_bmsy) %>% 
  gather("year", "ram_bmsy", -stockid, -years_data_2005_now, -years_data_2001_now, - stocklong) %>%
  mutate(year = as.numeric(year)) 


## gapfilling record keeping
ram_bmsy_gf <- ram_bmsy_gf %>%   
  mutate(gapfilled = NA) %>%
  mutate(gapfilled = ifelse(years_data_2001_now == 17, gapfilled, 
                            paste(17 - years_data_2001_now, "years gf", sep = " ")))

## see unique values of stocks
tmp <- ram_bmsy_gf %>%
  dplyr::select(stockid, gapfilled) %>%
  unique()


## check out gapfilling stats
length(tmp$gapfilled)  # 397 stocks with at least 5 years of data in past 11 years - v2019
                      # 398 stocks with at least 5 years of data in past 11 years - v2020
sum(table(tmp$gapfilled))  # 222 stocks have at least one year of B/Bmsy values gapfilled - v2019
                          # 314 stocks have at least one year of B/Bmsy values gapfilled; this is because there is an additional year of data in RAM... 2016 - v2020


## regression model for prediction for each stock
ram_bmsy_gf <- ram_bmsy_gf %>%
  group_by(stockid) %>%
  do({
    mod <- lm(ram_bmsy ~ year, data=.)  
    ram_bmsy_predict <- predict(mod, newdata=.[c('year')])
    data.frame(., ram_bmsy_predict)
  }) %>%
  ungroup()

summary(ram_bmsy_gf) #1206 NAs for ram_bmsy
sum(ram_bmsy_gf$ram_bmsy_predict < 0 )  # 30 of the predicted B/Bmsy values go below zero.  
min(ram_bmsy_gf$ram_bmsy, na.rm = TRUE) #0.00263

## We convert anything with a RAM BBmsy value < 0.00263 to 0.00263, which is the minimum observed B/Bmsy value in the data; add method documentation
ram_bmsy_gf <- ram_bmsy_gf %>%
  mutate(ram_bmsy_predict = ifelse(ram_bmsy_predict < 0.00263, 0.00263, ram_bmsy_predict)) 

## gapfilling record keeping
ram_bmsy_gf_final <- ram_bmsy_gf %>%
  mutate(method = ifelse(is.na(ram_bmsy), paste0("lm, ", gapfilled), NA)) %>%
  mutate(gapfilled = ifelse(is.na(ram_bmsy), "1", "0")) %>%
  mutate(ram_bmsy = ifelse(is.na(ram_bmsy), ram_bmsy_predict, ram_bmsy)) %>%
  dplyr::select(stockid, year, ram_bmsy, gapfilled, method) 

write.csv(ram_bmsy_gf_final, "int/ram_stock_bmsy_gf.csv", row.names=FALSE)

Get a general idea of how well the model predicts missing data based on observed and model predicted values. This model appears to do fairly well.

plot(ram_bmsy_gf$ram_bmsy, ram_bmsy_gf$ram_bmsy_predict)
abline(0,1, col="red")

plot(log(ram_bmsy_gf$ram_bmsy), log(ram_bmsy_gf$ram_bmsy_predict))
abline(0,1, col="red")

mod <- lm(ram_bmsy ~ ram_bmsy_predict, data=ram_bmsy_gf)
summary(mod)

Identify FAO and OHI regions for RAM stocks

Identify the FAO/OHI regions where each RAM stock is located (fao and ohi regions are assigned to RAM Data in fao_ohi_rgns.Rmd.

If there are many differences between RAM spatial file and RAM metadata, check the fao_ohi_rgns.Rmd prep again.

## Read in RAM spatial stocks file
ram_spatial <- read.csv("int/RAM_fao_ohi_rgns_final.csv", stringsAsFactors = FALSE)

ram_meta <- metadata %>% 
  dplyr::select(stockid, stocklong, scientificname)

setdiff(ram_spatial$stockid, ram_meta$stockid) # make sure all the spatial data has corresponding metadata (should be 0). It is not 0, probably because these are ones that have been removed from the RAM database since the 2017 assessment... delete these from the data frame below. 

# join with metadata to get scientific name
ram_spatial <- ram_spatial %>%
  dplyr::select(-stocklong) %>%
  left_join(ram_meta, by = c("stockid")) %>%
  rename(RAM_species = scientificname) %>%
  filter(!is.na(stocklong)) ## filtering out ones that didn't match above

setdiff(ram_spatial$stockid, ram_meta$stockid) #now it is 0

Standardize species names

In most cases, the RAM and Watson data use the same species names, but there are a few exceptions. The following code identifies species in the RAM data that are not in the Watson data. In these cases, different species names may be used (although not necessarily because some of the species may be present in RAM, but not Watson, for other reasons). For these species, I used fishbase to explore synonyms and create a table to harmonize the RAM species names with the Watson species names (saved as: int/RAM_species_to_Watson.csv).

ram_bmsy_gf_final <- read_csv(file.path("int/ram_stock_bmsy_gf.csv"))

# get list of RAM species, scientific name
ram_sp <- ram_bmsy_gf_final %>%
  left_join(data.frame(metadata), by = "stockid") %>%
  dplyr::select(scientificname) %>%
  unique() %>%
  arrange(scientificname)


# Watson species, sci name (read in the datatable that includes TaxonKey)
wat_sp <- read_csv(file.path(dir_M,'git-annex/globalprep/fis/v2020/int/stock_catch_by_rgn_taxa.csv')) %>% 
  dplyr::rename(wat_scientificname = TaxonName) %>%
  dplyr::select(wat_scientificname) %>%
  unique() %>%
  arrange(wat_scientificname)

# compare names - what's in RAM that's not in Watson
tmp <- data.frame(scientificname = sort(setdiff(ram_sp$scientificname, wat_sp$wat_scientificname))) # 52 species names

# compare names - what's in watson that's not in RAM
tmp2 <- data.frame(scientificname = sort(setdiff(wat_sp$wat_scientificname, ram_sp$scientificname))) # 1210 species names

write.csv(tmp, "int/unmatched_RAM_species.csv", row.names=FALSE)
write.csv(tmp2, "int/Watson_species_no_RAM.csv", row.names=FALSE)

#setdiff(tmp, ram_name_corr$RAM_species)


## join ram spatial with RAM species on scientific name. We can use this to help check whether questionable species names across the ram and watson data match by region and fao id...
ram_sp_fao_ohi <- tmp %>%
  left_join(ram_spatial, by = c("scientificname" = "RAM_species")) %>%
  unique()

write_csv(ram_sp_fao_ohi, "int/new_ram_sp.csv")  

## get watson fao_ohi regions
wat_sp_fao_ohi <- read_csv(file.path(dir_M,'git-annex/globalprep/fis/v2020/int/stock_catch_by_rgn_taxa.csv')) %>% 
  dplyr::rename(wat_scientificname = TaxonName) %>%
   dplyr::filter(year > 1979)

RAM_species_to_Watson <- read.csv("int/RAM_species_to_Watson.csv", stringsAsFactors = FALSE)

# Then I hand-looked up each of the missing ones from "unmatched_RAM_species.csv", and added those new ones to "RAM_species_to_Watson.csv" to generate this list - most still unmatched. See "RAM_species_to_watson_notes.csv" for our reasoning behind these changes. 

setdiff(RAM_species_to_Watson$RAM_species,tmp$scientificname) ## these are the new species to add to "RAM_species_to_Watson.csv". For 2020 there is only one new one, just a new spelling of Lepidorgombus spp

ram_name_corr <- read.csv("int/RAM_species_to_Watson.csv", stringsAsFactors = FALSE) %>%
   filter(!is.na(Watson_species))  # Watson to RAM name conversion

ram_name_corr # matched species, only 16

Final formatting

Harmonize names between RAM and Watson data.

# correct names in a few cases to match with Watson names
ram_name_corr <- read.csv("int/RAM_species_to_Watson.csv", stringsAsFactors = FALSE) %>%
  filter(!is.na(Watson_species))  # Watson to RAM name conversion

ram_spatial <- ram_spatial %>%
  left_join(ram_name_corr, by="RAM_species") %>%
  dplyr::mutate(species = ifelse(!is.na(Watson_species), Watson_species, RAM_species)) %>%
  dplyr::select(rgn_id, fao_id, stockid, stocklong, species, RAM_area_m2)

length(unique(ram_spatial$stockid)) # 443 RAM stocks with B/Bmsy data - v2020
length(unique(ram_spatial$species)) #217

Re-name stockid column to stockid_ram and create new column stockid that matches with the stockid column in the CMSY data table prepared in calculate_bbmsy.Rmd.

## Combine RAM spatial data with B/Bmsy data
ram_bmsy_gf <- read.csv("int/ram_stock_bmsy_gf.csv")


# check every stock has a location:
setdiff(ram_bmsy_gf$stockid, ram_spatial$stockid) # should be 0: every ram stock should have ohi/fao rgn
setdiff(ram_spatial$stockid, ram_bmsy_gf$stockid) # these are stocks that were dropped due to insufficient years of data

ram_data <- ram_bmsy_gf %>% 
  left_join(ram_spatial, by="stockid") %>%
  rename(stockid_ram = stockid) %>% 
  dplyr::mutate(stockid = paste(species, fao_id, sep="-")) %>%
  dplyr::mutate(stockid = gsub(" ", "_", stockid)) %>%
  dplyr::select(rgn_id, stockid, stockid_ram, stocklong, year, RAM_area_m2, ram_bmsy, gapfilled, method) %>%
  unique()

write.csv(ram_data, "int/ram_bmsy.csv", row.names=FALSE)


ram_2019 <- read_csv(file.path("../v2019/int/ram_bmsy.csv"))

OHI 2020: Food Provision/Fisheries, Preparing RAM B/Bmsy data

Compiled on Fri Jun 26 12:59:02 2020 by sgclawson