This script prepares the RAM B/Bmsy data: 1. Relevant data are collected from the RAM database 2. Missing years are gapfilled when appropriate 3. RAM and SAUP species names are harmonized in a few cases 4. RAM stocks are associated with the corresponding OHI and FAO regions
Using new SAUP catch data
Reference: RAM Legacy Stock Assessment Database v4.495
Reference: Christopher M. Free. 2017. Mapping fish stock boundaries for the original Ram Myers stock-recruit database. https://marine.rutgers.edu/~cfree/mapping-fish-stock-boundaries-for-the-original-ram-myers-stock-recruit-database/. downloaded 9/25/2017.
::opts_chunk$set(fig.width = 6, fig.height = 4, fig.path = 'figs/',message = FALSE, warning = FALSE, echo = TRUE, eval=FALSE) knitr
## Libraries
library(dplyr)
library(tidyr)
library(readr)
library(sf)
library(ggplot2)
library(here)
## highlight out when knitting
setwd(here::here("globalprep/fis/v2021"))
source('../../../workflow/R/common.R')
The data is stored as a relational database in an R object. Check that the names of each element have not changed from last year! Update as appropriate in the below list.
The following tables are included (for full list, see loadDBdata.r in mazu):
timeseries_values_views
This stores the timeseries values, using the most recent assessment available, with timeseries type. The dataframe contains the following headers/columns: stockid, stocklong, year, TBbest, TCbest, ERbest, BdivBmsypref, UdivUmsypref, BdivBmgtpref, UdivUmgtpref, TB, SSB, TN, R, TC, TL, RecC, F, ER, TBdivTBmsy, SSBdivSSBmsy, NdivNmsy, FdivFmsy, ERdivERmsy, CdivMSY, CdivMEANC, TBdivTBmgt, SSBdivSSBmgt, NdivNmgt, FdivFmgt, ERdivERmgt, Cpair, TAC, Cadvised, survB, CPUE, EFFORT, and stocks along the rows.
timeseries_units_views
This stores the timeseries units (or time series source for touse time series), with timeseries type. The dataframe contains the following headers/columns: stockid, stocklong, TBbest, TCbest, ERbest, BdivBmgtpref, UdivUmsypref, BdivBmgtpret, UdivUmgtpref, TB, SSB, TN, R, TC, TL, RecC, F, ER, TBdivTBmsy, SSBdivSSBmsy, NdivNmsy, FdivFmsy, ERdivERmsy, CdivMSY, CdivMEANC, TBdivTBmgt, SSBdivSSBmgt, NdivNmsy, FdivFmgt, ERdivERmgt, Cpair, TAC, Cadvised, survB, CPUE, EFFORT, and stocks along the rows
timeseries_id_views
This stores the timeseries ids with timeseries id along the columns. The dataframe contains the following headers/columns: stockid, stocklong, TBbest, TCbest, ERbest, BdivBmsypref, UdivUmsypref, BdivBmgtpref, UdivUmgtpref, TB, SSB, TN, R, TC, TL, RecC, F, ER, TBdivTBmsy, SSBdivSSBmsy, NdivNmsy, FdivFmsy, ERdivERmsy, CdivMSY, CdivMEANC, TBdivTBmgt, SSBdivSSBmgt, NdivNmgt, FdivFmgt, ERdivERmgt, Cpair, TAC, Cadvised, survB, CPUE, EFFORT, and stocks along the rows.
bioparams_values_views
This stores the bioparams values, with bioparam type along the columns (TBmsybest, ERmsybest, MSYbest, TBmgtbest, ERmgtbest, TBmsy, SSBmsy, Nmsy, MSY, Fmsy, ERmsy, TBmgt, SSBmgt, Fmgt, ERmgt, TB0, SSB0, M, TBlim, SSBlim, Flim, ERlim) and stocks along the rows.
bioparams_units_views
This stores the bioparams units, with bioparam type along the columns (TBmsybest, ERmsybest, MSYbest, TBmgtbest, ERmgtbest, TBmsy, SSBmsy, Nmsy, MSY, Fmsy, ERmsy, TBmgt, SSBmgt, Fmgt, ERmgt, TB0, SSB0, M, TBlim, SSBlim, Flim, ERlim) and stocks along the rows.
bioparams_ids_views
This stores the bioparams ids, with bioparam id along the columns (TBmsybest, ERmsybest, MSYbest, TBmgtbest, ERmgtbest, TBmsy, SSBmsy, Nmsy, MSY, Fmsy, ERmsy, TBmgt, SSBmgt, Fmgt, ERmgt, TB0, SSB0, M, TBlim, SSBlim, Flim, ERlim) and stocks along the rows.
metadata
This stores assorted metadata associated with the stock, with datatypes along the columns (assessid, stockid, stocklong, assessyear, scientificname, commonname, areaname, managementauthority, assessorfull, region, FisheryType, taxGroup, primary_FAOarea, primary_country) and stock by row.
tsmetrics Contains metadata, with columns tscategory, tsshort, tslong, tsunitsshort, tsunitslong, tsunique.
For this data prep we primarily use and consult timeseries_values_views
, tsmetrics
, and metadata
load(file.path(dir_M, "git-annex/globalprep/_raw_data/RAM/d2021/RAMLDB v4.495/R Data/DBdata[asmt][v4.495].RData"))
<- timeseries_values_views %>%
ram_bmsy_new ::select(stockid, stocklong, year, TBdivTBmsy, SSBdivSSBmsy, TBdivTBmgt, SSBdivSSBmgt) %>%
dplyrmutate(ram_bmsy =
ifelse(!is.na(TBdivTBmsy), TBdivTBmsy, SSBdivSSBmsy)) %>%
mutate(ram_bmsy =
ifelse(is.na(TBdivTBmsy) & is.na(SSBdivSSBmsy), TBdivTBmgt, ram_bmsy)) %>%
mutate(ram_bmsy =
ifelse(is.na(TBdivTBmsy) & is.na(SSBdivSSBmsy) & is.na(TBdivTBmgt), SSBdivSSBmgt, ram_bmsy)) %>%
::filter(year > 1979) %>%
dplyrfilter(!is.na(ram_bmsy)) %>%
::select(stockid, stocklong, year, ram_bmsy) dplyr
For each stock: - Missing years are gapfilled using a linear regression model that includes data from 2001 to 2018 (2018 is the final year of SAUP data). To be included in the gapfilling, there have to be 5 or more years of B/Bmsy data occuring over the last 11 years of data, from 2005 to 2015. - We convert any predicted RAM B/Bmsy value less than the minimum observed B/Bmsy value to that the minimum observed value, as there are some negative predicted values.
Summary: - There are 432 RAM stocks with at least 5 years of B/Bmsy data from 2005 to 2015. - 289 of these stocks have at least 1 year of gapfilled data.
- A few of the predicted B/Bmsy values go below zero. We convert anything with a RAM B/Bmsy value < 0.000001475954 to 0.000001475954, which is the minimum observed B/Bmsy value in the data.
## gap fill ram_bmsy
## based on this it seems reasonable to gap-fill missing values
<- ram_bmsy_new %>%
ram_gf_check filter(year >= 2001) %>%
spread(year, ram_bmsy)
# identify stocks for gapfilling (those with 5 or more years of data since 2005).
# NOTE: we potentially gapfill to 2001, but we want stocks with adequate *recent* data
<- ram_bmsy_new %>%
ram_bmsy_gf filter(year >= 2001 & year <= 2018) %>% # 2017 corresponds to the final year of SAUP catch data
group_by(stockid) %>%
mutate(years_data_2005_now = length(ram_bmsy[year >= 2005])) %>%
mutate(years_data_2001_now = length(ram_bmsy[year >= 2001])) %>%
ungroup() %>%
filter(years_data_2005_now >= 5)
## Get rows for stocks/years with no B/Bmsy (identified as NA B/Bmsy value for now)
<- ram_bmsy_gf %>%
ram_bmsy_gf spread(year, ram_bmsy) %>%
gather("year", "ram_bmsy", -stockid, -years_data_2005_now, -years_data_2001_now, - stocklong) %>%
mutate(year = as.numeric(year))
## gapfilling record keeping
<- ram_bmsy_gf %>%
ram_bmsy_gf mutate(gapfilled = NA) %>%
mutate(gapfilled = ifelse(years_data_2001_now == 18, gapfilled,
paste(18 - years_data_2001_now, "years gf", sep = " ")))
## see unique values of stocks
<- ram_bmsy_gf %>%
tmp ::select(stockid, gapfilled) %>%
dplyrunique()
## check out gapfilling stats
length(tmp$gapfilled) # 397 stocks with at least 5 years of data in past 11 years - v2019
# 398 stocks with at least 5 years of data in past 11 years - v2020
# 432 stocks with at least 5 years of data in past 11 years - v2021
sum(table(tmp$gapfilled)) # 222 stocks have at least one year of B/Bmsy values gapfilled - v2019
# 314 stocks have at least one year of B/Bmsy values gapfilled; this is because there is an additional year of data in RAM... 2016 - v2020
# 289 stocks have at least one year of B/Bmsy values gapfilled; I assume this loss is because version 4.495 is more refined than the previous version - v2021
## regression model for prediction for each stock
<- ram_bmsy_gf %>%
ram_bmsy_gf group_by(stockid) %>%
do({
<- lm(ram_bmsy ~ year, data=.)
mod <- predict(mod, newdata=.[c('year')])
ram_bmsy_predict data.frame(., ram_bmsy_predict)
%>%
}) ungroup()
summary(ram_bmsy_gf) #1235 NAs for ram_bmsy
sum(ram_bmsy_gf$ram_bmsy_predict < 0 ) # 50 of the predicted B/Bmsy values go below zero.
min(ram_bmsy_gf$ram_bmsy, na.rm = TRUE) #0.000001475954
## We convert anything with a RAM BBmsy value < 0.000001475954 to 0.000001475954, which is the minimum observed B/Bmsy value in the data; add method documentation
<- ram_bmsy_gf %>%
ram_bmsy_gf mutate(ram_bmsy_predict = ifelse(ram_bmsy_predict < 0.000001475954, 0.000001475954, ram_bmsy_predict))
## gapfilling record keeping
<- ram_bmsy_gf %>%
ram_bmsy_gf_final mutate(method = ifelse(is.na(ram_bmsy), paste0("lm, ", gapfilled), NA)) %>%
mutate(gapfilled = ifelse(is.na(ram_bmsy), "1", "0")) %>%
mutate(ram_bmsy = ifelse(is.na(ram_bmsy), ram_bmsy_predict, ram_bmsy)) %>%
::select(stockid, year, ram_bmsy, gapfilled, method)
dplyr
write.csv(ram_bmsy_gf_final, "int/ram_stock_bmsy_gf.csv", row.names=FALSE)
Get a general idea of how well the model predicts missing data based on observed and model predicted values. This model appears to do fairly well.
plot(ram_bmsy_gf$ram_bmsy, ram_bmsy_gf$ram_bmsy_predict)
abline(0,1, col="red")
plot(log(ram_bmsy_gf$ram_bmsy), log(ram_bmsy_gf$ram_bmsy_predict))
abline(0,1, col="red")
<- lm(ram_bmsy ~ ram_bmsy_predict, data=ram_bmsy_gf)
mod summary(mod)
Identify the FAO/OHI regions where each RAM stock is located (fao and ohi regions are assigned to RAM Data in STEP4b_fao_ohi_rgns.Rmd. Run fao_ohi_rgns.Rmd now.
If there are many differences between RAM spatial file and RAM metadata, check the fao_ohi_rgns.Rmd prep again.
## Read in RAM spatial stocks file
<- read.csv("int/RAM_fao_ohi_rgns_final.csv", stringsAsFactors = FALSE)
ram_spatial
<- metadata %>%
ram_meta ::select(stockid, stocklong, scientificname)
dplyr
setdiff(ram_spatial$stockid, ram_meta$stockid) # make sure all the spatial data has corresponding metadata (should be 0). It is not 0, probably because these are ones that have been removed from the RAM database since the 2017 assessment... delete these from the data frame below.
# join with metadata to get scientific name
<- ram_spatial %>%
ram_spatial ::select(-stocklong) %>%
dplyrleft_join(ram_meta, by = c("stockid")) %>%
rename(RAM_species = scientificname) %>%
filter(!is.na(stocklong)) ## filtering out ones that didn't match above
setdiff(ram_spatial$stockid, ram_meta$stockid) # now it is 0
In most cases, the RAM and SAUP data use the same species names, but there are a few exceptions. The following code identifies species in the RAM data that are not in the SAUP data. In these cases, different species names may be used (although not necessarily because some of the species may be present in RAM, but not SAUP, for other reasons). For these species, I used fishbase to explore synonyms and create a table to harmonize the RAM species names with the SAUP species names (saved as: int/RAM_species_to_SAUP.csv).
<- read_csv(file.path("int/ram_stock_bmsy_gf.csv"))
ram_bmsy_gf_final
# get list of RAM species, scientific name
<- ram_bmsy_gf_final %>%
ram_sp left_join(data.frame(metadata), by = "stockid") %>%
::select(scientificname) %>%
dplyrunique() %>%
arrange(scientificname)
# SAUP species, sci name (read in the datatable that includes TaxonKey)
<- read_csv(file.path(dir_M,'git-annex/globalprep/fis/v2021/int/stock_catch_by_rgn_taxa.csv')) %>%
SAUP_sp ::rename(SAUP_scientificname = TaxonName) %>%
dplyr::select(SAUP_scientificname) %>%
dplyrunique() %>%
arrange(SAUP_scientificname)
# compare names - what's in RAM that's not in SAUP
<- data.frame(scientificname = sort(setdiff(ram_sp$scientificname, SAUP_sp$SAUP_scientificname))) # 47 species names
tmp
# compare names - what's in SAUP that's not in RAM
<- data.frame(scientificname = sort(setdiff(SAUP_sp$SAUP_scientificname, ram_sp$scientificname))) # 2070 species names.. unfortunately a lot.
tmp2
write.csv(tmp, "int/unmatched_RAM_species.csv", row.names=FALSE)
write.csv(tmp2, "int/SAUP_species_no_RAM.csv", row.names=FALSE)
#setdiff(tmp, ram_name_corr$RAM_species)
## join ram spatial with RAM species on scientific name. We can use this to help check whether questionable species names across the ram and SAUP data match by region and fao id...
<- tmp %>%
ram_sp_fao_ohi left_join(ram_spatial, by = c("scientificname" = "RAM_species")) %>%
unique()
write_csv(ram_sp_fao_ohi, "int/new_ram_sp.csv")
## get SAUP fao_ohi regions
<- read_csv(file.path(dir_M,'git-annex/globalprep/fis/v2021/int/stock_catch_by_rgn_taxa.csv')) %>%
SAUP_sp_fao_ohi ::rename(SAUP_scientificname = TaxonName) %>%
dplyr::filter(year > 1979) # %>%
dplyr#filter(str_detect(SAUP_scientificname, "Lepidorhombus"))
<- read.csv("int/RAM_species_to_SAUP.csv", stringsAsFactors = FALSE)
RAM_species_to_SAUP
# Then I hand-looked up each of the missing ones from "unmatched_RAM_species.csv", and added those new ones to "RAM_species_to_SAUP.csv" to generate this list - most still unmatched. See "RAM_species_to_SAUP_notes.csv" for our reasoning behind these changes.
setdiff(tmp$scientificname, RAM_species_to_SAUP$SAUP_species) ## these are the new species to add to "RAM_species_to_SAUP.csv".
<- read.csv("int/RAM_species_to_SAUP.csv", stringsAsFactors = FALSE) %>%
ram_name_corr filter(!is.na(SAUP_species)) # SAUP to RAM name conversion
# matched species, unfortunately only 12 ram_name_corr
Harmonize names between RAM and SAUP data.
# correct names in a few cases to match with SAUP names
<- read.csv("int/RAM_species_to_SAUP.csv", stringsAsFactors = FALSE) %>%
ram_name_corr filter(!is.na(SAUP_species)) # SAUP to RAM name conversion
<- ram_spatial %>%
ram_spatial left_join(ram_name_corr, by="RAM_species") %>%
::mutate(species = ifelse(!is.na(SAUP_species), SAUP_species, RAM_species)) %>%
dplyr::select(rgn_id, fao_id, stockid, stocklong, species, RAM_area_m2)
dplyr
length(unique(ram_spatial$stockid)) # 487 RAM stocks with B/Bmsy data - v2021
length(unique(ram_spatial$species)) #237
Re-name stockid
column to stockid_ram
and create new column stockid
that matches with the stockid
column in the CMSY data table prepared in STEP3_calculate_bbmsy.Rmd.
## Combine RAM spatial data with B/Bmsy data
<- read.csv("int/ram_stock_bmsy_gf.csv")
ram_bmsy_gf
# check every stock has a location:
setdiff(ram_bmsy_gf$stockid, ram_spatial$stockid) # should be 0: every ram stock should have ohi/fao rgn
setdiff(ram_spatial$stockid, ram_bmsy_gf$stockid) # these are stocks that were dropped due to insufficient years of data
<- ram_bmsy_gf %>%
ram_data left_join(ram_spatial, by="stockid") %>%
rename(stockid_ram = stockid) %>%
::mutate(stockid = paste(species, fao_id, sep="-")) %>%
dplyr::mutate(stockid = gsub(" ", "_", stockid)) %>%
dplyr::select(rgn_id, stockid, stockid_ram, stocklong, year, RAM_area_m2, ram_bmsy, gapfilled, method) %>%
dplyrunique() %>%
filter(!is.na(rgn_id))
write.csv(ram_data, "int/ram_bmsy.csv", row.names=FALSE)
summary(ram_data)
<- read_csv(file.path("../v2020/int/ram_bmsy.csv")) ram_2020