[REFERENCE RMD FILE: http://ohi-science.org/ohiprep_v2021/globalprep/np/v2021/STEP2_np_weighting_prep.html]

Summary

This analysis converts FAO commodities tonnes data for ornamental fish trade, SAUP fisheries tonnes data for FOFM, and FAO mariculture tonnes data for seaweeds into a weighting scheme per each ohi region, based off of 5 year averages of tonnes and values (in usd) for each product. This weighting scheme will be applied to the scores to determine how much of each natural products layer affects the overall NP score when updating in ohi global.

Updates from previous year

updated syntax to follow tidyverse style when reasonable; updated functions read.csv() and write.csv() to read_csv() and write_csv()
updated file paths to be more reproducible, used here() package and updated syntax to reduce system-dependent file paths
updated full_join(by = character()) superseded method to cross_join() when creating rgn_id x products x years dataframe
added optional code to write out tonnes objects created in weight calculation prep (fofm_tonnes.csv, sw_tonnes.csv, and orn_tonnes.csv) to the int folder
added extensive datacheck section, plotting

Data Source

FAO: Commodities

Reference:
https://www.fao.org/fishery/en/statistics/software/fishstatj App release date: March 2024

FAO raw commodities quantity 1976_2021 FAO raw commodities value 1976_2022 FAO metadata

Global aquatic trade - All partners aggregated - Quantities and Values - 1976-2022 (Release date: July 2024)

Downloaded: July, 26, 2024

Description: Quantity (tonnes) and value (USD) of raw commodities (Exports only) for each country, taxa, year. The FAO data is subset to include commodities in these categories: ornamental fish, fish oil, seaweed and plants (see: raw/commodities2products_weighting.csv for details).

Time range: 1976-2022

Seafood Watch sustainability data

Reference: https://www.seafoodwatch.org/globalassets/sfw/pdf/whats-new/seafood-watch-complete-recommendation-list.pdf Release date: March 4, 2023

Downloaded: May 31, 2023

Description: Monterrey Bay Aquarium Seafood Watch aquaculture recommendations. Sustainability scored from 0-10. Rescaled to 0-1. There is only one value for seaweeds in the data… 0.67

Sea Around Us Concepts

Reference: Pauly D., Zeller D., Palomares M.L.D. (Editors), 2020. Sea Around Us Concepts, Design and Data (seaaroundus.org).

Downloaded: September 27, 2022

Description: Tons per year and SAUP region with information on sector type, industry type, fishing entitity, reporting status and taxonomic information.

Time range: 1950 - 2019

Format: CSV

Additional Information: Methods

Avoiding the ecological limits of forage fish for fed aquaculture

Reference: Froehlich, H.E., Jacobsen, N.S., Essington, T.E., Clavelle, T., and Halpern, B.S. (2018). Avoiding the ecological limits of forage fish for fed aquaculture. Nature Sustainability 1, 298.

Downloaded: July 7, 2020. Obtained from Melanie Frazier (NCEAS).

Description: List of FOFM species from Watson v3 data.

Native data resolution:

Format: CSV format

RAM

RAM Legacy Data

Reference: RAM Legacy Stock Assessment Database v4.65, 06/17/2024

Downloaded: 08/07/2024

Description: B/Bmsy value by stock and year (other data, which we do not use, are also available in the database)

Native data resolution: stock (fish stock, species and region specific)

Time range: 1800 - 2023

Format: CSV format

Additional Information: We use the finalized b/bmsy layer from OHI-global for this data prep. We do not actually read in the raw RAM data here.

FAO: Production

Reference:
https://www.fao.org/fishery/statistics-query/en/aquaculture/aquaculture_quantity FAO Global Aquaculture Production Quantity 1950_2022 FAO metadata found here

Downloaded: July 26th, 2024

Last updated: March 29th, 2024

Description: Quantity (tonnes) of mariculture for each country, species, year.

Time range: 1950-2022

Setup

library(formatR) # for wrapping code when rendered

knitr::opts_chunk$set(eval=FALSE, tidy.opts = list(width.cutoff = 80), tidy = TRUE)

# Read in packages ----
library(here)
library(tidyverse)

# Set scenario year, reproducible file paths ----
scen_year_number <- 2024 # update this!!
scenario_year <- as.character(scen_year_number)
v_scen_year <- paste0("v", scenario_year)
# define file paths to simplify read and write operations
current_np_dir <- here::here("globalprep", "np", v_scen_year)
previous_np_dir <- here::here("globalprep", "np", paste0("v", scen_year_number - 1))

# Source scripts/functions ----
# directory locations
source(here::here("workflow", "R", "common.R"))

## This file makes it easier to process data for the OHI global assessment
##  by creating the following objects:
## 
##  * dir_M = identifies correct file path to Mazu (internal server) based on your operating system
##  * mollCRS = the crs code for the mollweide coordinate reference system we use in the global assessment
##  * regions_shape() = function to load global shapefile for land/eez/high seas/antarctica regions
##  * ohi_rasters() = function to load two rasters: global eez regions and ocean region
##  * region_data() = function to load 2 dataframes describing global regions 
##  * rgn_syns() = function to load dataframe of region synonyms (used to convert country names to OHI regions)
##  * low_pop() = function to load dataframe of regions with low and no human population
##  * UNgeorgn = function to load dataframe of UN geopolitical designations used to gapfill missing data

# function for handling FAO commodity data specific to NP
source(here::here(current_np_dir, "R", "np_fxn.R")) 

# Archived setup:
#prep_year <- 2023
#prep <- file.path(here(paste0("globalprep/np/v", prep_year, "/")))

Methods

Import prepped FAO commodities data set and find 5 year average $USD / Tonne average value per product per region. Gapfill these based on the UN geopolitical region/global values.

# Read in the prepped FAO commodities data set (created on line 453 in step 1a)
harvest_tonnes_usd <- read_csv(here(current_np_dir, "int", "np_harvest_tonnes_usd.csv")) %>% 
  filter(rgn_id != 213) # filter out Antarctica

# Make a data frame with every region and every product ----
# Define years
max_year <- max(harvest_tonnes_usd$year)
min_year <- max_year - 4 
years <- c(min_year:max_year) # get 5 year range
years_df <- data.frame(year = min_year:max_year) # so that we can get 5 year average

# Load OHI region data from common.R
region_data() # gets rgns_eez

# Define subset of OHI EEZ regions
rgns_eez <- rgns_eez %>%
  dplyr::select(rgn_id) %>%
  filter(rgn_id != 213) # filter out Antarctica

# Define products dataframe
products <- data.frame(product = c("seaweeds", "ornamentals", "fish_oil"))

# Create full rgn x product x years dataframe
region_product <- rgns_eez %>% 
  # select rgn_id from rgns_eez
  dplyr::select(rgn_id) %>%  
  # cross join with products (match every row in x with every row in y)
  cross_join(products) %>% 
  # cross join with years
  cross_join(years_df) 

# Archived method for joining rgns, products, and years dfs
# region_product <- full_join(rgns_eez, products, by = character()) %>%
#   full_join(years_df, by = character())


# Calculate $/tonne for each OHI region and product and gapfill based on UN geopolitical region ----

harvest_tonnes_usd_geo_rgn <- harvest_tonnes_usd %>%
  full_join(region_product, by = join_by("rgn_id", "product", "year")) %>%
  add_georegion_id() %>%
  dplyr::filter(year %in% years) %>%
  arrange(rgn_id)


# calculate georgn values (for each region, year, and product)
geo_rgn_values_df <- harvest_tonnes_usd_geo_rgn %>%
  dplyr::group_by(georgn_id, year, product) %>%
  # calculate values ($/tonnes)
  dplyr::mutate(values = usd/tonnes) %>%
    mutate(values = ifelse(values == "Inf", NA, values)) %>% ## assign Inf as NA
  summarise(georgn_values = mean(values, na.rm = TRUE)) %>% 
  ungroup() ## now we have a 5 year average value for geo region

# calculate global values (for each product across all regions for each year)
global_values_df <- harvest_tonnes_usd_geo_rgn %>%
  dplyr::group_by(product, year) %>%
  mutate(values = usd/tonnes) %>%
  mutate(values = ifelse(values == "Inf", NA, values)) %>% ## assign Inf as NA
  summarise(global_values = mean(values, na.rm = TRUE)) ## now we have a 5 year average global value


# Create final gapfilled data frame
harvest_tonnes_usd_values_gf <- harvest_tonnes_usd_geo_rgn %>%
  left_join(geo_rgn_values_df, by = join_by("georgn_id", "product", "year")) %>%
  left_join(global_values_df, by = join_by("product", "year")) %>%
  mutate(values = usd/tonnes) %>%
  mutate(values = ifelse(values == "Inf", NA, 
                         # assign Inf (n/0) and NaNs (0/0) as NA
                         ifelse(values == "NaN", NA, values))) %>% 
  # gapfill so that when there is no regional value, take the geo regional value, 
  mutate(values_final = ifelse(is.na(values) & georgn_values != "NaN", georgn_values, 
                               # and if there is no geo regional value, take the global value; else, keep original values. 
                               ifelse(is.na(values) & georgn_values == "NaN", global_values, values))) %>% 
  # make note of gapfilling if any occurred in the previous step
  mutate(values_gf_description = ifelse(is.na(values) & georgn_values != "NaN", "geo_region",
                                        ifelse(is.na(values) & georgn_values == "NaN", "global", "none"))) %>%
  mutate(values_gf = ifelse(is.na(values) & georgn_values != "NaN", 1,
                            ifelse(is.na(values) & georgn_values == "NaN", 1, 0)))
# now we have a dataset with gap-filled average values and calculated yearly values... 
# we will save this and then calculate the 5 year averages

# Save this data frame to int
readr::write_csv(harvest_tonnes_usd_values_gf,
                 here(current_np_dir, "int", "harvest_tonnes_usd_weighting_gf.csv"))

# Now calculate 5 year averages (and assign year = 2020 <- legacy comment, no longer what the code does)
weighting_usd_values_final <- harvest_tonnes_usd_values_gf %>%
  dplyr::select(rgn_id, product, year, values_final) %>%
  group_by(rgn_id, product) %>%
  summarise(value_per_tonne = mean(values_final)) %>%
  ungroup() 

#save this df
readr::write_csv(weighting_usd_values_final,
                 here(current_np_dir, "int", "harvest_weighting_values.csv"))

Now prep the tonnes data from each product to be per each region for the 6 year average (2013-2018). (<– legacy note. As of v2024, this is a 5 year range (from 2018:2022), however there are data limitations from the Sea Around Us catch data, which only goes until 2019 (2001-2019). It was last downloaded in 2022, with an end data year of 2019, and hasn’t been updated since. This impacts the Fish Oil/Fish Meal (FOFM) intermediate data product mean_catch_FOFM.csv used to create the fofm_tonnes object below.)

## Read in FOFM tonnes data
fofm_tonnes <- read_csv(here(current_np_dir, "int", "mean_catch_FOFM.csv")) %>%
  dplyr::filter(year %in% years) %>% # v2024 range of 2001:2019 to 2018:2019
  mutate(product = "fish_oil") %>%
  group_by(rgn_id, year, product) %>%
  # Note: multiply by 0.3 to account for water loss when converting fish to fish oil.. 
  ## about 30% of fish are water and 70% are fish oil
  summarise(tonnes = sum(catch) * 0.3) %>% 
  ungroup() %>%
  group_by(rgn_id, product) %>%
  # calculate 5 year mean of catch -- 
  ## note: in v2024, there were only 2 years in the former year column: 
  ## 2018 and 2019 -- could be due to data year cap of 2019 in SAUP 
  ## (downloaded in 2022 with data up to year 2019, hasn't been updated since)
  summarise(tonnes = mean(tonnes)) %>% 
  ungroup() %>%
  full_join(rgns_eez) %>%
  dplyr::select(rgn_id, product, tonnes) %>%
  mutate(product = "fish_oil") %>%
  mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes)) 


# # write out for troubleshooting, bookkeeping
# fofm_int <- fofm_tonnes %>% mutate(year = scen_year_number)
# write_csv(fofm_int, here(current_np_dir, "int", "fofm_tonnes.csv"))

## Read in ornamentals tonnes data
np_tonnes <- read_csv(here(current_np_dir, "int", "np_harvest_tonnes_usd.csv")) %>% 
  filter(rgn_id != 213) # has year range of 1976:2022 in v2024
# this object is equivalent to the harvest_tonnes_usd object created in the previous chunk

orn_fill_df <- region_product %>% # has years from 2018:2022 in v2024 (proper 5 year range)
  filter(product == "ornamentals")
  
orn_tonnes <- np_tonnes %>%
  # filter to 5 year range defined in the beginning of the first methods chunk
  dplyr::filter(year %in% years, 
                product == "ornamentals") %>% 
  # full join to fill in the rest of the regions with 0
  full_join(orn_fill_df, by = c("rgn_id", "year", "product")) %>% 
  # gapfill the NAs to be 0
  mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes),
         usd = ifelse(is.na(usd), 0, usd)) %>% 
  dplyr::select(rgn_id, year, product, tonnes) %>%
  group_by(rgn_id, product) %>%
  # calculate 5 year mean of catch
  summarise(tonnes = mean(tonnes)) %>% 
  ungroup()

# # write out for troubleshooting, bookkeeping
# orn_tonnes_int <- orn_tonnes %>% mutate(year = scen_year_number)
# write_csv(orn_tonnes_int, here(current_np_dir, "int", "orn_tonnes.csv"))
  
## Read in seaweed tonnes data
sw_fill_df <- region_product %>%
  filter(product == "seaweeds")

sw_tonnes_raw <- read_csv(here(current_np_dir, "int", "np_seaweeds_tonnes_weighting.csv")) 

sw_tonnes <- sw_tonnes_raw %>%
  mutate(product = "seaweeds") %>%
  group_by(rgn_id, year, product) %>% # per region, year, and product,
  summarise(tonnes = sum(tonnes, na.rm = TRUE)) %>% # sum across all species of seaweed
  dplyr::filter(year %in% years) %>% # filter to 5 year range 
  full_join(sw_fill_df, by = c("rgn_id", "year", "product")) %>%
    mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes)) %>% # gapfill the NAs to be 0
  dplyr::select(rgn_id, year, product, tonnes) %>%
  ungroup() %>%
  group_by(rgn_id, product) %>%
  summarise(tonnes = mean(tonnes)) %>% ## calculate 5 year average
  ungroup()


## write out for troubleshooting, bookkeeping
# sw_tonnes_int <- sw_tonnes %>% mutate(year = scen_year_number)
# write_csv(sw_tonnes_int, here(current_np_dir, "int", "sw_tonnes.csv"))

Now we will calculate the weights per each product. To do this we need to multiply our average $ value for each product * tonnes of each product, and then divide by the total per each region. We will also assign year = (YYYY, e.g., 2022) so that this can be read into OHI-global (YYYY corresponds to the year that these weights were calculated).

harvest_weighting_values <- read_csv(here(current_np_dir, "int", "harvest_weighting_values.csv")) %>%
    dplyr::select(rgn_id, product, value_per_tonne)

prod_weights <- orn_tonnes %>%
    bind_rows(sw_tonnes) %>%
    bind_rows(fofm_tonnes) %>%
    left_join(harvest_weighting_values, by = c("rgn_id", "product")) %>%
    arrange(rgn_id) %>%
    mutate(usd_product = tonnes * value_per_tonne) %>%
    group_by(rgn_id) %>%
    mutate(total_usd = sum(usd_product)) %>%
    ungroup() %>%
    # calculate weight: proportion that each product contributes to that
    # region's NP
mutate(weight = usd_product/total_usd) %>%
    dplyr::select(-3, -4, -5, -6) %>%
    mutate(year = scen_year_number) %>%
    dplyr::select(rgn_id, year, product, weight) %>%
    filter(rgn_id != 213)

# write.csv(prod_weights, file.path(prep, 'output/np_product_weights.csv'),
# row.names = FALSE)
readr::write_csv(prod_weights, here(current_np_dir, "output", "np_product_weights.csv"))

Datacheck

Note: we’re not comparing data for a specific year, e.g., 2019, as we do in many data checks. The data processing that creates np_product_weights.csv fills the year column with the scenario year (e.g., 2024), which means that the previous year’s np_product_weights.csv and the current year’s weights cannot be compared by filtering to a specific year. Instead, we compare all weights for all regions and find the difference between this year’s weights and last year’s (called diff). We also plot the changes by having one year’s weights as x-value and the other year’s weights as y-values in a few scatterplots.

# Read in previous and current years' np_product_weights to compare
# old_prod_weights <- read_csv(file.path(prep, paste0('../v', prep_year-3,
# '/output/np_product_weights.csv'))) old_prod_weights <-
# read_csv(file.path(prep, paste0('../v', prep_year-2,
# '/output/np_product_weights.csv')))
old_prod_weights <- read_csv(here(previous_np_dir, "output", "np_product_weights.csv"))
prod_weights <- read_csv(here(current_np_dir, "output", "np_product_weights.csv"))

check <- prod_weights %>%
    rename(new_weight = "weight") %>%
    left_join(old_prod_weights, by = c("rgn_id", "product")) %>%
    mutate(diff = new_weight - weight) %>%
    left_join(rgns_eez)

# note -- these are across all years...

plot(check$new_weight, check$weight, main = "All Products")
abline(0, 1, col = "red")

check_sw <- check %>%
    filter(product == "seaweeds")

plot(check_sw$new_weight, check_sw$weight, main = "Seaweeds")
abline(0, 1, col = "red")

check_orn <- check %>%
    filter(product == "ornamentals") %>%
    mutate(difference = new_weight - weight)

plot(check_orn$new_weight, check_orn$weight, main = "Ornamentals")
abline(0, 1, col = "red")


check_fofm <- check %>%
    filter(product == "fish_oil")

plot(check_fofm$new_weight, check_fofm$weight, main = "Fish Oil / Fish Meal (FOFM)")
abline(0, 1, col = "red")  ## these change because of new SAUP fisheries catch data... 

# max(check$diff, na.rm = TRUE) min(check$diff, na.rm = TRUE)

top_10_diffs <- check %>%
    arrange(desc(abs(diff))) %>%
    head(n = 10)
top_10_diffs %>%
    relocate(diff, .after = "rgn_id")
# rgn 164 = Belize ~ had a 'seaweeds' weight of 0.00 in 2023, now has a weight
# of 0.98 ≈ 0.9845 diff (v2024) ~ had a 'fish_oil' weight of 1.00 in 2023, now
# has a weight of 0.015 ≈ -0.9845 diff (v2024) 155 = Tonga ~ had an
# 'ornamentals' weight of 0.97 in 2023, now has a weight of 0.00 ≈ -9.77
# (v2024) 153 = Cook Islands 11 = Marshall Islands

# v2024: for Cook Islands and Marshall Islands, almost feels like values for
# `ornamentals` and `fish_oil` got flipped?

test_belize <- harvest_tonnes_usd %>%
    filter(rgn_id == 164)

## check saint martin, Niue, Bonaire, Sint Eustasius - should decrease in
## production check djibouti - should increase in production

test <- harvest_tonnes_usd %>%
    filter(rgn_id == 46)

harvest_tonnes_usd_old <- read_csv(here(previous_np_dir, "int", "np_harvest_tonnes_usd.csv")) %>%
    filter(rgn_id != 213)

test2 <- harvest_tonnes_usd_old %>%
    filter(rgn_id == 46)

# test test2 v2024: Djibouti (rgn 46) does increase in production

# gf_orn <- read_csv(here(current_np_dir, 'output',
# 'np_ornamentals_harvest_tonnes_gf.csv'))

Check more previous years:

library(kableExtra)
# Read in previous 2 years' np_product_weights to compare old_prod_weights <-
# read_csv(file.path(prep, paste0('../v', prep_year-3,
# '/output/np_product_weights.csv'))) old_prod_weights <-
# read_csv(file.path(prep, paste0('../v', prep_year-2,
# '/output/np_product_weights.csv')))
old_prod_weights <- read_csv(here(previous_np_dir, "output", "np_product_weights.csv"))
older_prod_weights <- read_csv(here("globalprep", "np", "v2022", "output", "np_product_weights.csv"))

check_older <- older_prod_weights %>%
    rename(older_weight = "weight") %>%
    left_join(old_prod_weights, by = c("rgn_id", "product")) %>%
    mutate(diff = weight - older_weight) %>%
    left_join(rgns_eez)

plot(check_older$weight, check_older$older_weight, main = "v2023 vs v2022")
abline(0, 1, col = "red")

plot(check$new_weight, check$weight, main = "v2024 vs v2023")
abline(0, 1, col = "red")

# show diff
options(scipen = 999)

top_10_older_diffs <- check_older %>%
    arrange(desc(abs(diff))) %>%
    relocate(diff, .after = product) %>%
    relocate(year.x, .before = older_weight) %>%
    head(n = 10)

top_10_older_diffs

top_10_diffs <- check %>%
    arrange(desc(abs(diff))) %>%
    head(n = 10) %>%
    relocate(diff, .after = "rgn_id")

top_10_diffs

check_older_prep <- check_older %>%
    rename(previous_diff = diff, weight_2023 = weight, weight_2022 = older_weight) %>%
    dplyr::select(rgn_id, rgn_name, product, previous_diff, weight_2023, weight_2022) %>%
    arrange(desc(abs(previous_diff)))


# Create combined table with all differences
combined_check <- check %>%
    rename(weight_2023 = weight, weight_2024 = new_weight) %>%
    dplyr::select(rgn_id, rgn_name, product, diff, weight_2024, weight_2023) %>%
    left_join(check_older_prep, by = c("rgn_id", "rgn_name", "product", "weight_2023")) %>%
    relocate(diff, .after = "product") %>%
    mutate(two_yr_diff = weight_2024 - weight_2022) %>%
    relocate(c(previous_diff, two_yr_diff), .after = "diff")

# Set and format diff table ---- sort
diff_tbl_full <- combined_check %>%
    arrange(desc(abs(diff)), desc(abs(previous_diff))) %>%
    head(n = 10)
# style
diff_tbl_styled <- diff_tbl_full %>%
    kableExtra::kbl() %>%
    kableExtra::kable_styling(bootstrap_options = "striped", full_width = F, position = "left",
        fixed_thead = T) %>%
    kableExtra::column_spec(4, color = "white", background = kableExtra::spec_color(diff_tbl_full$diff[1:10],
        end = 0.7)) %>%
    kableExtra::column_spec(c(7, 8), bold = T, border_left = T, border_right = T)  #%>% 
# kableExtra::column_spec(2, bold = T)


# Set and format two yr diff table ---- sort
two_yr_diff_tbl <- combined_check %>%
    arrange(desc(abs(two_yr_diff))) %>%
    head(n = 10)

# style
two_yr_diff_tbl_styled <- two_yr_diff_tbl %>%
    kableExtra::kbl() %>%
    kableExtra::kable_styling(bootstrap_options = "striped", full_width = F, position = "left",
        fixed_thead = T) %>%
    kableExtra::column_spec(6, color = "white", background = kableExtra::spec_color(two_yr_diff_tbl$two_yr_diff[1:10],
        end = 0.7)) %>%
    kableExtra::column_spec(c(7, 9), border_left = T, border_right = T, bold = T)

# view tables
diff_tbl_styled
two_yr_diff_tbl_styled

When looking at the two-year-difference trends, the most notable differences appear to be in Belize seaweeds – which went from a consistent two year streak of 0.00 to 0.98, indicating that seaweeds now make up 98% of Belize’s Natural Products. Similarly, the ornamentals weight in Belize dropped from 85 in 2022 to 0 in 2023, and remains at 0 in 2024. Tonga’s ornamentals category also saw a dramatic shift from two years of weights around 98, to 0 in 2024. The other notable change when considering the two year trends is in Tonga’s fish_oil weight, which has increased from 0.016 in 2022 to 0.023 in 2023, to now a staggering 0.71. These variations are likely due to a range of impacts, including changes in consumer demand trends, fishery and aquaculture status, and the fact that the reference point is a moving 5-year average, rather than a fixed point.

Compare regions of interest

in v2024, we noticed some extreme differences in Belize (rgn_id 164) and Tonga (rgn_id 155). The code below was used to investigate these differences in upstream data.

library(readr)
library(dplyr)
library(tidyr)
library(here)

harvest_tonnes_usd_old <- read_csv(here(previous_np_dir, "int", "np_harvest_tonnes_usd.csv")) %>%
  filter(rgn_id != 213)

View(np_tonnes %>% filter(rgn_id == 208, year %in% years))

# ---- Belize ----
# current
belize_current_harv <- harvest_tonnes_usd %>%
  filter(rgn_id == 164,
         year %in% years)

# previous
belize_prev_harv <- harvest_tonnes_usd_old %>% 
  filter(rgn_id == 164,
         year %in% years)

# harvest_tonnes_usd_old # this object only contains FAO commodity data

# check fish_oil
#fofm_tonnes <- read_csv(here(current_np_dir, "int", "mean_catch_FOFM.csv")) %>%
# read in old data
fofm_tonnes_old_full <- read_csv(here(previous_np_dir, "int", "mean_catch_FOFM.csv")) 
fofm_tonnes_current_full <- read_csv(here(current_np_dir, "int", "mean_catch_FOFM.csv"))

# prep old data how we prepped current data earlier
fofm_tonnes_old <- fofm_tonnes_old_full %>% 
  dplyr::filter(year %in% 2017:2019) %>% # v2024 range of 2001:2019 to 2018:2019
  mutate(product = "fish_oil") %>%
  group_by(rgn_id, year, product) %>%
  # Note: multiply by 0.3 to account for water loss when converting fish to fish oil.. 
  ## about 30% of fish are water and 70% are fish oil
  summarise(tonnes = sum(catch) * 0.3) %>% 
  ungroup() %>%
  group_by(rgn_id, product) %>%
  summarise(tonnes = mean(tonnes)) %>% 
  ungroup() %>%
  full_join(rgns_eez) %>%
  dplyr::select(rgn_id, product, tonnes) %>%
  mutate(product = "fish_oil") %>%
  mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes))



fofm_tonnes %>% filter(rgn_id == 164)
# 3.63 tonnes... fish_oil appears as 0 in the harvest_tonnes_usd because we use a different data source for it (SAUP)

fofm_tonnes_old %>% filter(rgn_id == 164)
#fofm_tonnes_current %>% filter(rgn_id == 164)
#  3.65

# check seaweeds
sw_tonnes %>% filter(rgn_id == 164)
# 7.56 tonnes

sw_tonnes_old_raw <- read_csv(here(previous_np_dir, "int", "np_seaweeds_tonnes_weighting.csv")) 

sw_tonnes_old <- sw_tonnes_old_raw %>%
  mutate(product = "seaweeds") %>%
  group_by(rgn_id, year, product) %>% # per region, year, and product,
  summarise(tonnes = sum(tonnes, na.rm = TRUE)) %>% # sum across all species of seaweed
  dplyr::filter(year %in% years) %>% # filter to 5 year range 
  full_join(sw_fill_df, by = c("rgn_id", "year", "product")) %>%
    mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes)) %>% ## gapfill the NAs to be 0
  dplyr::select(rgn_id, year, product, tonnes) %>%
  ungroup() %>%
  group_by(rgn_id, product) %>%
  summarise(tonnes = mean(tonnes)) %>% ## calculate 5 year average
  ungroup()

sw_tonnes_old %>% filter(rgn_id == 164)
# 0 tonnes!

# ornamentals -- still 0

# overall, these number make sense and can be explained by the introduction of seaweed farming in

# ---- Tonga ----
# current
tonga_current_harv <- harvest_tonnes_usd %>%
  filter(rgn_id == 155,
         year %in% years)
tonga_current_harv
# seaweeds and ornamentals are both 0 in 2022.

# previous
tonga_prev_harv <- harvest_tonnes_usd_old %>% 
  filter(rgn_id == 155,
         year %in% years)
tonga_prev_harv
# note: the current harvest data shows very different values for 
# seaweed and ornamental values from previous years' data for 2018-2021
# example:
# Tonga (rgn_id 155), year == 2021
#       in v2023's data, seaweeds have a value of 100.58 tonnes and 147 usd, 
# while in v2024's data, seaweeds have a value of 23.25 tonnes and 32.75 usd

plotly version of old/standard base plots:

library(plotly)

test_comp <- check_older %>% 
  plotly::plot_ly(type = "scatter", mode = "markers") %>% 
  # add points
  add_trace(
    x = ~older_weight, y = ~weight,
    text = ~paste("<b>Region ID</b>:", rgn_id), #, "<br><b>Product</b>:", product),
    marker = list(
      color = 'rgba(0,0,0,0.2)', # adds opacity as fourth argument
      line = list(
        color = 'rgba(0,0,0,1)',
        width = 1.5)
    ),
    hovertemplate = paste(
      '%{text}',
      # '<b>Product</b>: %{color}',
      '<br><b>Old Weight</b>: %{x:.2f}',
      '<br><b>New Weight</b>: %{y:.2f}<br>'
    ),
    showlegend = FALSE,
    hoverinfo = "none"
  ) %>% 
  # add AB line
  add_trace(
    x = c(0,1),
    y = c(0,1),
    mode = "lines",
    line = list(color = "red"),
    showlegend = FALSE,
    hoverinfo = "none"
  ) %>%
  
  layout(xaxis = list(hoverformat = '.3f'),
         yaxis = list(hoverformat = '.3f')) #%>% 
# style(hoverinfo = "none")

test_comp

RAM lost some stocks, so some stocks get removed in v4.65 (2024 data download); also got some new stocks

New datacheck: interactive plot

# ======== Setup ==========================================================
# (mostly copied from earlier setup with addition of library(plotly))
# Read in packages necessary for visualization (in case you cleared your environment)
library(ggplot2)
library(plotly)
library(dplyr)
library(tidyr)
library(here)

# Set scenario year, reproducible file paths
scen_year_number <- 2024 # update this!!
scenario_year <- as.character(scen_year_number)
v_scen_year <- paste0("v", scenario_year)

prev_year_number <- scen_year_number - 1
prev_2_year_num <- scen_year_number - 2
v_prev_year <- paste0("v", prev_year_number)
v_prev_2_year <- paste0("v", prev_2_year_num)

current_np_dir <- here::here("globalprep", "np", v_scen_year)
previous_np_dir <- here::here("globalprep", "np", paste0("v", scen_year_number - 1))

# ----- Read in NP weights data  -----
# Read in current, previous, and 2 years previous data
old_prod_weights <- read_csv(here(previous_np_dir, "output", "np_product_weights.csv"))
prod_weights <- read_csv(here(current_np_dir, "output", "np_product_weights.csv"))

check <- prod_weights %>%
  rename("new_weight" = "weight") %>%
  left_join(old_prod_weights, by = c("rgn_id", "product")) %>%
  mutate(diff = new_weight - weight) %>%
  left_join(rgns_eez)


#old_prod_weights <- read_csv(here(previous_np_dir, "output", "np_product_weights.csv"))
older_prod_weights <- read_csv(here("globalprep", "np",
                                    paste0("v", scen_year_number - 2),
                                    "output", "np_product_weights.csv"))

check_older <- older_prod_weights %>%
  rename("older_weight" = "weight") %>%
  left_join(old_prod_weights, by = c("rgn_id", "product")) %>%
  mutate(diff = weight - older_weight) %>%
  left_join(rgns_eez)


# ======== Plot new NP weights data vs. previous year's data ==================


# ---- Create ggplot object with `text` aes argument to prep for plotly ----- #
g_new <- ggplot(check, aes(x = weight, y = new_weight, color = product,
                             text = paste("<b>Region ID:</b>", rgn_id,
                                          "<br><b>Product:</b>", product #,
                                          #"<br>Old weight:", older_weight,
                                          # "<br>New weight:", weight
                                          ))) +
  geom_point(alpha = 0.8) +
  geom_abline(intercept = 0, slope = 1, alpha = 0.7,
              color = "red", linetype = "solid") +
  scale_color_manual(values = c("fish_oil" = "#edae49",
                                "seaweeds" = "#386641",
                                "ornamentals" = "#83c5be")) +
  labs(title = paste(v_scen_year, "vs", v_prev_year, "Natural Product Weights by Product"),
       x = paste(v_prev_year, "Weights"),
       y = paste(v_scen_year, "Weights"),
       color = "Product") +
  theme_minimal()

# ---- Convert to plotly ------------------------------ #
p_new <- ggplotly(g_new, tooltip = "text") %>%
  layout(
    hovermode = "closest",
    xaxis = list(hoverformat = ".4f"), # customize number formatting 
    yaxis = list(hoverformat = ".4f") # (number of places after decimal) 
  )

# ------- Create custom hover template ---------------- #
p_new$x$data <- lapply(p_new$x$data, function(trace) {
  if (trace$mode == "markers") {
    trace$hovertemplate <- paste(
      "%{text}<br>",
      "<b>Old Weight</b>: %{x:.4f}<br>",
      "<b>New Weight</b>: %{y:.4f}<br>"
      )
  }
  return(trace)
})

# Display customized plot
p_new


# ========= Plot old vs. older NP weights data ================================
# Plot previous year's NP weights data vs. year before previous year's data 

#color_scale <- c("fish_oils" = "#edae49", "seaweeds" = "#00798c", "ornamentals" = "#d1495b")

# ---- Create ggplot object with added `text` field to prep for plotly ------ #
g_old <- ggplot(check_older, aes(
  x = older_weight, y = weight, color = product,
  text = paste("<b>Region ID:</b>", rgn_id, "<br><b>Product:</b>", product #,
               #"<br>Old weight:", older_weight, "<br>New weight:", weight
               )
  )) +
  # add scatterplot points
  geom_point(alpha = 0.8) +
  # add ab line
  geom_abline(intercept = 0, slope = 1, alpha = 0.7,
              color = "red", linetype = "solid") +
  # customize colors
  scale_color_manual(values = c("fish_oil" = "#edae49",
                                "seaweeds" = "#386641",
                                "ornamentals" = "#83c5be")) +
  # update labels
  labs(title = paste(v_prev_year, "vs", v_prev_2_year, "Natural Products Weights by Product"),
       x = paste(v_prev_2_year, "Weights"),
       y = paste(v_prev_year, "Weights"),
       color = "Product") +
  # set base theme
  theme_minimal()

# ---- Convert to plotly ------------------------------ #
p_old <- ggplotly(g_old, tooltip = "text") %>%
  layout(
    hovermode = "closest",
    xaxis = list(hoverformat = ".4f"), # customize number formatting 
    yaxis = list(hoverformat = ".4f")
  )

# ------- Custom hover template ----------------------- #
p_old$x$data <- lapply(p_old$x$data, function(trace) {
  if (trace$mode == "markers") {
    trace$hovertemplate <- paste(
      "%{text}<br>",
      "<b>Old Weight</b>: %{x:.4f}<br>",
      "<b>New Weight</b>: %{y:.4f}<br>"
      )
  }
  return(trace)
})

# Display customized plot!
p_old

OHI 2024: Natural Products: weighting data prep

Compiled on Wed Aug 28 20:11:38 2024 by ramji