[REFERENCE RMD FILE: http://ohi-science.org/ohiprep_v2021/globalprep/np/v2021/STEP2_np_weighting_prep.html]
This analysis converts FAO commodities tonnes data for ornamental fish trade, SAUP fisheries tonnes data for FOFM, and FAO mariculture tonnes data for seaweeds into a weighting scheme per each ohi region, based off of 5 year averages of tonnes and values (in usd) for each product. This weighting scheme will be applied to the scores to determine how much of each natural products layer affects the overall NP score when updating in ohi global.
updated syntax to follow tidyverse style when reasonable; updated
functions read.csv()
and write.csv()
to
read_csv()
and write_csv()
updated file paths to be more reproducible, used
here()
package and updated syntax to reduce
system-dependent file paths
updated full_join(by = character())
superseded
method to cross_join()
when creating rgn_id x products x
years dataframe
added optional code to write out tonnes
objects
created in weight calculation prep (fofm_tonnes.csv
,
sw_tonnes.csv
, and orn_tonnes.csv
) to the
int
folder
added extensive datacheck section, plotting
Reference:
https://www.fao.org/fishery/en/statistics/software/fishstatj
App release date: March 2024
FAO raw commodities quantity 1976_2021 FAO raw commodities value 1976_2022 FAO metadata
Downloaded: July, 26, 2024
Description: Quantity (tonnes) and value (USD) of raw commodities (Exports only) for each country, taxa, year. The FAO data is subset to include commodities in these categories: ornamental fish, fish oil, seaweed and plants (see: raw/commodities2products_weighting.csv for details).
Time range: 1976-2022
Reference: https://www.seafoodwatch.org/globalassets/sfw/pdf/whats-new/seafood-watch-complete-recommendation-list.pdf Release date: March 4, 2023
Downloaded: May 31, 2023
Description: Monterrey Bay Aquarium Seafood Watch aquaculture recommendations. Sustainability scored from 0-10. Rescaled to 0-1. There is only one value for seaweeds in the data… 0.67
Reference: Pauly D., Zeller D., Palomares M.L.D. (Editors), 2020. Sea Around Us Concepts, Design and Data (seaaroundus.org).
Downloaded: September 27, 2022
Description: Tons per year and SAUP region with information on sector type, industry type, fishing entitity, reporting status and taxonomic information.
Time range: 1950 - 2019
Format: CSV
Additional Information: Methods
Reference: Froehlich, H.E., Jacobsen, N.S., Essington, T.E., Clavelle, T., and Halpern, B.S. (2018). Avoiding the ecological limits of forage fish for fed aquaculture. Nature Sustainability 1, 298.
Downloaded: July 7, 2020. Obtained from Melanie Frazier (NCEAS).
Description: List of FOFM species from Watson v3 data.
Native data resolution:
Format: CSV format
Reference: RAM Legacy Stock Assessment Database v4.65, 06/17/2024
Downloaded: 08/07/2024
Description: B/Bmsy value by stock and year (other data, which we do not use, are also available in the database)
Native data resolution: stock (fish stock, species and region specific)
Time range: 1800 - 2023
Format: CSV format
Additional Information: We use the finalized b/bmsy layer from OHI-global for this data prep. We do not actually read in the raw RAM data here.
Reference:
https://www.fao.org/fishery/statistics-query/en/aquaculture/aquaculture_quantity
FAO Global Aquaculture Production Quantity 1950_2022 FAO metadata found
here
Downloaded: July 26th, 2024
Last updated: March 29th, 2024
Description: Quantity (tonnes) of mariculture for each country, species, year.
Time range: 1950-2022
library(formatR) # for wrapping code when rendered
knitr::opts_chunk$set(eval=FALSE, tidy.opts = list(width.cutoff = 80), tidy = TRUE)
# Read in packages ----
library(here)
library(tidyverse)
# Set scenario year, reproducible file paths ----
scen_year_number <- 2024 # update this!!
scenario_year <- as.character(scen_year_number)
v_scen_year <- paste0("v", scenario_year)
# define file paths to simplify read and write operations
current_np_dir <- here::here("globalprep", "np", v_scen_year)
previous_np_dir <- here::here("globalprep", "np", paste0("v", scen_year_number - 1))
# Source scripts/functions ----
# directory locations
source(here::here("workflow", "R", "common.R"))
## This file makes it easier to process data for the OHI global assessment
## by creating the following objects:
##
## * dir_M = identifies correct file path to Mazu (internal server) based on your operating system
## * mollCRS = the crs code for the mollweide coordinate reference system we use in the global assessment
## * regions_shape() = function to load global shapefile for land/eez/high seas/antarctica regions
## * ohi_rasters() = function to load two rasters: global eez regions and ocean region
## * region_data() = function to load 2 dataframes describing global regions
## * rgn_syns() = function to load dataframe of region synonyms (used to convert country names to OHI regions)
## * low_pop() = function to load dataframe of regions with low and no human population
## * UNgeorgn = function to load dataframe of UN geopolitical designations used to gapfill missing data
Import prepped FAO commodities data set and find 5 year average $USD / Tonne average value per product per region. Gapfill these based on the UN geopolitical region/global values.
# Read in the prepped FAO commodities data set (created on line 453 in step 1a)
harvest_tonnes_usd <- read_csv(here(current_np_dir, "int", "np_harvest_tonnes_usd.csv")) %>%
filter(rgn_id != 213) # filter out Antarctica
# Make a data frame with every region and every product ----
# Define years
max_year <- max(harvest_tonnes_usd$year)
min_year <- max_year - 4
years <- c(min_year:max_year) # get 5 year range
years_df <- data.frame(year = min_year:max_year) # so that we can get 5 year average
# Load OHI region data from common.R
region_data() # gets rgns_eez
# Define subset of OHI EEZ regions
rgns_eez <- rgns_eez %>%
dplyr::select(rgn_id) %>%
filter(rgn_id != 213) # filter out Antarctica
# Define products dataframe
products <- data.frame(product = c("seaweeds", "ornamentals", "fish_oil"))
# Create full rgn x product x years dataframe
region_product <- rgns_eez %>%
# select rgn_id from rgns_eez
dplyr::select(rgn_id) %>%
# cross join with products (match every row in x with every row in y)
cross_join(products) %>%
# cross join with years
cross_join(years_df)
# Archived method for joining rgns, products, and years dfs
# region_product <- full_join(rgns_eez, products, by = character()) %>%
# full_join(years_df, by = character())
# Calculate $/tonne for each OHI region and product and gapfill based on UN geopolitical region ----
harvest_tonnes_usd_geo_rgn <- harvest_tonnes_usd %>%
full_join(region_product, by = join_by("rgn_id", "product", "year")) %>%
add_georegion_id() %>%
dplyr::filter(year %in% years) %>%
arrange(rgn_id)
# calculate georgn values (for each region, year, and product)
geo_rgn_values_df <- harvest_tonnes_usd_geo_rgn %>%
dplyr::group_by(georgn_id, year, product) %>%
# calculate values ($/tonnes)
dplyr::mutate(values = usd/tonnes) %>%
mutate(values = ifelse(values == "Inf", NA, values)) %>% ## assign Inf as NA
summarise(georgn_values = mean(values, na.rm = TRUE)) %>%
ungroup() ## now we have a 5 year average value for geo region
# calculate global values (for each product across all regions for each year)
global_values_df <- harvest_tonnes_usd_geo_rgn %>%
dplyr::group_by(product, year) %>%
mutate(values = usd/tonnes) %>%
mutate(values = ifelse(values == "Inf", NA, values)) %>% ## assign Inf as NA
summarise(global_values = mean(values, na.rm = TRUE)) ## now we have a 5 year average global value
# Create final gapfilled data frame
harvest_tonnes_usd_values_gf <- harvest_tonnes_usd_geo_rgn %>%
left_join(geo_rgn_values_df, by = join_by("georgn_id", "product", "year")) %>%
left_join(global_values_df, by = join_by("product", "year")) %>%
mutate(values = usd/tonnes) %>%
mutate(values = ifelse(values == "Inf", NA,
# assign Inf (n/0) and NaNs (0/0) as NA
ifelse(values == "NaN", NA, values))) %>%
# gapfill so that when there is no regional value, take the geo regional value,
mutate(values_final = ifelse(is.na(values) & georgn_values != "NaN", georgn_values,
# and if there is no geo regional value, take the global value; else, keep original values.
ifelse(is.na(values) & georgn_values == "NaN", global_values, values))) %>%
# make note of gapfilling if any occurred in the previous step
mutate(values_gf_description = ifelse(is.na(values) & georgn_values != "NaN", "geo_region",
ifelse(is.na(values) & georgn_values == "NaN", "global", "none"))) %>%
mutate(values_gf = ifelse(is.na(values) & georgn_values != "NaN", 1,
ifelse(is.na(values) & georgn_values == "NaN", 1, 0)))
# now we have a dataset with gap-filled average values and calculated yearly values...
# we will save this and then calculate the 5 year averages
# Save this data frame to int
readr::write_csv(harvest_tonnes_usd_values_gf,
here(current_np_dir, "int", "harvest_tonnes_usd_weighting_gf.csv"))
# Now calculate 5 year averages (and assign year = 2020 <- legacy comment, no longer what the code does)
weighting_usd_values_final <- harvest_tonnes_usd_values_gf %>%
dplyr::select(rgn_id, product, year, values_final) %>%
group_by(rgn_id, product) %>%
summarise(value_per_tonne = mean(values_final)) %>%
ungroup()
#save this df
readr::write_csv(weighting_usd_values_final,
here(current_np_dir, "int", "harvest_weighting_values.csv"))
Now prep the tonnes data from each product to be per each region for
the 6 year average (2013-2018). (<– legacy note. As of v2024, this is
a 5 year range (from 2018:2022), however there are data limitations from
the Sea Around Us catch data, which only goes until 2019 (2001-2019). It
was last downloaded in 2022, with an end data year of 2019, and hasn’t
been updated since. This impacts the Fish Oil/Fish Meal (FOFM)
intermediate data product mean_catch_FOFM.csv
used to
create the fofm_tonnes
object below.)
## Read in FOFM tonnes data
fofm_tonnes <- read_csv(here(current_np_dir, "int", "mean_catch_FOFM.csv")) %>%
dplyr::filter(year %in% years) %>% # v2024 range of 2001:2019 to 2018:2019
mutate(product = "fish_oil") %>%
group_by(rgn_id, year, product) %>%
# Note: multiply by 0.3 to account for water loss when converting fish to fish oil..
## about 30% of fish are water and 70% are fish oil
summarise(tonnes = sum(catch) * 0.3) %>%
ungroup() %>%
group_by(rgn_id, product) %>%
# calculate 5 year mean of catch --
## note: in v2024, there were only 2 years in the former year column:
## 2018 and 2019 -- could be due to data year cap of 2019 in SAUP
## (downloaded in 2022 with data up to year 2019, hasn't been updated since)
summarise(tonnes = mean(tonnes)) %>%
ungroup() %>%
full_join(rgns_eez) %>%
dplyr::select(rgn_id, product, tonnes) %>%
mutate(product = "fish_oil") %>%
mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes))
# # write out for troubleshooting, bookkeeping
# fofm_int <- fofm_tonnes %>% mutate(year = scen_year_number)
# write_csv(fofm_int, here(current_np_dir, "int", "fofm_tonnes.csv"))
## Read in ornamentals tonnes data
np_tonnes <- read_csv(here(current_np_dir, "int", "np_harvest_tonnes_usd.csv")) %>%
filter(rgn_id != 213) # has year range of 1976:2022 in v2024
# this object is equivalent to the harvest_tonnes_usd object created in the previous chunk
orn_fill_df <- region_product %>% # has years from 2018:2022 in v2024 (proper 5 year range)
filter(product == "ornamentals")
orn_tonnes <- np_tonnes %>%
# filter to 5 year range defined in the beginning of the first methods chunk
dplyr::filter(year %in% years,
product == "ornamentals") %>%
# full join to fill in the rest of the regions with 0
full_join(orn_fill_df, by = c("rgn_id", "year", "product")) %>%
# gapfill the NAs to be 0
mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes),
usd = ifelse(is.na(usd), 0, usd)) %>%
dplyr::select(rgn_id, year, product, tonnes) %>%
group_by(rgn_id, product) %>%
# calculate 5 year mean of catch
summarise(tonnes = mean(tonnes)) %>%
ungroup()
# # write out for troubleshooting, bookkeeping
# orn_tonnes_int <- orn_tonnes %>% mutate(year = scen_year_number)
# write_csv(orn_tonnes_int, here(current_np_dir, "int", "orn_tonnes.csv"))
## Read in seaweed tonnes data
sw_fill_df <- region_product %>%
filter(product == "seaweeds")
sw_tonnes_raw <- read_csv(here(current_np_dir, "int", "np_seaweeds_tonnes_weighting.csv"))
sw_tonnes <- sw_tonnes_raw %>%
mutate(product = "seaweeds") %>%
group_by(rgn_id, year, product) %>% # per region, year, and product,
summarise(tonnes = sum(tonnes, na.rm = TRUE)) %>% # sum across all species of seaweed
dplyr::filter(year %in% years) %>% # filter to 5 year range
full_join(sw_fill_df, by = c("rgn_id", "year", "product")) %>%
mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes)) %>% # gapfill the NAs to be 0
dplyr::select(rgn_id, year, product, tonnes) %>%
ungroup() %>%
group_by(rgn_id, product) %>%
summarise(tonnes = mean(tonnes)) %>% ## calculate 5 year average
ungroup()
## write out for troubleshooting, bookkeeping
# sw_tonnes_int <- sw_tonnes %>% mutate(year = scen_year_number)
# write_csv(sw_tonnes_int, here(current_np_dir, "int", "sw_tonnes.csv"))
Now we will calculate the weights per each product. To do this we need to multiply our average $ value for each product * tonnes of each product, and then divide by the total per each region. We will also assign year = (YYYY, e.g., 2022) so that this can be read into OHI-global (YYYY corresponds to the year that these weights were calculated).
harvest_weighting_values <- read_csv(here(current_np_dir, "int", "harvest_weighting_values.csv")) %>%
dplyr::select(rgn_id, product, value_per_tonne)
prod_weights <- orn_tonnes %>%
bind_rows(sw_tonnes) %>%
bind_rows(fofm_tonnes) %>%
left_join(harvest_weighting_values, by = c("rgn_id", "product")) %>%
arrange(rgn_id) %>%
mutate(usd_product = tonnes * value_per_tonne) %>%
group_by(rgn_id) %>%
mutate(total_usd = sum(usd_product)) %>%
ungroup() %>%
# calculate weight: proportion that each product contributes to that
# region's NP
mutate(weight = usd_product/total_usd) %>%
dplyr::select(-3, -4, -5, -6) %>%
mutate(year = scen_year_number) %>%
dplyr::select(rgn_id, year, product, weight) %>%
filter(rgn_id != 213)
# write.csv(prod_weights, file.path(prep, 'output/np_product_weights.csv'),
# row.names = FALSE)
readr::write_csv(prod_weights, here(current_np_dir, "output", "np_product_weights.csv"))
Note: we’re not comparing data for a specific year, e.g., 2019, as we
do in many data checks. The data processing that creates
np_product_weights.csv
fills the year
column
with the scenario year (e.g., 2024), which means that the previous
year’s np_product_weights.csv
and the current year’s
weights cannot be compared by filtering to a specific year. Instead, we
compare all weights for all regions and find the difference between this
year’s weights and last year’s (called diff
). We also plot
the changes by having one year’s weights as x-value and the other year’s
weights as y-values in a few scatterplots.
# Read in previous and current years' np_product_weights to compare
# old_prod_weights <- read_csv(file.path(prep, paste0('../v', prep_year-3,
# '/output/np_product_weights.csv'))) old_prod_weights <-
# read_csv(file.path(prep, paste0('../v', prep_year-2,
# '/output/np_product_weights.csv')))
old_prod_weights <- read_csv(here(previous_np_dir, "output", "np_product_weights.csv"))
prod_weights <- read_csv(here(current_np_dir, "output", "np_product_weights.csv"))
check <- prod_weights %>%
rename(new_weight = "weight") %>%
left_join(old_prod_weights, by = c("rgn_id", "product")) %>%
mutate(diff = new_weight - weight) %>%
left_join(rgns_eez)
# note -- these are across all years...
plot(check$new_weight, check$weight, main = "All Products")
abline(0, 1, col = "red")
check_sw <- check %>%
filter(product == "seaweeds")
plot(check_sw$new_weight, check_sw$weight, main = "Seaweeds")
abline(0, 1, col = "red")
check_orn <- check %>%
filter(product == "ornamentals") %>%
mutate(difference = new_weight - weight)
plot(check_orn$new_weight, check_orn$weight, main = "Ornamentals")
abline(0, 1, col = "red")
check_fofm <- check %>%
filter(product == "fish_oil")
plot(check_fofm$new_weight, check_fofm$weight, main = "Fish Oil / Fish Meal (FOFM)")
abline(0, 1, col = "red") ## these change because of new SAUP fisheries catch data...
# max(check$diff, na.rm = TRUE) min(check$diff, na.rm = TRUE)
top_10_diffs <- check %>%
arrange(desc(abs(diff))) %>%
head(n = 10)
top_10_diffs %>%
relocate(diff, .after = "rgn_id")
# rgn 164 = Belize ~ had a 'seaweeds' weight of 0.00 in 2023, now has a weight
# of 0.98 ≈ 0.9845 diff (v2024) ~ had a 'fish_oil' weight of 1.00 in 2023, now
# has a weight of 0.015 ≈ -0.9845 diff (v2024) 155 = Tonga ~ had an
# 'ornamentals' weight of 0.97 in 2023, now has a weight of 0.00 ≈ -9.77
# (v2024) 153 = Cook Islands 11 = Marshall Islands
# v2024: for Cook Islands and Marshall Islands, almost feels like values for
# `ornamentals` and `fish_oil` got flipped?
test_belize <- harvest_tonnes_usd %>%
filter(rgn_id == 164)
## check saint martin, Niue, Bonaire, Sint Eustasius - should decrease in
## production check djibouti - should increase in production
test <- harvest_tonnes_usd %>%
filter(rgn_id == 46)
harvest_tonnes_usd_old <- read_csv(here(previous_np_dir, "int", "np_harvest_tonnes_usd.csv")) %>%
filter(rgn_id != 213)
test2 <- harvest_tonnes_usd_old %>%
filter(rgn_id == 46)
# test test2 v2024: Djibouti (rgn 46) does increase in production
# gf_orn <- read_csv(here(current_np_dir, 'output',
# 'np_ornamentals_harvest_tonnes_gf.csv'))
Check more previous years:
library(kableExtra)
# Read in previous 2 years' np_product_weights to compare old_prod_weights <-
# read_csv(file.path(prep, paste0('../v', prep_year-3,
# '/output/np_product_weights.csv'))) old_prod_weights <-
# read_csv(file.path(prep, paste0('../v', prep_year-2,
# '/output/np_product_weights.csv')))
old_prod_weights <- read_csv(here(previous_np_dir, "output", "np_product_weights.csv"))
older_prod_weights <- read_csv(here("globalprep", "np", "v2022", "output", "np_product_weights.csv"))
check_older <- older_prod_weights %>%
rename(older_weight = "weight") %>%
left_join(old_prod_weights, by = c("rgn_id", "product")) %>%
mutate(diff = weight - older_weight) %>%
left_join(rgns_eez)
plot(check_older$weight, check_older$older_weight, main = "v2023 vs v2022")
abline(0, 1, col = "red")
plot(check$new_weight, check$weight, main = "v2024 vs v2023")
abline(0, 1, col = "red")
# show diff
options(scipen = 999)
top_10_older_diffs <- check_older %>%
arrange(desc(abs(diff))) %>%
relocate(diff, .after = product) %>%
relocate(year.x, .before = older_weight) %>%
head(n = 10)
top_10_older_diffs
top_10_diffs <- check %>%
arrange(desc(abs(diff))) %>%
head(n = 10) %>%
relocate(diff, .after = "rgn_id")
top_10_diffs
check_older_prep <- check_older %>%
rename(previous_diff = diff, weight_2023 = weight, weight_2022 = older_weight) %>%
dplyr::select(rgn_id, rgn_name, product, previous_diff, weight_2023, weight_2022) %>%
arrange(desc(abs(previous_diff)))
# Create combined table with all differences
combined_check <- check %>%
rename(weight_2023 = weight, weight_2024 = new_weight) %>%
dplyr::select(rgn_id, rgn_name, product, diff, weight_2024, weight_2023) %>%
left_join(check_older_prep, by = c("rgn_id", "rgn_name", "product", "weight_2023")) %>%
relocate(diff, .after = "product") %>%
mutate(two_yr_diff = weight_2024 - weight_2022) %>%
relocate(c(previous_diff, two_yr_diff), .after = "diff")
# Set and format diff table ---- sort
diff_tbl_full <- combined_check %>%
arrange(desc(abs(diff)), desc(abs(previous_diff))) %>%
head(n = 10)
# style
diff_tbl_styled <- diff_tbl_full %>%
kableExtra::kbl() %>%
kableExtra::kable_styling(bootstrap_options = "striped", full_width = F, position = "left",
fixed_thead = T) %>%
kableExtra::column_spec(4, color = "white", background = kableExtra::spec_color(diff_tbl_full$diff[1:10],
end = 0.7)) %>%
kableExtra::column_spec(c(7, 8), bold = T, border_left = T, border_right = T) #%>%
# kableExtra::column_spec(2, bold = T)
# Set and format two yr diff table ---- sort
two_yr_diff_tbl <- combined_check %>%
arrange(desc(abs(two_yr_diff))) %>%
head(n = 10)
# style
two_yr_diff_tbl_styled <- two_yr_diff_tbl %>%
kableExtra::kbl() %>%
kableExtra::kable_styling(bootstrap_options = "striped", full_width = F, position = "left",
fixed_thead = T) %>%
kableExtra::column_spec(6, color = "white", background = kableExtra::spec_color(two_yr_diff_tbl$two_yr_diff[1:10],
end = 0.7)) %>%
kableExtra::column_spec(c(7, 9), border_left = T, border_right = T, bold = T)
# view tables
diff_tbl_styled
two_yr_diff_tbl_styled
When looking at the two-year-difference trends, the most notable
differences appear to be in Belize seaweeds
– which went
from a consistent two year streak of 0.00 to 0.98, indicating that
seaweeds
now make up 98% of Belize’s Natural Products.
Similarly, the ornamentals
weight in Belize dropped from 85
in 2022 to 0 in 2023, and remains at 0 in 2024. Tonga’s
ornamentals
category also saw a dramatic shift from two
years of weights around 98, to 0 in 2024. The other notable change when
considering the two year trends is in Tonga’s fish_oil
weight, which has increased from 0.016 in 2022 to 0.023 in 2023, to now
a staggering 0.71. These variations are likely due to a range of
impacts, including changes in consumer demand trends, fishery and
aquaculture status, and the fact that the reference point is a moving
5-year average, rather than a fixed point.
in v2024, we noticed some extreme differences in Belize (rgn_id 164) and Tonga (rgn_id 155). The code below was used to investigate these differences in upstream data.
library(readr)
library(dplyr)
library(tidyr)
library(here)
harvest_tonnes_usd_old <- read_csv(here(previous_np_dir, "int", "np_harvest_tonnes_usd.csv")) %>%
filter(rgn_id != 213)
View(np_tonnes %>% filter(rgn_id == 208, year %in% years))
# ---- Belize ----
# current
belize_current_harv <- harvest_tonnes_usd %>%
filter(rgn_id == 164,
year %in% years)
# previous
belize_prev_harv <- harvest_tonnes_usd_old %>%
filter(rgn_id == 164,
year %in% years)
# harvest_tonnes_usd_old # this object only contains FAO commodity data
# check fish_oil
#fofm_tonnes <- read_csv(here(current_np_dir, "int", "mean_catch_FOFM.csv")) %>%
# read in old data
fofm_tonnes_old_full <- read_csv(here(previous_np_dir, "int", "mean_catch_FOFM.csv"))
fofm_tonnes_current_full <- read_csv(here(current_np_dir, "int", "mean_catch_FOFM.csv"))
# prep old data how we prepped current data earlier
fofm_tonnes_old <- fofm_tonnes_old_full %>%
dplyr::filter(year %in% 2017:2019) %>% # v2024 range of 2001:2019 to 2018:2019
mutate(product = "fish_oil") %>%
group_by(rgn_id, year, product) %>%
# Note: multiply by 0.3 to account for water loss when converting fish to fish oil..
## about 30% of fish are water and 70% are fish oil
summarise(tonnes = sum(catch) * 0.3) %>%
ungroup() %>%
group_by(rgn_id, product) %>%
summarise(tonnes = mean(tonnes)) %>%
ungroup() %>%
full_join(rgns_eez) %>%
dplyr::select(rgn_id, product, tonnes) %>%
mutate(product = "fish_oil") %>%
mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes))
fofm_tonnes %>% filter(rgn_id == 164)
# 3.63 tonnes... fish_oil appears as 0 in the harvest_tonnes_usd because we use a different data source for it (SAUP)
fofm_tonnes_old %>% filter(rgn_id == 164)
#fofm_tonnes_current %>% filter(rgn_id == 164)
# 3.65
# check seaweeds
sw_tonnes %>% filter(rgn_id == 164)
# 7.56 tonnes
sw_tonnes_old_raw <- read_csv(here(previous_np_dir, "int", "np_seaweeds_tonnes_weighting.csv"))
sw_tonnes_old <- sw_tonnes_old_raw %>%
mutate(product = "seaweeds") %>%
group_by(rgn_id, year, product) %>% # per region, year, and product,
summarise(tonnes = sum(tonnes, na.rm = TRUE)) %>% # sum across all species of seaweed
dplyr::filter(year %in% years) %>% # filter to 5 year range
full_join(sw_fill_df, by = c("rgn_id", "year", "product")) %>%
mutate(tonnes = ifelse(is.na(tonnes), 0, tonnes)) %>% ## gapfill the NAs to be 0
dplyr::select(rgn_id, year, product, tonnes) %>%
ungroup() %>%
group_by(rgn_id, product) %>%
summarise(tonnes = mean(tonnes)) %>% ## calculate 5 year average
ungroup()
sw_tonnes_old %>% filter(rgn_id == 164)
# 0 tonnes!
# ornamentals -- still 0
# overall, these number make sense and can be explained by the introduction of seaweed farming in
# ---- Tonga ----
# current
tonga_current_harv <- harvest_tonnes_usd %>%
filter(rgn_id == 155,
year %in% years)
tonga_current_harv
# seaweeds and ornamentals are both 0 in 2022.
# previous
tonga_prev_harv <- harvest_tonnes_usd_old %>%
filter(rgn_id == 155,
year %in% years)
tonga_prev_harv
# note: the current harvest data shows very different values for
# seaweed and ornamental values from previous years' data for 2018-2021
# example:
# Tonga (rgn_id 155), year == 2021
# in v2023's data, seaweeds have a value of 100.58 tonnes and 147 usd,
# while in v2024's data, seaweeds have a value of 23.25 tonnes and 32.75 usd
plotly
version of old/standard base plots:
library(plotly)
test_comp <- check_older %>%
plotly::plot_ly(type = "scatter", mode = "markers") %>%
# add points
add_trace(
x = ~older_weight, y = ~weight,
text = ~paste("<b>Region ID</b>:", rgn_id), #, "<br><b>Product</b>:", product),
marker = list(
color = 'rgba(0,0,0,0.2)', # adds opacity as fourth argument
line = list(
color = 'rgba(0,0,0,1)',
width = 1.5)
),
hovertemplate = paste(
'%{text}',
# '<b>Product</b>: %{color}',
'<br><b>Old Weight</b>: %{x:.2f}',
'<br><b>New Weight</b>: %{y:.2f}<br>'
),
showlegend = FALSE,
hoverinfo = "none"
) %>%
# add AB line
add_trace(
x = c(0,1),
y = c(0,1),
mode = "lines",
line = list(color = "red"),
showlegend = FALSE,
hoverinfo = "none"
) %>%
layout(xaxis = list(hoverformat = '.3f'),
yaxis = list(hoverformat = '.3f')) #%>%
# style(hoverinfo = "none")
test_comp
# ======== Setup ==========================================================
# (mostly copied from earlier setup with addition of library(plotly))
# Read in packages necessary for visualization (in case you cleared your environment)
library(ggplot2)
library(plotly)
library(dplyr)
library(tidyr)
library(here)
# Set scenario year, reproducible file paths
scen_year_number <- 2024 # update this!!
scenario_year <- as.character(scen_year_number)
v_scen_year <- paste0("v", scenario_year)
prev_year_number <- scen_year_number - 1
prev_2_year_num <- scen_year_number - 2
v_prev_year <- paste0("v", prev_year_number)
v_prev_2_year <- paste0("v", prev_2_year_num)
current_np_dir <- here::here("globalprep", "np", v_scen_year)
previous_np_dir <- here::here("globalprep", "np", paste0("v", scen_year_number - 1))
# ----- Read in NP weights data -----
# Read in current, previous, and 2 years previous data
old_prod_weights <- read_csv(here(previous_np_dir, "output", "np_product_weights.csv"))
prod_weights <- read_csv(here(current_np_dir, "output", "np_product_weights.csv"))
check <- prod_weights %>%
rename("new_weight" = "weight") %>%
left_join(old_prod_weights, by = c("rgn_id", "product")) %>%
mutate(diff = new_weight - weight) %>%
left_join(rgns_eez)
#old_prod_weights <- read_csv(here(previous_np_dir, "output", "np_product_weights.csv"))
older_prod_weights <- read_csv(here("globalprep", "np",
paste0("v", scen_year_number - 2),
"output", "np_product_weights.csv"))
check_older <- older_prod_weights %>%
rename("older_weight" = "weight") %>%
left_join(old_prod_weights, by = c("rgn_id", "product")) %>%
mutate(diff = weight - older_weight) %>%
left_join(rgns_eez)
# ======== Plot new NP weights data vs. previous year's data ==================
# ---- Create ggplot object with `text` aes argument to prep for plotly ----- #
g_new <- ggplot(check, aes(x = weight, y = new_weight, color = product,
text = paste("<b>Region ID:</b>", rgn_id,
"<br><b>Product:</b>", product #,
#"<br>Old weight:", older_weight,
# "<br>New weight:", weight
))) +
geom_point(alpha = 0.8) +
geom_abline(intercept = 0, slope = 1, alpha = 0.7,
color = "red", linetype = "solid") +
scale_color_manual(values = c("fish_oil" = "#edae49",
"seaweeds" = "#386641",
"ornamentals" = "#83c5be")) +
labs(title = paste(v_scen_year, "vs", v_prev_year, "Natural Product Weights by Product"),
x = paste(v_prev_year, "Weights"),
y = paste(v_scen_year, "Weights"),
color = "Product") +
theme_minimal()
# ---- Convert to plotly ------------------------------ #
p_new <- ggplotly(g_new, tooltip = "text") %>%
layout(
hovermode = "closest",
xaxis = list(hoverformat = ".4f"), # customize number formatting
yaxis = list(hoverformat = ".4f") # (number of places after decimal)
)
# ------- Create custom hover template ---------------- #
p_new$x$data <- lapply(p_new$x$data, function(trace) {
if (trace$mode == "markers") {
trace$hovertemplate <- paste(
"%{text}<br>",
"<b>Old Weight</b>: %{x:.4f}<br>",
"<b>New Weight</b>: %{y:.4f}<br>"
)
}
return(trace)
})
# Display customized plot
p_new
# ========= Plot old vs. older NP weights data ================================
# Plot previous year's NP weights data vs. year before previous year's data
#color_scale <- c("fish_oils" = "#edae49", "seaweeds" = "#00798c", "ornamentals" = "#d1495b")
# ---- Create ggplot object with added `text` field to prep for plotly ------ #
g_old <- ggplot(check_older, aes(
x = older_weight, y = weight, color = product,
text = paste("<b>Region ID:</b>", rgn_id, "<br><b>Product:</b>", product #,
#"<br>Old weight:", older_weight, "<br>New weight:", weight
)
)) +
# add scatterplot points
geom_point(alpha = 0.8) +
# add ab line
geom_abline(intercept = 0, slope = 1, alpha = 0.7,
color = "red", linetype = "solid") +
# customize colors
scale_color_manual(values = c("fish_oil" = "#edae49",
"seaweeds" = "#386641",
"ornamentals" = "#83c5be")) +
# update labels
labs(title = paste(v_prev_year, "vs", v_prev_2_year, "Natural Products Weights by Product"),
x = paste(v_prev_2_year, "Weights"),
y = paste(v_prev_year, "Weights"),
color = "Product") +
# set base theme
theme_minimal()
# ---- Convert to plotly ------------------------------ #
p_old <- ggplotly(g_old, tooltip = "text") %>%
layout(
hovermode = "closest",
xaxis = list(hoverformat = ".4f"), # customize number formatting
yaxis = list(hoverformat = ".4f")
)
# ------- Custom hover template ----------------------- #
p_old$x$data <- lapply(p_old$x$data, function(trace) {
if (trace$mode == "markers") {
trace$hovertemplate <- paste(
"%{text}<br>",
"<b>Old Weight</b>: %{x:.4f}<br>",
"<b>New Weight</b>: %{y:.4f}<br>"
)
}
return(trace)
})
# Display customized plot!
p_old