This script prepares scores (status and trend) for Iconic Species in each global coastal region. For each iconic marine species, the countries of occurrence are pulled from the IUCN API. Extinction risk categories for each species are pulled based on current and past assessments; by tracking the assessed extinction risk over time, we can understand the trends of extinction risk for iconic species directly rather than using the “population trend” method from prior OHI assessments.
The Iconic Species sub-goal model calculates a region’s status based upon an unweighted average of species health for all ‘iconic’ species found within each reporting region.
From Halpern et al (2012):
Iconic species are those that are relevant to local cultural identity through a species’ relationship to one or more of the following: 1) traditional activities such as fishing, hunting or commerce; 2) local ethnic or religious practices; 3) existence value; and 4) locally-recognized aesthetic value (e.g., touristic attractions/common subjects for art such as whales). Habitat-forming species are not included in this definition of iconic species, nor are species that are harvested solely for economic or utilitarian purposes (even though they may be iconic to a sector or individual).
Ultimately, almost any species can be iconic to someone, and so the intent with this goal was to focus on those species widely seen as iconic within a country, and iconic from a cultural or existence value (rather than for a livelihoods or extractive reason).
The reference point is to have the risk status of all assessed species as Least Concern (i.e., a goal score = 1.0)
The Status of this sub-goal (XICO) is then the % of iconic species in each threat category (as defined by the IUCN Red List), such that:
\[X_{ICO} = \frac{\displaystyle\sum_{category}S_{cat}*w_{cat}}{\displaystyle\sum_{category}S_{cat}}\]
where for each IUCN threat category:
ICO trend is calculated in a similar manner, but weightings are assigned according to IUCN population trend: ‘Decreasing’ = -0.5, ‘Stable’ = 0.0, ‘Increasing’ = +0.5.
Additional year of IUCN Red List data added. Note that the Red List v3 API will be deprecated soon, in lieu of the v4 API.
Additional year of IUCN Red List data added.
The iconic species list was updated by adding species from Reyes-García et al. 2023.
List of iconic species: Original List created in initial 2012 OHI assessment. Additional marine species were added from Reyes-García et al. 2023 in v2023.
Species native country and status information:
Using the IUCN API, we accessed the full IUCN species list at http://apiv3.iucnredlist.org/api/v3/speciescount?token=
iucn_sid | kingdom | phylum | class | order | family | genus | sciname | population | category
Get all pages and bind into total species list, using the IUCN API.
## this is the api key that we need to use to access the files in the api
api_file <- here(dir_data, 'api_key_gc.csv')
api_key <- scan(api_file, what = 'character')
## file path to save species list that is scraped from the web.
spp_list_from_api_file <- here::here(dir_here, 'raw', 'spp_list_from_api.csv')
reload <- TRUE
## scrapes IUCN website to get data for species list and saves as int/spp_list_from_api.csv
if(!file.exists(spp_list_from_api_file) | reload) {
message('Using API to create full species list from scratch')
## determine the number of species on IUCN
## then determine the number of cores we will be using to get the data from the website.
spp_npage_url <- sprintf('http://apiv3.iucnredlist.org/api/v3/speciescount?token=%s', api_key)
n_spp <- jsonlite::fromJSON(spp_npage_url) %>%
.$count %>% as.integer()
n_pages <- ceiling(n_spp/10000)
## send to cores
spp_page_url <- 'http://apiv3.iucnredlist.org/api/v3/species/page/%s?token=%s'
## mc_get_from_api function is defined in the ico_fxn.R script
spp_df_all <- mc_get_from_api(spp_page_url, c(0:(n_pages - 1)), api_key)
spp_df_all <- spp_df_all %>%
dplyr::select(-infra_rank, -infra_name, -count, -page) %>%
dplyr::rename(iucn_sid = taxonid, sciname = scientific_name) %>%
setNames(names(.) %>% stringr::str_replace('_name', ''))
readr::write_csv(spp_df_all, spp_list_from_api_file)
} else {
message('reading file of API species list: \n ', spp_list_from_api_file)
spp_df_all <- readr::read_csv(spp_list_from_api_file)
}
# explore `spp_list_from_api.csv`
spp_df_all <- read_csv(here::here(dir_here, "raw","spp_list_from_api.csv"))
The list of Iconic Species is based upon the original ICO list generated in 2011, using species identified as globally iconic (WWF Flagship species and Priority species) or regionally iconic (based upon WWF regional/local priority species and nation-specific lists).
2023 update: The iconic species list was updated by adding species from Reyes-García et al. 2023. The list should be read directly from the raw data folder. This file will be copied from year to year unless further updates are made to the list.
# ico_list_raw_updated can be copied from previous year (v2023)
# (if there are any changes to the main list then this needs to be updated)
ico_list_raw <- read_csv(here("globalprep","ico", version_year, "raw","ico_list_raw_updated.csv"))
Filtering the complete IUCN species list to include only the
identified Iconic Species, we then use the IUCN API to access the list
of countries in which each species occurs, from http://apiv3.iucnredlist.org/api/v3/species/countries/id/
# read in the file we scraped that contains a species list and their status
spp_df_all <- here::here(dir_here, 'raw', 'spp_list_from_api.csv') %>%
readr::read_csv()
# Filtering the IUCN list of species to just those that are in the list of iconic species
spp_ico <- spp_df_all %>%
dplyr::filter(sciname %in% ico_list_raw$sciname)
# Is everything we see in the iconic list in the IUCN list?
spp_missing <- ico_list_raw %>%
dplyr::filter(!sciname %in% spp_ico$sciname) # v2024: no data missing
nrow(spp_missing) # v2024: 0
#add info from iucn redlist to iconic list
ico_list <- ico_list_raw %>%
dplyr::left_join(spp_ico %>%
dplyr::select(iucn_sid, sciname, subpop = population, cat = category),
by = 'sciname',relationship = "many-to-many") %>%
dplyr::filter(!is.na(iucn_sid))
#many to many relationship because there is more than one population of a single species and there is more than one of the same species in ico_list_raw, since it is often by individual region.
readr::write_csv(ico_list, here::here(dir_here, 'int', 'ico_list_prepped.csv')) # 270 in v2021, 275 in v2022, #5698 in 2023 (increase due to new species being region specific rather than global)
# v2024: 7612 observations, 127 unique species by sciname.
For each of these species, use the IUCN API to gather a list of countries in which it is present.
## saves each species as a temp file in a new tmp folder then binds them together to form the full species list
ico_list <- here::here(dir_here, 'int', 'ico_list_prepped.csv') %>%
readr::read_csv()
ico_country_url <- 'http://apiv3.iucnredlist.org/api/v3/species/countries/id/%s?token=%s'
spp_ids <- unique(ico_list$iucn_sid)
n_chunks <- length(spp_ids)
cat_msg <- function(x, ...) {
if(is.null(knitr:::.knitEnv$input.dir)) {
### not in knitr environment, so use cat()
cat(x, ..., '\n')
} else {
### in knitr env, so use message()
message(x, ...)
}
return(invisible(NULL))
}
## for each species ID, get country list
for(j in 1:n_chunks){
chunk_file <- here::here(
dir_here, 'tmp', sprintf('spp_countries_chunk_%s.csv', spp_ids[j])
)
if(!file.exists(chunk_file)) {
cat_msg('Getting country info for species ', min(spp_ids[j]), ' row number ', j, ' out of ', n_chunks)
#spp_ids_chunk <- ico_list$iucn_sid[spp_ids[j]]
ico_spp_countries_chunk <- mc_get_from_api(url = ico_country_url, param_vec = spp_ids[j], api_key = api_key, delay = 0.5)
cat_msg('... found ', nrow(ico_spp_countries_chunk), ' countries for these species')
readr::write_csv(ico_spp_countries_chunk, chunk_file)
} else {cat_msg('Chunk file ', chunk_file, ' already exists; skipping these spp')}
}
spp_countries_chunk_files <- list.files(
here::here(dir_here, 'tmp'), pattern = 'spp_countries_chunk', full.names = TRUE)
fun_fun <- function(fun) {readr::read_csv(fun, col_types = cols()) %>% dplyr::mutate(code = as.character(code))}
### V2022 was trouble shooting and it turned out there was an NA file in the tmp folder. Deleted and worked
# same thing happened in v2023, sort by file size to find empty file
# no issue in v2024
#read in all of these files and combine them
spp_countries_df <- lapply(spp_countries_chunk_files, FUN = fun_fun) %>%
dplyr::bind_rows()
### save this as an intermediate data file since it takes so long to create the df
readr::write_csv(spp_countries_df, here::here(dir_here, 'int', 'ico_spp_countries.csv'))
Name to region function (in OHI core package) reports regions that don’t have a match in OHI region list. Here we report certain reported regions at a higher spatial scale, based on the listed regions in the error message.
# bring back in the reported regions associated with the iucn ids
ico_spp_countries <- here::here(dir_here, 'int', 'ico_spp_countries.csv') %>%
readr::read_csv()
# Report these regions at higher spatial resolution:
ico_spp_countries_final <- ico_spp_countries %>%
mutate(split_id = case_when(country ==
"Bonaire, Sint Eustatius and Saba" ~
"Bonaire, Saba, Sint Eustatius",
country == "French Southern Territories" ~
"Glorioso Islands, Juan de Nova Island, Bassas da India, Ile Europa, Ile Tromelin, Crozet Islands, Amsterdam Island and Saint Paul Island, Kerguelen Islands",
country == "United States Minor Outlying Islands" ~ "Wake Island, Jarvis Island, Palmyra Atoll, Howland Island and Baker Island, Johnston Atoll",
country == ("Saint Helena, Ascension and Tristan da Cunha") ~ "Saint Helena, Ascension, Tristan da Cunha")) %>%
separate_longer_delim(cols = split_id, delim = ", ") %>%
mutate(country= case_when(is.na(split_id) ~ country,
TRUE ~ split_id)) %>%
select(-split_id)
# change this back to align with ico_list for next step
ico_spp_countries_final <- ico_spp_countries_final %>%
dplyr::rename(iucn_sid = name)
# run name_2_rgn to match with OHI regions
ico_spp_rgn_raw <- name_2_rgn(
df_in = ico_spp_countries_final,
fld_name='country',
flds_unique = 'iucn_sid'
)
# v2023 - No match for Isle of Man (not reported in OHI), Saint Barthélemy (not reported in OHI), disputed territory, can't be matched to a country
# v2024 - ALSO no match for Isle of Man (not reported in OHI), Saint Barthélemy (not reported in OHI), disputed territory, can't be matched to a country
# Check that the country_split_data captures all the data from the longer country strings that were removed
# ---- Bonaire, Sint Eustatius and Saba ----
check_full <- ico_spp_countries %>%
dplyr::filter(country == "Bonaire, Sint Eustatius and Saba") # v2024 and v2023: 34 rows
check_split <- ico_spp_rgn_raw %>%
dplyr::filter(rgn_name == "Bonaire" | rgn_name == "Saba" | rgn_name == "Sint Eustatius") # v2024 and v2023: 102 rows
# 102 entries makes sense each of the 34 rows is listed three times, one for each country
# ---- French Southern Territories ----
check_full <- ico_spp_countries %>%
dplyr::filter(country == "French Southern Territories") #v2024: 25
check_split <- ico_spp_rgn_raw %>%
dplyr::filter(rgn_name %in% c("Glorioso Islands", "Juan de Nova Island", "Bassas da India", "Ile Europa", "Ile Tromelin", "Crozet Islands", "Amsterdam Island and Saint Paul Island", "Kerguelen Islands")) # v2024: 200
# 25(rows) * 8(regions) = 200
# ---- United States Minor Outlying Islands ----
check_full <- ico_spp_countries %>%
dplyr::filter(country == "United States Minor Outlying Islands") # v2024: 18
check_split <- ico_spp_rgn_raw %>%
dplyr::filter(rgn_name %in% c("Wake Island", "Jarvis Island", "Palmyra Atoll", "Howland Island and Baker Island", "Johnston Atoll")) # v2024: 90
# 18 rows * 5 regions = 90
# ---- Saint Helena, Ascension and Tristan da Cunha ----
check_full <- ico_spp_countries %>%
dplyr::filter(country == "Saint Helena, Ascension and Tristan da Cunha") #v2024: 33 rows
check_split <- ico_spp_rgn_raw %>%
dplyr::filter(rgn_name == "Saint Helena" | rgn_name == "Ascension" | rgn_name == "Tristan da Cunha") # v2024: 99 rows
# Check error message to make sure the unmatched countries don't have an OHI region match, that there are no duplicates, and that the unmatched regions make sense (not disputed, landlocked, not reported). If there are duplicates or unmatched regions that are reported in OHI, need to go back to fix and rerun the code.
# v2024 - returned the "duplicates found" message
# check for duplicates
duplicates <- ico_spp_rgn_raw %>% group_by(rgn_id, iucn_sid) %>% mutate(n = n()) %>% filter(n >1)
#remove duplicates that are not related to presence
#duplicates that have different presence status will get averaged later on
ico_spp_rgn_raw <- ico_spp_rgn_raw %>% select(-c(distribution_code, origin, code, country)) %>%
distinct()
############
summary(ico_spp_rgn_raw) # Make sure there are no NAs!
readr::write_csv(ico_spp_rgn_raw, here::here(dir_here, 'raw', 'ico_spp_rgn_raw.csv'))
Filter for globally iconic species
ico_spp_rgn_raw <- here::here(dir_here, 'raw', 'ico_spp_rgn_raw.csv') %>%
readr::read_csv()
ico_list <- here::here(dir_here, 'int', 'ico_list_prepped.csv') %>%
readr::read_csv()
ico_spp_rgn_prepped <- ico_spp_rgn_raw %>%
dplyr::left_join(ico_list, by = 'iucn_sid')
# check dimensions here; expect it to expand because each species in the iconic global list will be linked to the country it's listed in in the IUCN data
## filter this for species who are global (so all instances are iconic)
## *OR* ico_rgn_id matches rgn_id (so locally iconic matches with location)
ico_spp_rgn_prepped <- ico_spp_rgn_prepped %>%
dplyr::filter(ico_gl == TRUE | ico_rgn_id == rgn_id) %>%
dplyr::as_tibble()
# dimensions should shrink after running this line
# Adding as.tibble() simplified the structure, prevents ico_rgn_id from being coerced to logical structue
# v2024: 50,734 obs --> 3857 obs
readr::write_csv(ico_spp_rgn_prepped, here::here(dir_here, 'int', 'ico_spp_rgn_prepped.csv'))
ico_spp_rgn_prepped <- here::here(goal, version_year, 'int', 'ico_spp_rgn_prepped.csv') %>%
readr::read_csv(col_types = "ddcdcccldcc")
# Explicitly designating col_types retains all data !
ico_spp_rgn_prepped <- ico_spp_rgn_prepped %>%
dplyr::select(iucn_sid, sciname, comname, presence,
cat, ico_gl, rgn_id, rgn_name, subpop, ico_rgn_id)
DT::datatable(ico_spp_rgn_prepped, width = 400)
We accessed the IUCN API to determine past IUCN assessments for each
of the identified iconic species: http://apiv3.iucnredlist.org/api/v3/species/history/id/
Each assessment includes a year and an extinction risk, along with additional information on the assessment.
# get ready to access the extinction risk
ico_past_assess_url <- 'http://apiv3.iucnredlist.org/api/v3/species/history/id/%s?token=%s'
ico_list <- here::here(dir_here, 'int', 'ico_list_prepped.csv') %>%
readr::read_csv()
spp_ids <- unique(ico_list$iucn_sid)
n_chunks <- length(spp_ids)
cat_msg <- function(x, ...) {
if(is.null(knitr:::.knitEnv$input.dir)) {
### not in knitr environment, so use cat()
cat(x, ..., '\n')
} else {
### in knitr env, so use message()
message(x, ...)
}
return(invisible(NULL))
}
## for each species ID, get past assessments
for(j in 1:n_chunks){
# j = 2
chunk_file <- here::here(
dir_here, 'tmp', sprintf('spp_past_assess_chunk_%s.csv', spp_ids[j])
)
if(!file.exists(chunk_file)) {
cat_msg('Getting assessment info for species ', min(spp_ids[j]), ' row number ', j, ' out of ', n_chunks)
ico_spp_past_assess_chunk <- mc_get_from_api(
url = ico_past_assess_url, param_vec = spp_ids[j], api_key = api_key, delay = 0.5)
cat_msg('... found ', nrow(ico_spp_past_assess_chunk), ' past assessments for this species')
readr::write_csv(ico_spp_past_assess_chunk, chunk_file)
}else {
cat_msg('Chunk file ', chunk_file, ' already exists; skipping these spp')
}
}
# Read in all the files and bind together to make the ico_assess_raw file
spp_past_assess_chunk_files <- list.files(
here::here(dir_here, 'tmp'),
pattern = 'spp_past_assess_chunk',
full.names = TRUE
)
ico_assess_raw <- lapply(spp_past_assess_chunk_files, FUN = function(x) {
read.csv(x) %>%
dplyr::mutate(code = as.character(code))}) %>%
dplyr::bind_rows()
###############
# prep the dataframe and join by the iucn id and scientific name
ico_assess_raw <- ico_assess_raw %>%
dplyr::rename(iucn_sid = name) %>%
dplyr::mutate(iucn_sid = as.integer(iucn_sid),
year = as.integer(year)) %>% #add in scientific name
dplyr::left_join(ico_list %>%
dplyr::select(iucn_sid, sciname) %>%
dplyr::distinct(),
)
# 705 in v2018, 743 in v2019, 463 in v2020, 480 in v2021, 496 in v2022, #783 in v2023, #828 in v2024
##### v2020 error check - same logic goes for 2021...
# there's a difference of 280 past assessments which seems like a lot, I feel like past assessments should only be increasing unless for some reason they are no longer accessible on the website
# group by iucn_sid and compare the number of assessments for each species
current_grouped <- ico_assess_raw %>%
dplyr::group_by(iucn_sid) %>%
dplyr::count() %>%
dplyr::rename(!!version_year := "n")
previous_grouped <- here::here(goal, previous_scen, 'raw', 'ico_assessments_raw.csv') %>%
readr::read_csv() %>%
dplyr::group_by(iucn_sid) %>%
dplyr::count() %>%
dplyr::rename(!!previous_scen := "n")
compare_totals <- left_join(current_grouped, previous_grouped, by="iucn_sid") %>%
dplyr::filter(!!sym(previous_scen) != !!sym(version_year)) %>% # v2024 update: use `sym()` to have it recognize the columns correctly and filter for any rows in which v2023 does not equal v2024
dplyr::mutate(diff = !!sym(version_year) - !!sym(previous_scen)) # to see whether the assessment was added or dropped
# new dataframe that contains only the rows from the joined dataset where there are missing values in either this year's assessment OR last year's assessment
compare_na <- left_join(current_grouped, previous_grouped, by="iucn_sid") %>%
dplyr::filter(is.na(!!sym(version_year))|is.na(!!sym(previous_scen))) # v2024: 42 obs
# v2024: 7 species have one more assessment now than they did in the previous version, 4 species have one less, 42 were not assessed last year
previous <- here::here(goal, previous_scen, "raw", "ico_assessments_raw.csv") %>%
readr::read_csv() %>%
dplyr::filter(iucn_sid ==
84131194) %>%
dplyr::group_by(year) %>%
dplyr::count()
current <- ico_assess_raw %>%
dplyr::filter(iucn_sid ==
84131194) %>%
dplyr::group_by(year) %>%
dplyr::count()
# v2022 okay so the years of assessment are all the same (1996-2000-2004-2008-2015) but there were 9 assessments in every year for the 2019 data and only one assessment per year in the 2020 data, when I look at the tmp csv file it just shows one assessment for every year. After looking at specific species on the IUCN it doesn't look like there should be more than one assessment for any given year. Plus in the 'clean_up_category_list' chunk the code uses distinct() which should reduce the number. Will compare the 2020 to 2019 after this step and make sure it makes sense.
#v2023 it looks like the more recent assessment was removed in the data update for the two species where the number dropped
# v2024: no issue, was fixed between v2022 - v2023
######
readr::write_csv(ico_assess_raw, here(dir_here, 'raw', 'ico_assessments_raw.csv'))
These raw assessments are cleaned up to standardize codes and categories, and category scores are assigned based on a scale from Least Concern = 1 to Extinct = 0, as per the Species OHI subgoal. Note that past assessments used different coding, especially prior to 1994; we reclassified older codes according to this chart:
if (!require(librarian)){install.packages("librarian")}
if (!require(ohicore)){devtools::install_github('ohi-science/ohicore@dev')}
librarian::shelf(
tidyverse,
here,
janitor,
jsonlite,
plotly,
parallel,
devtools,
ohicore,
kableExtra,
magrittr
) # repasted for knitting
# use `kableExtra` to create a table that depicts the same information
classification_df <- data.frame(
Original = c(
"Lower risk/Near threatened (LR/NT)",
"Threatened (T)",
"Vulnerable (V)",
"Endangered (E)",
"Lower risk/Conservation dependent (LR/CD)",
"Very rare and believed to be decreasing in numbers",
"Less rare but believed to be threatened-Requires watching",
"Insufficiently known (K)",
"Indeterminate (I)",
"Status inadequately known-Survey required or data sought",
"Not recognized (NR)"
),
New = c("NT", "T", "VU", "EN", "LR/CD", "CR", "T", "DD", "DD", "DD", "NE")
)
classification_table <- classification_df %>%
kbl(col.names = c("Original Category/Description", "New Category")) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
classification_table
Original Category/Description | New Category |
---|---|
Lower risk/Near threatened (LR/NT) | NT |
Threatened (T) | T |
Vulnerable (V) | VU |
Endangered (E) | EN |
Lower risk/Conservation dependent (LR/CD) | LR/CD |
Very rare and believed to be decreasing in numbers | CR |
Less rare but believed to be threatened-Requires watching | T |
Insufficiently known (K) | DD |
Indeterminate (I) | DD |
Status inadequately known-Survey required or data sought | DD |
Not recognized (NR) | NE |
iucn_sid | year | code | category | sciname
ico_assess_raw <- here::here(dir_here, 'raw', 'ico_assessments_raw.csv') %>%
readr::read_csv()
ico_assess <- ico_assess_raw %>%
dplyr::rename(cat = code, cat_txt = category) %>%
dplyr::mutate(
cat = toupper(cat),
cat = str_replace(cat, 'LR/', ''),
cat = case_when(
cat %in% c('K', 'I') ~ 'DD',
cat == 'NR' ~ 'NE',
cat == 'V' ~ 'VU',
cat == 'E' ~ 'EN',
str_detect(toupper(cat_txt), 'VERY RARE') ~ 'CR',
str_detect(toupper(cat_txt), 'LESS RARE') ~ 'T',
str_detect(toupper(cat_txt), 'STATUS INADEQUATELY KNOWN') ~ 'DD',
T ~ cat))
pop_cat <- tibble::tribble(
~cat, ~cat_score,
"LC", 0.0,
"NT", 0.2,
"VU", 0.4,
"EN", 0.6,
"CR", 0.8,
"EX", 1.0,
"T" , 0.6,
"CD", 0.3,
"NE", NA,
"DD", NA
)
ico_assess <- ico_assess %>%
dplyr::left_join(pop_cat, by = 'cat') %>%
dplyr::filter(!is.na(cat_score)) %>% # to remove DD or NE (Data Deficient or Not Recognized)
dplyr::distinct() %>%
dplyr::arrange(iucn_sid, year)
# v2024: 828, there were new species added this year too
#v2023 595-makes sense since there were new species added
# v2021 358 rows in the list - compare to v2020 to double check
# v2020 had 341 rows, makes sense that v2021 is more since there are some newer years of assessments in the updated data
# v2022 358 rows still.....
previous_assess_clean <- here::here(goal, previous_scen, 'int', 'ico_assess_clean.csv') %>%
readr::read_csv()
readr::write_csv(ico_assess, here::here(dir_here, 'int', 'ico_assess_clean.csv'))
Using tidyr::complete()
and tidyr::fill()
,
we create a full time series for all species from the earliest
assessment to the most recent year.
ico_assess <- here::here(dir_here, 'int', 'ico_assess_clean.csv') %>% readr::read_csv()
ico_list <- here::here(dir_here, 'int', 'ico_list_prepped.csv') %>% readr::read_csv()
# Fill in category score for missing years based on previous year's data:
ico_assess_full <- ico_assess %>%
dplyr::mutate(eval_yr = year) %>%
dplyr::select(-sciname) %>%
dplyr::arrange(iucn_sid, year) %>%
tidyr::complete(year = full_seq(year, 1), nesting(iucn_sid)) %>%
dplyr::group_by(iucn_sid) %>%
tidyr::fill(cat, cat_txt, cat_score, eval_yr) %>% ## fills all the way to latest year
dplyr::ungroup()
ico_spp_cat <- ico_list %>%
dplyr::rename(cat_2022 = cat) %>%
dplyr::left_join(ico_assess_full, by = c('iucn_sid'))
## if no time series available, time series years will be NA. Assign a list to those NAs, then unnest it to create observations for those years.
ico_spp_cat <- ico_spp_cat %>%
dplyr::mutate(
year = ifelse(
is.na(year),
list(c(min(year, na.rm = TRUE):max(year, na.rm = TRUE))),
year)) %>%
tidyr::unnest(year)
# ico_rgn_id is classified as numeric
## NAs will be filled backward in time by starting from the most recent non-NA.
## To do this, we'll swap any current-year NAs with the cat_score (meaning no
## time series fill), and fill upwards instead of downwards.
ico_spp_cat <- ico_spp_cat %>%
dplyr::left_join(pop_cat %>%
dplyr::rename(cat_2022 = cat, cat_2022_score = cat_score),
by = 'cat_2022') %>%
dplyr::mutate(cat_score = ifelse(year == max(year, na.rm = TRUE) & is.na(cat),
cat_2022_score,
cat_score)) %>%
dplyr::arrange(iucn_sid, year) %>%
dplyr::group_by(iucn_sid) %>%
tidyr::fill(cat, cat_score, cat_txt, .direction = 'up') %>%
dplyr::ungroup() %>%
dplyr::distinct() #added in v2023
summary(ico_spp_cat)
# v2024: 85727-355351-88265 NAs for cat_score, eval_yr, cat_2022_score
#v2023 72906, 264791, 78242 NAs
# v2020 still has 4715-10118-1540 NAs for cat_score, eval_yr, cat_2016_score
# v2021 still has 448-10181-1120 NAs for cat_score, eval_yr, cat_2016_score; compare against v2020
# v2022 still has 348-10530-1044 NAs for cat_score, eval_yr, cat_2016_score
readr::write_csv(ico_spp_cat, here::here(dir_here, 'int', 'ico_spp_cat.csv'))
Using dplyr::full_join()
we combine the
ico_spp_rgn
dataframe (iconic species by OHI region) with
the ico_spp_cat
dataframe (iconic species by category and
year, with species info, year, and category info).
ico_cat_ts_abbr <- here::here(dir_here, 'int', 'ico_spp_cat.csv') %>%
readr::read_csv() %>%
dplyr::select(iucn_sid, sciname, year, cat, cat_score, eval_yr) %>%
dplyr::filter(year >= 1992) %>% # This goal uses 20 years of data to calculate trends (assessment started in 2012)
distinct() #remove duplicates-added v2023
ico_spp_rgn <- here::here(dir_here, 'int', 'ico_spp_rgn_prepped.csv') %>%
readr::read_csv() %>%
dplyr::select(rgn_id, rgn_name, iucn_sid, comname, sciname, ico_gl, ico_rgn_id, presence)
ico_spp_rgn_cat <- ico_cat_ts_abbr %>%
dplyr::full_join(ico_spp_rgn, by = c('iucn_sid', 'sciname'))
### No information on when species go extinct locally, so just set all years to extinct for that region
ico_spp_rgn_cat <- ico_spp_rgn_cat %>%
dplyr::mutate(cat = ifelse(str_detect(presence, '^Extinct'), 'EX', cat),
cat_score = ifelse(cat == 'EX', 1, cat_score)) %>%
dplyr::filter(ico_gl | ico_rgn_id == rgn_id) %>% # keep (all globally iconic) and (regionally iconic in region only)
dplyr::distinct()
readr::write_csv(ico_spp_rgn_cat, here::here(dir_here, 'int', 'ico_spp_rgn_cat.csv'))
# csv retaining data in ico_rgn_id column, but not when it is re-imported
The toolbox wants rgn_id
, species sciname
,
and extinction risk category
for the basic calculations.
Since some regions contain multiple subpops (or parent/subpop) we also
include iucn_sid
to differentiate. This information is
included for each year
, filtered back to the year 2000.
While the official calculations are performed in the toolbox, we perform the same basic calcs here to get a sense of the ICO status and trend ahead of time. Report and summarize estimate of regional iconic species status:
ico_spp_rgn_cat <- here::here(dir_here, 'int', 'ico_spp_rgn_cat.csv') %>%
readr::read_csv(col_types = "dcdcdddccldc")
## Report out for toolbox format (rgn_id | sciname | category or popn_trend for each species within a region).
## Note: in toolbox, group_by(rgn_id, sciname) and then summarize(category = mean(category)) to
## average any parent/subpop species listings before aggregating to overall average per region.
ico_status_raw <- ico_spp_rgn_cat %>%
dplyr::select(rgn_id, rgn_name, sciname, iucn_sid, cat, cat_score, year, eval_yr) %>%
dplyr::arrange(rgn_id, desc(year), sciname) %>%
dplyr::ungroup()
# Get a preview of status and trend:
ico_status_calc <- ico_status_raw %>%
dplyr::group_by(rgn_id, rgn_name, sciname, year) %>%
dplyr::filter(!is.na(cat_score)) %>% # remove any DDs (data deficient)/ NE (not recognized)
dplyr::summarize(cat_score = mean(cat_score)) %>%
dplyr::group_by(rgn_id, rgn_name, year) %>%
dplyr::summarize(mean_cat = round(mean(cat_score), 5),
ico_status = (1 - mean_cat) * 100,
n_spp = n()) %>% # n_spp represents the number of species for each unique combination of region ID, region name, and year
dplyr::ungroup()
ico_trend <- data.frame()
for (i in 1993:max(ico_status_calc$year, na.rm = TRUE)) { # i <- 2013
tmp_status <- ico_status_calc %>%
dplyr::filter(year <= i & year > (i - 10)) # trend based on 10-year average since assessments are sporadic
tmp_trend <- tmp_status %>%
dplyr::group_by(rgn_id) %>%
dplyr::do(trend_lm = lm(ico_status ~ year, data = .)$coefficients[2]) %>%
dplyr::mutate(year = i,
trend_lm = as.numeric(trend_lm)/100, # status is 0 - 100; trend should be +1 to -1
ico_trend = round(trend_lm * 5, 5)) %>% # trend prediction five years out
dplyr::ungroup()
ico_trend <- ico_trend %>%
dplyr::bind_rows(tmp_trend) # v2024: 6541 obs
}
ico_sum <- ico_status_raw %>%
dplyr::left_join(ico_status_calc, by = c('rgn_id', 'rgn_name', 'year')) %>%
dplyr::left_join(ico_trend, by = c('rgn_id', 'year'))
readr::write_csv(ico_sum, here::here(dir_here, 'summary', 'ico_summary.csv'))
### Report out for finalized status and trend values per region
ico_status_raw1 <- ico_status_raw %>%
dplyr::select(rgn_id, sciname, iucn_sid, year, eval_yr, category = cat)
# Create the files the toolbox will use:
readr::write_csv(ico_status_raw1, here::here(dir_here, 'output', 'ico_spp_iucn_status.csv'))
readr::write_csv(ico_status_calc, here::here(dir_here, 'output', 'ico_status_calc.csv'))
readr::write_csv(ico_trend, here::here(dir_here, 'output', 'ico_trend.csv'))
ico_status_raw1[duplicated(ico_status_raw1 ), ]
### NOTE: if iucn_sid were removed, this would show duplicates due to subpops with same category.
table(ico_status_raw1$category)
# v2024: sad changes :(
# CD CR DD EN EX LC NT VU
# 536 9513 9291 24544 800 25109 22677 30955
ico_status_raw1_prev <- read_csv(here::here("globalprep","ico", previous_scen, "output","ico_spp_iucn_status.csv"))
table(ico_status_raw1_prev$category)
# v2023:
# CD CR EN EX LC NT VU
# 614 8779 22554 589 26405 22516 33274
To examine results of the new methods (including API-based data scraping and trend based on time series), we plot the estimated status and trend over time.
status_ts_plot <- ggplot(
ico_sum %>% filter(!is.na(rgn_id)),
aes(x = year, y = ico_status, color = rgn_id, group = rgn_name)) +
geom_line(size = 1.2, alpha = .4) +
labs(x = 'year',
y = 'ICO status',
title = 'ICO status over time',
color = 'Region')
plotly::ggplotly(status_ts_plot)
# v2024
# interactive plotly of iconic species status over time by region
ico_status_plotly <- ico_sum %>%
plot_ly(x = ~year, y = ~ico_status, color = ~rgn_name,
type = "scatter", mode = "lines") %>%
layout(title = "ICO status over time",
xaxis = list(title = "Year"),
yaxis = list(title = "ICO status"))
ico_status_plotly
trend_ts_plot <- ggplot(
ico_sum %>% filter(!is.na(rgn_id) &!is.na(ico_trend)),
aes(x = year, y = ico_trend, color = rgn_id, group = rgn_name)) +
geom_line(size = 1.2, alpha = .4) +
labs(x = 'year',
y = 'ICO trend',
title = 'ICO trend over time',
color = 'Region')
plotly::ggplotly(trend_ts_plot)
# v2024
# interactive plot of status trend over time by region
trend_ts_plotly <- ico_sum %>%
plot_ly(x = ~year, y = ~ico_trend, color = ~rgn_name,
type = "scatter", mode = "lines") %>%
layout(title = "ICO trend over time",
xaxis = list(title = "Year"),
yaxis = list(title = "ICO trend"))
trend_ts_plotly
Plot the estimated status and trends from assessments v2023 and v2024 to compare.
v2023 code:
ico_sum <- here::here(dir_here, 'summary', 'ico_summary.csv') %>%
readr::read_csv()
status_trend <- ico_sum %>% # status and trend in this year's assessment
dplyr::filter(year == max(year)) %>%
dplyr::select(rgn_id, ico_status, ico_trend, n_spp) %>%
dplyr::distinct() # because removing iucn_sid, cat, cat_score, and species name, results in duplicates
ico_compare <- here::here(goal, previous_scen, 'summary', 'ico_summary.csv') %>%
readr::read_csv() %>%
dplyr::filter(year == max(year)) %>%
dplyr::select(rgn_id, rgn_name, ico_status, ico_trend, n_spp) %>%
dplyr::distinct() %>%
dplyr::rename(previous_status = ico_status, previous_trend = ico_trend, previous_nspp = n_spp) %>%
dplyr::left_join(status_trend, by = 'rgn_id') %>%
dplyr::mutate(d_nspp = as.factor(n_spp - previous_nspp)) %>% # represents the difference in the number of species between the current year's assessment and the previous assessment for each region
mutate(status_diff = ico_status - previous_status)
# four regions have NAs
compare_status_plot <- ggplot(ico_compare) +
geom_point(aes(previous_status, ico_status, color = d_nspp, label = rgn_id, text = rgn_name), alpha = 0.6) +
geom_abline(slope = 1, intercept = 0, color = 'red') +
labs(x = sprintf('ICO status %s', previous_scen),
y = sprintf('ICO status %s', version_year),
title = 'ICO Status Comparison') +
theme_minimal()
plotly::ggplotly(compare_status_plot)
trend_compare_plot <- ggplot(ico_compare) +
geom_point(
aes(x = previous_trend, y = ico_trend,
color = d_nspp, label = rgn_id, text = rgn_name),
alpha = .6) +
geom_abline(slope = 1, intercept = 0, color = 'red') +
labs(x = sprintf('ICO trend %s', previous_scen),
y = sprintf('ICO trend %s', version_year),
title = 'ICO Trend Comparison') +
theme_minimal()
plotly::ggplotly(trend_compare_plot)
v2021 No need to run this section of the code; for data exploration purposes only.
Let’s check rgn id 215