This script prepares scores (status and trend) for Iconic Species in each global coastal region. For each iconic marine species, the countries of occurrence are pulled from the IUCN API. Extinction risk categories for each species are pulled based on current and past assessments; by tracking the assessed extinction risk over time, we can understand the trends of extinction risk for iconic species directly rather than using the “population trend” method from prior OHI assessments.
The Iconic Species sub-goal model calculates a region’s status based upon an unweighted average of species health for all ‘iconic’ species found within each reporting region.
From Halpern et al (2012):
Iconic species are those that are relevant to local cultural identity through a species’ relationship to one or more of the following: 1) traditional activities such as fishing, hunting or commerce; 2) local ethnic or religious practices; 3) existence value; and 4) locally-recognized aesthetic value (e.g., touristic attractions/common subjects for art such as whales). Habitat-forming species are not included in this definition of iconic species, nor are species that are harvested solely for economic or utilitarian purposes (even though they may be iconic to a sector or individual). …
Ultimately, almost any species can be iconic to someone, and so the intent with this goal was to focus on those species widely seen as iconic within a country, and iconic from a cultural or existence value (rather than for a livelihoods or extractive reason). …
The reference point is to have the risk status of all assessed species as Least Concern (i.e., a goal score = 1.0)
The Status of this sub-goal (XICO) is then the % of iconic species in each threat category (as defined by the IUCN Red List), such that:
\[X_{ICO} = \frac{\displaystyle\sum_{category}S_{cat}*w_{cat}}{\displaystyle\sum_{category}S_{cat}}\]
where for each IUCN threat category:
ICO trend is calculated in a similar manner, but weightings are assigned according to IUCN population trend: ‘Decreasing’ = -0.5, ‘Stable’ = 0.0, ‘Increasing’ = +0.5.
Additional year of IUCN Red List data added.
List of iconic species:
Species native country information:
Using the IUCN API, we accessed the full IUCN species list at http://apiv3.iucnredlist.org/api/v3/speciescount?token=
iucn_sid | kingdom | phylum | class | order | family | genus | sciname | population | category
Get all pages and bind into total species list, using the IUCN API.
<- file.path(dir_server, 'api_key_gc.csv')
api_file <- scan(api_file, what = 'character')
api_key
# file path to save species list that is scraped from the web.
<- file.path(dir_github, 'raw/spp_list_from_api.csv')
spp_list_from_api_file <- TRUE
reload
# scrapes IUCN website to get data for species list and saves as int/spp_list_from_api.csv
if(!file.exists(spp_list_from_api_file) | reload) {
message('Using API to create full species list from scratch')
# determine the number of species on IUCN and then determine the number of cores we will
# be using to get the data from the website.
<- sprintf('http://apiv3.iucnredlist.org/api/v3/speciescount?token=%s', api_key)
spp_npage_url <- fromJSON(spp_npage_url) %>%
n_spp $count %>% as.integer()
.<- ceiling(n_spp/10000)
n_pages
# send to cores
<- 'http://apiv3.iucnredlist.org/api/v3/species/page/%s?token=%s'
spp_page_url <- mc_get_from_api(spp_page_url, c(0:(n_pages - 1)), api_key) # mc_get_from_api function is defined in the ico_fxn.R script
spp_df_all
<- spp_df_all %>%
spp_df_all ::select(-infra_rank, -infra_name, -count, -page) %>%
dplyrrename(iucn_sid = taxonid, sciname = scientific_name) %>%
setNames(names(.) %>%
str_replace('_name', ''))
write_csv(spp_df_all, spp_list_from_api_file)
else {
} message('reading file of API species list: \n ', spp_list_from_api_file)
<- read_csv(spp_list_from_api_file)
spp_df_all }
The list of Iconic Species is based upon the original ICO list generated in 2011, using species identified as globally iconic (WWF Flagship species and Priority species) or regionally iconic (based upon WWF regional/local priority species and nation-specific lists).
# ico master species list from Mazu: 'git-annex/globalprep/ico/ico/ico_global_list2016.csv'
# (if there are any changes to the master list then this needs to be udated)
# Need to have ohi-global repo cloned to run the following line (region ids are from ohi-global):
<- get_ico_list(reload = TRUE) # get_ico_list is defined in ico_fxn.R
ico_list_raw
## head(ico_list_raw)
## head of ico_list_raw for v2021:
# A tibble: 6 x 4
# comname sciname ico_gl ico_rgn_id
# <chr> <chr> <lgl> <dbl>
# 1 Blue Shark Prionace glauca TRUE NA
# 2 Whale Shark Rhincodon typus TRUE NA
# 3 Shortfin Mako Isurus oxyrinchus TRUE NA
# 4 Olive Ridley Turtle Lepidochelys olivacea TRUE NA
# 5 Irrawaddy Dolphin Orcaella brevirostris TRUE NA
# 6 Humphead wrasse, Napoleon Wrasse Cheilinus undulatus TRUE NA
## ico_rgn_id: rgn_id in which species is iconic by regional/national lists;
## if globally iconic, ico_rgn_id <- NA
write_csv(ico_list_raw, file.path(dir_github, 'raw/ico_list_raw.csv'))
::datatable(ico_list_raw, caption = 'Iconic Species List') DT
Filtering the complete IUCN species list to include only the identified Iconic Species, we then use the IUCN API to access the list of countries in which each species occurs, from http://apiv3.iucnredlist.org/api/v3/species/countries/id/
<- read_csv(file.path(dir_github, 'raw/spp_list_from_api.csv'))
spp_df_all <- read_csv(file.path(dir_github, 'raw/ico_list_raw.csv')) # static list; same each year
ico_list_raw
# Filtering the IUCN list of species to just those that are in the list of iconic species
<- spp_df_all %>%
spp_ico filter(sciname %in% ico_list_raw$sciname)
# Is everything we see in the iconic list in the IUCN list?
<- ico_list_raw %>%
spp_missing filter(!sciname %in% spp_ico$sciname) # no data here for v2020
<- ico_list_raw %>%
ico_list left_join(spp_ico %>%
select(iucn_sid, sciname, subpop = population, cat = category),
by = 'sciname') %>%
filter(!is.na(iucn_sid))
write_csv(ico_list, file.path(dir_github, 'int/ico_list_prepped.csv')) # 270 in v2021
For each of these species, use the IUCN API to gather a list of countries in which it is present.
# ico_country_url <- 'http://apiv3.iucnredlist.org/api/v3/species/countries/id/%s?token=%s'
#
# ico_spp_countries <- mc_get_from_api(url = ico_country_url, param_vec = unique(ico_list$iucn_sid), api_key = api_key, delay = 0.5)
# # sometimes this line takes a while to run! Make sure you have the project open........
# # v2020 had trouble with this line running for 8+ hours and never finishing, I think the server might have been timing out but wasn't sure, used the work around below instead
#
# write_csv(ico_spp_countries, file.path(dir_github, 'int/ico_spp_countries.csv')) # save this as an intermediate data file since it takes so long to create the df
#
# ico_spp_countries <- read_csv(here("globalprep/ico/v2019/int/ico_spp_countries.csv"))
################ workaround
## v2020 workaround - The following code is a work around that saves each species as a temp file in a new tmp folder then binds them together to form the full species list
<- read_csv(file.path(dir_github, 'int/ico_list_prepped.csv'))
ico_list
<- 'http://apiv3.iucnredlist.org/api/v3/species/countries/id/%s?token=%s'
ico_country_url
<- ceiling(unique(ico_list$iucn_sid))
n_chunks
<- function(x, ...) {
cat_msg if(is.null(knitr:::.knitEnv$input.dir)) {
### not in knitr environment, so use cat()
cat(x, ..., '\n')
else {
} ### in knitr env, so use message()
message(x, ...)
}return(invisible(NULL))
}
## for each species ID, get country list
for(j in 1:n_chunks){
#j = 1
<- file.path(dir_github, 'tmp',
chunk_file sprintf('spp_countries_chunk_%s.csv',
n_chunks[j]))
if(!file.exists(chunk_file)) {
cat_msg('Getting country info for species ', min(n_chunks[j]), ' row number ', j, ' out of ', 147)
#spp_ids_chunk <- ico_list$iucn_sid[n_chunks[j]]
<- mc_get_from_api(url = ico_country_url, param_vec = n_chunks[j], api_key = api_key, delay = 0.5)
ico_spp_countries_chunk cat_msg('... found ', nrow(ico_spp_countries_chunk), ' countries for these species')
write_csv(ico_spp_countries_chunk, chunk_file)
else {
}
cat_msg('Chunk file ', chunk_file, ' already exists; skipping these spp')
}
}
<- list.files(file.path(dir_github, 'tmp'),
spp_countries_chunk_files pattern = 'spp_countries_chunk',
full.names = TRUE)
<- lapply(spp_countries_chunk_files, FUN = function(x) {
spp_countries_df read.csv(x) %>%
mutate(code = as.character(code))}) %>%
bind_rows()
write_csv(spp_countries_df, file.path(dir_github, 'int/ico_spp_countries.csv')) # save this as an intermediate data file since it takes so long to create the df
Name to region function (in OHI core package) reports regions that don’t have a match in OHI region list. Here we report certain reported regions at a higher spatial scale, based on the listed regions in the error message.
<- read_csv(file.path(dir_github, 'int/ico_spp_countries.csv'))
ico_spp_countries
# Report these regions at higher spatial resolution:
<- data.frame(country = "Bonaire, Sint Eustatius and Saba", region = c('Bonaire', 'Saba', 'Sint Eustatius'))
country_split_1
<- data.frame(country = "French Southern Territories", region = c('Glorioso Islands', 'Juan de Nova Island', 'Bassas da India', 'Ile Europa', 'Ile Tromelin', 'Crozet Islands', 'Amsterdam Island and Saint Paul Island', 'Kerguelen Islands'))
country_split_2
<- data.frame(country = "Saint Helena, Ascension and Tristan da Cunha", region = c('Ascension', 'Saint Helena', 'Tristan da Cunha'))
country_split_3
<- data.frame(country = "United States Minor Outlying Islands",region = c('Wake Island', 'Jarvis Island', 'Palmyra Atoll', 'Howland Island and Baker Island', 'Johnston Atoll'))
country_split_4
<- rbind(country_split_1, country_split_2, country_split_3, country_split_4)
country_split <- country_split %>%
country_split_data left_join(ico_spp_countries) %>%
select(-country) %>%
rename(country = region)
# Join country split data with ico_spp_countries
<- rbind(ico_spp_countries, country_split_data)
ico_spp_countries <- ico_spp_countries %>%
ico_spp_countries mutate(country = ifelse(country == "Curaçao", "Curacao", country)) %>%
mutate(country = ifelse(country == "Côte d'Ivoire", "Ivory Coast", country)) %>%
mutate(country = ifelse(country == "Réunion", "Reunion", country)) %>%
mutate(country = ifelse(country == "Saint Martin (French part)", "Sint Maarten", country)) %>%
rename(iucn_sid = name) # changed this back to align with ico_list for next step
<- name_2_rgn(df_in = ico_spp_countries,
ico_spp_rgn_raw fld_name='country',
flds_unique = 'iucn_sid')
####### v2021 error checking:
# v2021 - No match for Isle of Man (not reported in OHI), Palestine (disputed), Saint Barthélemy (not reported in OHI), Disputed Territory (not associated with a country and can't be mapped to an OHI region) Macedonia (landlocked), and the four country strings above that were split to higher spatial resolutions
# Check that the country_split_data captures all the data from the longer country strings that were removed
<- ico_spp_countries %>%
check_full filter(country == "Bonaire, Sint Eustatius and Saba") #34 species
<- ico_spp_countries %>%
check_split filter(country == "Bonaire" | country == "Saba" | country == "Sint Eustatius")
#102 entries - makes sense each of the 34 species is listed three times, one for each country
<- ico_spp_countries %>%
check filter(country == "United States Minor Outlying Islands")
# Check error message to make sure the unmatched countries don't have an OHI region match, that there are no duplicates, and that the unmatched regions make sense (not disputed, landlocked, not reported). If there are duplicates or unmatched regions that are reported in OHI, need to go back to fix and rerun the code.
# v2021 - 26 countries removed for not being proper rgn_type or mismatched region names, all countries are not reported in OHI and aren't found in the rgns_all list
# v2021 - returned the "duplicates found" error
# check for duplicates by replicating the code at the end of the name_2_rgn function
<- ico_spp_rgn_raw[, c("iucn_sid", "country")]
dups <- duplicated(dups, fromLast = FALSE) | duplicated(dups,
i_dupes fromLast = TRUE)
sum(i_dupes) #returns 0
<- unique(ico_spp_rgn_raw[i_dupes, "country"]) #empty df - implies there are no duplicates, so I'm not sure why I was getting that error
df_dups
############
summary(ico_spp_rgn_raw) # Make sure there are no NAs!
write_csv(ico_spp_rgn_raw, file.path(dir_github, 'raw/ico_spp_rgn_raw.csv'))
Filter for globally iconic species
<- read_csv(file.path(dir_github, 'raw/ico_spp_rgn_raw.csv'))
ico_spp_rgn_raw <- read_csv(file.path(dir_github, 'int/ico_list_prepped.csv'))
ico_list
<- ico_spp_rgn_raw %>%
ico_spp_rgn_prepped left_join(ico_list, by = 'iucn_sid')
# check dimensions here; expect it to expand because each species in the iconic global list will be linked to the country it's listed in in the IUCN data
## filter this for species who are global (so all instances are iconic)
## OR ico_rgn_id matches rgn_id (so locally iconic matches with location)
<- ico_spp_rgn_prepped %>%
ico_spp_rgn_prepped filter(ico_gl == TRUE | ico_rgn_id == rgn_id) %>%
as_tibble()
# dimensions should shrink after running this line
# Adding as.tibble() simplified the structure, prevents ico_rgn_id from being coerced to logical structue
write_csv(ico_spp_rgn_prepped, here('globalprep/ico/v2021/int/ico_spp_rgn_prepped.csv'))
<- read_csv(here('globalprep/ico/v2021/int/ico_spp_rgn_prepped.csv'),
ico_spp_rgn_prepped col_types = "ddcccccdcccldcc")
# Explicitly designating col_types retains all data !
<- ico_spp_rgn_prepped %>%
ico_spp_rgn_prepped select(iucn_sid, sciname, comname, presence, origin, distribution_code, cat, ico_gl, rgn_id, rgn_name, subpop, ico_rgn_id)
::datatable(ico_spp_rgn_prepped, width = 400) DT
We accessed the IUCN API to determine past IUCN assessments for each of the identified iconic species: http://apiv3.iucnredlist.org/api/v3/species/history/id/
Each assessment includes a year and an extinction risk, along with additional information on the assessment.
<- 'http://apiv3.iucnredlist.org/api/v3/species/history/id/%s?token=%s'
ico_past_assess_url <- read_csv(file.path(dir_github, 'int/ico_list_prepped.csv'))
ico_list
## for each species ID, get past assessments
# ico_assess_raw <- mc_get_from_api(ico_past_assess_url, ico_list$iucn_sid, api_key)
# this line also sometimes takes a while to run
# v2020 - again had trouble running this line, adapted the above work around for this too
###############
# Workaround
<- ceiling(unique(ico_list$iucn_sid))
n_chunks
<- function(x, ...) {
cat_msg if(is.null(knitr:::.knitEnv$input.dir)) {
### not in knitr environment, so use cat()
cat(x, ..., '\n')
else {
} ### in knitr env, so use message()
message(x, ...)
}return(invisible(NULL))
}
## for each species ID, get past assessments
for(j in 1:n_chunks){
# j = 2
<- file.path(dir_github, 'tmp',
chunk_file sprintf('spp_past_assess_chunk_%s.csv',
n_chunks[j]))
if(!file.exists(chunk_file)) {
cat_msg('Getting assessment info for species ', min(n_chunks[j]), ' row number ', j, ' out of ', 146)
<- mc_get_from_api(url = ico_past_assess_url, param_vec = n_chunks[j],
ico_spp_past_assess_chunk api_key = api_key, delay = 0.5)
cat_msg('... found ', nrow(ico_spp_past_assess_chunk), ' past assessments for this species')
write_csv(ico_spp_past_assess_chunk, chunk_file)
else {
}
cat_msg('Chunk file ', chunk_file, ' already exists; skipping these spp')
}
}
# Read in all the files and bind together to make the ico_assess_raw file
<- list.files(file.path(dir_github, 'tmp'),
spp_past_assess_chunk_files pattern = 'spp_past_assess_chunk',
full.names = TRUE)
<- lapply(spp_past_assess_chunk_files, FUN = function(x) {
ico_assess_raw read.csv(x) %>%
mutate(code = as.character(code))}) %>%
bind_rows()
###############
<- ico_assess_raw %>%
ico_assess_raw rename(iucn_sid = name) %>%
mutate(iucn_sid = as.integer(iucn_sid),
year = as.integer(year)) %>%
left_join(ico_list %>%
select(iucn_sid, sciname) %>%
distinct(),
by = 'iucn_sid')
# 705 in v2018, 743 in v2019, 463 in v2020, 480 in v2021
##### v2020 error check - same logic goes for 2021...
# there's a difference of 280 past assessments which seems like a lot, I feel like past assessments should only be increasing unless for some reason they are no longer accessible on the website
# group by iucn_sid and compare the number of assessments for each species
<- ico_assess_raw %>%
v2020_grouped group_by(iucn_sid) %>%
count() %>%
rename(v2020 = "n")
<- read_csv(file = file.path(here(),"/globalprep/ico/v2019/raw/ico_assessments_raw.csv")) %>%
v2019_grouped group_by(iucn_sid) %>%
count() %>%
rename(v2019 = "n")
<- left_join(v2020_grouped, v2019_grouped, by="iucn_sid") %>%
compare_totals filter(v2019 != v2020) %>%
mutate(diff = v2019 - v2020)
# there are 29 species with a different number of past assessments between last year and this year
# of these, 11 have one more observation in 2020 compared to 2019
# but the rest have more past assessments in 2019 than today, by as much as 40 for some species (which adds up to the 280)
# for example spp 12419 has five assessments in the 2020 data but 45 in the 2019 data maybe this has something to do with different years or numbers of assessment per year?
<- read_csv(file = file.path(here(),"/globalprep/ico/v2019/raw/ico_assessments_raw.csv")) %>%
v2019 filter(iucn_sid == 12419) %>%
group_by(year) %>%
count()
<- ico_assess_raw %>%
v2020 filter(iucn_sid == 12419) %>%
group_by(year) %>%
count()
# okay so the years of assessment are all the same (1996-2000-2004-2008-2015) but there were 9 assessments in every year for the 2019 data and only one assessment per year in the 2020 data, when I look at the tmp csv file it just shows one assessment for every year. After looking at specific species on the IUCN it doesn't look like there should be more than one assessment for any given year. Plus in the 'clean_up_category_list' chunk the code uses distinct() which should reduce the number. Will compare the 2020 to 2019 after this step and make sure it makes sense.
######
write_csv(ico_assess_raw, file.path(dir_github, 'raw/ico_assessments_raw.csv'))
::datatable(ico_assess_raw, caption = 'ICO species and past IUCN assessments') DT
These raw assessments are cleaned up to standardize codes and categories, and category scores are assigned based on a scale from Least Concern = 1 to Extinct = 0, as per the Species OHI subgoal. Note that past assessments used different coding, especially prior to 1994; we reclassified older codes according to this chart:
\[\begin{eqnarray*} \textbf{New Category} & & \longleftarrow \textbf{Original Category/Description}\\ NT & & \longleftarrow \text{"Lower risk/Near threatened ($LR/NT$)"}\\ T & & \longleftarrow \text{"Threatened ($T$)" treat as "$EN$"}\\ VU & & \longleftarrow \text{"Vulnerable ($V$)"}\\ EN & & \longleftarrow \text{"Endangered ($E$)" }\\ LR/CD & & \longleftarrow \text{"Lower risk/Conservation dependent ($LR/CD$)" treat as between $VU$ and $NT$}\\ CR & & \longleftarrow \text{"Very rare and believed to be decreasing in numbers"}\\ T & & \longleftarrow \text{"Less rare but believed to be threatened-Requires watching"}\\ DD & & \longleftarrow \text{"Insufficiently known ($K$)"}\\ DD & & \longleftarrow \text{"Indeterminate ($I$)"}\\ DD & & \longleftarrow \text{"Status inadequately known-Survey required or data sought"}\\ NE & & \longleftarrow \text{"Not recognized ($NR$)"} \end{eqnarray*}\]
iucn_sid | year | code | category | sciname
<- read_csv(file.path(dir_github, 'raw/ico_assessments_raw.csv'))
ico_assess_raw
<- ico_assess_raw %>%
ico_assess rename(cat = code, cat_txt = category) %>%
mutate(cat = toupper(cat),
cat = str_replace(cat, 'LR/', ''),
cat = ifelse(cat %in% c('K', 'I'), 'DD', cat),
cat = ifelse(cat == 'NR', 'NE', cat),
cat = ifelse(str_detect(toupper(cat_txt), 'VERY RARE'), 'CR', cat),
cat = ifelse(str_detect(toupper(cat_txt), 'LESS RARE'), 'T', cat),
cat = ifelse(str_detect(toupper(cat_txt), 'STATUS INADEQUATELY KNOWN'), 'DD', cat),
cat = ifelse(cat == 'V', 'VU', cat),
cat = ifelse(cat == 'E', 'EN', cat))
<- data.frame(cat = c("LC", "NT", "VU", "EN", "CR", "EX", "T", "CD", "NE", "DD"),
pop_cat cat_score = c( 0, 0.2, 0.4, 0.6, 0.8, 1.0, 0.6, 0.3, NA, NA),
stringsAsFactors = FALSE)
<- ico_assess %>%
ico_assess left_join(pop_cat, by = 'cat') %>%
filter(!is.na(cat_score)) %>%
distinct() %>%
arrange(iucn_sid, year)
# v2021 358 rows in the list - compare to v2020 to double check
<- read_csv(file = file.path(here(),"/globalprep/ico/v2020/int/ico_assess_clean.csv"))
v2020_assess_clean # v2020 had 341 rows, makes sense that v2021 is more since there are some newer years of assessments in the updated data
write_csv(ico_assess, file.path(dir_github, 'int/ico_assess_clean.csv'))
Using tidyr::complete()
and tidyr::fill()
, we create a full time series for all species from the earliest assessment to the most recent year.
<- read_csv(file.path(dir_github, 'int/ico_assess_clean.csv'))
ico_assess <- read_csv(file.path(dir_github, 'int/ico_list_prepped.csv'))
ico_list
# Fill in category score for missing years based on previous year's data:
<- ico_assess %>%
ico_assess_full mutate(eval_yr = year) %>%
select(-sciname) %>%
arrange(iucn_sid, year) %>%
complete(year = full_seq(year, 1), nesting(iucn_sid)) %>%
group_by(iucn_sid) %>%
fill(cat, cat_txt, cat_score, eval_yr) %>% ## fills all the way to latest year
ungroup()
# does it make sense that there are 4818 NAs here for cat_score and eval_yr? Yes
# v2020 4907 NAs in cat_score and eval_yr: ex 2468 has earliest assessment in 1996 so all assessments after this should be filled using the 1996 data, but before 1996 would be NAs
<- ico_list %>%
ico_spp_cat rename(cat_2016 = cat) %>%
left_join(ico_assess_full, by = c('iucn_sid'))
## if no time series available, time series years will be NA. Assign a list to those NAs, then unnest it to create observations for those years.
<- ico_spp_cat %>%
ico_spp_cat mutate(year = ifelse(is.na(year),
list(c(min(year, na.rm = TRUE):max(year, na.rm = TRUE))),
%>%
year)) unnest(year)
# ico_rgn_id is classified as numeric
## NAs will be filled backward in time by starting from the most recent non-NA.
## To do this, we'll swap any current-year NAs with the cat_score (meaning no
## time series fill), and fill upwards instead of downwards.
<- ico_spp_cat %>%
ico_spp_cat left_join(pop_cat %>%
rename(cat_2016 = cat, cat_2016_score = cat_score),
by = 'cat_2016') %>%
mutate(cat_score = ifelse(year == max(year, na.rm = TRUE) & is.na(cat),
cat_2016_score, %>%
cat_score)) arrange(iucn_sid, year) %>%
group_by(iucn_sid) %>%
fill(cat, cat_score, cat_txt, .direction = 'up') %>%
ungroup() %>%
distinct()
summary(ico_spp_cat)
# v2021 still has 448-10181-1120 NAs for cat_score, eval_yr, cat_2016_score; compare against v2020
<- read_csv(file = file.path(here(),"/globalprep/ico/v2020/int/ico_spp_cat.csv")) #has 715-10118-1540 NAs
v2020_cat
write_csv(ico_spp_cat, file.path(dir_github, 'int/ico_spp_cat.csv'))
Using dplyr::full_join()
we combine the ico_spp_rgn
dataframe (iconic species by OHI region) with the ico_spp_cat
dataframe (iconic species by category and year, with species info, year, and category info).
<- read_csv(file.path(dir_github, 'int/ico_spp_cat.csv')) %>%
ico_cat_ts_abbr select(iucn_sid, sciname, year, cat, cat_score, eval_yr) %>%
filter(year >= 1992) # This goal uses 20 years of data to calculate trends (assessment started in 2012)
<- read_csv(here('globalprep/ico/v2021/int/ico_spp_rgn_prepped.csv'),
ico_spp_rgn col_types = "ddcccccdcccldcc") %>%
select(rgn_id, rgn_name, iucn_sid, comname, sciname, ico_gl, ico_rgn_id, presence)
# Explicitly designating col_types retains all data !
<- ico_cat_ts_abbr %>%
ico_spp_rgn_cat full_join(ico_spp_rgn, by = c('iucn_sid', 'sciname'))
### No information on when species go extinct locally, so just set all years to extinct for that region
<- ico_spp_rgn_cat %>%
ico_spp_rgn_cat mutate(cat = ifelse(str_detect(presence, '^Extinct'), 'EX', cat),
cat_score = ifelse(cat == 'EX', 1, cat_score)) %>%
filter(ico_gl | ico_rgn_id == rgn_id) %>% # keep (all globally iconic) and (regionally iconic in region only)
distinct()
write_csv(ico_spp_rgn_cat, file.path(dir_github, 'int/ico_spp_rgn_cat.csv'))
# csv retaining data in ico_rgn_id column, but not when it is re-imported
::kable(head(ico_spp_rgn_cat, 10)) knitr
The toolbox wants rgn_id
, species sciname
, and extinction risk category
for the basic calculations. Since some regions contain multiple subpops (or parent/subpop) we also include iucn_sid
to differentiate. This information is included for each year
, filtered back to the year 2000.
While the official calculations are performed in the toolbox, we perform the same basic calcs here to get a sense of the ICO status and trend ahead of time. Report and summarize estimate of regional iconic species status:
<- read_csv(file.path(dir_github, 'int/ico_spp_rgn_cat.csv'),
ico_spp_rgn_cat col_types = "dcdcdddccldc")
## Report out for toolbox format (rgn_id | sciname | category or popn_trend for each species within a region).
## Note: in toolbox, group_by(rgn_id, sciname) and then summarize(category = mean(category)) to
## average any parent/subpop species listings before aggregating to overall average per region.
<- ico_spp_rgn_cat %>%
ico_status_raw select(rgn_id, rgn_name, sciname, iucn_sid, cat, cat_score, year, eval_yr) %>%
arrange(rgn_id, desc(year), sciname) %>%
ungroup()
# Get a preview of status and trend:
<- ico_status_raw %>%
ico_status_calc group_by(rgn_id, rgn_name, sciname, year) %>%
filter(!is.na(cat_score)) %>% # remove any DDs (data deficient)
summarize(cat_score = mean(cat_score)) %>%
group_by(rgn_id, rgn_name, year) %>%
summarize(mean_cat = round(mean(cat_score), 5),
ico_status = (1 - mean_cat) * 100,
n_spp = n()) %>%
ungroup()
<- data.frame()
ico_trend for (i in 1993:max(ico_status_calc$year, na.rm = TRUE)) { # i <- 2013
<- ico_status_calc %>%
tmp_status filter(year <= i & year > (i - 10)) # trend based on 10-year average since assessments are sporadic
<- tmp_status %>%
tmp_trend group_by(rgn_id) %>%
do(trend_lm = lm(ico_status ~ year, data = .)$coefficients[2]) %>%
mutate(year = i,
trend_lm = as.numeric(trend_lm)/100, # status is 0 - 100; trend should be +1 to -1
ico_trend = round(trend_lm * 5, 5)) %>% # trend prediction five years out
ungroup()
<- ico_trend %>%
ico_trend bind_rows(tmp_trend)
}
<- ico_status_raw %>%
ico_sum left_join(ico_status_calc, by = c('rgn_id', 'rgn_name', 'year')) %>%
left_join(ico_trend, by = c('rgn_id', 'year'))
write_csv(ico_sum, file.path(dir_github, 'summary/ico_summary.csv'))
### Report out for finalized status and trend values per region
<- ico_status_raw %>%
ico_status_raw1 ::select(rgn_id, sciname, iucn_sid, year, eval_yr, category = cat)
dplyr
# Create the files the toolbox will use:
write_csv(ico_status_raw1, file.path(dir_github, 'output/ico_spp_iucn_status.csv'))
write_csv(ico_status_calc, file.path(dir_github, 'output/ico_status_calc.csv'))
write_csv(ico_trend, file.path(dir_github, 'output/ico_trend.csv'))
duplicated(ico_status_raw1 ), ]
ico_status_raw1[### NOTE: if iucn_sid were removed, this would show duplicates due to subpops with same category.
table(ico_status_raw1$category)
::datatable(ico_status_raw %>% filter(year == 2018)) DT
::datatable(ico_status_calc %>% filter(year == 2018), caption = 'ICO status')
DT
::datatable(ico_trend %>% filter(year == 2018) %>% select(-trend_lm), caption = 'ICO trend') DT
To examine results of the new methods (including API-based data scraping and trend based on time series), we plot the estimated status and trend over time.
library(ggplot2)
library(plotly)
<- ggplot(ico_sum %>%
status_ts_plot filter(!is.na(rgn_id)),
aes(x = year, y = ico_status, color = rgn_id, group = rgn_id)) +
geom_line(size = 1.2, alpha = .4) +
labs(x = 'year',
y = 'ICO status',
title = 'ICO status over time',
color = 'Region')
::ggplotly(status_ts_plot)
plotly
<- ggplot(ico_sum %>%
trend_ts_plot filter(!is.na(rgn_id) &!is.na(ico_trend)),
aes(x = year, y = ico_trend, color = rgn_id, group = rgn_id)) +
geom_line(size = 1.2, alpha = .4) +
labs(x = 'year',
y = 'ICO trend',
title = 'ICO trend over time',
color = 'Region')
::ggplotly(trend_ts_plot) plotly
Plot the estimated status and trends for 2016 from assessments v2020 and v2021.
<- read_csv(file.path(dir_github, 'summary/ico_summary.csv'))
ico_sum
<- ico_sum %>% # status and trend in this year's assessment
status_trend filter(year == max(year)) %>% # max year of ico_sum is 2019; want to compare the most recent shared year (also 2019 in v2019)
select(rgn_id, ico_status, ico_trend, n_spp) %>%
distinct() # because removing iucn_sid, cat, cat_score, and species name, results in duplicates
<- read_csv(here('globalprep/ico',previous_scen,'/summary/ico_summary.csv')) %>%
ico_compare filter(year == max(year)) %>% # 2019 data is max of last year's data
select(rgn_id, rgn_name, ico_status, ico_trend, n_spp) %>%
distinct() %>%
rename(previous_status = ico_status, previous_trend = ico_trend, previous_nspp = n_spp) %>%
left_join(status_trend, by = 'rgn_id') %>%
mutate(d_nspp = as.factor(n_spp - previous_nspp))
# four regions have NAs
<- ggplot(ico_compare) +
compare_status_plot geom_point(aes(previous_status, ico_status, color = d_nspp, label = rgn_id), alpha = 0.6) +
geom_abline(slope = 1, intercept = 0, color = 'red') +
labs(x = sprintf('ICO status %s', previous_scen),
y = sprintf('ICO status %s', scenario),
title = 'ICO Status Comparison')
::ggplotly(compare_status_plot)
plotly
<- ggplot(ico_compare) +
trend_compare_plot geom_point(aes(previous_trend, ico_trend, color = d_nspp, label = rgn_id), alpha = .6) +
geom_abline(slope = 1, intercept = 0, color = 'red') +
labs(x = sprintf('ICO trend %s', previous_scen),
y = sprintf('ICO trend %s', scenario),
title = 'ICO Trend Comparison')
::ggplotly(trend_compare_plot) plotly
v2021 No need to run this section of the code; for data exploration purposes only.
Let’s check rgn id 215