Chapter 8 Calculations: basic workflow
The purpose of Chapter 8 is to introduce you to the basic workflow for calculating OHI scores. This is a 2-hour hands-on training: you will be following along on your own computer and working with a copy of the demonstration repository that is used throughout this chapter.
8.1 Overview
Calculating scores with the OHI Toolbox requires a tailored repository operating with the OHI R
package ohicore
. The tailored repo has information specific to your assessment — most importantly the data and goal models — and ohicore
will combine these with core operations to calculate OHI scores. You will always start with a tailored repository that has data and models extracted from the most recent Global OHI assessment.
This training will introduce the basic workflow for calculating scores. There are many ways to build from the ‘out-of-the-box’ tailored repo you have instead of starting an assessment from scratch. For example, you may want to just change underlying data sources within the models, or completely change the models which also requires new data layers and data sources.
We will repeat the basic workflow four times, each time adding complexity. We will calculate scores with:
- ‘out of the box’ data and models extracted from a recent global assessment
- tailored data (with ‘out of the box’ models)
- explore Configure Toolbox section
- explore
functions.R
- tailored models (with ‘out of the box’ data layers)
- tailored data and models (adding a new data layer / model variable)
The workflow depends on the calculate_scores.Rmd
file found the scenario folder of any tailored repo. We’ll also dive deeper into the code itself, focusing particularly on developing goal models in functions.R
.
Note: earlier Toolbox versions have calculate_scores.R
which calls configure_toolbox.R
. Now, calculate_scores.Rmd
has a section named “Configure Toolbox”, which has the equivalent code so it is possible to follow along with this tutorial even if your repository does not have the .Rmd
.
This is a lot to cover in a 2-hour training, and the purpose is to give you big take home messages and experience for what you need to begin calculating scores. But the Toolbox has a lot of moving parts, and we cannot cover all of it here. There are a lot of details and other operations that we won’t get into here and that will be coming in future tutorials (including tailoring pressures & resilience, and how to change subgoals).
8.1.1 Prerequisites
Before the training, please make sure you have done the following:
- Have up-to-date versions of
R
and RStudio and have RStudio configured with Git/GitHub - Fork the toolbox-demo repository into your own GitHub account by going to https://github.com/OHI-Science/toolbox-demo, clicking “Fork” in the upper right corner, and selecting your account
- Clone the toolbox-demo repo from your GitHub account into RStudio into a folder called “github” in your home directory (filepath “~/github”)
- Get comfortable: be set up with two screens if possible. You will be following along in RStudio on your own computer while also watching an instructor’s screen or following this tutorial.
8.2 Review the Toolbox file ecosystem
Let’s quickly review some of the files you have in the toolbox-demo repo that we saw in Chapter 6. Remember that the ecosystem structure of any tailored repo is the same, so as you learn to navigate through and calculate scores in this repo you are also learning how to navigate through and calculate scores in any other OHI assessment repository — yours or anyone else’s.
This figure highlights the files we will focus on in this tutorial (others are grayed out).
In our toolbox-demo repo, here are a few additional things to mention:
- our scenario folder is called
region2017
- goal models are
R
functions all stored inconf/functions.R
- regions are listed (with area) in
spatial/regions_list.csv
. We have 8 here.
8.3 Calculate with ‘out-of-the-box’ data and models
The first time we go through the basic workflow will be with ‘out-of-the-box’ data and models from the global assessment.
calculate_scores.Rmd
is the file that you’ll use a lot — mostly to run piece-by-piece as you develop your models. It takes inputs (data and models) from your repository and uses the OHI R package ohicore
to compute OHI scores. It has several components which we will explore in turn in the rest of the tutorial.
calculate_scores.Rmd
will load the libraries you need and ohicore
will check your book-keeping and configuration, and calculate OHI scores. Ultimately, it will save the scores for each goal and dimension in scores.csv
. The ‘dimensions’ of OHI goal scores are Status, Trend, Pressures, Resilience, Likely Future State, and overall goal Score. Dimensions are calculated for each goal in a specific order, as we will see below. calculate_scores.Rmd
will combine information from your tailored repository and calculate scores with OHI core functions from ohicore
.
Open region2017/calculate_scores.Rmd
and let’s have a look at its operations. We will then run it line-by-line.
calculate_scores.Rmd
is an RMarkdown file, which combines simply formatted text and R code and is really amazing for communication, including our OHI websites (see a 1-minute video here). For now, we will focus on the .Rmd file within the RStudio pane, and see that written text appears with a white background and R code appears with a grey background as a “code chunk”. You can run R code line-by-line, or as a whole chunk by clicking the green triangle at the top-right corner of the code chunk’s grey box.
Each of the following steps is its own section and code chunk within calculate_scores.Rmd
.
8.3.1 Install packages, including ohicore
Note: Previous versions of the Toolbox had install_ohicore.r
as a separate file, but the effect is the same.
OHI requires packages created by others in the R community as well as one we developed ourselves. This is something that only needs to be done one time. I think of it as wiring a building for electricity: once it’s done, it’s done. Let’s run these line-by-line if you don’t have them installed already.
ohicore
is an R package developed by the OHI team that has all the essential core functions and supporting packages you will use to develop your assessment and calculate scores.
## install packages from R community
install.packages("tidyverse")
install.packages("zoo")
install.packages("here")
install.packages("devtools")
## install the ohicore package from OHI team
::install_github('ohi-science/ohicore@dev') devtools
8.3.2 Load R packages
Next, you will load each R package as a library from the toolbox-training
repository whenever you work on your assessment to gain access to all those functions and packages. That is like turning on the lights when you need to use them; you need to do this every time you open your assessment repository.
We will also set the working directory, because the ohicore
package expects you to be inside your scenario folder (this will be improved further another time). We will use the new here
package, which will identify the full filepath on your computer and will make collaborating easier between us.
## load package libraries
library(tidyverse)
library(stringr)
library(zoo)
library(here)
library(ohicore)
## set the working directory to a filepath we all have
setwd(here::here('region2017'))
8.3.3 Configure the Toolbox
Next, we will configure the toolbox from within calculate_scores.Rmd
. Let’s run the whole code chunk by clicking the green arrow at the top-right.
There is output printed to the console that lists all of the layers registered, and ends with any warning messages about the layers themselves. We will explore what is happening here and how to interpret these warning messages further on; for now, let’s move on since we have not encountered an error.
<- ohicore::Conf('conf')
conf
## check that scenario layers files in the \layers folder match layers.csv registration. Layers files are not modified.
::CheckLayers('layers.csv', 'layers', flds_id=conf$config$layers_id_fields)
ohicore
## load scenario layers for ohicore to access. Layers files are not modified.
<- ohicore::Layers('layers.csv', 'layers')
layers
## select corresponding data year to use for pressures and resilience
<- 2016
scenario_years $data$scenario_year <- scenario_years
layers
# cc_acid
# cc_slr
# ...
# tr_travelwarnings
# Warning messages:
# 1: In ohicore::CheckLayers("layers.csv", "layers", flds_id = conf$config$layers_id_fields) :
# Unused fields...
# ico_spp_iucn_status: iucn_sid
# 2: In ohicore::CheckLayers("layers.csv", "layers", flds_id = conf$config$layers_id_fields) :
# Rows duplicated...
# ico_spp_iucn_status: 816
8.3.4 Calculate Scores
Now let’s continue with the next code chunk in calculate_scores.Rmd
, which first runs CalculateAll()
. Notice too that we are saving the output to a variable called scores
. Instead of running the whole code chunk here, let’s just run this single line.
Note: the prefix
ohicore::
is a way to be explicit that theCalculateAll()
is part of theohicore
package.
## calculate scenario scores
<- ohicore::CalculateAll(conf, layers) scores
8.3.4.1 Output: Status and Trend
CalculateAll()
first calculates the Status and Trend for every goal and subgoal. These models are in your tailored repository’s functions.R
(we will explore functions.R
below). You can choose to add messages to print during calculation like is shown below for Mariculture (MAR).
# Running Setup()...
# Calculating Status and Trend for each region for FIS...
# Calculating Status and Trend for each region for MAR...
# 95th percentile for MAR ref pt is: 0.0758396517531756
# ...
8.3.4.2 Output: Pressures and Resilience
Next, we see output as CalculateAll()
calculates Pressures and Resilience based on the pressures and resilience matrix tables in your tailored repository. For each, ohicore
lists the subcategories that will be calculated, and identifies any mismatches between data layers identified but not used or missing. We will learn more about the pressures and resilience matrices in a different Chapter.
# Calculating Pressures for each region...
# There are 6 pressures subcategories: pollution, alien_species, habitat_destruction, fishing_pressure, climate_change, social
# These goal-elements are in the weighting data layers, but not included in the pressure_matrix.csv:
# LIV-aqf
# These goal-elements are in the pressure_matrix.csv, but not included in the weighting data layers:
# CP-coral, CP-mangrove, CP-saltmarsh, CS-mangrove, CS-saltmarsh, HAB-coral, HAB-mangrove, HAB-saltmarsh, HAB-seagrass, LIV-ph, LIV-tran, CP-seaice_shoreline, HAB-seaice_edge, ECO-wte, LIV-wte, LIV-sb
# Calculating Resilience for each region...
# There are 7 Resilience subcategories: ecological, alien_species, goal, fishing_pressure, habitat_destruction, pollution, social
# These goal-elements are in the resilience_matrix.csv, but not included in the weighting data layers:
# CP-coral, CP-saltmarsh, CS-saltmarsh, HAB-coral, HAB-saltmarsh, HAB-seagrass, CP-mangrove, CS-mangrove, HAB-mangrove, HAB-seaice_edge, CP-seaice_shoreline
8.3.4.3 Output: Combine Dimensions
Finally, we see output as CalculateAll()
combines the dimensions above in several ways. It calculates the Goal Scores and Likely Future State for each goal and subgoal. Then, it calculates ‘supragoals’, which are goals that have subgoals, for example Food Provision (FP), which has the subgoals FIS (Wild-caught Fisheries) and Mariculture (MAR). Finally, it calculates the overall Index score for the entire Assessment Area using an area-weighted average.
# ...
# Calculating Goal Score and Likely Future for each region for FIS...
# Calculating Goal Score and Likely Future for each region for MAR...
# ...
# Calculating post-Index function for each region for FP...
# Calculating post-Index function for each region for LE...
# Calculating Index score for each region for supragoals using goal weights...
# Calculating Likely Future State for each region for supragoals using goal weights...
# Calculating scores for ASSESSMENT AREA (region_id=0) by area weighting...
# Calculating FinalizeScores function...
8.3.4.4 Output: Warning Messages
Following all the calculations are the warning messages, which are due to operations within functions.R
, which you will be able to fix as you tailor your goal models. These warning messages are due to using goal models from the global assessment with just a subset of data from the global assessment we have extracted here for the toolbox-demo repository.
# Warning messages:
# 1: In left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) :
# joining factors with different levels, coercing to character vector
# ...
# 8: In max(d$x, na.rm = T) :
# no non-missing arguments to max; returning -Inf
8.3.5 Save scores variable as scores.csv
Finally, we will save the output from CalculateAll()
, a variable called scores
, as a comma-separated-value file called scores.csv
. We will do this by running the second line of code in this code chunk.
## save scores as scores.csv
::write_csv(scores, 'scores.csv', na='') readr
We can inspect it and see that it is a long-formatted file with four columns for the goal, dimension, numeric region identifier, and score.
goal | dimension | region_id | score |
---|---|---|---|
AO | future | 0 | 92.85 |
AO | future | 1 | 92.85 |
AO | future | 2 | 92.85 |
… | … | … | … |
AO | pressures | 1 | 37.75 |
AO | pressures | 2 | 37.75 |
… | … | … | … |
We have 8 regions in the toolbox-demo repo. An additional region 0 is the area-weighted combination of all regions.
Note: each region in your assessment will have a numeric region identifer, called a
region_id
orrgn_id
for short. You can see a list of all regions and corresponding identifiers in toolbox-demo/region2017/spatial/regions_list.csv
8.3.6 Error messages
Hopefully this first time through calculate_scores.Rmd
you did not encounter error messages, but you definitely will as you move ahead. Error messages are often due to typos or miscommunications between what you tell R
versus what it expects. You will encounter error messages due to R
itself, and due to ohicore
. Error messages often have human-friendly messages to alert you to what went wrong, and we are continually improving error messages you’ll encounter when you use ohicore
so you can try to solve them more easily. Some commonly occurring errors and how to fix them can be found in the Troubleshooting section of the manual. Copy-pasting error messages into Google is also one of the best places to start.
8.3.7 Create figures
Two common plots to represent scores are flower plots and maps. We will walk through an example of the flower plot code here. Note: you’ll see that we are sourcing code from another OHI repository, where this code was developed. After we finish testing it, we will add it as a function to ohicore
and then you will not need to source it anymore.
## source script (to be incorporated into ohicore)
source('https://raw.githubusercontent.com/OHI-Science/arc/master/circle2016/plot_flower_local.R')
PlotFlower(assessment_name = "Toolbox Demo",
dir_fig_save = "reports/figures")
The default arguments is to create a flowerplot for every region and region 0, although you can modify this. When we run this code now you will see that the figures were in fact recreated (the timestamps for the figures in the File pane of RStudio have updated) but are not different from the previous ones so they do not show up in the Git window.
8.3.8 Recap of first calculate_scores.Rmd
run
We have just successfully run through the basic workflow to calculate OHI scores. It first loads necessary packages, configures data and models, and then it calculates all the components of OHI scores (status, trend, pressures, resilience, overall scores), and finally it saves the new OHI scores
object in a .csv file.
We will build on this basic workflow, by exploring the operations above in more detail, and by updating the data, models, and configurations within the toolbox-demo repository.
8.4 Calculate with tailored data
Now let’s run through this basic workflow a second time, building on what we’ve learned.
Here, we will focus on one of the layers for the Artisanal Fishing Opportunity (AO) goal. We will prepare local data that will substitute global data for the data layer ao_access
and recalculate scores without modifying the goal model itself.
It’s a good idea to go to RStudio’s Session menu and select Restart R
to make sure you have a clean working directory.
8.4.1 Prepare and save our data layer
While Chapter 7 shows in detail how to prepare data layers, save them in the “layers” folder, and register them in layers.csv
and scenario_data_years.csv
so the Toolbox knows where to find them, we have prepared a shorter example with AO for our purposes here.
Open toolbox-demo/prep/AO/access_prep.R
and source it after reading it through. The result will be a new data file called “ao_access_demo2017.csv” saved to the “layers” folder, and you should see that there is a new file saved in your Git window.
8.4.2 Register in layers.csv
Now that we have prepared and saved our data layer, we’ll register it in layers.csv
. layers.csv
is a registry that will direct ohicore
to appropriate data layers, and has information about each data layer — which goal it is used for, filename, column names, etc. For further detail see Chapter 7.
There is a data layer for ao_access
that is already registered in layers.csv
, but it is currently created from a file called “ao_access_gl2017.csv”. We will update this so the data layer is created from our new demo file (“ao_access_demo2017.csv”); this happens in the “filename” column of layers.csv
.
Open region2017/layers.csv
in a spreadsheet software (i.e. Microsoft Excel or Open Office). Next, find ao_access
in the “layer” column. Where it says “ao_access_gl2017.csv”, update this to say “ao_access_demo2017.csv” — the new data layer you just saved. Save this and close Excel.
IMPORTANT! Be sure to close Excel after you have made these edits. On a PC, having
layers.csv
open in Excel will prohibit it from being accessed from R, and the Toolbox needs access to calculate scores!
8.4.3 Register in scenario_data_years.csv
Next let’s go to region2017/conf/scenario_data_years.csv
. We can open this in RStudio: when you click on its name in the Files pane, select “View File”.
scenario_data_years.csv
is a registry to organize year information for each layer, and helps set you up from the very beginning to be able to calculate repeated assessments. When you calculate OHI scores, you will be explicit about the year your completed assessment represents, and we call this the scenario_year
. data_year
is the most recent years available for that data layer.
Let’s look at the ao_access
layer. It turns out that the same data_year
, 2013, is used for all scenario_years
2008:2017. This means that this data source has not been updated through time so the trends that are calculated will be flat. We can double-check our “ao_access_demo2017.csv” file to see that 2013 is the most recent data that we have. This means that our data layer is already registered here in scenario_data_years.csv
and we do not need to make any changes.
Depending on your local data, registering in scenario_data_years.csv
may be more like confirming the information that is already registered. Should you delete some of those previous years? Well, the Trend calculations require at least 5 years of data or the Toolbox will give errors. You can delete some of the earliest years to remove some clutter (left over from the global assessment), definitely rerun calculate_scores.Rmd
afterwards to make sure that there are no unexpected changes to scores.csv
(As a side note, this would be a good data layer to substitute if you had better local information through time.)
8.4.4 Rerun calculate_scores.Rmd
Now, let’s rerun calculate_scores.Rmd
. ohicore
will now use your tailored data when it creates the “ao_access” layer because you’ve registered it in layers.csv
and scenario_data_years.csv
and the file is available in the layers folder.
8.4.5 Check our work, plot, and sync
Whenever there are changes made to your files (additions, deletions, and modifications), you will be notified in the Git window, since Git is tracking the files in this repo. This is a good place to confirm you have did the things you set out to do, and you can also see if you errantly did anything you didn’t mean to.
So here, you added a new data layer and after calculating scores you expect to see changes to AO scores in score.csv
. layers.csv
will also change because ohicore
will update fields in this file as it runs through its checks. But we don’t expect any other files to change at this point, so let’s make sure that’s true.
Now let’s recreate the flower plots with our updated goal model. You’ll see those .png’s show up in the Git tab as well. Although we can’t inspect the differences between the figures through RStudio here, we will be able to see them on GitHub.com.
Now is a good time to commit this work and sync to GitHub. That way, the work we’ve done is committed together and we will have a clean slate (from a Git sense) moving forward. I’ll use the commit message “toolbox-training: tailor ao_access layer and rerun calculate_scores.Rmd”
Now, we can inspect on GitHub.com:
8.4.6 Recap of second calculate_scores.Rmd
run
One way to tailor your assessment is to substitute data for an existing data layer. We have just run through the basic workflow a second time. This time we successfully:
- substituted the global OHI data layer
ao_access
with new data, which includes saving it inlayers
folder, registering it inlayers.csv
andscenario_data_years.csv
, and - reran
calculate_scores.Rmd
without modifying the goal model itself - checked
scores.csv
changes in the Git tab to make sure all changes were expected
8.5 Explore Configure Toolbox
So now let’s take a closer look at the Configure Toolbox section, the first code chunk following library install and loading in calculate_scores.Rmd
.
This code chunk combines everything required to calculate OHI scores and checks that they are properly formatted and available, and will minimize potential errors later on. It makes sure that your data and goal models are ready to be used to calculate scores.
Important: Any time you make a change to a data layer or a goal model and want to recalculate scores, you will need to re-run the Configure Toolbox code chunk to have ohicore
operate on the most up-to-date information. You can click the green triangle at the top right corner of the grey code chunk to run all the lines at the same time. We’ll walk through line-by-line now.
8.5.1 ohicore::Conf()
The Conf()
function from ohicore
(represented in code with the ohicore::Conf()
syntax prepares for the next steps of running the Toolbox, and calls forth everything you need to calculate scores:
- goal models
- other OHI parameters that determines how OHI scores are calculated
## load scenario configuration
<- ohicore::Conf('conf') conf
This function provides no output in the console, but does save a conf
object that you can see in the Environment tab of RStudio.
8.5.2 ohicore::CheckLayers()
The CheckLayers()
function from ohicore
checks that data layers are properly formatted and registered (e.g., that each data layer in layers.csv
exists in the layers folder), and returns a list of all of the layers that are registered in the console. Check to make sure ours is there. This is a gate-keeping step by to make sure the data layers you’ve entered are in the right format and can be read by ohicore
properly.
## check that layers in the layers folder match layers.csv registration.
::CheckLayers('layers.csv', 'layers', flds_id=conf$config$layers_id_fields) ohicore
In the R console, you will see a list of all data layers registered, and there will be additional warning information about specific layers at the end. You should not get an error at this point, but if you do, the list will stop printing where the error occurs, which will help you troubleshoot.
8.5.2.1 Warning messages
Warning messages alert you to problems with specific layers: this is showing that there are unused fields and duplicate rows. These warning messages are not a problem now (they are a byproduct of extracting this repo based on global assessments; you’ll be changing this layer anyways).
Unused fields...: iucn_sid
ico_spp_iucn_status: analysis_year
le_jobs_sector_year: analysis_year
le_wage_sector_year
Rows duplicated...: 952
ico_spp_iucn_status: 144
le_jobs_sector_year: 120
le_wage_sector_yearNA ...
Layers missing data, ie all : element_wts_cp_km2_x_protection_gl2017.csv
element_wts_cp_km2_x_protection: element_wts_cs_km2_x_storage_gl2017.csv
element_wts_cs_km2_x_storage: element_wts_hab_pres_abs_gl2017.csv element_wts_hab_pres_abs
You will encounter error messages as you develop your own assessment. These messages intend to alert you that there are errors in data entry. Some common errors are:
- improper formatting or missing columns in your data layer
- typos or misnamed columns
Warning messages (and error messages) most often have information about what is wrong so that you can fix it. When in doubt, use Google! You’re not the first person to see an error or warning message. If a specific file or object isn’t named for you to inspect, you can copy-paste the messages directly into Google to see what the cause could be.
8.5.3 ohicore::Layers()
The next operation in the Configure Toolbox code chunk is the Layers()
function from ohicore
, which combines all the information from the layers files and layers.csv
into a single R
object called layers
. This object will be used to calculate scores.
## load scenario layers for ohicore to access.
<- ohicore::Layers('layers.csv', 'layers') layers
We may see some warning messages due to the data being extracted from global assessments, but otherwise we do not expect output here.
Note: to inspect a specific layer in the layers
object, you can do this with layers$data$LAYER_NAME
. So to quickly inspect the ao_access
layer we can type layers$data$ao_access
into the console. You can see there has been an additional column added to identify the layername.
rgn_id year value layer
1 1 2013 0.09680860 ao_access
2 2 2013 0.07088916 ao_access
3 3 2013 0.05217424 ao_access
4 4 2013 0.17266718 ao_access
5 5 2013 0.06821116 ao_access
6 6 2013 0.09653261 ao_access
7 7 2013 0.09848508 ao_access
8 8 2013 0.14124868 ao_access
8.5.4 Assign scenario years
The final part of the Configure Toolbox code chunk is to assign the scenario year, which is required for ohicore
to properly calculate pressures and resilience.
After assigning the scenario year, we add this to the layers
object. This seems a bit redundant now, but it is some set-up for if you were going to assess multiple years at the same time (which we are not doing in this tutorial).
## select scenario year for the assessment
<- 2017
scenario_years $data$scenario_year <- scenario_years layers
There should not be any output in the console after running this code, but the objects in the Environment pane of RStudio have been updated.
8.5.5 Recap of Configure Toolbox
We have explored each component of the Configure Toolbox section, which sets up for calculations by creating objects after checking that your models and data layers are formatted and registered properly.
8.6 Explore functions.R
goal models
Now, let’s explore a goal model. If we look in calculate_scores.Rmd
, the code chunk that follows the Configure Toolbox section is to calculate scores using ohicore::CalculateAll()
. This means that ohicore
is running through the goal Status and Trend models, which are each R
functions in the file functions.R
. functions.R
is in the conf
folder of your tailored repo. We can navigate to it: region2017/conf/functions.R
.
In functions.R
, each goal’s Status and Trend model is represented as an R function. You will be able modify the goal model within the confines of each function. You can run all of them at the once or each individually.
Let’s look at the goal model for Artisanal Fishing Opportunity (AO) to continue our example. It has models developed from the most recent global assessment as a place for you to start ‘out-of-the-box’.
Tip: Clicking the bottom left corner of Console will show you a drop-down menu of all functions. It’s a shortcut to jump to the appropriate section or goal model
When you modify an individual goal model, you will only work within that function’s curly braces { }
.
The following things happen in each goal model:
- set scenario year variable and any other constant variables
- load specific data layers with
ohicore::AlignDataYears()
(recommended over the depreciatingohicore::SelectLayersData()
) - calculate Status scores
- calculate Trend scores
- combine Status and Trend scores
- format and return the
scores
object
Throughout functions.R
, you will see syntax from the tidyverse
package that you installed and loaded. It contains the commonly used data-wrangling functions you’ll need in almost every analysis, and enables chaining: %>%
. To learn more, take a look at tidyverse.org. This cheatsheet is also a helpful guide with quick references to each function.
Tip: changes must be saved before it is recorded by Git and reflected in the Git window. When new changes are made, the title of your R script will be shown in red color with an *. It will change back to black once the changes are saved.
Now that we’ve had this overview looking at this goal model, let’s run the code. Remember, we have already loaded the libraries we need, and run the Configure Toolbox code.
8.6.1 Load specific data layers with AlignDataYears()
AlignDataYears()
is an ohicore
function to call the appropriate data layers by its layer name registered in layers.csv
(e.g. ao_access
). Note: previous versions of the Toolbox use the function SelectLayersData()
, which still operates correctly, but only for assessments for single years. As we have updated the Toolbox to streamline repeated assessments, AlignDataYears()
is the preferred function to use.
Run the first few lines of code and the ao_access
and ao_need
layers will be loaded, joined into an ry
, and ready to be manipulated further:
<- 1.0
Sustainability
<- layers$data$scenario_year
scen_year
<- AlignDataYears(layer_nm = "ao_access", layers_obj = layers) %>%
r rename(region_id = rgn_id, access = value) %>%
select(-layer_name) %>%
na.omit()
<-
ry AlignDataYears(layer_nm = "ao_need", layers_obj = layers) %>%
rename(region_id = rgn_id, need = value) %>%
select(-layer_name) %>%
left_join(r, by = c("region_id", "scenario_year"))
It’s always a good idea to check what your data looks like and make sure there are no glaring errors. We can explore what this ry
object using functions like head()
, summary()
, and str()
. We can write this in the console, or we can add it to the functions.R
directly (although I would probably comment it out after I’m done testing).
head(ry)
summary(ry)
str(ry)
At this point you have probably spent a lot of time preparing these data, but errors can still arise. Things that I would look for: are there NA’s? Do I expect them?
8.6.2 Goal models
The goal model that was developed for global assessments and described in Halpern et al. 2012 (see current Supplemental Information here) states that the status for this goal is represented by unmet demand (Du), which includes measures of opportunity for artisanal fishing, and the sustainability of the methods used.
\[ D_{U} = (1 - need) * (1 - access) \] \[ status = (1 - D_{U}) * sustainability \]
And this is how it looks in R
:
## model
<- ry %>%
ry mutate(Du = (1 - need) * (1 - access)) %>%
mutate(status = (1 - Du) * Sustainability)
# head(ry); summary(ry)
8.6.3 Calculate Status
The status operation in this model is largely filtering out just the recent year of all the years you have calculated in the model above.
# status
<- ao_model %>%
ao_status ::filter(year==status_year) %>%
dplyr::select(region_id, status) %>%
dplyr::mutate(status=status*100) dplyr
8.6.4 Calculate Trend
Next is the Trend scores. They are typically based on linear regression of status scores from the most recent five years (inspect the trend_years
object below to confirm!). The trend is calculated with the CalculateTrend()
function from ohicore
.
# trend
<- (scen_year - 4):(scen_year)
trend_years
<- CalculateTrend(status_data = ry,
r.trend trend_years = trend_years)
We can inspect r.trend
: it returns a dataframe with 3 columns: region_id, score, and dimension.
8.6.5 Scores variable: combining Status and Trend
Combining the Status and Trend into the scores variable involves selecting only the region_id
and score
columns, and adding two more columns identifying score dimension (Status or Trend) and goal name.
# return scores
<- rbind(r.status, r.trend) %>%
scores mutate(goal = 'AO')
The scores variable is something that you’ll see at the end of every goal model. Each function ends with returning the scores
variable, so that ohicore
can combine all scores together when CalculateAll()
runs (but we won’t run return(scores)
now. The scores
variable has a specific format, with four columns.
region_id score dimension goal
1 1 94.93492 status AO
2 2 94.93492 status AO
3 3 94.93492 status AO
4 4 94.93492 status AO
5 5 94.93492 status AO
6 6 94.93492 status AO
7 7 94.93492 status AO
8 8 94.93492 status AO
9 1 0.01270 trend AO
10 2 0.01270 trend AO
11 3 0.01270 trend AO
12 4 0.01270 trend AO
13 5 0.01270 trend AO
14 6 0.01270 trend AO
15 7 0.01270 trend AO
16 8 0.01270 trend AO
8.6.6 Recap of exploring functions.R
functions.R
is a collection of goal models to calculate Status and Trend. Each goal is written inside an R
function and can have the following steps:
- set scenario year variable and any other constant variables
- load specific data layers with
ohicore::AlignDataYears()
(recommended over the depreciatingohicore::SelectLayersData()
) - calculate Status scores
- calculate Trend scores
- combine Status and Trend scores
- format and return the
scores
object
8.7 Calculate with tailored models
Now let’s run through the basic workflow a third time, this time modifying a goal model but keeping all data layers the same. We will do this without making any changes to the data layers at the moment. Tailoring a goal model involves editing the operations within that goal’s model in functions.R
.
8.7.1 Restart R, Libraries, Configure Toolbox
First let’s restart R
and rerun the Load Libraries and Configure Toolbox sections in calculate_scores.Rmd
. Now, we are all set to dive into the models in functions.R
.
8.7.2 Tailor AO goal model
Now, let’s go to functions.R
, to the AO model. As an example, we will do something pretty simple to tailor the goal model. Let’s say we just wanted to divide the variable Du
by 2 in the equation.
# model
<- ry %>%
ry mutate(Du = (1 - need) * (1 - access)) %>%
mutate(status = (1 - Du/2) * Sustainability)
We can run the rest of the AO function line-by-line and inspect the scores variable at the end to see if everything looks OK.
8.7.3 Calculate scores, check, plot, and sync
Now let’s run the Calculate Scores chunk and save scores.csv
.
We can use Git’s differencing feature to see how our scores have changed. This is a great way to double-check and error-check that things are working the way you expected.
We can also recreate the flower plots with our updated goal model. Then let’s commit and sync so we can see the differences on GitHub.com.
My commit message here will be “toolbox-training: tailor AO goal model with original data layer”.
8.7.4 Recap of third calculate_scores.Rmd
run
In this third time through the basic workflow, we updated the goal model without changing any of the data layers that it depends upon. Next up, we will add a new data layer for it to work with.
8.7.5 Troubleshooting
If you’ve tailored a goal model function, you need to make sure that its output is still a data frame, and one that is not grouped. Otherwise, when you run calculate_scores.Rmd
, you may get a cryptic error. Examples of some of the ones we’ve seen are:
# Error in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)) :
# Can't join on 'region_id' x 'region_id' because of incompatible types (integer / list)
We are constantly improving ohicore
with more human-readable error messages, but it is still best practice to ensure your goal model output is returning a dataframe.
You can do this in a few ways. The way we commonly do this is by adding ungroup()
or as.data.frame()
as the final step before returning the scores variable at the end of a goal model function.
8.8 Calculate with tailored data and models
The fourth and final example we will do in this chapter is to tailor a goal model by adding a new variable. This will mean that we will prepare, save, and register a new data layer and update the goal model in functions.R
. It will be a combination of what we’ve done previously in this chapter.
Let’s restart R
before proceeding.
Let’s say, as an example, that we want to tailor the AO goal model by adding a new variable for poverty into the equation.
\[ D_{U} = (1 - (need + poverty) / 2) * (1 - access) \] \[ status = (1 - D_{U}) * sustainability \]
8.8.1 Prepare and save our new data layer
We will create the new data layer for poverty by running a script in the prep folder.
Open toolbox-demo/prep/AO/poverty_prep.R
and source the file after reading it through. The result will be a new data layer saved to the “layers” folder, and you should see that there is a new file saved in your Git window.
8.8.2 Register in layers.csv
Now that we have prepared and saved our data layer, we’ll register it in layers.csv
. This time, since we have added an additional data layer that has not been previously registered, we need to add a new row.
Open region2017/layers.csv
in a spreadsheet software (i.e. Microsoft Excel or Open Office). Add a new row for “ao_poverty”, and fill in the following information. We’ve added the row near the other AO data layers.
8.8.3 Register in scenario_data_years.csv
8.8.4 Configure Toolbox
To use this layer as we develop our goal model, we need to rerun the Configure Toolbox section. Before that, let’s restart R, and reload the libraries. It’s good to restart R often so that you don’t introduce errors later (that could be because your work is dependent on something that shouldn’t be there, like in our layer preparation).
8.8.5 Update the AO goal model
We will need to do two things to update the goal model in functions.R
.
First, we’ll have to load our data layer with AlignDataYears()
. You can copy-paste the following into your functions.R
to make sure everything is working properly:
<- 1.0
Sustainability
<- layers$data$scenario_year
scen_year
<- AlignDataYears(layer_nm = "ao_access", layers_obj = layers) %>%
r rename(region_id = rgn_id, access = value) %>%
select(-layer_name) %>%
na.omit()
<- AlignDataYears(layer_nm = "ao_poverty", layers_obj = layers) %>%
rp rename(region_id = rgn_id, poverty = value) %>%
select(-layer_name) %>%
left_join(r, by = c("region_id", "scenario_year"))
<-
ry AlignDataYears(layer_nm = "ao_need", layers_obj = layers) %>%
rename(region_id = rgn_id, need = value) %>%
select(-layer_name) %>%
left_join(rp, by = c("region_id", "scenario_year"))
Let’s run this and inspect the variables.
Note: if you forgot to add ao_poverty
as a new layer to scenario_data_years.csv
in the section above, you wouldn’t get an error, but when your rp
variable wouldn’t read any any data, it would be a dataframe with 0 rows! That is why it’s important to inspect all these variables so you can trace back where the problem is as early as possible.
Alright. Next, we’ll tailor the goal model itself. Here is how the goal model looks as an equation and in R
: you can copy-paste this model into functions.R
, replacing the existing model.
\[ D_{U} = (1 - (need + poverty) / 2) * (1 - access) \]
\[ status = (1 - D_{U}) * sustainability \]
## tailored goal model with poverty
<- ry %>%
ry mutate(Du = (1 - (need + poverty) / 2 ) * (1 - access)) %>%
mutate(status = (1 - Du) * Sustainability)
8.8.6 Calculate scores, check, and sync
Everything is looking good in functions.R
and in the Git tab that we’re looking at as we go along.
Now let’s restart R
and recalculate scores in calculate_scores.Rmd
. We’ll see that scores.csv
will also update, and we can check that only AO dimensions (except pressures and resilience since we haven’t changed them) and Index scores are affected.
Let’s commit and sync. My commit message will be “toolbox-training: tailor AO goal model with a new data layer”.
8.8.7 Recap of fourth calculate_scores.Rmd
run
In this run we combined what we have practiced in the previous two runs, and we successfully:
- created a new data layer, which includes preparing layer in the prep file, saving it in
layers
, and registering it inlayers.csv
andscenario_data_years.csv
- added this new model variable (ie. new layer) in AO model in
functions.R
- reran
calculate_scores.Rmd
and saw the changes reflected in Git
8.9 Chapter Recap
We have completed Chapter 8 and successfully run through the basic workflow to calculate OHI scores with several variations using our toolbox-demo repository.
Each variation involves the same basic workflow of bookkeeping and running calculate_scores.Rmd
, and will enable you to begin tailoring the Toolbox for your assessment.
- ‘out of the box’ data and models extracted from the Global 2016 assessment
- tailored data and ‘out of the box’ models
- tailored data and models
- tailored (new) data and models
Also, a few best practices we have used throughout this training that are good to remember:
- Compulsively restart R
- Always check Git window after each change for expected changes
- Commit, then Pull before Push
- Rerun the Configure Toolbox code chunk after any data layer or model changes
- Save
functions.R
before the changes are reflected in Git window - Close Excel before returning to
R
(this is less important on a Mac but will cause errors on a PC)