Chapter 8 Calculations: basic workflow

The purpose of Chapter 8 is to introduce you to the basic workflow for calculating OHI scores. This is a 2-hour hands-on training: you will be following along on your own computer and working with a copy of the demonstration repository that is used throughout this chapter.

8.1 Overview

Calculating scores with the OHI Toolbox requires a tailored repository operating with the OHI R package ohicore. The tailored repo has information specific to your assessment — most importantly the data and goal models — and ohicore will combine these with core operations to calculate OHI scores. You will always start with a tailored repository that has data and models extracted from the most recent Global OHI assessment.

This training will introduce the basic workflow for calculating scores. There are many ways to build from the ‘out-of-the-box’ tailored repo you have instead of starting an assessment from scratch. For example, you may want to just change underlying data sources within the models, or completely change the models which also requires new data layers and data sources.

We will repeat the basic workflow four times, each time adding complexity. We will calculate scores with:

‘out of the box’ data and models extracted from a recent global assessment
tailored data (with ‘out of the box’ models)
- explore Configure Toolbox section
- explore functions.R
tailored models (with ‘out of the box’ data layers)
tailored data and models (adding a new data layer / model variable)

The workflow depends on the calculate_scores.Rmd file found the scenario folder of any tailored repo. We’ll also dive deeper into the code itself, focusing particularly on developing goal models in functions.R. Note: earlier Toolbox versions have calculate_scores.R which calls configure_toolbox.R. Now, calculate_scores.Rmd has a section named “Configure Toolbox”, which has the equivalent code so it is possible to follow along with this tutorial even if your repository does not have the .Rmd.

This is a lot to cover in a 2-hour training, and the purpose is to give you big take home messages and experience for what you need to begin calculating scores. But the Toolbox has a lot of moving parts, and we cannot cover all of it here. There are a lot of details and other operations that we won’t get into here and that will be coming in future tutorials (including tailoring pressures & resilience, and how to change subgoals).

8.1.1 Prerequisites

Before the training, please make sure you have done the following:

Have up-to-date versions of R and RStudio and have RStudio configured with Git/GitHub
Fork the toolbox-demo repository into your own GitHub account by going to https://github.com/OHI-Science/toolbox-demo, clicking “Fork” in the upper right corner, and selecting your account
Clone the toolbox-demo repo from your GitHub account into RStudio into a folder called “github” in your home directory (filepath “~/github”)
Get comfortable: be set up with two screens if possible. You will be following along in RStudio on your own computer while also watching an instructor’s screen or following this tutorial.

8.2 Review the Toolbox file ecosystem

Let’s quickly review some of the files you have in the toolbox-demo repo that we saw in Chapter 6. Remember that the ecosystem structure of any tailored repo is the same, so as you learn to navigate through and calculate scores in this repo you are also learning how to navigate through and calculate scores in any other OHI assessment repository — yours or anyone else’s.

This figure highlights the files we will focus on in this tutorial (others are grayed out).

In our toolbox-demo repo, here are a few additional things to mention:

our scenario folder is called region2017
goal models are R functions all stored in conf/functions.R
regions are listed (with area) in spatial/regions_list.csv. We have 8 here.

8.3 Calculate with ‘out-of-the-box’ data and models

The first time we go through the basic workflow will be with ‘out-of-the-box’ data and models from the global assessment.

calculate_scores.Rmd is the file that you’ll use a lot — mostly to run piece-by-piece as you develop your models. It takes inputs (data and models) from your repository and uses the OHI R package ohicore to compute OHI scores. It has several components which we will explore in turn in the rest of the tutorial.

calculate_scores.Rmd will load the libraries you need and ohicore will check your book-keeping and configuration, and calculate OHI scores. Ultimately, it will save the scores for each goal and dimension in scores.csv. The ‘dimensions’ of OHI goal scores are Status, Trend, Pressures, Resilience, Likely Future State, and overall goal Score. Dimensions are calculated for each goal in a specific order, as we will see below. calculate_scores.Rmd will combine information from your tailored repository and calculate scores with OHI core functions from ohicore.

Open region2017/calculate_scores.Rmd and let’s have a look at its operations. We will then run it line-by-line.

calculate_scores.Rmd is an RMarkdown file, which combines simply formatted text and R code and is really amazing for communication, including our OHI websites (see a 1-minute video here). For now, we will focus on the .Rmd file within the RStudio pane, and see that written text appears with a white background and R code appears with a grey background as a “code chunk”. You can run R code line-by-line, or as a whole chunk by clicking the green triangle at the top-right corner of the code chunk’s grey box.

Each of the following steps is its own section and code chunk within calculate_scores.Rmd.

8.3.1 Install packages, including `ohicore`

Note: Previous versions of the Toolbox had install_ohicore.r as a separate file, but the effect is the same.

OHI requires packages created by others in the R community as well as one we developed ourselves. This is something that only needs to be done one time. I think of it as wiring a building for electricity: once it’s done, it’s done. Let’s run these line-by-line if you don’t have them installed already.

ohicore is an R package developed by the OHI team that has all the essential core functions and supporting packages you will use to develop your assessment and calculate scores.

## install packages from R community
install.packages("tidyverse")
install.packages("zoo")
install.packages("here")
install.packages("devtools")

## install the ohicore package from OHI team
devtools::install_github('ohi-science/ohicore@dev')

8.3.2 Load R packages

Next, you will load each R package as a library from the toolbox-training repository whenever you work on your assessment to gain access to all those functions and packages. That is like turning on the lights when you need to use them; you need to do this every time you open your assessment repository.

We will also set the working directory, because the ohicore package expects you to be inside your scenario folder (this will be improved further another time). We will use the new here package, which will identify the full filepath on your computer and will make collaborating easier between us.

## load package libraries
library(tidyverse)
library(stringr)
library(zoo)
library(here)
library(ohicore)

## set the working directory to a filepath we all have
setwd(here::here('region2017'))

8.3.3 Configure the Toolbox

Next, we will configure the toolbox from within calculate_scores.Rmd. Let’s run the whole code chunk by clicking the green arrow at the top-right.

There is output printed to the console that lists all of the layers registered, and ends with any warning messages about the layers themselves. We will explore what is happening here and how to interpret these warning messages further on; for now, let’s move on since we have not encountered an error.

conf <- ohicore::Conf('conf')

## check that scenario layers files in the \layers folder match layers.csv registration. Layers files are not modified.
ohicore::CheckLayers('layers.csv', 'layers', flds_id=conf$config$layers_id_fields)

## load scenario layers for ohicore to access. Layers files are not modified.
layers <- ohicore::Layers('layers.csv', 'layers')

## select corresponding data year to use for pressures and resilience
scenario_years <- 2016
layers$data$scenario_year <- scenario_years

#   cc_acid
#   cc_slr
# ...
#   tr_travelwarnings
# Warning messages:
# 1: In ohicore::CheckLayers("layers.csv", "layers", flds_id = conf$config$layers_id_fields) :
#   Unused fields...
#     ico_spp_iucn_status: iucn_sid
# 2: In ohicore::CheckLayers("layers.csv", "layers", flds_id = conf$config$layers_id_fields) :
#   Rows duplicated...
#     ico_spp_iucn_status: 816

8.3.4 Calculate Scores

Now let’s continue with the next code chunk in calculate_scores.Rmd, which first runs CalculateAll(). Notice too that we are saving the output to a variable called scores. Instead of running the whole code chunk here, let’s just run this single line.

Note: the prefix ohicore:: is a way to be explicit that the CalculateAll() is part of the ohicore package.

## calculate scenario scores
scores <-  ohicore::CalculateAll(conf, layers)

8.3.4.1 Output: Status and Trend

CalculateAll() first calculates the Status and Trend for every goal and subgoal. These models are in your tailored repository’s functions.R (we will explore functions.R below). You can choose to add messages to print during calculation like is shown below for Mariculture (MAR).

# Running Setup()...
# Calculating Status and Trend for each region for FIS...
# Calculating Status and Trend for each region for MAR...
# 95th percentile for MAR ref pt is: 0.0758396517531756
# ...

8.3.4.2 Output: Pressures and Resilience

Next, we see output as CalculateAll() calculates Pressures and Resilience based on the pressures and resilience matrix tables in your tailored repository. For each, ohicore lists the subcategories that will be calculated, and identifies any mismatches between data layers identified but not used or missing. We will learn more about the pressures and resilience matrices in a different Chapter.

# Calculating Pressures for each region...

# There are 6 pressures subcategories: pollution, alien_species, habitat_destruction, fishing_pressure, climate_change, social
# These goal-elements are in the weighting data layers, but not included in the pressure_matrix.csv:
# LIV-aqf
# These goal-elements are in the pressure_matrix.csv, but not included in the weighting data layers:
# CP-coral, CP-mangrove, CP-saltmarsh, CS-mangrove, CS-saltmarsh, HAB-coral, HAB-mangrove, HAB-saltmarsh, HAB-seagrass, LIV-ph, LIV-tran, CP-seaice_shoreline, HAB-seaice_edge, ECO-wte, LIV-wte, LIV-sb


# Calculating Resilience for each region...

# There are 7 Resilience subcategories: ecological, alien_species, goal, fishing_pressure, habitat_destruction, pollution, social
# These goal-elements are in the resilience_matrix.csv, but not included in the weighting data layers:
# CP-coral, CP-saltmarsh, CS-saltmarsh, HAB-coral, HAB-saltmarsh, HAB-seagrass, CP-mangrove, CS-mangrove, HAB-mangrove, HAB-seaice_edge, CP-seaice_shoreline

8.3.4.3 Output: Combine Dimensions

Finally, we see output as CalculateAll() combines the dimensions above in several ways. It calculates the Goal Scores and Likely Future State for each goal and subgoal. Then, it calculates ‘supragoals’, which are goals that have subgoals, for example Food Provision (FP), which has the subgoals FIS (Wild-caught Fisheries) and Mariculture (MAR). Finally, it calculates the overall Index score for the entire Assessment Area using an area-weighted average.

# ...
# Calculating Goal Score and Likely Future for each region for FIS...
# Calculating Goal Score and Likely Future for each region for MAR...
# ...
# Calculating post-Index function for each region for FP...
# Calculating post-Index function for each region for LE...
# Calculating Index score for each region for supragoals using goal weights...
# Calculating Likely Future State for each region for supragoals using goal weights...
# Calculating scores for ASSESSMENT AREA (region_id=0) by area weighting...
# Calculating FinalizeScores function...

8.3.4.4 Output: Warning Messages

Following all the calculations are the warning messages, which are due to operations within functions.R, which you will be able to fix as you tailor your goal models. These warning messages are due to using goal models from the global assessment with just a subset of data from the global assessment we have extracted here for the toolbox-demo repository.

# Warning messages:
# 1: In left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) :
#   joining factors with different levels, coercing to character vector
# ...
# 8: In max(d$x, na.rm = T) :
#   no non-missing arguments to max; returning -Inf

8.3.5 Save scores variable as `scores.csv`

Finally, we will save the output from CalculateAll(), a variable called scores, as a comma-separated-value file called scores.csv. We will do this by running the second line of code in this code chunk.

## save scores as scores.csv
readr::write_csv(scores, 'scores.csv', na='')

We can inspect it and see that it is a long-formatted file with four columns for the goal, dimension, numeric region identifier, and score.

goal	dimension	region_id	score
AO	future	0	92.85
AO	future	1	92.85
AO	future	2	92.85
…	…	…	…
AO	pressures	1	37.75
AO	pressures	2	37.75
…	…	…	…

We have 8 regions in the toolbox-demo repo. An additional region 0 is the area-weighted combination of all regions.

Note: each region in your assessment will have a numeric region identifer, called a region_id or rgn_id for short. You can see a list of all regions and corresponding identifiers in toolbox-demo/region2017/spatial/regions_list.csv

8.3.6 Error messages

Hopefully this first time through calculate_scores.Rmd you did not encounter error messages, but you definitely will as you move ahead. Error messages are often due to typos or miscommunications between what you tell R versus what it expects. You will encounter error messages due to R itself, and due to ohicore. Error messages often have human-friendly messages to alert you to what went wrong, and we are continually improving error messages you’ll encounter when you use ohicore so you can try to solve them more easily. Some commonly occurring errors and how to fix them can be found in the Troubleshooting section of the manual. Copy-pasting error messages into Google is also one of the best places to start.

8.3.7 Create figures

Two common plots to represent scores are flower plots and maps. We will walk through an example of the flower plot code here. Note: you’ll see that we are sourcing code from another OHI repository, where this code was developed. After we finish testing it, we will add it as a function to ohicore and then you will not need to source it anymore.

## source script (to be incorporated into ohicore)
source('https://raw.githubusercontent.com/OHI-Science/arc/master/circle2016/plot_flower_local.R')

PlotFlower(assessment_name = "Toolbox Demo",
           dir_fig_save    = "reports/figures")

The default arguments is to create a flowerplot for every region and region 0, although you can modify this. When we run this code now you will see that the figures were in fact recreated (the timestamps for the figures in the File pane of RStudio have updated) but are not different from the previous ones so they do not show up in the Git window.

8.3.8 Recap of first `calculate_scores.Rmd` run

We have just successfully run through the basic workflow to calculate OHI scores. It first loads necessary packages, configures data and models, and then it calculates all the components of OHI scores (status, trend, pressures, resilience, overall scores), and finally it saves the new OHI scores object in a .csv file.

We will build on this basic workflow, by exploring the operations above in more detail, and by updating the data, models, and configurations within the toolbox-demo repository.

8.4 Calculate with tailored data

Now let’s run through this basic workflow a second time, building on what we’ve learned.

Here, we will focus on one of the layers for the Artisanal Fishing Opportunity (AO) goal. We will prepare local data that will substitute global data for the data layer ao_access and recalculate scores without modifying the goal model itself.

It’s a good idea to go to RStudio’s Session menu and select Restart R to make sure you have a clean working directory.

8.4.1 Prepare and save our data layer

While Chapter 7 shows in detail how to prepare data layers, save them in the “layers” folder, and register them in layers.csv and scenario_data_years.csv so the Toolbox knows where to find them, we have prepared a shorter example with AO for our purposes here.

Open toolbox-demo/prep/AO/access_prep.R and source it after reading it through. The result will be a new data file called “ao_access_demo2017.csv” saved to the “layers” folder, and you should see that there is a new file saved in your Git window.

8.4.2 Register in `layers.csv`

Now that we have prepared and saved our data layer, we’ll register it in layers.csv. layers.csvis a registry that will direct ohicore to appropriate data layers, and has information about each data layer — which goal it is used for, filename, column names, etc. For further detail see Chapter 7.

There is a data layer for ao_access that is already registered in layers.csv, but it is currently created from a file called “ao_access_gl2017.csv”. We will update this so the data layer is created from our new demo file (“ao_access_demo2017.csv”); this happens in the “filename” column of layers.csv.

Open region2017/layers.csv in a spreadsheet software (i.e. Microsoft Excel or Open Office). Next, find ao_access in the “layer” column. Where it says “ao_access_gl2017.csv”, update this to say “ao_access_demo2017.csv” — the new data layer you just saved. Save this and close Excel.

IMPORTANT! Be sure to close Excel after you have made these edits. On a PC, having layers.csv open in Excel will prohibit it from being accessed from R, and the Toolbox needs access to calculate scores!

8.4.3 Register in `scenario_data_years.csv`

Next let’s go to region2017/conf/scenario_data_years.csv. We can open this in RStudio: when you click on its name in the Files pane, select “View File”.

scenario_data_years.csv is a registry to organize year information for each layer, and helps set you up from the very beginning to be able to calculate repeated assessments. When you calculate OHI scores, you will be explicit about the year your completed assessment represents, and we call this the scenario_year. data_year is the most recent years available for that data layer.

Let’s look at the ao_access layer. It turns out that the same data_year, 2013, is used for all scenario_years 2008:2017. This means that this data source has not been updated through time so the trends that are calculated will be flat. We can double-check our “ao_access_demo2017.csv” file to see that 2013 is the most recent data that we have. This means that our data layer is already registered here in scenario_data_years.csv and we do not need to make any changes.

Depending on your local data, registering in scenario_data_years.csv may be more like confirming the information that is already registered. Should you delete some of those previous years? Well, the Trend calculations require at least 5 years of data or the Toolbox will give errors. You can delete some of the earliest years to remove some clutter (left over from the global assessment), definitely rerun calculate_scores.Rmd afterwards to make sure that there are no unexpected changes to scores.csv

(As a side note, this would be a good data layer to substitute if you had better local information through time.)

8.4.4 Rerun `calculate_scores.Rmd`

Now, let’s rerun calculate_scores.Rmd. ohicore will now use your tailored data when it creates the “ao_access” layer because you’ve registered it in layers.csv and scenario_data_years.csv and the file is available in the layers folder.

8.4.5 Check our work, plot, and sync

Whenever there are changes made to your files (additions, deletions, and modifications), you will be notified in the Git window, since Git is tracking the files in this repo. This is a good place to confirm you have did the things you set out to do, and you can also see if you errantly did anything you didn’t mean to.

So here, you added a new data layer and after calculating scores you expect to see changes to AO scores in score.csv. layers.csv will also change because ohicore will update fields in this file as it runs through its checks. But we don’t expect any other files to change at this point, so let’s make sure that’s true.

Now let’s recreate the flower plots with our updated goal model. You’ll see those .png’s show up in the Git tab as well. Although we can’t inspect the differences between the figures through RStudio here, we will be able to see them on GitHub.com.

Now is a good time to commit this work and sync to GitHub. That way, the work we’ve done is committed together and we will have a clean slate (from a Git sense) moving forward. I’ll use the commit message “toolbox-training: tailor ao_access layer and rerun calculate_scores.Rmd”

Now, we can inspect on GitHub.com:

8.4.6 Recap of second `calculate_scores.Rmd` run

One way to tailor your assessment is to substitute data for an existing data layer. We have just run through the basic workflow a second time. This time we successfully:

substituted the global OHI data layer ao_access with new data, which includes saving it in layers folder, registering it in layers.csv and scenario_data_years.csv, and
reran calculate_scores.Rmd without modifying the goal model itself
checked scores.csv changes in the Git tab to make sure all changes were expected

8.5 Explore Configure Toolbox

So now let’s take a closer look at the Configure Toolbox section, the first code chunk following library install and loading in calculate_scores.Rmd.

This code chunk combines everything required to calculate OHI scores and checks that they are properly formatted and available, and will minimize potential errors later on. It makes sure that your data and goal models are ready to be used to calculate scores.

Important: Any time you make a change to a data layer or a goal model and want to recalculate scores, you will need to re-run the Configure Toolbox code chunk to have ohicore operate on the most up-to-date information. You can click the green triangle at the top right corner of the grey code chunk to run all the lines at the same time. We’ll walk through line-by-line now.

8.5.1 `ohicore::Conf()`

The Conf() function from ohicore (represented in code with the ohicore::Conf() syntax prepares for the next steps of running the Toolbox, and calls forth everything you need to calculate scores:

goal models
other OHI parameters that determines how OHI scores are calculated

## load scenario configuration
conf <- ohicore::Conf('conf')

This function provides no output in the console, but does save a conf object that you can see in the Environment tab of RStudio.

8.5.2 `ohicore::CheckLayers()`

The CheckLayers() function from ohicore checks that data layers are properly formatted and registered (e.g., that each data layer in layers.csv exists in the layers folder), and returns a list of all of the layers that are registered in the console. Check to make sure ours is there. This is a gate-keeping step by to make sure the data layers you’ve entered are in the right format and can be read by ohicore properly.

## check that layers in the layers folder match layers.csv registration. 
ohicore::CheckLayers('layers.csv', 'layers', flds_id=conf$config$layers_id_fields)

In the R console, you will see a list of all data layers registered, and there will be additional warning information about specific layers at the end. You should not get an error at this point, but if you do, the list will stop printing where the error occurs, which will help you troubleshoot.

8.5.2.1 Warning messages

Warning messages alert you to problems with specific layers: this is showing that there are unused fields and duplicate rows. These warning messages are not a problem now (they are a byproduct of extracting this repo based on global assessments; you’ll be changing this layer anyways).

Unused fields...
    ico_spp_iucn_status: iucn_sid
    le_jobs_sector_year: analysis_year
    le_wage_sector_year: analysis_year
Rows duplicated...
    ico_spp_iucn_status: 952
    le_jobs_sector_year: 144
    le_wage_sector_year: 120
Layers missing data, ie all NA ...
    element_wts_cp_km2_x_protection: element_wts_cp_km2_x_protection_gl2017.csv
    element_wts_cs_km2_x_storage: element_wts_cs_km2_x_storage_gl2017.csv
    element_wts_hab_pres_abs: element_wts_hab_pres_abs_gl2017.csv

You will encounter error messages as you develop your own assessment. These messages intend to alert you that there are errors in data entry. Some common errors are:

improper formatting or missing columns in your data layer
typos or misnamed columns

Warning messages (and error messages) most often have information about what is wrong so that you can fix it. When in doubt, use Google! You’re not the first person to see an error or warning message. If a specific file or object isn’t named for you to inspect, you can copy-paste the messages directly into Google to see what the cause could be.

8.5.3 `ohicore::Layers()`

The next operation in the Configure Toolbox code chunk is the Layers() function from ohicore, which combines all the information from the layers files and layers.csv into a single R object called layers. This object will be used to calculate scores.

## load scenario layers for ohicore to access.
layers <- ohicore::Layers('layers.csv', 'layers')

We may see some warning messages due to the data being extracted from global assessments, but otherwise we do not expect output here.

Note: to inspect a specific layer in the layers object, you can do this with layers$data$LAYER_NAME. So to quickly inspect the ao_access layer we can type layers$data$ao_access into the console. You can see there has been an additional column added to identify the layername.

  rgn_id year      value     layer
1      1 2013 0.09680860 ao_access
2      2 2013 0.07088916 ao_access
3      3 2013 0.05217424 ao_access
4      4 2013 0.17266718 ao_access
5      5 2013 0.06821116 ao_access
6      6 2013 0.09653261 ao_access
7      7 2013 0.09848508 ao_access
8      8 2013 0.14124868 ao_access

8.5.4 Assign scenario years

The final part of the Configure Toolbox code chunk is to assign the scenario year, which is required for ohicore to properly calculate pressures and resilience.

After assigning the scenario year, we add this to the layers object. This seems a bit redundant now, but it is some set-up for if you were going to assess multiple years at the same time (which we are not doing in this tutorial).

## select scenario year for the assessment
scenario_years <- 2017
layers$data$scenario_year <- scenario_years

There should not be any output in the console after running this code, but the objects in the Environment pane of RStudio have been updated.

8.5.5 Recap of Configure Toolbox

We have explored each component of the Configure Toolbox section, which sets up for calculations by creating objects after checking that your models and data layers are formatted and registered properly.

8.6 Explore `functions.R` goal models

Now, let’s explore a goal model. If we look in calculate_scores.Rmd, the code chunk that follows the Configure Toolbox section is to calculate scores using ohicore::CalculateAll(). This means that ohicore is running through the goal Status and Trend models, which are each R functions in the file functions.R. functions.R is in the conf folder of your tailored repo. We can navigate to it: region2017/conf/functions.R.

In functions.R, each goal’s Status and Trend model is represented as an R function. You will be able modify the goal model within the confines of each function. You can run all of them at the once or each individually.

Let’s look at the goal model for Artisanal Fishing Opportunity (AO) to continue our example. It has models developed from the most recent global assessment as a place for you to start ‘out-of-the-box’.

Tip: Clicking the bottom left corner of Console will show you a drop-down menu of all functions. It’s a shortcut to jump to the appropriate section or goal model

When you modify an individual goal model, you will only work within that function’s curly braces { }.

The following things happen in each goal model:

set scenario year variable and any other constant variables
load specific data layers with ohicore::AlignDataYears() (recommended over the depreciating ohicore::SelectLayersData())
calculate Status scores
calculate Trend scores
combine Status and Trend scores
format and return the scores object

Throughout functions.R, you will see syntax from the tidyverse package that you installed and loaded. It contains the commonly used data-wrangling functions you’ll need in almost every analysis, and enables chaining: %>%. To learn more, take a look at tidyverse.org. This cheatsheet is also a helpful guide with quick references to each function.

Tip: changes must be saved before it is recorded by Git and reflected in the Git window. When new changes are made, the title of your R script will be shown in red color with an *. It will change back to black once the changes are saved.

Now that we’ve had this overview looking at this goal model, let’s run the code. Remember, we have already loaded the libraries we need, and run the Configure Toolbox code.

8.6.1 Load specific data layers with `AlignDataYears()`

AlignDataYears() is an ohicore function to call the appropriate data layers by its layer name registered in layers.csv (e.g. ao_access). Note: previous versions of the Toolbox use the function SelectLayersData(), which still operates correctly, but only for assessments for single years. As we have updated the Toolbox to streamline repeated assessments, AlignDataYears() is the preferred function to use.

Run the first few lines of code and the ao_access and ao_need layers will be loaded, joined into an ry, and ready to be manipulated further:

  Sustainability <- 1.0

  scen_year <- layers$data$scenario_year

  r <- AlignDataYears(layer_nm = "ao_access", layers_obj = layers) %>%
    rename(region_id = rgn_id, access = value) %>%
    select(-layer_name) %>%
    na.omit()

  ry <-
    AlignDataYears(layer_nm = "ao_need", layers_obj = layers) %>%
    rename(region_id = rgn_id, need = value) %>%
    select(-layer_name) %>%
    left_join(r, by = c("region_id", "scenario_year"))

It’s always a good idea to check what your data looks like and make sure there are no glaring errors. We can explore what this ry object using functions like head(), summary(), and str(). We can write this in the console, or we can add it to the functions.R directly (although I would probably comment it out after I’m done testing).

head(ry)
summary(ry)
str(ry)

At this point you have probably spent a lot of time preparing these data, but errors can still arise. Things that I would look for: are there NA’s? Do I expect them?

8.6.2 Goal models

The goal model that was developed for global assessments and described in Halpern et al. 2012 (see current Supplemental Information here) states that the status for this goal is represented by unmet demand (Du), which includes measures of opportunity for artisanal fishing, and the sustainability of the methods used.

\[ D_{U} = (1 - need) * (1 - access) \] \[ status = (1 - D_{U}) * sustainability \]

And this is how it looks in R:

 ## model
  ry <- ry %>%
    mutate(Du = (1 - need) * (1 - access)) %>%
    mutate(status = (1 - Du) * Sustainability)
  # head(ry); summary(ry)

8.6.3 Calculate Status

The status operation in this model is largely filtering out just the recent year of all the years you have calculated in the model above.

# status
  ao_status <- ao_model %>%
    dplyr::filter(year==status_year) %>%
    dplyr::select(region_id, status) %>%
    dplyr::mutate(status=status*100)

8.6.4 Calculate Trend

Next is the Trend scores. They are typically based on linear regression of status scores from the most recent five years (inspect the trend_years object below to confirm!). The trend is calculated with the CalculateTrend() function from ohicore.

  # trend
  trend_years <- (scen_year - 4):(scen_year)

  r.trend <- CalculateTrend(status_data = ry, 
                            trend_years = trend_years)

We can inspect r.trend: it returns a dataframe with 3 columns: region_id, score, and dimension.

8.6.5 Scores variable: combining Status and Trend

Combining the Status and Trend into the scores variable involves selecting only the region_id and score columns, and adding two more columns identifying score dimension (Status or Trend) and goal name.

# return scores
  scores <- rbind(r.status, r.trend) %>%
    mutate(goal = 'AO')

The scores variable is something that you’ll see at the end of every goal model. Each function ends with returning the scores variable, so that ohicore can combine all scores together when CalculateAll() runs (but we won’t run return(scores) now. The scores variable has a specific format, with four columns.

   region_id    score dimension goal
1          1 94.93492    status   AO
2          2 94.93492    status   AO
3          3 94.93492    status   AO
4          4 94.93492    status   AO
5          5 94.93492    status   AO
6          6 94.93492    status   AO
7          7 94.93492    status   AO
8          8 94.93492    status   AO
9          1  0.01270     trend   AO
10         2  0.01270     trend   AO
11         3  0.01270     trend   AO
12         4  0.01270     trend   AO
13         5  0.01270     trend   AO
14         6  0.01270     trend   AO
15         7  0.01270     trend   AO
16         8  0.01270     trend   AO

8.6.6 Recap of exploring `functions.R`

functions.R is a collection of goal models to calculate Status and Trend. Each goal is written inside an R function and can have the following steps:

set scenario year variable and any other constant variables
load specific data layers with ohicore::AlignDataYears() (recommended over the depreciating ohicore::SelectLayersData())
calculate Status scores
calculate Trend scores
combine Status and Trend scores
format and return the scores object

8.7 Calculate with tailored models

Now let’s run through the basic workflow a third time, this time modifying a goal model but keeping all data layers the same. We will do this without making any changes to the data layers at the moment. Tailoring a goal model involves editing the operations within that goal’s model in functions.R.

8.7.1 Restart R, Libraries, Configure Toolbox

First let’s restart R and rerun the Load Libraries and Configure Toolbox sections in calculate_scores.Rmd. Now, we are all set to dive into the models in functions.R.

8.7.2 Tailor AO goal model

Now, let’s go to functions.R, to the AO model. As an example, we will do something pretty simple to tailor the goal model. Let’s say we just wanted to divide the variable Du by 2 in the equation.

  # model
  ry <- ry %>%
    mutate(Du = (1 - need) * (1 - access)) %>%
    mutate(status = (1 - Du/2) * Sustainability)

We can run the rest of the AO function line-by-line and inspect the scores variable at the end to see if everything looks OK.

8.7.3 Calculate scores, check, plot, and sync

Now let’s run the Calculate Scores chunk and save scores.csv.

We can use Git’s differencing feature to see how our scores have changed. This is a great way to double-check and error-check that things are working the way you expected.

We can also recreate the flower plots with our updated goal model. Then let’s commit and sync so we can see the differences on GitHub.com.

My commit message here will be “toolbox-training: tailor AO goal model with original data layer”.

8.7.4 Recap of third `calculate_scores.Rmd` run

In this third time through the basic workflow, we updated the goal model without changing any of the data layers that it depends upon. Next up, we will add a new data layer for it to work with.

8.7.5 Troubleshooting

If you’ve tailored a goal model function, you need to make sure that its output is still a data frame, and one that is not grouped. Otherwise, when you run calculate_scores.Rmd, you may get a cryptic error. Examples of some of the ones we’ve seen are:

# Error in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, check_na_matches(na_matches)) : 
#  Can't join on 'region_id' x 'region_id' because of incompatible types (integer / list)

We are constantly improving ohicore with more human-readable error messages, but it is still best practice to ensure your goal model output is returning a dataframe.

You can do this in a few ways. The way we commonly do this is by adding ungroup() or as.data.frame() as the final step before returning the scores variable at the end of a goal model function.

8.8 Calculate with tailored data and models

The fourth and final example we will do in this chapter is to tailor a goal model by adding a new variable. This will mean that we will prepare, save, and register a new data layer and update the goal model in functions.R. It will be a combination of what we’ve done previously in this chapter.

Let’s restart R before proceeding.

Let’s say, as an example, that we want to tailor the AO goal model by adding a new variable for poverty into the equation.

\[ D_{U} = (1 - (need + poverty) / 2) * (1 - access) \] \[ status = (1 - D_{U}) * sustainability \]

8.8.1 Prepare and save our new data layer

We will create the new data layer for poverty by running a script in the prep folder.

Open toolbox-demo/prep/AO/poverty_prep.R and source the file after reading it through. The result will be a new data layer saved to the “layers” folder, and you should see that there is a new file saved in your Git window.

8.8.2 Register in `layers.csv`

Now that we have prepared and saved our data layer, we’ll register it in layers.csv. This time, since we have added an additional data layer that has not been previously registered, we need to add a new row.

Open region2017/layers.csv in a spreadsheet software (i.e. Microsoft Excel or Open Office). Add a new row for “ao_poverty”, and fill in the following information. We’ve added the row near the other AO data layers.

8.8.3 Register in `scenario_data_years.csv`

8.8.4 Configure Toolbox

To use this layer as we develop our goal model, we need to rerun the Configure Toolbox section. Before that, let’s restart R, and reload the libraries. It’s good to restart R often so that you don’t introduce errors later (that could be because your work is dependent on something that shouldn’t be there, like in our layer preparation).

8.8.5 Update the AO goal model

We will need to do two things to update the goal model in functions.R.

First, we’ll have to load our data layer with AlignDataYears(). You can copy-paste the following into your functions.R to make sure everything is working properly:

  Sustainability <- 1.0

  scen_year <- layers$data$scenario_year

  r <- AlignDataYears(layer_nm = "ao_access", layers_obj = layers) %>%
    rename(region_id = rgn_id, access = value) %>%
    select(-layer_name) %>%
    na.omit()
  
  rp <- AlignDataYears(layer_nm = "ao_poverty", layers_obj = layers) %>%
    rename(region_id = rgn_id, poverty = value) %>%
    select(-layer_name) %>%
    left_join(r, by = c("region_id", "scenario_year")) 

  ry <-
    AlignDataYears(layer_nm = "ao_need", layers_obj = layers) %>%
    rename(region_id = rgn_id, need = value) %>%
    select(-layer_name) %>%
    left_join(rp, by = c("region_id", "scenario_year"))

Let’s run this and inspect the variables.

Note: if you forgot to add ao_poverty as a new layer to scenario_data_years.csv in the section above, you wouldn’t get an error, but when your rp variable wouldn’t read any any data, it would be a dataframe with 0 rows! That is why it’s important to inspect all these variables so you can trace back where the problem is as early as possible.

Alright. Next, we’ll tailor the goal model itself. Here is how the goal model looks as an equation and in R: you can copy-paste this model into functions.R, replacing the existing model.

\[ D_{U} = (1 - (need + poverty) / 2) * (1 - access) \]

\[ status = (1 - D_{U}) * sustainability \]

## tailored goal model with poverty
ry <- ry %>%
    mutate(Du = (1 - (need + poverty) / 2 ) * (1 - access)) %>%
    mutate(status = (1 - Du) * Sustainability)

8.8.6 Calculate scores, check, and sync

Everything is looking good in functions.R and in the Git tab that we’re looking at as we go along.

Now let’s restart R and recalculate scores in calculate_scores.Rmd. We’ll see that scores.csv will also update, and we can check that only AO dimensions (except pressures and resilience since we haven’t changed them) and Index scores are affected.

Let’s commit and sync. My commit message will be “toolbox-training: tailor AO goal model with a new data layer”.

8.8.7 Recap of fourth `calculate_scores.Rmd` run

In this run we combined what we have practiced in the previous two runs, and we successfully:

created a new data layer, which includes preparing layer in the prep file, saving it in layers, and registering it in layers.csv and scenario_data_years.csv
added this new model variable (ie. new layer) in AO model in functions.R
reran calculate_scores.Rmd and saw the changes reflected in Git

8.9 Chapter Recap

We have completed Chapter 8 and successfully run through the basic workflow to calculate OHI scores with several variations using our toolbox-demo repository.

Each variation involves the same basic workflow of bookkeeping and running calculate_scores.Rmd, and will enable you to begin tailoring the Toolbox for your assessment.

‘out of the box’ data and models extracted from the Global 2016 assessment
tailored data and ‘out of the box’ models
tailored data and models
tailored (new) data and models

Also, a few best practices we have used throughout this training that are good to remember:

Compulsively restart R
Always check Git window after each change for expected changes
Commit, then Pull before Push
Rerun the Configure Toolbox code chunk after any data layer or model changes
Save functions.R before the changes are reflected in Git window
Close Excel before returning to R (this is less important on a Mac but will cause errors on a PC)