ohi logo
OHI Science | Citation policy

1 Summary

This data prep markdown calls a series of scripts to download and analyze sea ice data for the habitat subgoal and coastal protection goal. See data layers documentation for more information.

2 Updates from previous assessment

Version 1 of the data set which had been used previously was retired. The version 2 data available was in netcdf format, while the data was previously stored in .bin files. The workflow was updated to accommodate the new file type and file naming system.


3 Data Source

Reference:

DiGirolamo, N., C. L. Parkinson, D. J. Cavalieri, P. Gloersen, and H. J. Zwally. (2022). Sea Ice Concentrations from Nimbus-7 SMMR and DMSP SSM/I-SSMIS Passive Microwave Data, Version 2 [Data Set]. Boulder, Colorado USA. NASA National Snow and Ice Data Center Distributed Active Archive Center. https://doi.org/10.5067/MPYG15WAA4WX Date Accessed 06-05-2023.

Downloaded: July 12, 2024

Description: Monthly sea ice extent data.

Data can be downloaded here: https://nsidc.org/data/nsidc-0051. The user guide for the data is located here.

3.1 Downloading the raw data

There are several options for downloading the raw data files. If the user is familiar with python see option 1. If not, see option 2. Option 3 is not fully developed but it may be worth investing time in changing to this workflow.

Option 1 - Python

  1. To download the raw data, go to this website: https://nsidc.org/data/nsidc-0051 and make an account.

  2. After making an account, scroll down from the link above and navigate to the data access tool. Type _??????_v into the search bar. This is because all monthly files follow the format “NSIDC0051_SEAICE_PS_H25km_YYYYMMDD_v2.0.nc”, with H replaced with hemisphere (N or S).

  3. Click the download script option just below the search results. It is important that you first input the search option as this will be placed into the python script automatically to filter the results to what we want.

  4. Place this file (eg. nsidc-download_NSIDC-0051.001_2022-06-06.py) in the folder /home/shares/ohi/git-annex/globalprep/_raw_data/NSIDC_SeaIce/<version_year>

  5. Create “xml”, “north” , “south” and “tmp” folders in the same directory above

  • Open a terminal on Mazu in your R session and run:

    • cd /home/shares/ohi/git-annex/globalprep/_raw_data/NSIDC_SeaIce/<version_year>

    • mkdir north

    • mkdir south

    • mkdir xml

    • mkdir tmp

  1. Run the python script
  • In your terminal on Mazu run:

    • python nsidc-download_NSIDC-0051.001_<download_date>.py

    • Notes:

      • You will be prompted to enter your username and password

      • This may take quite a while to run

      • If you experience server disconnection issues, download the data using option 2

  1. next we move the files to the folders we created
  • You should ensure you are still in /home/shares/ohi/git-annex/globalprep/_raw_data/NSIDC_SeaIce/<version_year> and then run:

    • mv ./*.xml ./xml

    • mv ./**PS_N*.nc ./north

    • mv ./**PS_S*.nc ./south

  1. Your files should be in the correct place and you may proceed to the methods section of this workflow

Option 2 - Order Files

To download the raw data, you must go to this website: https://nsidc.org/data/nsidc-0051, and navigate to the data access tool. Type _??????_v into the search bar.

This is because all monthly files follow the format “NSIDC0051_SEAICE_PS_H25km_YYYYMMDD_v2.0.nc”, with H replaced with hemisphere (N or S). Click “Order Files” and wait until your order is processed. Once processed (should only take a couple of minutes, you will receive an email with instructions), download the zip file and place it into the git-annex/globalprep/_raw_data/NSIDC_SeaIce/<vyear> folder on Mazu and extract it.

The zip file contains numbered folders, each with two files in them, .xml and .nc files.

We need to create “xml”, north” and “south” folders and place the .nc and .xml files within them.

To do this, we will be working in the terminal (alternatively, you could do it manually, but it takes much longer, see v2020 script).

Open the terminal within R Studion on Mazu. Now we need to cd into the correct folder on mazu, which where we have placed our seaice raw data: - Enter cd /home/shares/ohi/git-annex/globalprep/_raw_data/NSIDC_SeaIce/<vyear>/<folder_number> to do this.

Now move all of the xml files to the xml folder that your have created:

  • mv **/*.xml ../xml

And do the same with the North files and South files:

  • mv **/*PS_N*.nc ./north

  • mv **/*PS_S*.nc ./south

NOTE: remember to change the assessment year to whatever assessment you are conducting.

The final folder structure looks like this, where you have created new “north”, “south”, and “xml” folders:

Option 3 - Download through R (new and incomplete)

This option is not complete and will need some work but amy be the best option moving forward. See the NOAA tutorial for downloading directly with R through the PolarWatch ERDDAP server.

This will download the data as NETCDF (.nc) files. This is the same file format that is currently used in version 2 of the data set.

See issue 206 for example code to get started.

For complete documentation and more information about data access, please see:

http://nsidc.org/data/nsidc-0051.html

If you wish to be notified of updates or corrections to these data, please register with NSIDC User Services by sending e-mail to: nsidc@nsidc.org

Identify yourself as a user of “Sea Ice Concentrations from Nimbus-7 SMMR and DMSP SSM/I-SSMIS Passive Microwave Data (NSIDC-0051).” Include your name, e-mail address, postal address, and telephone number.

If you have questions, please contact NSIDC User Services.

CONTACT INFORMATION: User Services National Snow and Ice Data Center CIRES, 449 UCB University of Colorado Boulder, CO USA 80309-0449 Phone: +1 303-492-6199 Fax: +1 303-492-2468 E-mail:

Time range: 1979-2023


4 Methods

4.1 Setup

Load all relevant libraries, establish/define parameters and commonly used pathnames. Manually change scenario and data years in file pathnames code chunk to reflect the most recent data (d) and current assessment year (v) in setup code chunk.

## Install R packages where necessary
pkg <- c("raster", "fasterize", "sf", "sp", "rgdal", "fields", "here", "tictoc", "dplyr", "ggplot2", "foreach", "parallel", "doParallel") # "fields" for colors in Status_Trend.R
new.pkg <- pkg[!(pkg %in% installed.packages())]
if (length(new.pkg)){install.packages(new.pkg)}
if (!("ohicore" %in% installed.packages())){devtools::install_github("ohi-science/ohicore")}

## Load libraries, set directories
lapply(c(pkg, "ohicore"), require, character.only = TRUE)

## UPDATE THESE!
current_year <- 2024
last_year <- 2023 # final year of data (all months)

assess_year <- paste0("v", current_year) # assessment year for file paths
previous_year <- paste0("v", last_year)  # previous assessment year
source(here::here("workflow/R/common.R"))
## This file makes it easier to process data for the OHI global assessment
##  by creating the following objects:
## 
##  * dir_M = identifies correct file path to Mazu (internal server) based on your operating system
##  * mollCRS = the crs code for the mollweide coordinate reference system we use in the global assessment
##  * regions_shape() = function to load global shapefile for land/eez/high seas/antarctica regions
##  * ohi_rasters() = function to load two rasters: global eez regions and ocean region
##  * region_data() = function to load 2 dataframes describing global regions 
##  * rgn_syns() = function to load dataframe of region synonyms (used to convert country names to OHI regions)
##  * low_pop() = function to load dataframe of regions with low and no human population
##  * UNgeorgn = function to load dataframe of UN geopolitical designations used to gapfill missing data

4.2 Location of Maps

These maps of the OHI regions were made by the PreparingSpatialFiles.R script. If there are changes to the OHI regions, the PreparingSpatialFiles.R script will need to be run. Additionally, it is critical to walk through the ObtainingData.R script if any of the spatial files have been modified (this saves the files as spatial points).

The original polygon files are used from: git-annex/globalprep/_raw_data/NSIDC_SeaIce/v2015. However, we use the fasterize package to rasterize the polygons and save them to: git-annex/globalprep/_raw_data/NSIDC_SeaIce/v2023.

maps <- file.path(dir_M, "git-annex/globalprep/_raw_data/NSIDC_SeaIce", assess_year)

4.3 Establish Parameters

Establish: CRS, website to collect data, data selection parameters. Filename format for final monthly data is nt_YYYYMM_SSS_vVV_R.bin. Parameters will be used to take data from MAZU in ObtainingData script.

pixel = 25000 # pixel dimension in meters for both x and y
prj.n = "+proj=stere +lat_0=90 +lat_ts=70 +lon_0=-45 +k=1 +x_0=0 +y_0=0 +a=6378273 +b=6356889.449 +units=m +no_defs"
prj.s = "+proj=stere +lat_0=-90 +lat_ts=-70 +lon_0=0 +k=1 +x_0=0 +y_0=0 +a=6378273 +b=6356889.449 +units=m +no_defs"
prj.mol = "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"

## Filepath (fp), filepath for the final monthly data in north and south folders
fp.n <- file.path(dir_M, "git-annex/globalprep/_raw_data/NSIDC_SeaIce", assess_year, "north")
fp.s <- file.path(dir_M, "git-annex/globalprep/_raw_data/NSIDC_SeaIce", assess_year, "south")

## Potential for the need to modify this if there are not data available for all 12 months of the most recent year. For example, the ObtainingData.R and ObtainingData_Parallel.R scripts rely on a 12 month interval when looping, so this will introduce problems if the "last_year" of data only has data through September (9).
poles = c("n","s")
years = c(1979:last_year) # full range of data
months = 1:12
n.pym = length(poles)*length(years)*length(months)
i.pym = 0

4.4 Function 1

Collects the data for each month/year from mazu and add to raster stack that is saved in tmp folder as: n_rasters_points.rdata or s_rasters_points.rdata. And, if it doesn’t already exist, it converts the region shapefile into a raster points file. See ObtainingData.R script for more details.

## Soft limit to running 5 cores per dataprep script. For this process running additional cores did not decrease the runtime. 
tic()
## Previously (<2022) update to some file paths with the correct assessment year in ObtainingData.R was needed
## Currently (>2023) this is automatic from this script with `assess_year` variable

## Attention: Because this is now done in parallel using the  "ObtainingData_Parallel.R" script rather than in sequence using the "ObtainingData.R" script (saving ~2hrs of processing time), this function will not display the progress as it does when its sequentially run. That is okay as long as you check mazu and ensure that the tmp folder is updated about 2 minutes into run-time, and the v202X folder should have n_rasters_points.rdata and s_rasters_points.rdata populated in about 1.25-1.5 hours.

source(here("globalprep/hab_seaice/v2024/ObtainingData_Parallel.R")) 

toc()
## should take ~1.5 hrs to completely run with ObtainingData_Parallel.R 

4.5 Function 2

Using the data from the .rdata files created from Function 1 with the ObtainingData_Parallel.R script, this function calculates status and trend for shoreline ice and ice edge habitat. Data is saved in intermediate (int) folder:

  • Habitat: n_IceEdgeHabitat.csv, s_IceEdgeHabitat.csv
  • Coastal Protection: n_IceShoreProtection.csv, s_IceShoreProtection.csv
tic()
ref.years <- 1979:2000
source(here("globalprep/hab_seaice/v2024/Status_Trend.R")) # calculates status and trend for shoreline ice and ice edge habitat
toc()

4.6 Final calculations and organization

Read in ice edge habitat and ice shore protection csv-format datasets, remove anomalous eez regions with minimal ice cover, remove disputed regions. Bind these datasets and convert to units of km^2. Save seaice health, extent, trend, and extent data.

n_edge <- read.csv("globalprep/hab_seaice/v2024/int/n_IceEdgeHabitat_ref1979to2000.csv")
s_edge <- read.csv("globalprep/hab_seaice/v2024/int/s_IceEdgeHabitat_ref1979to2000.csv")
edge <- rbind(n_edge, s_edge)
edge  <- edge %>%
  dplyr::filter(Reference_avg1979to2000monthlypixels != 0) %>%
  dplyr::filter(!(rgn_id %in% c(59, 141, 219, 4, 172, 94))) %>%  # anomalous eez regions with very little ice cover
  dplyr::filter(!(rgn_id %in% c("248300","258510","258520","258600","258700"))) %>% # cut due to minimal ice (<200km2/yr - avg of months)
  dplyr::filter(rgn_nam != "DISPUTED") %>%
  dplyr::mutate(habitat="seaice_edge")

n_shore <- read.csv("globalprep/hab_seaice/v2024/int/n_IceShoreProtection_ref1979to2000.csv")
s_shore <- read.csv("globalprep/hab_seaice/v2024/int/s_IceShoreProtection_ref1979to2000.csv")
shore <- rbind(n_shore, s_shore)
shore <- shore %>%
  dplyr::filter(Reference_avg1979to2000monthlypixels != 0) %>%
  dplyr::filter(!(rgn_id %in% c(59, 89, 177, 178))) %>%  # anomalous eez regions with very little ice cover
  dplyr::filter(rgn_nam != "DISPUTED") %>%
  dplyr::mutate(habitat = "seaice_shoreline")

data <- rbind(edge, shore)
data  <- data %>%
  dplyr::mutate(km2 = Reference_avg1979to2000monthlypixels/12 * (pixel/1000)^2)

write.csv(data, "globalprep/hab_seaice/v2024/int/sea_ice.csv", row.names = FALSE)

## Health data
health <- data %>%
  dplyr::filter(rgn_typ == "eez") %>%
  dplyr::select(rgn_id, habitat, dplyr::starts_with("pctdevR")) %>%
  tidyr::gather("year", "health", -(1:2)) %>%
  dplyr::mutate(year = substring(year, 9, 12)) %>%
  dplyr::mutate(year = as.numeric(year)) %>%
  dplyr::mutate(health = ifelse(health > 1, 1, health))

write.csv(health, "globalprep/hab_seaice/v2024/output/hab_ice_health_eez.csv", row.names = FALSE) # save sea ice health data

## Trend data
trend <- data %>%
  dplyr::filter(rgn_typ == "eez") %>%
  dplyr::select(rgn_id, habitat, dplyr::starts_with("Trend")) %>%
  tidyr::gather("year", "trend", -(1:2)) %>%
  dplyr::mutate(year = substring(year, 13, 16)) %>%
  dplyr::mutate(year = as.numeric(year)) %>%
  dplyr::mutate(trend = trend * 5) %>%
  dplyr::mutate(trend = ifelse(trend > 1, 1, trend)) %>%
  dplyr::mutate(trend = ifelse(trend < (-1), -1, trend))

write.csv(trend, "globalprep/hab_seaice/v2024/output/hab_ice_trend_eez.csv", row.names = FALSE) # save sea ice trend data

## Sea ice extent data
extent <- data %>%
   dplyr::filter(rgn_typ == "eez") %>%
   dplyr::mutate(year = 2016) %>% # extent not updated each year (historic extent of the sea ice habitat); updated last 2016 bc of source methods
  dplyr::select(rgn_id, habitat, year, km2)

write.csv(extent, "globalprep/hab_seaice/v2024/output/hab_ice_extent_eez.csv", row.names = FALSE) # save sea ice extent data

4.7 Data Checks

  1. Compare to last year’s data. There should be a strong correlation between these data.
  2. Pair with region names for sanity check.
## Health comparison

## v2024: Because the last full year of data is 2022, 'previous_year' reflects 2022 instead of 2023. Changing the variable to that of previous assessment rather than previous year of full data. This wont change anything for 2024 because 2023 uses 2022 data year (last full data year) instead of 2023 data.

previous_assessment_full_data <- 2023
previous_year <- paste0("v", previous_assessment_full_data)

#compare same years with different version year 
health <- read.csv("globalprep/hab_seaice/v2024/output/hab_ice_health_eez.csv") # Current year

ice_health_eez <- read.csv(sprintf("globalprep/hab_seaice/%s/output/hab_ice_health_eez.csv", previous_year)) %>% # compare to last year's data
  dplyr::rename(health_prev_assess = health) %>%
  dplyr::left_join(health, by = c("rgn_id", "habitat", "year")) %>% 
  na.omit("health") 

comp_plot <- ggplot() + geom_point(data = ice_health_eez, aes(x = health_prev_assess, y = health)) + labs(x = "previous health (2012-2022)", y = "health (2012-2023)") + geom_abline(slope = 1, intercept = 0, col = "red")
plotly::ggplotly(comp_plot)


### let's compare the data year 2023 to data year 2022 (current data year vs the year before)

## Pair with region names
regions <- rgn_master %>%
  dplyr::select(rgn_id = rgn_id_2013, rgn_name = rgn_nam_2013) %>%
  unique()

## Current full year of data
current_year = 2023

health_2023 <- health %>%
  dplyr::filter(year == current_year)

## Last full year of data
last_year = 2022

health_2022 <- ice_health_eez %>%
  dplyr::filter(year == last_year) %>%
  dplyr::select(-year, -health)

health_old_new <-  dplyr::left_join(health_2023, health_2022) %>% mutate(difference = (health - health_prev_assess)) %>% left_join(regions)


year_change_plot <-ggplot() +
  geom_point(data = health_old_new,
             aes(x = health_prev_assess,
                 y = health, text = paste("Region ID:", rgn_id, "<br>Habitat:", habitat))) + labs(x = "previous assessment health", y = "current health") + geom_abline(slope = 1, intercept = 0, col = "red")

plotly::ggplotly(year_change_plot, tooltip = "text")

## Trend comparison
trend <- read.csv("globalprep/hab_seaice/v2024/output/hab_ice_trend_eez.csv")
ice_trend_eez <- read.csv(sprintf("globalprep/hab_seaice/%s/output/hab_ice_trend_eez.csv", previous_year)) %>% # compare to last year's data
  dplyr::rename(trend_prev_assess = trend) %>%
  dplyr::left_join(trend, by = c("rgn_id", "habitat", "year")) %>% 
  na.omit("trend")

#update label years 
trend_comp <- ggplot() + geom_point(data = ice_trend_eez, aes(x = trend_prev_assess, y = trend)) + labs(x = "previous assessment trend (2016-2022)", y= "current assessment trend (2017-2023)") + geom_abline(slope = 1, intercept = 0, col = "red")

plotly::ggplotly(trend_comp)

### let's compare the data year 2023 to data year 2022 (current data year vs the year before)

trend_2023 <- trend %>%
  dplyr::filter(year == (current_year))

trend_2022 <- ice_trend_eez %>%
  dplyr::filter(year == last_year) %>%
  dplyr::select(-year, -trend)

trend_old_new <- dplyr::left_join(trend_2023, trend_2022) %>% mutate(difference = (trend - trend_prev_assess)) %>% left_join(regions)

year_trend_change_plot <-ggplot() + geom_point(data = trend_old_new, aes(x = trend_prev_assess, y = trend, text = paste("Region ID:", rgn_id, "<br>Habitat:", habitat))) + labs(x = "previous data year trend (2022)", y = "current data year trend (2023)") + geom_abline(slope = 1, intercept = 0, col = "red")

plotly::ggplotly(year_trend_change_plot, tooltip = "text")

## Extent comparison
extent <- read.csv("globalprep/hab_seaice/v2024/output/hab_ice_extent_eez.csv", stringsAsFactors = FALSE)
ice_extent_eez <- read.csv(sprintf("globalprep/hab_seaice/%s/output/hab_ice_extent_eez.csv", previous_year), stringsAsFactors = FALSE) %>% # compare to last year's data
  dplyr::rename(km2_prev_assess = km2) %>% left_join(extent)

ggplot(data = ice_extent_eez) +geom_point(aes(x =km2_prev_assess, y = km2)) + geom_abline(slope = 1, intercept = 0, col = "red")


### let's compare the data year 2023 to data year 2022

extent_2023 <- extent 

extent_2022 <- ice_extent_eez

extent_old_new <- dplyr::left_join(extent_2023, extent_2022)

 # extent doesn't change at all, because we use the same year for extent every year
ggplot(data = extent_old_new) + geom_point(aes(x = km2_prev_assess, y = km2)) + geom_abline(slope = 1, intercept = 0, col = "red")


## Pair with region names
regions <- rgn_master %>%
  dplyr::select(rgn_id = rgn_id_2013, rgn_name = rgn_nam_2013) %>%
  unique()

## Make sure sea ice health across regions makes intuitive sense...
data <- read.csv("globalprep/hab_seaice/v2024/output/hab_ice_health_eez.csv") %>%
  dplyr::left_join(regions, by = "rgn_id") %>%
  dplyr::arrange(habitat, health)

#### Check largest score changes for v2024 ####


## Bouvet Island, rgn_id == 105, HAB score +


# health 
bouv_health <- health_old_new %>%
  dplyr::filter(rgn_id == 105) %>%
  dplyr::mutate(diff = health - health_prev_assess) # -0.33

# trend 
bouv_trend <- trend_old_new %>%
  dplyr::filter(rgn_id == 105) %>%
  dplyr::mutate(diff = trend - trend_prev_assess) # trend got worse (-.01 previously to -0.14 in 2024)


## checked against previous conditions, and it looks like Bouvet Island and Jan Mayen (our two largest increases for 2021) have pretty volatile condition scores year after year, so I think this is ok. Nothing went wrong in the data processing on our part. 

4.8 Gapfill

There was no gapfilling for these data. Created gapfill files with values of 0. Note: all layers need a gf file, eventhough if there was no gapfilling. In this case the gapfill value is 0 for every region.

## Health gapfill
hab_ice_health_gf <- read.csv("globalprep/hab_seaice/v2024/output/hab_ice_health_eez.csv")%>%
  dplyr::mutate(gapfilled = 0) %>% 
  dplyr::select(rgn_id, year, gapfilled)

write.csv(hab_ice_health_gf, "globalprep/hab_seaice/v2024/output/hab_ice_health_eez_gf.csv", row.names=FALSE) # save sea ice health gapfill file

## Extent gapfill
hab_ice_extent_gf <- read.csv("globalprep/hab_seaice/v2024/output/hab_ice_extent_eez.csv")%>%
  dplyr::mutate(gapfilled = 0) %>% 
  dplyr::select(rgn_id, year, gapfilled)

write.csv(hab_ice_health_gf, "globalprep/hab_seaice/v2024/output/hab_ice_extent_eez_gf.csv", row.names=FALSE) # save sea ice extent gapfill file

## Trend gapfill
hab_ice_trend_gf <- read.csv("globalprep/hab_seaice/v2024/output/hab_ice_trend_eez.csv")%>%
  dplyr::mutate(gapfilled = 0) %>% 
  dplyr::select(rgn_id, year, gapfilled)

write.csv(hab_ice_health_gf, "globalprep/hab_seaice/v2024/output/hab_ice_trend_eez_gf.csv", row.names=FALSE) # save sea ice trend gapfill file