Chapter 1 Welcome

This training book will introduce you to open data science so you can work with data in an open, reproducible, and collaborative way. Open data science means that methods, data, and code are available so that others can access, reuse, and build from it without much fuss. Here you will learn a workflow with R, RStudio, Git, and GitHub, as we describe in Lowndes et al. 2017, Nature Ecology & Evolution: Our path to better science in less time using open data science tools.

This is going to be fun, because learning these open data science tools and practices is empowering! This training book is written (and always improving) so you can use it as self-paced learning, or it can be used to teach an in-person workshop where the instructor live-codes. Either way, you should do everything hands-on on your own computer as you learn.

Before you begin, be sure you are all set up: see the prerequisites in Chapter 2.

Suggested breakdown for a 2-day workshop:

time Day 1 Day 2
9-10:30 Motivation, R & RStudio, Rmarkdown Data Wrangling: tidyr
11-12:30 GitHub Programming
13:30-15:00 Visualization: ggplot2 Collaborating with GitHub
15:30-17:00 Data Wrangling: dplyr Practice, Be a champion for open data science

This book has been used in the following:

Open Data Science Training — 2 day workshop at the University of Queensland, Australia 2019-06-18

Software Carpentry — 2-day workshop at the Woods Hole Oceanographic Institution (WHOI) 2018-10-22

Data integration and team science — 4 day workshop at NCEAS, California, USA 2018-03-12

Data Carpentry — 2-day workshop at the University of California Merced 2017-08-17

Software Carpentry — 2-day workshop at the Monterey Bay Aquarium Research Institute (MBARI) 2017-11-30

Software Carpentry: Reproducible Science with RStudio and GitHub — 2-day workshop at Oxford University 2016-07-12

Software Carpentry — 2-day workshop at UC Santa Barbara 2016-04-15

Creative Commons License  This work is licensed under a Creative Commons Attribution 4.0 International License.