Taking the Next Steps in R

Contributed by Auriel FournierFirst in a series of blog posts developed from workshops presented at AOS’s 2019 annual meeting in Anchorage, Alaska.

I have co-taught a workshop on R at every AOS meeting since Oklahoma in 2015. The exact content has changed based on participant feedback, but the goal has remained the same: to help those who have some R experience already gain new skills that we feel are valuable to most ecologists who are working with data in R as a part of their scientific work. Our goal is to introduce several skills and concepts and to leave participants better able to help themselves learn whatever R task they need to do next.

If you haven’t been able to attend one these workshops, however, you can work through the lessons yourself online! The lessons are designed for someone who has some R experience and don’t include an orientation to the R interface or instructions about how to use the console. If you’ve never used R before, the R lessons provided by Data Carpentry are a good starting point.

We start by covering two R packages, tidyr and dplyr, which provide functions for the manipulation of data into different forms and the generation of summary information. There are multiple ways to get to most end goals in R, and these packages are just one of those ways. Based on our experience teaching R to folks who do not have a background in programming, the language used by these package functions is more intuitive then the bracket notation often used in other ways in R. We cover subsetting, summarization, and joins in these lessons. View these lessons on GitHub.

Next we dive into graphing, which we teach through the ggplot2 package. There are three main ways to graph in R: through R’s base graphics, through the lattice package, and through ggplot2. Which is best is largely a matter of personal opinion. We teach ggplot2 because it is our preference, and we have found it to be flexible for creating graphics that meet the criteria of a variety of publications and other uses.

In these lessons we work with some eBird data to generate different kinds of graphs, from scatterplots to box plots, and explore some of the important “grammar” of how ggplot works and some of the unexpected results you will get if you don’t follow those rules. We work through a fun exercise where a really visually assaulting graph is created, and we use the theme() function to make it into something closer to publication quality bit by bit. This really shows the level of detailed control you have in ggplot of each small element of a graph. We also take about how to bring in custom color schemes to your graphs and the importance of using color schemes which are red/green colorblind friendly. Lastly we talk about how to bring together multiple graphs into the same panel through the cowplotpackage. View these lessons on GitHub.

In our final set of lessons, we get into how to use R to automate tasks (to avoid copy/paste or other tedious hand-based analysis tasks). This can be in done several ways. Two that we discuss are for loops and building custom functions, and we work through some basic examples pulled from our own experiences using these to analyze ecological datasets.

We also cover functions such as the paste functions, which can be useful when building workflows where you’ll be reading in or writing out large series of files. Our goal here is not to make participants experts in these topics, but to show them examples that may be relevant to their work and show them the utility of these tools so that they can adapt them to their own projects. View these lessons on GitHub.

Our hope is that these lessons give folks a good start in exploring the wide range of tasks for which R can be used to help make our scientific workflows more efficient, less prone to human error, and more reproducible. If you have any questions about these R lessons, feel free to get in touch with me, and good luck!

Leave a Reply

Your email address will not be published. Required fields are marked *