Required packages to install:
Session video recording
At the end of this session you will be able:
For learning:
For help:
The ability to read, understand, modify and write simple pieces of code is an essential skill for modern data analysis. Here we introduce you to some of the best practices one should have while writing their codes:
Managing your projects in a reproducible fashion doesn’t just make your science reproducible, it also makes your life easier! RStudio is here to help us with that by using projects!! RStudio projects make it straightforward to divide your work into multiple contexts, each with their own working directory, workspace, history, and source documents.
It is strongly recommended that you store all the necessary files that will be used/sourced in your code in the same directory. You can then use the respective relative path to access them. This makes the directory and R Project a “product”, or “bundle/package”. Like a tiny machine, that needs to have all it’s component parts in the same place.
Let’s create our first project!
RStudio projects are associated with R working directories. You can create an RStudio project:
There are many ways one could organise a project folder. We can set up a project directory folder using prodigenr, using:
…which will have the following folders and files:
ProjectName
├── R
│ ├── README.md
│ ├── fetch_data.R
│ └── setup.R
├── data
│ └── README.md
├── doc
│ └── README.md
├── .Rbuildignore
├── .gitignore
├── DESCRIPTION
├── ProjectName.Rproj
└── README.md
This forces a specific, and consistent, folder structure to all your work. Think of this like the “introduction”, “methods”, “results”, and “discussion” sections of your paper. Each project is then like a single manuscript or report, that contains everything relevant to that specific project. There is a lot of powerful in something as simple as a consistent structure.
The README in each folder explains a bit about what should be placed there. But briefly:
doc/
directory.data/
directory (or data-raw/
for the very raw data).R/
directory.And make sure to use version control (Git! See the AUOC Git material for more details).
Time: 2 min
Think about these file names. Which file names should you use?
fit models.R
fit-models.R
foo.r
stuff.r
get_data.R
Manuscript version 10.docx
manuscript.docx
new version of analysis.R
trying.something.here.R
plotting-regression.R
utility_functions.R
code.R
Projects are used to make life easier. Once a project is opened within RStudio the following actions are taken:
.Rprofile
file in the project’s main directory (if any) is sourced by R.Even though R doesn’t care about naming, spacing, and indenting, it really matters how your code looks. Coding is just like writing. Even though you may go through a brainstorming note taking stage of writing, you eventually need to write correctly so others can understand what you are trying to say. In coding, brainstorming is fine, but eventually you need to code in a readable way.
Time: 6 min
Before we go more into this section, try to make these code more readable. Edit the code so it’s easier to understand what is going on.
# Variable names
DayOne
dayone
T <- FALSE
c <- 9
mean <- function(x) sum(x)
# Spacing
x[,1]
x[ ,1]
x[ , 1]
mean (x, na.rm = TRUE)
mean( x, na.rm = TRUE )
function (x) {}
function(x){}
height<-feet*12+inches
mean(x, na.rm=10)
sqrt(x ^ 2 + y ^ 2)
df $ z
x <- 1 : 10
# Indenting
if (y < 0 && debug)
message("Y is negative")
You have organised it by hand, however it is also possible to do it automatically. The tidyverse style guide has helped people to follow standards styles and automatically re-style chunks of code using an R package: styler. The styler snippets can be found in the Addins function on the top of your R document.
Now, let’s try using styler on the exercise code above.
DRY or “don’t repeat yourself” is another way of saying, “make your own functions”! That way you don’t need to copy and paste code you’ve used multiple times. Using functions also can make your code more readable and descriptive, since a function is a bundle of code that does a specific task… and usually the function name should describe what you are doing.
It is very important for your future self, and for any person that will be reading/using your code to be able to understand what the code, function, or R Mardown will generate. So it’s crucial to describe what the code does, acknowledge the author (if necessary), and give an example on how to execute it. If your function name is well decriptive, then you don’t need to spend much time describing what the code does! In the AUOC session on creating functions for packages, we went into detail about function documentation and creation. Here we will briefly cover the core concepts.
Example:
# Code developed by Maria Izabel
# The following function outputs the sum of two numeric variables (a and b).
# usage: summing(a = 2, b = 3)
summing <- function(a, b) {
return(a + b)
}
summing(a = 2, b = 3)
## [1] 5
The example above is summing up two different numeric variables. Note that the name for this function was chosen as summing, instead of sum. This is because we know that R already has a built-in function called sum and so we don’t want to overwrite it!
At the top of each script, you should put all your library calls for loading your packages. Better yet, put all the library calls in a new file and source()
that file in each R script.
We’ll cover this more during the session, but mainly at the end.
Many of the best practices are taken from the “best practices” articles listed in the “Resources”.↩
This work is licensed under a Creative Commons Attribution 4.0 International License. See the licensing page for more details about copyright information.