# install.packages("tidyverse")
library(tidyverse)
Introduction to tidyverse
The R Tidyverse is a collection of packages for data handling, analysis and visualization. If you want to use the tidyverse
, you have to install the additional packages first with the install.packages()
function. Once installed, you then have to tell R to make the tidyverse
functions available in your current R session with library()
You only have to install a package once, but loading it has to be done every time you start a new R session. It is recommened to either not include the install.packages()
in your script or just comment it out like below.
As you see in the output, library(tidyverse)
actually loads nine different packages. It will also give you a warning about conflicting functions. Do not worry for now, we will get to that in time.
Why tidyverse?
- consistent syntax and workflows
- makes code more readable
- pipe operator
%>%
/|>
can chain functions together - tidy data approach
- rows are observations
- columns are variables / features
data.frames with dplyr
- provides functions for
data.frame
manipulation - can complement or replace base R functions
Of course, you can also load single packages from the tidyverse
with the library()
function.
library(dplyr)
= read.csv("data/2021-06_aasee.csv") aasee
slice
- a slice of data - i.e. the specified rows
= slice(aasee, seq(8)) aasee
select
- selects columns
select(aasee, Wassertemperatur)
Wassertemperatur
1 17.98
2 17.66
3 18.03
4 18.08
5 18.06
6 18.01
7 18.02
8 18.06
filter
- filters rows based on logical operators
filter(aasee, Wassertemperatur < 18)
Datum Wassertemperatur pH.Wert Sauerstoffgehalt
1 2021-05-31 23:57 17.98 8.05 10.53
2 2021-06-01 00:09 17.66 8.04 9.64
mutate
- mutates the data.frame by adding columns
mutate(aasee, t_kelvin = Wassertemperatur + 273.15)
Datum Wassertemperatur pH.Wert Sauerstoffgehalt t_kelvin
1 2021-05-31 23:57 17.98 8.05 10.53 291.13
2 2021-06-01 00:09 17.66 8.04 9.64 290.81
3 2021-06-01 00:19 18.03 8.12 11.30 291.18
4 2021-06-01 00:27 18.08 8.14 11.32 291.23
5 2021-06-01 00:39 18.06 8.12 11.06 291.21
6 2021-06-01 00:49 18.01 8.10 10.91 291.16
7 2021-06-01 00:59 18.02 8.10 10.96 291.17
8 2021-06-01 01:08 18.06 8.10 10.83 291.21
summarise
- summarises data
summarise(aasee, minimum_t = min(Wassertemperatur))
minimum_t
1 17.66
The functions above could all be realized with base R:
# the same in base R
# select
$Wassertemperatur
aasee
# filter
$Wassertemperatur < 18]
aasee[,aasee
# mutate
$t_kelvin = aasee$Wassertemperatur + 273.15
aasee
# summarise
min(aasee$Wassertemperatur)
The pipe operator
The strength of dplyr is the possibility to chain functions with %>%
or |>
.
|>
aaseefilter(Wassertemperatur < 18) |>
select(pH.Wert) |>
max()
[1] 8.05
With base R functions this looks messy, because we have to use functions inside functions.
max(aasee$pH.Wert[aasee$Wassertemperatur < 18])
[1] 8.05