grabbing the data
Csv-data: A comma-separated values file is a delimited text file that uses a comma to separate values.
These are plain text files, just like .txt meaning they contain no formatting(bold, italics, fontsizes, etc.)-> protable and durable
tidy data
- every row is exactly one observation
- every column is exactly one variable
- every table contains only one type of, i.e…all observations are of the same type/category
tidyverse
- tibble: improved dataframes
- dplyr: manipulate dataframes
- readr: read in data files
- strings: work with text data
- forcats: work with categorical data types
- lubridate: work with datas and times
- ggplot2: data visualization
- purrr: parallelize analyses
Exploratory data analysis
setting up environment
library(tidyverse)
reading the data
csvdata <- read_csv("~/data/csvdata")
It needs the filepath to the file as a string.
specify paths in the current working directory or in a subdirectory of it,
exploring data
dim()
dimension of an object
names()
column names
head(), tail()
glimpse()
a transposed version of print columns run down the page, and data runs across.
summary()
munging
mutate()
add to the dataframe a new column
mutate(csvdata, new_column=q1+q2+q3+q4)
select()
remove the unwanted columns
analysing
summarise()
group_by()
indicate how to group observations in a dataframe
It takes in a dataframe, and a (selection of ) column(s) to use as grouping variables.
ungroup()
return a copy of the dataframe without any previously applied grouping
n()
counts the number of rows in a data frame
putting it all together
summarise(csvdata,import,export)
relabling
mutate(csvdata, d_or_s = ifelse(differential==TRUE, "d", "s"))