R语言笔记| work with datausing R and theTidyverse

本文链接：https://blog.csdn.net/weixin_39469297/article/details/109732366

这篇博客介绍了如何使用R语言和Tidyverse进行数据处理。从抓取CSV数据开始，强调了整洁数据的原则，即每行代表一个观察，每列代表一个变量。接着，详细阐述了Tidyverse中的工具，如tibble、dplyr、readr等。在探索性数据分析部分，讨论了设置环境、读取数据、查看数据的方法。此外，还涵盖了数据转换（mutate、select）、分组分析（group_by、ungroup）以及总结统计（summary、n）等核心操作。

摘要由CSDN通过智能技术生成

grabbing the data

Csv-data: A comma-separated values file is a delimited text file that uses a comma to separate values.
These are plain text files, just like .txt meaning they contain no formatting(bold, italics, fontsizes, etc.)-> protable and durable

tidy data

every row is exactly one observation
every column is exactly one variable
every table contains only one type of, i.e…all observations are of the same type/category

tidyverse

tibble: improved dataframes
dplyr: manipulate dataframes
readr: read in data files
strings: work with text data
forcats: work with categorical data types
lubridate: work with datas and times
ggplot2: data visualization
purrr: parallelize analyses

Exploratory data analysis

setting up environment

library(tidyverse)

reading the data

csvdata <- read_csv("~/data/csvdata")

It needs the filepath to the file as a string.
specify paths in the current working directory or in a subdirectory of it,

exploring data

dim()
dimension of an object
names()
column names
head(), tail()
glimpse()
a transposed version of print columns run down the page, and data runs across.
summary()

munging

mutate()
add to the dataframe a new column

mutate(csvdata, new_column=q1+q2+q3+q4)

select()
remove the unwanted columns

analysing

summarise()
group_by()
indicate how to group observations in a dataframe
It takes in a dataframe, and a (selection of ) column(s) to use as grouping variables.
ungroup()
return a copy of the dataframe without any previously applied grouping
n()
counts the number of rows in a data frame
putting it all together

summarise(csvdata,import,export)

relabling

mutate(csvdata, d_or_s = ifelse(differential==TRUE, "d", "s"))