【R语言笔记 (1/4)】Coursera_R Programming

最新推荐文章于 2021-09-19 20:53:10 发布

EcrayyarcE

最新推荐文章于 2021-09-19 20:53:10 发布

阅读量576

点赞数 2

分类专栏： R语言文章标签： r语言

本文链接：https://blog.csdn.net/Echo__Yi/article/details/120277620

版权

课程源自coursera平台约翰·霍普金斯大学开放的R-Programming课程（https://www.coursera.org/learn/r-programming），作者对讲义同样进行了整理（https://bookdown.org/rdpeng/rprogdatascience/）。
因为课程为全英文，所以有些翻译不好的地方会出现中英夹杂的情况，还请见谅。文中有问题的地方大家可以直接指出，评论区欢迎留言讨论。

0 Overview and resources of R

下载地址：R (https://www.r-project.org/), R Studio (https://www.rstudio.com/)
历史：S 语言、Bell实验室
网页：Bioconductor - Home, CRAN - Contributed Packages
书籍/文章：

书名	作者	年份
Software for Data Analysis (textbook)	Chambers	2008
Programming with Data	Chambers	1998
Modern Applied Statistics with S	Venables & Ripley	2002
S Programming	Venables & Ripley	2000
Mixed-Effects Models in S and S-PLUS	Pinheiro & Bates	2000
R Graphics	Murrell	2000

问题解决：

Describe the goal, not the step;
Be explicit about your question;
Do provide minimun amount of information necessary;
Be courteous (it never hurts);
Follow up within the solution;

1 Data types

1.1 Input and evaluation

对Object赋值："<-"，举例如下：

> x<-1
> x
[1] 1
> msg<-"hello world"
> msg
[1] "hello world"

- 其中[1]表示x是一个向量且第一个元素为1；

> x<-1:20
> x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

- 创建数组则返回前闭后闭区间序列；

对代码片段进行注释："##"，举例如下：

> x<-5 ## nothing printed
> x ## auto-printed
[1] 5

1.2 Objects and attributes

Object的基本类型：Charactor, Numeric (real number), Integer, Comlex. Logical (True or False)；
最基本的Object：向量Vector
一个向量中可以包含多个同一类型对象，一个列表中可以包含不同类型对象
数值型对象默认按照双精度处理，我其中inf表示无穷，NaN表示空值 (missing value)
Attribute的基本类型：非固定，可能包含name, dimnames, dimensions. length, class等

1.3 Vectors, lists and matrixs

c( ) function: 构造Vector的对象，可以把c看作concatenate串联的意思

> x<-c(0.5,0.6) ##numeric
> x<-c(TRUE,FALSE) ##logical
> x<-c(T,F) ##logical
> x<-c("a","b","c") ##character
> x<-9:29 ##integer
> x<-c(1+0i,2-4i) ##complex

- complex复数的理解可以参考这篇文章（https://blog.csdn.net/zincrain/article/details/89305234）；
- 当c function创建的多个对象为不同类型时，返回值默认强制转化为最低级公共类型，其中TRUE通常被转化为 1，FALSE被转化为 0；

vector( ) function: 构造默认向量

> x<-vector("numeric",length=10)
> x
 [1] 0 0 0 0 0 0 0 0 0 0

- 默认情况下，向量会被初始化为默认值；

as.* function: 类型的强制转化

> x<-0:6
> class(x)
[1] "integer"
> as.logical(x)
[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
> as.character(x)
[1] "0" "1" "2" "3" "4" "5" "6"

list: 对象可以为不同类型的高维向量，用list( ) function构建

> x<-list(1,"a",T)
> x
[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE

1.4 Factors

定义：Factors are used to represent categorical data. Factorys can be understanded or oedered. One can think of a factor as interger vector where each integer has a label.
- Factors are treated specially by modelling functions like lm() and glm()；
理解：专门表示类别的变量的集合，可以是有序的可以是无序的，当为整数时表达的并非数值而是一个标签；因子的理解可参考品这篇文章（https://blog.csdn.net/hsdcc217/article/details/78510087）。
创建Factors

> x<-factor(c("yes","no","yes","yes","no"))
> x
[1] yes no  yes yes no 
Levels: no yes
> table(x)
x
 no yes 
  2   3

- 创建的因子为x，包含两个level: yes, no；
- 调用 table()函数，输出每个level出现的频率；

class()判断属于什么类，unclass()去掉类属性

> unclass(x)
[1] 2 1 2 2 1
attr(,"levels")
[1] "no"  "yes"

用参数level设置level的顺序，默认情况按首字母排序

> x<-factor(c("yes","no","yes","yes","no"),levels=c("yes","no"))
> x
[1] yes no  yes yes no 
Levels: yes no

1.5 Missing value

缺失值通常用NA或NaN表示，用is.na()或is.nan()函数进行缺失值判定：

> x<-c(1,2,NA,10,3)
> is.na(x)
[1] FALSE FALSE  TRUE FALSE FALSE
> is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE

1.6 Data frames

一种特殊的列表，每一列长度相同；与矩阵不同的是，矩阵的每个元素必须为同一类型，而表格不必。
常用函数：row.names; resd.table() or read.csv。转化为矩阵：data.matrix()。

> data.frame(foo=1:4,bar=c(T,F,T,T))
  foo   bar
1   1  TRUE
2   2 FALSE
3   3  TRUE
4   4  TRUE

1.7 Names and attributes

定义对象的名称对于代码的可读性非常重要，未定义时默认返回NULL值，对每一个元素定义name举例如下：

> x<-c(1:3)
> names(x)
NULL
> names(x)<-c("foo","bar","norf")
> x
 foo  bar norf 
   1    2    3 
> names(x)
[1] "foo"  "bar"  "norf"

对列表进行命名

> x<-list(a=1,b=2,c=3)
> x
$a
[1] 1

$b
[1] 2

最低0.47元/天解锁文章

EcrayyarcE

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
打赏
0
评论
【R语言笔记 (1/4)】Coursera_R Programming

课程源自coursera平台约翰·霍普金斯大学开放的R-Programming课程（https://www.coursera.org/learn/r-programming），在此记录笔记是为了自我监督完成四周课程的学习（虽然每周自觉的我都赶不上DDL233），每周的内容汇集在一篇博客中。我是R语言的初学者，之前本科的学习接触过Python和SQL，但马上开学的研究生课程要用到R，所以就开始了自学之旅。因为课程为全英文，所以有些翻译不好的地方会出现中英夹杂的情况，还请见谅，对某些专有名词的理解参考了其他.
复制链接

扫一扫