R语言基础用法

最新推荐文章于 2024-06-06 14:31:15 发布

hellopbc

最新推荐文章于 2024-06-06 14:31:15 发布

阅读量3.4k

点赞数

分类专栏： R 文章标签： R 基础语法

本文链接：https://blog.csdn.net/qq_37774098/article/details/117366142

版权

R 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

文章目录

R_base

R_base

ref

w3school

base

基本语法

只提供单行注释；tips：使用if(FALSE){"content"}作为多行注释的样式
赋值：使用<-符号
输出：print(var)

运行：

终端运行：终端输入R , 启动R语言解释器
脚本运行：Rscript xxx.R

if(FALSE) {
   "This is a demo for multi-line comments and it should be put inside either a single
      OR double quote"
}

myString <- "Hello, World!" # 赋值到字符串变量myString
print ( myString)

数据类型

R变量是引用，存储数据地址；R变量分配有R对象；R对象包含多种数据类型。

R称为动态类型语言,类似python

变量只是保留值的存储位置。

这意味着，当你创建一个变量，你必须在内存中保留一些空间来存储它们。

变量不会声明为某种数据类型。

变量分配有 R 对象，R 对象的数据类型变为变量的数据类型。

常用R对象：

矢量
列表
矩阵
数组
因子
数据帧

原子向量有六种数据类型：

Logical（逻辑型）
- TRUE、FALSE
Numeric（数字）
- 12.3，5，999
Integer（整型）
- 2L，34L，0L
Complex（复合型、复数）
- 3+2i
Character（字符）
- ‘a’，“good”，“True”，“23.4”
Raw(原型)
- “Hello” 被存储为 48 65 6c 6c 6f

输出数据类型：print(class(var))

v <- charToRaw("Hello")
print(class(v))

# [1] "raw"

Vectors 向量:使用 c() 函数
- 可以是不同类型混合
  - 向量c（TRUE，1）具有逻辑和数值类的混合。因此，逻辑类强制转换为数字类，使TRUE为1
- ```
apple <- c('red','green',"yellow")
# Get the class of the vector.
print(class(apple))
```
- ```
[1] "red"    "green"  "yellow"
[1] "character"
```

Lists 列表:包含许多不同类型的元素

# Create a list.
list1 <- list(c(2,5,3),21.3,sin)

# Print the list.
print(list1)

[[1]]
[1] 2 5 3

[[2]]
[1] 21.3

[[3]]
function (x)  .Primitive("sin")

Matrices 矩阵:二维矩形数据集

# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)

    [,1] [,2] [,3]
[1,] "a"  "a"  "b" 
[2,] "c"  "b"  "a"

Arrays 数组：具有任何数量的维度

# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

, , 1

     [,1]     [,2]     [,3]    
[1,] "green"  "yellow" "green" 
[2,] "yellow" "green"  "yellow"
[3,] "green"  "yellow" "green" 

, , 2

     [,1]     [,2]     [,3]    
[1,] "yellow" "green"  "yellow"
[2,] "green"  "yellow" "green" 
[3,] "yellow" "green"  "yellow"

Factors 因子：将向量与向量中元素的不同值一起存储为标签，类似set

步骤：声明一个向量，在使用factor()函数

# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.
factor_apple <- factor(apple_colors)

# Print the factor.
print(factor_apple)
print(nlevels(factor_apple))

[1] green  green  yellow  red   red   red   green 
Levels: green red yellow
# applying the nlevels function we can know the number of distinct values
[1] 3

Data Frames 数据帧：表格数据对象。按列声明，每列可以是不同数据类型

使用 data.frame() 函数创建数据帧。

# Create the data frame.
BMI <- 	data.frame(
   gender = c("Male", "Male","Female"), 
   height = c(152, 171.5, 165), 
   weight = c(81,93, 78),
   Age = c(42,38,26)
)
print(BMI)

gender height weight Age
1   Male  152.0     81  42
2   Male  171.5     93  38
3 Female  165.0     78  26

变量

命名
- 有字母，数字，点和下划线组成。
- 允许.和_，只有.可以开头，.开头时不能跟数字，其他位置可以。

赋值:使用 = , -> , <- 进行赋值

# 三者等价
var = c(1,0,2)
var -> c(1,0,2)
c(1,0,2) -> var

输出: print() , cat()
- cat()函数将多个项目组合成连续打印输出。
  - ```
  print(var.1)
  cat ("var.1 is ", var.1 ,"
  ")
  cat ("var.2 is ", var.2 ,"
  ")
```

变量操作

查找变量

工作空间中当前可用的所有变量，我们使用ls()函数. ls()函数也可以使用模式来匹配变量名。

# 输出当前环境中所有变量(点(.)开头的变量被隐藏)
print(ls())

# 模式匹配，输出所有带有var的变量
print(ls(pattern = "var"))

使用ls()函数的“all.names = TRUE”参数列出

# 输出当前环境中所有变量(包括点(.)开头的变量)
print(ls(all.name = TRUE))

删除变量

删除某个变量

rm(var.3)
print(var.3)

# result
[1] "var.3"
Error in print(var.3) : object 'var.3' not found

删除所有变量

rm(list = ls())
print(ls())

# result
character(0)

运算符

算术运算符：操作符对向量的每个元素起作用
- + , - , * 加减乘
- / : 两向量相除，结果带小数
- %% : 两个向量求余
- %/% : 两个向量相除取整
- ^ : 将第二向量作为第一向量的指数
关系运算符：操作符对向量的每个元素起作用
- > , <, = , <= , >= , !=
- 返回TRUE或者FALSE
逻辑运算符：它只适用于逻辑，数字或复杂类型的向量。所有大于1的数字被认为是逻辑值TRUE。将第一向量的每个元素与第二向量的相应元素进行比较。比较的结果是布尔值。
- 对向量每个元素操作
  - & : and
  - | : or
  - ! : not
- 对向量第一个元素操作
  - && : and
  - || : ors
赋值运算符
- 左分配：赋值给左边
  - <- , <<- , =
- 右分配：赋值给右边
  - -> , ->>

其他运算符：用于特定目的，而不是一般的数学或逻辑计算。

: 它为向量按顺序创建一系列数字。左闭右闭
- ```
v <- 2:8
print(v) 

# result
2 3 4 5 6 7 8
```

%in% 用于标识元素是否属于向量。

v1 <- 8
v2 <- 12
t <- 1:10
print(v1 %in% t) 
print(v2 %in% t)

```
TRUE
FALSE
```

%*% 用于将矩阵与其转置相乘。

M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow = TRUE)
t = M %*% t(M)
print(t)

# result
      [,1] [,2]
[1,]   65   82
[2,]   82  117

决策 if

表达式

if(boolean_expression) {
   // statement(s) will execute if the boolean expression is true.
}

demo

x <- 30L
if(is.integer(x)) {
   print("X is an Integer")
}

# result
[1] "X is an Integer"

if…else if… … else…

表达式

if(boolean_expression) {
   // statement(s) will execute if the boolean expression is true.
} else if( boolean_expression 3) {
   // Executes when the boolean expression 3 is true.
} else {
   // statement(s) will execute if the boolean expression is false.
}

demo

x <- c("what","is","truth")

if("Truth" %in% x) {
   print("Truth is found")
} else {
   print("Truth is not found")
}

# result
[1] "Truth is not found"

switch

表达式

switch(expression, case1, case2, case3....)

demo

x <- switch(
   3,
   "first",
   "second",
   "third",
   "fourth"
)
print(x)

# result
[1] "third"

demo-2

# runif() 函数用于生成从 0 到 1 区间范围内的服从正态分布的随机数
switch(1,2*3,sd(1:5),runif(3))  #返回（2*3,sd(1:5),runif(3)）list中的第一个成分 
switch(2,2*3,sd(1:5),runif(3))  #返回第二个成分
switch(3,2*3,sd(1:5),runif(3))  #返回第三个成分

# result
[1] 6
[1] 1.581139
[1] 0.31508117 0.04610938 0.19489747

包

获取包含R包的库位置：.libPaths()

获取已安装的所有软件包列表：library()

获取当前在R环境中加载的所有包：search()

循环

循环语句

repeat

repeat { 
   commands 
   if(condition) {
      break
   }
}

demo

v <- c("Hello","loop")
cnt <- 2

repeat {
   print(v)
   cnt <- cnt+1
   
   if(cnt > 5) {
      break
   }
}

# result

[1] "Hello" "loop" 
[1] "Hello" "loop" 
[1] "Hello" "loop" 
[1] "Hello" "loop"

while

while (test_expression) {
   statement
}

demo

v <- c("Hello","while loop")
cnt <- 2

while (cnt < 7) {
   print(v)
   cnt = cnt + 1
}

# result
[1] "Hello"  "while loop"
[1] "Hello"  "while loop"
[1] "Hello"  "while loop"
[1] "Hello"  "while loop"
[1] "Hello"  "while loop"

for

for (变量 in 条件) {
   循环体
}

demo

v <- LETTERS[1:4]
for ( i in v) {
   print(i)
}

# result
[1] "A"
[1] "B"
[1] "C"
[1] "D"

循环控制语句

break：跳出当前循环
next：跳过循环的当前迭代而不终止它时便可使用next，继续下一次迭代，相当于continue

数据重塑

数据重塑是关于改变数据被组织成行和列的方式

cbind() 函数连接多个向量来创建数据帧
rbind() 函数合并两个数据帧。即保持列，增加了行
merge() 函数合并两个数据帧。数据帧必须具有相同的列名称，在其上进行合并。

函数

function_name <- function(arg_1, arg_2, ...) {
   Function body 
}

返回值：

可以通过显示地调用return()，把一个值返回给主调函数。如果不是用这条语句，默认将会把最后执行的语句的值作为返回值。
函数的返回值可以说任何对象，也就可以返回复杂对象。如果你的函数有多个返回值，可以把它们存储在一个列表或其他容器变量中。

内置函数：

seq()，mean()，max()，sum(x) 和 paste(…) 等

函数参数：像python一样用

调用： function_name(arg_1, arg_2,…)

字符串

单双引号的使用：

单引号或双引号对中写入的任何值都被视为字符串
R语言存储的每个字符串都在双引号内，即使是使用单引号创建的依旧如此。
“xxxx‘xxxxxx”
‘xxxxx"xxxxxx’

字符串操作：

连接字符串 - paste() 函数
- ```
paste(..., sep = " ", collapse = NULL)
```
  - … 表示要组合的任意数量的自变量。
  - sep 表示参数之间的任何分隔符。它是可选的。
  - collapse 用于消除两个字符串之间的空格。但不是一个字符串的两个字内的空间。
格式化数字和字符串 - format() 函数
- ```
format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none")) 
```
  - x 是向量输入。
  - digits 是显示的总位数。
  - nsmall 是小数点右边的最小位数。
  - scientific设置为 TRUE 以显示科学记数法。
  - width 指示通过在开始处填充空白来显示的最小宽度。
  - justify 是字符串向左，右或中心的显示。如：justify = “c”。
计算字符串中的字符数 - nchar() 函数
- 此函数计算字符串中包含空格的字符数。
- ```
nchar(x)
```
  - x 是向量输入。
更改大小写 - toupper()和tolower()函数
- ```
toupper(x)
tolower(x)
```
  - x是向量输入。
提取字符串的一部分 - substring()函数
- 以char为单位
- ```
substring(x,first,last)
```
  - x 是字符向量输入。
  - 首先是要提取的第一个字符的位置。
  - last 是要提取的最后一个字符的位置。

向量

向量是最基本的R语言数据对象，有六种类型的原子向量。它们是逻辑，整数，双精度，复杂，字符和原始。

创建向量

单元素向量

demo

# Atomic vector of type character.
print("abc");

# Atomic vector of type double.
print(12.5)

# Atomic vector of type integer.
print(63L)

# Atomic vector of type logical.
print(TRUE)

# Atomic vector of type complex.
print(2+3i)

# Atomic vector of type raw.
print(charToRaw('hello'))

多元素向量

demo

# If the final element specified does not belong to the sequence then it is discarded.
v <- 3.8:11.4
print(v)

# Create vector with elements from 5 to 9 incrementing by 0.4.
print(seq(5, 9, by = 0.4))

# The logical and numeric values are converted to characters.
s <- c('apple','red',5,TRUE)
print(s)

# result
[1]  3.8  4.8  5.8  6.8  7.8  8.8  9.8 10.8
[1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0
[1] "apple" "red"   "5"     "TRUE"

访问向量元素

使用索引访问向量的元素。 t[]括号用于建立索引。索引从位置1开始。

# Accessing vector elements using position.
t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]
print(u)

# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)

# Accessing vector elements using negative indexing.
x <- t[c(-2,-5)]
print(x)

# Accessing vector elements using 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]
print(y)

# result
[1] "Mon" "Tue" "Fri"
[1] "Sun" "Fri"
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
[1] "Sun"

向量操作

向量运算
- 见运算符部分
向量元素回收
- 对不等长的两个向量应用算术运算，则较短向量的元素被循环以完成操作。
```
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11)
# 做运算时，V2 becomes c(4,11,4,11,4,11)
```

向量元素排序

向量中的元素可以使用sort()函数排序

v <- c(3,8,4,5,0,11, -9, 304)
sort.result <- sort(v)
print(sort.result)
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)

# result
# 数字型
[1]  -9   0   3   4   5   8  11 304
[1] 304  11   8   5   4   3   0  -9
# 字符型
[1] "Blue"   "Red"    "violet" "yellow"
[1] "yellow" "violet" "Red"    "Blue"

列表

列表是R语言对象，使用list()函数创建的。

它包含不同类型的元素，如数字，字符串，向量和其中的另一个列表。列表还可以包含矩阵或函数作为其元素。

创建列表

list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data)

命名列表元素

列表元素可以给出名称，并且可以使用这些名称访问它们

names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Show the list.
print(list_data)

# result
$`1st_Quarter`
[1] "Jan" "Feb" "Mar"

$A_Matrix
     [,1] [,2] [,3]
[1,]    3    5   -2
[2,]    9    1    8

$A_Inner_list
$A_Inner_list[[1]]
[1] "green"

$A_Inner_list[[2]]
[1] 12.3

访问列表元素
- 列表的元素可以通过列表中元素的索引访问。
- 在命名列表的情况下，它也可以使用名称来访问。

操控列表元素

添加，删除和更新列表元素
只能在列表的末尾添加和删除元素
但我们可以更新任何元素

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])

# Remove the last element.
list_data[4] <- NULL

# Print the 4th Element.
print(list_data[4])

# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])

# result

[[1]]
[1] "New element"

$
NULL

$`A Inner list`
[1] "updated element"

合并列表

将所有元素放在list()函数中，使用c()函数合并多个list为一个list

# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")

# Merge the two lists.
merged.list <- c(list1,list2)

# Print the merged list.
print(merged.list)

列表转向量

列表可以转换为向量，可以在将列表转换为向量之后应用对向量的所有算术运算
使用**unlist()**函数，将列表作为输入并生成向量。

# Create lists.
list1 <- list(1:5)
print(list1)

list2 <-list(10:14)
print(list2)

# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)

print(v1)
print(v2)

# Now add the vectors
result <- v1+v2
print(result)

# result
[[1]]
[1] 1 2 3 4 5

[[1]]
[1] 10 11 12 13 14

[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19

矩阵

矩阵是其中元素以二维矩形布局布置的R对象

语法
- ```
matrix(data, nrow, ncol, byrow, dimnames)
```
  - data是成为矩阵的数据元素的输入向量。
  - nrow是要创建的行数。
  - ncol是要创建的列数。
  - byrow是一个逻辑线索。如果为TRUE，则输入向量元素按行排列。
  - dimname是分配给行和列的名称。

demo

# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)

# result
     col1 col2 col3
row1    3    4    5
row2    6    7    8
row3    9   10   11
row4   12   13   14

访问矩阵元素

使用元素的列和行索引来访问矩阵的元素

# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

# Create the matrix.
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))

# Access the element at 3rd column and 1st row.
print(P[1,3])

# Access the element at 2nd column and 4th row.
print(P[4,2])

# Access only the  2nd row.
print(P[2,])

# Access only the 3rd column.
print(P[,3])

# result
[1] 5
[1] 13
col1 col2 col3 
   6    7    8 
row1 row2 row3 row4 
   5    8   11   14

矩阵计算
- 对于操作中涉及的矩阵，维度（行数和列数）应该相同。
- 加减乘除，都是对应元素位置做运算
- 乘除法的运算都是对应元素的乘除

数组

可以在两个以上维度中存储数据的R数据对象。

使用array()函数创建数组。它使用向量作为输入，并使用dim参数中的值创建数组。

如果我们创建一个维度(2，3，4)的数组，则它创建4个矩形矩阵，每个矩阵具有2行和3列。数组只能存储数据类型。

demo

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)

# result
, , 1

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

, , 2

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

注意：这里是按列填充

命名行列

使用dimnames参数给数组中的行，列和矩阵命名

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(row.names,column.names,
   matrix.names))
print(result)

# result
, , Matrix1

     COL1 COL2 COL3
ROW1    5   10   13
ROW2    9   11   14
ROW3    3   12   15

, , Matrix2

     COL1 COL2 COL3
ROW1    5   10   13
ROW2    9   11   14
ROW3    3   12   1

访问数组元素：类似python的切片

# Print the third row of the second matrix of the array.
print(result[3,,2])

# Print the element in the 1st row and 3rd column of the 1st matrix.
print(result[1,3,1])

# Print the 2nd Matrix.
print(result[,,2])

操作数组元素
- 对数组元素的操作通过访问矩阵的元素来执行

跨数组元素的计算

使用**apply()**函数在数组中的元素上进行计算
语法
```
apply(x, margin, fun)
```
- x是一个数组。
- margin是所使用的数据集的名称。
- fun是要应用于数组元素的函数。

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.
new.array <- array(c(vector1,vector2),dim = c(3,3,2))
print(new.array)

# Use apply to calculate the sum of the rows across all the matrices.
result <- apply(new.array, c(1), sum)
print(result)

# result
, , 1

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

, , 2

     [,1] [,2] [,3]
[1,]    5   10   13
[2,]    9   11   14
[3,]    3   12   15

[1] 56 68 60
# 解释：
参数c(1)不清楚
56 = (5   10   13)*2

因子（有点像set）

使用**factor()**函数通过将向量作为输入创建因子。

因子是用于对数据进行分类并将其存储为Levels的数据对象。

它们可以存储字符串和整数。

它们在具有有限数量的唯一值的列中很有用。

它们在统计建模的数据分析中很有用。

demo

基本用法(它的Level类似集合set的概念)

# Create a vector as input.
data <- c("East","West","East","North","North","East","West","West","West","East","North")

print(data)
print(is.factor(data))

# Apply the factor function.
factor_data <- factor(data)

print(factor_data)
print(is.factor(factor_data))

# result
[1] "East"  "West"  "East"  "North" "North" "East"  "West"  "West"  "West"  "East" "North"
[1] FALSE
[1] East  West  East  North North East  West  West  West  	  East  North
	Levels: East North West
[1] TRUE

数据帧（就是表格数据）的因子

对数据帧的文本数据列，R语言默认视为分类数据并在其上创建因子。

# Create the vectors for data frame.
height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")

# Create the data frame.
input_data <- data.frame(height,weight,gender)
print(input_data)

# Test if the gender column is a factor.
print(is.factor(input_data$gender))

# Print the gender column so see the levels.
print(input_data$gender)

# result
  height weight gender
1    132     48   male
2    151     49   male
3    162     66 female
4    139     53 female
5    166     67   male
6    147     52 female
7    122     40   male
[1] TRUE
[1] male   male   female female male   female male  
Levels: female male

# result
  height weight gender
1    132     48   male
2    151     49   male
3    162     66 female
4    139     53 female
5    166     67   male
6    147     52 female
7    122     40   male
[1] TRUE
[1] male   male   female female male   female male  
Levels: female male

更改级别顺序

在factor()函数中，使用levels = c( )参数规定

data <- c("East","West","East","North","North","East","West","West","West","East","North")
# Create the factors
factor_data <- factor(data)
print(factor_data)

# Apply the factor function with required order of the level.
new_order_data <- factor(factor_data,levels = c("East","West","North"))
print(new_order_data)

# result
 [1] East  West  East  North North East  West  West  West  East  North
Levels: East North West
 [1] East  West  East  North North East  West  West  West  East  North
Levels: East West North

生成因子级别

使用**gl()**函数生成因子级别。它需要两个整数作为输入，指示每个级别有多少级别和多少次。
语法
```
gl(n, k, labels)
```
- n是给出级数的整数。
- k是给出复制数目的整数。
- labels是所得因子水平的标签向量。

demo

# 3级，每级复制4个，具体的元素见labels
v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)

# result
Tampa   Tampa   Tampa   Tampa   Seattle Seattle Seattle Seattle Boston Boston  Boston  Boston 
Levels: Tampa Seattle Boston

数据帧

数据帧是表或二维阵列状结构，理解成表格就好了

特性
- 列名称应为非空。
- 行名称应该是唯一的。
- 存储在数据帧中的数据可以是数字，因子或字符类型。
- 每个列应包含相同数量的数据项。

创建数据帧

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)
# Print the data frame.			
print(emp.data) 

# result
 emp_id    emp_name     salary     start_date
1     1     Rick        623.30     2012-01-01
2     2     Dan         515.20     2013-09-23
3     3     Michelle    611.00     2014-11-15
4     4     Ryan        729.00     2014-05-11
5     5     Gary        843.25     2015-03-27

获取数据帧结构

使用str()函数可以看到数据帧的结构。

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)
# Get the structure of the data frame.
str(emp.data)

# result
'data.frame':   5 obs. of  4 variables:
 $ emp_id    : int  1 2 3 4 5
 $ emp_name  : chr  "Rick" "Dan" "Michelle" "Ryan" ...
 $ salary    : num  623 515 611 729 843
 $ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...

数据帧的数据摘要

应用**summary()**函数获取数据的统计摘要和性质

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)
# Print the summary.
print(summary(emp.data)) 

# result
    emp_id    emp_name             salary        start_date        
 Min.   :1   Length:5           Min.   :515.2   Min.   :2012-01-01  
 1st Qu.:2   Class :character   1st Qu.:611.0   1st Qu.:2013-09-23  
 Median :3   Mode  :character   Median :623.3   Median :2014-05-11  
 Mean   :3                      Mean   :664.4   Mean   :2014-01-14  
 3rd Qu.:4                      3rd Qu.:729.0   3rd Qu.:2014-11-15  
 Max.   :5                      Max.   :843.2   Max.   :2015-03-27

从数据帧提取数据

# 提取1，2行
result <- emp.data[1:2,]
# 用第2和第4列提取第3和第5行
result <- emp.data[c(3,5),c(2,4)]

扩展数据帧

通过添加列和行来扩展数据帧。

使用新的列名称添加列

# Add the "dept" coulmn.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)

将更多行永久添加到现有数据帧,结构要一致，使用**rbind()**函数。

等同于添加多个结构一样的数据帧

# Create the first data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   dept = c("IT","Operations","IT","HR","Finance"),
   stringsAsFactors = FALSE
)

# Create the second data frame
emp.newdata <- 	data.frame(
   emp_id = c (6:8), 
   emp_name = c("Rasmi","Pranab","Tusar"),
   salary = c(578.0,722.5,632.8), 
   start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
   dept = c("IT","Operations","Fianance"),
   stringsAsFactors = FALSE
)

# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

# result
  emp_id     emp_name    salary     start_date       dept
1      1     Rick        623.30     2012-01-01       IT
2      2     Dan         515.20     2013-09-23       Operations
3      3     Michelle    611.00     2014-11-15       IT
4      4     Ryan        729.00     2014-05-11       HR
5      5     Gary        843.25     2015-03-27       Finance
6      6     Rasmi       578.00     2013-05-21       IT
7      7     Pranab      722.50     2013-07-30       Operations
8      8     Tusar       632.80     2014-06-17       Fianance

hellopbc

关注

0
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
R语言基础用法

文章目录R_baserefbase基本语法数据类型变量运算符决策 if包循环数据重塑函数字符串向量列表矩阵数组因子（有点像set）数据帧R_baserefw3schoolbase基本语法只提供单行注释；tips：使用if(FALSE){"content"}作为多行注释的样式赋值：使用<-符号输出：print(var)运行：终端运行：终端输入R , 启动R语言解释器脚本运行：Rscript xxx.Rif(FALSE) { "This is a demo
复制链接

扫一扫

专栏目录