R学习笔记之基础

最新推荐文章于 2023-06-14 08:00:00 发布

小白菜czl

最新推荐文章于 2023-06-14 08:00:00 发布

阅读量342

点赞数

分类专栏： R 数据分析文章标签：数据挖掘 R语言

原文链接：https://www.w3cschool.cn/r/r_loops.html

版权

R 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

数据分析

2 篇文章 0 订阅

订阅专栏

R基础

注释

R语言不支持多行注释, 可以用下面方法进行注释

if(FALSE) {
   "This is a demo for multi-line comments and it should be put inside either a single
      OR double quote"
}
myString <- "Hello, World!"
print ( myString)

命名规则

大原则：只有字母（区分大小写）、数字、“_”（下划线）、“.”（英文句号）可以出现。
数字、下划线不能开头。
英文句号开头不能紧接数字。

数据类型

经常使用的对象有矢量、列表、矩阵、数组、因子、数据帧。

这些对象中最简单的是向量对象，并且这些原子向量有六种数据类型，也称为六类向量。其他R对象建立在原子向量之上。

Logical（逻辑型） TRUE, FALSE

Numeric（数字） 12.3，5，999

Integer（整型） 2L，34L，0L

Complex（复合型） 3 + 2i

Character（字符） ‘a’ , '“good”, “TRUE”, ‘23.4’

Raw（原型） “Hello” 被存储为 48 65 6c 6c 6f 方法: charToRaw(“Hello”)

Vectors 向量

c()

# Create a vector.
apple <- c('red','green',"yellow")
print(apple)

# Get the class of the vector.
print(class(apple))

Lists 列表

list()

# Create a list.
list1 <- list(c(2,5,3),21.3,sin)

# Print the list.
print(list1)

Matrices 矩阵

matrix()

# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)

Arrays 数组

array()

# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

Factors 因子

因子是针对向量对象，取出不重复的元素存储成字符串标签。
**factor()**函数创建因子。**nlevels()**函数给出级别计数。

# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.
factor_apple <- factor(apple_colors)

# Print the factor.
print(factor_apple)
print(nlevels(factor_apple))

Data Frames 数据帧==pd.DataFrame

**data.frame()**函数创建数据帧。

# Create the data frame.
BMI <- 	data.frame(
   gender = c("Male", "Male","Female"), 
   height = c(152, 171.5, 165), 
   weight = c(81,93, 78),
   Age = c(42,38,26)
)
print(BMI)

变量

变量赋值

# Assignment using equal operator.
var.1 = c(0,1,2,3)           

# Assignment using leftward operator.
var.2 <- c("learn","R")   

# Assignment using rightward operator.   
c(TRUE,1) -> var.3           

print(var.1)
cat ("var.1 is ", var.1 ," ")
cat ("var.2 is ", var.2 ," ")
cat ("var.3 is ", var.3 ," ")

注：向量c（TRUE，1）具有逻辑和数值类的混合。因此，逻辑类强制转换为数字类，使TRUE为1

变量的数据类型（R和Python都是动态语言类型）

var_x <- "Hello"
cat("The class of var_x is ",class(var_x),"
")

var_x <- 34.5
cat("  Now the class of var_x is ",class(var_x),"
")

var_x <- 27L
cat("   Next the class of var_x becomes ",class(var_x),"
")

查找变量

ls()函数查看当前工作空间中的所有可用变量

查看所有变量

print(ls())

匹配变量名，含有var的变量

# List the variables starting with the pattern "var".
print(ls(pattern = "var"))

查看隐藏变量ls(all.name = TRUE)

print(ls(all.name = TRUE))

删除变量

rm(var.3)
print(var.3)

运算符

算术运算符

±*/不变, %%取余, ％/％取整, ^将第二向量作为第一向量的指数

关系运算符（>=<）

第一向量的每个元素与第二向量的相应元素进行比较

逻辑运算符 (&|!)

只适用于逻辑，数字或复杂类型的向量, 所有大于等于1的数字都是True，对应的元素相比较

赋值运算符

左分配: <-、=、<<-
右分配: ->、 ->>

其他运算符

: 冒号运算符，它为向量按顺序创建一系列数字。

v <- 2:8
print(v)

%in%，元素是否属于向量。

v1 <- 8
v2 <- 12
t <- 1:10
print(v1 %in% t) 
print(v2 %in% t)

%*%，将矩阵与其转置相乘。

M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow = TRUE)
t = M %*% t(M)
print(t)

决策

if…else if…else 语句

x <- c("what","is","truth")

if("Truth" %in% x) {
   print("Truth is found the first time")
} else if ("truth" %in% x) {
   print("truth is found the second time")
} else {
   print("No truth found")
}

Switch语句

switch(expression, case1, case2, case3…)

如果expression的值不是字符串，那么它被强制为整数。
在交换机中可以有任意数量的case语句。每个案例后面都跟要比较的值和冒号。
如果整数的值在1和nargs() - 1（参数的最大数目）之间，则对case条件的相应元素求值并返回结果。
如果表达式求值为字符串，那么该字符串与元素的名称匹配。
如果有多个匹配，则返回第一个匹配元素。
无默认参数可用。
在没有匹配的情况下，如果有一个未命名的元素…它的值被返回。（如果有多个这样的参数，则返回错误。）

x <- switch(   
    3,   
    "first",   
    "second",   
    "third",   
    "fourth"
) 
print(x)

包

R语言的包是R函数，编译代码和样本数据的集合，存储在”library”的目录下

获取包含R包的库位置: .libPaths()
获取已安装的所有软件包列表: library()
获取当前在R环境中加载的所有包: search()
安装一个新的软件包:
1. 直接从CRAN安装: install.packages(“Package Name”)
2. 手动安装包: 将所需的包作为.zip文件保存在file_path路径下，install.packages(file_path, repos = NULL, type = “source”)

装载包到库中：

必须将包加载到当前环境中

library("package Name", lib.loc = "path to library")

# Load the package named "XML"
install.packages("E:/XML_3.98-1.3.zip", repos = NULL, type = "source")

循环

repeat

v <- c("Hello","loop")
cnt <- 2

repeat {
   print(v)
   cnt <- cnt+1
   
   if(cnt > 5) {
      break
   }
}

while

v <- c("Hello","while loop")
cnt <- 2

while (cnt < 7) {
   print(v)
   cnt = cnt + 1
}

for

v <- LETTERS[1:4]
for ( i in v) {
   print(i)
}

break

循环终止
终止switch语句

v <- LETTERS[1:6]
for ( i in v) {
  
  if (i == "D") {
    next
  }
  print(i)
}

数据重塑

数据帧中加入列和行

cbind()函数连接多个向量来创建数据帧

# Create vector objects.
city <- c("Tampa","Seattle","Hartford","Denver")
state <- c("FL","WA","CT","CO")
zipcode <- c(33602,98104,06161,80294)

# Combine above three vectors into one data frame.
addresses <- cbind(city,state,zipcode)

# Print the data frame.
print(addresses)

rbind()函数合并两个数据帧，相当于concat，append(竖向拼接)

new.address <- data.frame(
   city = c("Lowry","Charlotte"),
   state = c("CO","FL"),
   zipcode = c("80230","33949"),
   stringsAsFactors = FALSE
)

# Print the new data frame.
print(new.address)

# Combine rows form both the data frames.
all.addresses <- rbind(addresses,new.address)

# Print the result.
print(all.addresses)

合并数据帧

library(MASS)可以加载图书馆名称“MASS”中有关Pima Indian Women的糖尿病的数据集

merge()函数合并两个数据帧, 和pandas用法一致

merged.Pima <- merge(x = Pima.te, y = Pima.tr,
   by.x = c("bp", "bmi"),
   by.y = c("bp", "bmi")
)
print(merged.Pima)
nrow(merged.Pima)
ncol(merged.Pima)

melt()拆分数据

variable是后面的所有列，value是所在列对应的值

molten.ships <- melt(ships, id = c("type","year"))
print(molten.ships)

cast()重构数据

type和year对应的variablel下每个值变成列，求func对应的结果，默认是计数,可以是(sum,mean)

recasted.ship <- cast(molten.ships, type+year~variable,sum)
print(recasted.ship)

函数

function_name <- function(arg_1, arg_2, ...) {
   Function body
   return 
}

内置函数

seq()，mean()，max()，sum(x)和paste(…)等

# Create a sequence of numbers from 32 to 44.
print(seq(32,44))

# Find mean of numbers from 25 to 82.
print(mean(25:82))

# Find sum of numbers frm 41 to 68.
print(sum(41:68))

自定义函数

# Create a function to print squares of numbers in sequence.
new.function <- function(a) {
   for(i in 1:a) {
      b <- i^2
      print(b)
   }
}

# Call the function new.function supplying 6 as an argument.
new.function(6)

字符串

字符串规则和python一样

paste() - 拼接

paste(..., sep = " ", collapse = NULL)

…表示要组合的任意数量的自变量。
sep表示参数之间的任何分隔符。它是可选的。
collapse用于消除两个字符串之间的空格。但不是一个字符串的两个字内的空间

format() - 格式化

format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none"))

x是向量输入。
digits是显示的总位数。
nsmall是小数点右边的最小位数。
scientific科学设置为TRUE以显示科学记数法。
width指示通过在开始处填充空白来显示的最小宽度。
justify是字符串向左，右或中心的显示。

nchar()

py的len()

toupper()

py的upper()

tolower()

py的lower()

substring() - 截取

substring(x,first,last)

x是字符向量输入。
first是要提取的第一个字符的位置。
last是要提取的最后一个字符的位置。

向量

创建

单元素向量
- “abc”
- 12.5
- 63L
- TRUE
- 2+3i
- charToRaw(‘hello’)
多元素向量
- v <- 5:13
- v <- 6.6:12.6
- v <- 3.8:11.4
- seq(5, 9, by=0.8)
C()函数
- 其中一个元素是字符，则非字符值被强制转换为字符类型

# The logical and numeric values are converted to characters.
s <- c('apple','red',5,TRUE)
print(s)

访问

t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")

u <- t[c(2,3,6)]
print(u)

v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)

x <- t[c(-2,-5)]
print(x)

# 0代表不输出, 1代表索引
y <- t[c(0,0,0,0,0,0,1)]
print(y)

向量操作

不等长的两个向量应用算术运算，则较短向量的元素被循环以完成操作

sort()

sort(v, decreasing = FALSE)

v是向量
decreasing默认是升序，为TRUE的时候是降序

# Sorting character vectors.
v <- c("Red","Blue","yellow","violet")
sort.result <- sort(v)
print(sort.result)

# Sorting character vectors in reverse order.
revsort.result <- sort(v, decreasing = TRUE)
print(revsort.result)

列表

规则和py一样

创建列表

list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data)

列表元素命名

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
   list("green",12.3))

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")

# Show the list.
print(list_data)

访问列表元素

列表的元素可以通过列表中元素的索引访问。在命名列表的情况下，它也可以使用名称来访问。

print(list_data[1])

print(list_data$A_Matrix)

操作列表元素

# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])

# Remove the last element.
list_data[4] <- NULL
# Print the 4th Element.
print(list_data[4])

# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])

合并列表

c函数可以使两个列表合并成一个
list()函数使两个列表组合成二维列表

# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")

# Merge the two lists.
exten.list <- c(list1,list2)
merged.list <- list(list1,list2)

# Print the combined list.
print(exten.list)
print(merged.list)

列表转换为向量

unlist转换为向量

list1 <- list(1:5)
print(list1)

v1 <- unlist(list1)
print(v1)

矩阵

相同原子类型的元素

创建

matrix(data, nrow, ncol, byrow, dimnames)

数据是成为矩阵的数据元素的输入向量。
nrow是要创建的行数。
ncol是要创建的列数。
byrow是一个逻辑线索。如果为TRUE，则输入向量元素按行排列。
dimname是分配给行和列的名称。

# Elements are arranged sequentially by row.
M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)

# Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)

# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")

P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)

访问矩阵的元素

print(P[1,3])

print(P[c("row1", "row3"), 'col3'])

索引和name都可以访问，向量也可以

矩阵计算

matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)

matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)

# Calculate the matrices.
# result <- matrix1 + matrix2
# result <- matrix1 - matrix2
# result <- matrix1 * matrix2
result <- matrix1 / matrix2
print(result)

数组

创建

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")

# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list(row.names,column.names,
   matrix.names))
print(result)

访问

result[a,b,c]: 第a行，第b列，第c个矩阵

操作数组元素

取出后做操作，没有区别

apply()函数

apply(x, margin, fun)

x是一个数组。
margin是所使用的数据集的名称。
- 1：基于行
- 2：基于列
- c(1,2)：对行和列都进行操作
fun是要应用于数组元素的函数。

# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)

# Take these vectors as input to the array.
new.array <- array(c(vector1,vector2),dim = c(3,3,2))
print(new.array)

# Use apply to calculate the sum of the rows across all the matrices.
result <- apply(new.array, c(1), sum)
print(result)

因子

因子用于对数据进行分类，并将其存储为级别的数据对象，可以存储字符串和整数。

**factor()**函数通过将向量作为输入创建因子。

# Create a vector as input.
data <- c("East","West","East","North","North","East","West","West","West","East","North")

print(data)
print(is.factor(data))

# Apply the factor function.
factor_data <- factor(data)

print(factor_data)
print(is.factor(factor_data))

数据帧的因子

# Create the vectors for data frame.
height <- c(132,151,162,139,166,147,122)
weight <- c(48,49,66,53,67,52,40)
gender <- c("male","male","female","female","male","female","male")

# Create the data frame.
input_data <- data.frame(height,weight,gender)
print(input_data)

# Test if the gender column is a factor.
print(is.factor(input_data$gender))

# Print the gender column so see the levels.
print(input_data$gender)

factor_data <- factor(input_data$gender)
print(factor_data)

级别顺序

data <- c("East","West","East","North","North","East","West","West","West","East","North")
# Create the factors
factor_data <- factor(data)
print(factor_data)

# Apply the factor function with required order of the level.
# 指定级别顺序
new_order_data <- factor(factor_data,levels = c("East","West","North"))
print(new_order_data)

生成因子级别

gl(n, k, labels)

n是给出级数的整数。
k是给出复制数目的整数。
labels是所得因子水平的标签向量。

v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
print(v)

数据帧

创建数据帧

# Create the data frame.
emp.data <- data.frame(
   emp_id = c (1:5), 
   emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
   salary = c(623.3,515.2,611.0,729.0,843.25), 
   
   start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
      "2015-03-27")),
   stringsAsFactors = FALSE
)
# Print the data frame.			
print(emp.data)

获取数据帧的结构

# Get the structure of the data frame.
str(emp.data)

数据统计摘要

**summary()**函数获取数据的统计摘要和性质

# Print the summary.
print(summary(emp.data))

从数据帧中获取数据

# Extract Specific columns.
result <- data.frame(emp.data$emp_name,emp.data$salary)
print(result)

添加列

# Add the "dept" coulmn.
emp.data$dept <- c("IT","Operations","IT","HR","Finance")
v <- emp.data
print(v)

添加行

emp.newdata <- 	data.frame(
   emp_id = c (6:8), 
   emp_name = c("Rasmi","Pranab","Tusar"),
   salary = c(578.0,722.5,632.8), 
   start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
   dept = c("IT","Operations","Fianance"),
   stringsAsFactors = FALSE
)

# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

小白菜czl

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
R学习笔记之基础

R基础注释R语言不支持多行注释, 可以用下面方法进行注释if(FALSE) { "This is a demo for multi-line comments and it should be put inside either a single OR double quote"}myString <- "Hello, World!"print ( myString)命名规则大原则：只有字母（区分大小写）、数字、“_”（下划线）、“.”（英文句号）可以
复制链接

扫一扫

专栏目录