获取数据维度
> dim(df)
[1] 14 5
获取数据结构
>
str(df)
'data.frame': 14 obs. of 5 variables:
$ Outlook : Factor w/ 3 levels "overcast","rainy",..: 3 3 1 2 2 2 1 3 3 2 ...
$ Temperature: Factor w/ 3 levels "cool","hot","mild": 2 2 2 3 1 1 1 3 1 3 ...
$ Humidity : Factor w/ 2 levels "high","normal": 1 1 1 1 2 2 2 1 2 2 ...
$ Windy : logi FALSE TRUE FALSE FALSE FALSE TRUE ...
$ Play : Factor w/ 2 levels "no","yes": 1 1 2 2 2 1 2 1 2 2 ...
获取数据统计信息
>
summary(df)
Outlook Temperature Humidity Windy Play
overcast:4 cool:4 high :7 Mode :logical no :5
rainy :5 hot :4 normal:7 FALSE:8 yes:9
sunny :5 mild:6 TRUE :6
选取指定的列
> df <- data.frame(df$Temperature, df$Outlook)
> df
df.Outlook df.Temperature
1 sunny hot
2 sunny hot
3 overcast hot
4 rainy mild
5 rainy cool
6 rainy cool
7 overcast cool
8 sunny mild
9 sunny cool
10 rainy mild
11 sunny mild
12 overcast mild
13 overcast hot
14 rainy mild
查看部分数据:
head(df)
tail(df)
|
赋予新列名
> names(ndata) <- c("temp", "out")
> names(ndata)
[1] "temp" "out"
获取列名
>
names(tdata)
[1] "Outlook" "Temperature" "Humidity" "Windy" "Play"
类型转换
df$Windy <- as.character(df$Windy)
df$Windy[df$Windy == "0"] <- "FALSE"
df$Windy[df$Windy == "1"] <- "TRUE"
df$Windy <- as.factor(df$Windy)
> summary(df$Windy)
FALSE TRUE
8 6
提取前2行
> result <- tdata[1:2,]
> result
Outlook Temperature Humidity Windy Play
1 sunny hot high FALSE no
2 sunny hot high TRUE no
# Extract 3rd and 5th row with 2nd and 4th column.
> result <- tdata[c(3,5),c(2,4)]
> result
Temperature Windy
3 hot FALSE
5 cool FALSE
合并列
cbind
合并行
rbind
清除包含NA的行
na.omit(df)
新增列
rs <- mutate(rs, flg = (beg_dif >0 & end_dif > 0) )
或
flg <- logical()
flg <- data.frame( rs$beg_dif >0 & rs$end_dif > 0 )
rs <-
cbind(rs, flg)
删除列,通过subset反向select。
rs <- subset(rs, select = -flg)
一个比较详细dataframe操作的参考:
https://www.datacamp.com/community/tutorials/15-easy-solutions-data-frame-problems-r