R 基本1 数据基本操作,matrix应用

loading the data set of 20,000 observations into the R workspace


source("http://www.openintro.org/stat/data/cdc.R")


To view the names of the variables

  

names(dataset)


look at the first/last few entries (rows) of our data with the command head(cdc)

head(cdc) / tail(cdc)


看某一行的统计数据

summary(cdc$some_column)


mean(cdc$weight)


var(cdc$weight)


median(cdc$weight)


categorical data? 绘图


table(cdc$smoke100) , barplot(table(cdc$smoke100))


或者

smoke = table(cdc$smoke100)
barplot(smoke)


multiple variables? 绘图 


gender_smokers = table(cdc$gender, cdc$smoke100)


mosaicplot(gender_smokers)


numerical data 绘图


boxplot(cdc$height ~ cdc$gender) y轴 ~ x 轴


hist(cdc$age)

hist(bmi)

hist(bmi, breaks = 50)


数据形式


dim(cdc)


某一个数据,某几row数据, 某几row数据的全部属性

cdc[row, column]                cdc[1:10, 6]     cdc[1:10, ]



select a variable "weight"

cdc$weight and therefore : cdc$weight[some_row] , cdc$weight[1:some_row]


subsetting


true or false conditions: cdc$age > 30 or cdc$gender == "m"

mdata = subset(cdc, cdc$gender == "m")


peak at the first several rows

head(mdata)


m_and_over30 = subset(cdc, cdc$gender == "m" & cdc$age > 30)


m_or_over30 = subset(cdc, cdc$gender == "m" | cdc$age > 30)













  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值