R学习02(数据集创建)

  • 注意
    1、 TRUE,FALSE严格区分大小写
    2、R不支持多行注释
    3、变量不能被declared,they come into existence on first assignment
    4、

vactor

Note:vectors are one-dimensional arrays,scalars are one-element vectors.


  • 创建

use c() function

a<-c(1,2,3)  #a numeric vector
b<-c("one",'two","three") # a charactor vector
c<-(TRUE,TRUE,FALSE) # a logical vector)
f<-3  # scalars are used to hold constants(常量)

matrices


  • 创建

mymatrix<- matrix(vector,nrow=number_of_rows,ncol=number_of_columns,byrow=logical_value,dimname=list(char_vector_rownames,char_vector_colnames))
默认:byrow=FALSE

> y<-matrix(1:20,nrow=5,ncol=4)
> y
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
> cells<-c(1,2,3,4)
> rnames<-c("R1","R2")
> cnames<-c("C1,","C2")
> mymatrix<-matrix(cells,nrow=2,ncol=2,byrow = TRUE,dimnames=list(rnames,cnames))
> mymatrix
   C1, C2
R1   1  2
R2   3  4

arrays

Arrays are similar to matrices but can have more than two dimensions.
Like matrices,they must be asingle mode.
- 创建

myarray<-array(vertor,dimensions,dimnames)

dimensions is a numeric vector giving the maximal index for each dimension
dimnames is an optional list of dimension lables

> dim1<-c("a1","a2")
> dim2<-c("b1","b2","b3")
> dim3<-c("c1","c2","c3","c4")
> z<-array(1:24,c(2,3,4),dimnames = list(dim1,dim2,dim3))
> z
, , c1

   b1 b2 b3
a1  1  3  5
a2  2  4  6

, , c2

   b1 b2 b3
a1  7  9 11
a2  8 10 12

, , c3

   b1 b2 b3
a1 13 15 17
a2 14 16 18

, , c4

   b1 b2 b3
a1 19 21 23
a2 20 22 24

data frames

a data from is more general than a matrix in that different columns can contain different modes of data.


  • 创建

mydata<- data.frame(col1,clo2,clo3,…)

> ID<-c(1,2,3,4)
> age<-c(25,34,28,52)
> diabetes<-c("type1","type2","type1","type1")
> status<-c("poor","improved","excellent","poor")
> patientdata<-data.frame(ID,age,diabetes,status)
> patientdata
  ID age diabetes    status
1  1  25    type1      poor
2  2  34    type2  improved
3  3  28    type1 excellent
4  4  52    type1      poor

each column must have only one mode,but you can put colmns of different modes together to form the data frame.

factor

variables canbe described as nominal(名义型),ordinal(有序型),or continuous.
Categorical(nominal) and ordered categorical(ordinal) variables in R are called factors.
norminal:上个例子中的diabetes(type1,type2),是无序的
ordinal:上个例子中的status(poor,improved,excellent),是有序的,但不表示数量

factor()
myfactor<-factor(factor_vector,order=TRUE,levels)

> ID<-c(1,2,3,4)
> age<-c(25,34,28,52)
> diabetes<-c("type1","type2","type1","type1")
> status<-c("poor","improved","excellent","poor")
> diabetes<-factor(diabetes)
> status<-factor(status,order=TRUE)
> patientdata<-data.frame(patientdata,age,diabetes,status)
> str(patientdata)
'data.frame':   4 obs. of  7 variables:
 $ ID        : num  1 2 3 4
 $ age       : num  25 34 28 52
 $ diabetes  : Factor w/ 2 levels "type1","type2": 1 2 1 1
 $ status    : Factor w/ 3 levels "excellent","improved",..: 3 2 1 3
 $ age.1     : num  25 34 28 52
 $ diabetes.1: Factor w/ 2 levels "type1","type2": 1 2 1 1
 $ status.1  : Ord.factor w/ 3 levels "excellent"<"improved"<..: 3 2 1 3
> summary(patientdata)
       ID            age         diabetes       status 
 Min.   :1.00   Min.   :25.00   type1:3   excellent:1  
 1st Qu.:1.75   1st Qu.:27.25   type2:1   improved :1  
 Median :2.50   Median :31.00             poor     :2  
 Mean   :2.50   Mean   :34.75                          
 3rd Qu.:3.25   3rd Qu.:38.50                          
 Max.   :4.00   Max.   :52.00                          
     age.1       diabetes.1      status.1
 Min.   :25.00   type1:3    excellent:1  
 1st Qu.:27.25   type2:1    improved :1  
 Median :31.00              poor     :2  
 Mean   :34.75                           
 3rd Qu.:38.50                           
 Max.   :52.00

注意,对factor指定order=TRUE,并规定level,是为了让factor的排序方式与逻辑顺序一致,默认情况是依照字母顺序创建的

list

A list allows you to gather a variety of objects under one name.For example,a list may contain a conbination of vectors,matrices,data frames,and even other lists.


  • 创建

mylist<-list(object1,object2,…)
# Optionally,you can name the objects in a list
mylist<-list(name1=object1,name2=object2,…)
> g<-"my first list"
> h<-c(1,2,3,4)
> j<-matrix(1:10,nrow=5)
> k<-c("one","two","three")
> mylist<-list(title=g,ages=h,j,k)
> mylist
$title
[1] "my first list"

$ages
[1] 1 2 3 4

[[3]]
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10

[[4]]
[1] "one"   "two"   "three"

> mylist[[2]]
[1] 1 2 3 4
> mylist[["ages"]]
[1] 1 2 3 4

R refers to case identifiers as rownames and categorical variables(nominal[名义型] ,ordinal[有序型]) as factors
A dataset is usually a rectangular array of data with rows representing observations and columns representing variables.
R has a wide variaty of objects for hoding data,including scalars(标量),vectors,matrices,arrays,data frame,and lists.——《R in Action》

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值