R Programming - Data Type

The definition of free software consists of 4 freedoms

  • The freedom to study how the program works, and adapt it to your needs.
  • The freedom to redistribute copies so you can help your neighbor
  • The freedom to run the program, for any purpose.
  • The freedom to improve the program, and release your improvements to the public, so that the whole community benefits.

Objects

R data types: objects and attributes

R has five basic or "atomic" classes of objects:

  • character
  • numeric(real numbers)
  • integer
  • complex(complex number)
  • logical(True/False)

The most basic object is a vecrtor

  • A vector can only contain objects of the same class
  • BUT: The one exception is a list, which is represented as a vector but can contain objects of different classes(indeed, that's usually why we use them)

Empty vectors can be created with the vector() function

Numbers

  • Numbers in R a generally treated as numeric objects(i.e. double precision real numbers);
  • If you explicitly want an integer, you need to specify the L suffix
    • Ex: Entering 1 gives you a numeric object; enteruing 1L explicitly gives you an integer.
  • There is also a special number Inf which represents infinity;  e.g. 1/0; Inf can be used in ordinary calculations; e.g. 1/Inf is 0
  • The value NaN represents an undefined value ("not a number"); e.g. 0/0; NaN can also be thought of as a missing value

Attributes

R objects can have attributes

  • names, dimnames(dimension name)
  • dimensions (e.g. matrices, arrays)
  • class
  • length
  • other user-defined attributes/metadata

Attributes of an object can be accessed using the attributes() function

Vectors

The c() function can be used to created vectors of objects

> x <- c(0.5, 0.6)           ## numeric
> x <- c(TRUE, FALSE) ## logical
> x <- c(T, F)                  ## logical
> x <- c("a", "b", "c")    ## character
> x <- 9:29                     ## integer
> x <- c(1+0i, 2+4i)       ## complex 

Using the vector() function

> x <- vector("numeric", length=10)   ## default value is zero
> x
 [1] 0 0 0 0 0 0 0 0 0 0       
 

> y <- c(1.7, "a")      ## character(least common denominator)
> y <- c(TRUE, 2)    ## numeric(True=1, False=0) 
> y <- c("a", TRUE) ## character

when different objects are mixed in a vector, coercion occurs so that every element in the vector is of the same class. 

elements of a vector all must be of the same class

Explicit Coercion

Objects can be explicitly coerced from one class to another using the as.* functions, if available

> x <- 0:6
> class(x)
[1] "integer"
> as.numeric(x)
[1] 0 1 2 3 4 5 6
> as.logical(x)
[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE          ## Turn 0 to False
> as.character(x)
[1] "0" "1" "2" "3" "4" "5" "6"
 

Nonsenscial coercion results in NAS

> x <- c("a", "b", "c")
> as.numeric(x)
[1] NA NA NA
Warning message:
NAs introduced by coercion 
> as.logical(x)
[1] NA NA NA

Lists

Lists are a special type of vector that can contain elements of different classes. Lists are a very important data type in R and you should get to know them well.

> x <- list(1, "a", TRUE, 1+4i)
> x
[[1]]  ## index
[1] 1

[[2]]
[1] "a"  ## a character vector containing the letter "a", a character vector of length 1.

[[3]]
[1] TRUE

[[4]]
[1] 1+4i

Matrices

Matrices are vectors with a dimension attribute. The dimension attribute is itself an integer vector of length 2 (nrow, ncol)

> m <- matrix(nrow=2, ncol=3)
> m
     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   NA   NA
> dim(m)
[1] 2 3
> attributes(m)
$dim
[1] 2 3
 

(cont'd)

Matrices are constructed column-wise, so entries can be thought of starting in the "upper left" corner and running down the columns

> m <- matrix(1:6, nrow=2, ncol=3)
> m
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
 

cbind-ing and rbing-ing

Matrices can be created by column-binding or row-binding with cbind() and rbind()

> x <- 1:3
> y <- 10:12
> cbind(x, y)
     x  y
[1,] 1 10
[2,] 2 11
[3,] 3 12
> rbind(x,y)
  [,1] [,2] [,3]
x    1    2    3
y   10   11   12
 

Factors

Factors are used to represent categorical data. Factors can be unordered or ordered/ One can think of a factor as an integer vector where each integer has a label.

  • Factors are treated specially by modelling functions like lm() and glm()
  • Using factors with labels is better than using integers because factors are self-describing; having a variable that has values "Male" and "Female" is better than a variable that has values 1 and 2> x <- factor(c("yes", "yes", "no", "yes", "no"))

> x <- factor(c("yes", "yes", "no", "yes", "no"))
> x
[1] yes yes no  yes no 
Levels: no yes
> table(x)
x
 no yes 
  2   3 
> unclass(x)   ## yes=2, no=1
[1] 2 2 1 2 1
attr(,"levels")
[1] "no"  "yes"

The order of the levels can be set using the levels argument to factor(). This can be important in linear modelling because the first level is used as the baseline level

> x <- factor(c("yes","yes","no","yes","no"), levels=c("yes","no"))
> x
[1] yes yes no  yes no 
Levels: yes no

Missing values

Missing values are denoted by NA or NaN for undefined mathematical operations

  • is.na() is used to test objects if they are NA
  • is.nan() is used to test for NaN
  • NA values have a class also, so there are integer NA, character NA, eyc
  • A NaN value is also NA but the converse is not true

> x <- c(1,2,NA,10,3)
> is.na(x)
[1] FALSE FALSE  TRUE FALSE FALSE
> is.nan(x)
[1] FALSE FALSE FALSE FALSE FALSE

> x <- c(1,2,NaN,NA,4)
> is.na(x)
[1] FALSE FALSE  TRUE  TRUE FALSE
> is.nan(x)
[1] FALSE FALSE  TRUE FALSE FALSE

Data Frame

Data frames are used to store tabular data

  • They are represented as a special type of list where every element of the list has to have the same length
  • Each element of the list can be thought of as a column and the length of each element of the list is the number of rows
  • Unlike matrices, data frames can store different classes of objects in each column (just like lists);matrices must have every element be the same class
  • Data frames also have a special attribute called row.names
  • Data frames are usually created by calling read.table() or read.csv()
  • Can be converted to a matrix by calling data.matrix()

> x <- data.frame(foo=1:4, bar=c(T,T,F,F))
> x
  foo   bar
1   1  TRUE
2   2  TRUE
3   3 FALSE
4   4 FALSE
> nrow(x)
[1] 4
> ncol(x)
[1] 2

Names Attributes ​​​​​​​

R objects can also have names, which is very useful for writing readable code and self-describing objects

> x <- 1:3
> names(x)
NULL
> names(x) <- c("foo","bar","norf")
> x
 foo  bar norf 
   1    2    3 
> names(x)
[1] "foo"  "bar"  "norf"
 

Names

Lists can also have names

> x <- list(a=1, b=2, c=3)
> x
$a
[1] 1

$b
[1] 2

$c
[1] 3
 

And matrices

> m <- matrix(1:4, nrow=2, ncol=2)
> dimnames(m) <- list(c("a","b"), c("c","d"))
> m
  c d
a 1 3
b 2 4

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值