R类型转换

R的类型

1. character:"treatment","22","A"

2.numeric:23.44,120,NaN

3.integer:4L,1123L

4.factor:factor("HELLO")

5.Logical:FALSE,TRUE,NA



class("")可以显示变量的类型

R类型转换函数as.新的类型:

as.character(2016)可以将2016转换为char类型


as.factor("something"), 可以得到:levels:something


as.logical(0)可以得到结果FALSE


**包:lubridate可以将string转换为date

library(lubridate)

ymd("2015-08-23")可以将该日期转换为UTC日期类型


ymd("2015 August 25")和

mdy("August 25,2015") 返回 “2015-08-25 UTC”


hms("13:33:09")返回“13H 33M 9S”


比如下边的类型转换:

# Make this evaluate to character
class(true)


# Make this evaluate to numeric
class(8484.00),此处数字两侧没有双引号


# Make this evaluate to integer
class(99L),L可以将此变量作为证书


# Make this evaluate to factor
class(factor("factor"))


# Make this evaluate to logical
class(FALSE),TRUE和FALSE不需要加双引号


# Preview students2 with str()
str(students2)


# Load the lubridate package
library(lubridate)


# Parse as date
dmy("17 Sep 2015")


# Parse as date and time (with no seconds!)
mdy_hm("July 15, 2012 12:56")


# Coerce dob to a date (with no time)
students2$dob <- ymd(students2$dob)


# Coerce nurse_visit to a date and time
students2$nurse_visit <- ymd_hms(students2$nurse_visit)
    
# Look at students2 once more with str()
str(students2)


**包:stringr

str_trim("  this is a test    ")返回"this is a test"

#pad string with zero

str_pad("24493",width=7, side="left",pad="0")给一个7位的字段补上前置0,返回"0024493"


friends=c("Sarah","Tom","Alice")

str_detect(friends,"Alice")--detect a pattern

返回FALSE FALSE TRUE


str_replace()-Find and replace a pattern


tolower()-make all lowercase

toupper()-make all uppercase


# Load the stringr package
library(stringr)


# Trim all leading and trailing whitespace
c("   Filip ", "Nick  ", " Jonathan")
str_trim(c("   Filip ", "Nick  ", " Jonathan"))
# Pad these strings with leading zeros
c("23485W", "8823453Q", "994Z")


str_pad(c("23485W", "8823453Q", "994Z"),width=9,side="left",pad="0")


Missing values

May be random, but dangerous to assume

Sometimes associated with variable/outcome of interest

In R, represented as NA

 May appear in other forms

 #N/A (Excel)

Single dot (SPSS, SAS)

 Empty string

Inf - "Infinite value" (indicative of outliers?)

 1/0

 1/0 + 1/0

 33333^33333

 NaN - "Not a number" (rethink a variable?)

 0/0

 1/0 - 1/0






Dealing with outliers and obvious errors



When dealing with strange values in your data, you often must decide whether they are just extreme or actually erroneous. Extreme values show up all over the place, but you, the data analyst, must figure out when they are plausible and when they are not.

We have loaded a dataset called students3, which is another slight variation of the original students dataset. Two variables appear to have suspicious values: age and absences. Let's explore these values further.

Another look at strange values

Another useful way of looking at strange values is with boxplots. Simply put, boxplots draw a box around the middle 50% of values for a given variable, with a bolded horizontal line drawn at the median. Values that fall far from the bulk of the data points (i.e. outliers) are denoted by open circles. (If you're curious about the exact formula for determining what is "far", check out ?hist.)

In this situation, we are concerned about three things:

  1. Since this dataset is about students and the only student above the age of 22 is 38 years old, we must wonder whether this is an error in the data or just an older student (perhaps returning to school after working for several years)
  2. There are four values of -1 for the absences variable, which is either a mistake or an intentional coding meant to say, for example, "this value is missing"
  3. There are several extreme values of absences in the positive direction, with a maximum value of 75 (which is over 18 times the median value of 4)


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值