R类型转换

最新推荐文章于 2023-03-17 09:23:10 发布

huizhanding

最新推荐文章于 2023-03-17 09:23:10 发布

阅读量1.7k

点赞数

分类专栏： R

本文链接：https://blog.csdn.net/weixin_39684745/article/details/77543271

版权

R 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

R的类型

1. character:"treatment","22","A"

2.numeric:23.44,120,NaN

3.integer:4L,1123L

4.factor:factor("HELLO")

5.Logical:FALSE,TRUE,NA

class("")可以显示变量的类型

R类型转换函数as.新的类型：

as.character(2016)可以将2016转换为char类型

as.factor("something"), 可以得到：levels:something

as.logical(0)可以得到结果FALSE

**包：lubridate可以将string转换为date

library(lubridate)

ymd("2015-08-23")可以将该日期转换为UTC日期类型

ymd("2015 August 25")和

mdy("August 25,2015") 返回 “2015-08-25 UTC”

hms("13:33:09")返回“13H 33M 9S”

比如下边的类型转换：

# Make this evaluate to character
class(true)

# Make this evaluate to numeric
class(8484.00)，此处数字两侧没有双引号

# Make this evaluate to integer
class(99L)，L可以将此变量作为证书

# Make this evaluate to factor
class(factor("factor"))

# Make this evaluate to logical
class(FALSE)，TRUE和FALSE不需要加双引号

# Preview students2 with str()
str(students2)

# Load the lubridate package
library(lubridate)

# Parse as date
dmy("17 Sep 2015")

# Parse as date and time (with no seconds!)
mdy_hm("July 15, 2012 12:56")

# Coerce dob to a date (with no time)
students2$dob <- ymd(students2$dob)

# Coerce nurse_visit to a date and time
students2$nurse_visit <- ymd_hms(students2$nurse_visit)

# Look at students2 once more with str()
str(students2)

**包：stringr

str_trim(" this is a test ")返回"this is a test"

#pad string with zero

str_pad("24493",width=7, side="left",pad="0")给一个7位的字段补上前置0，返回"0024493"

friends=c("Sarah","Tom","Alice")

str_detect(friends,"Alice")--detect a pattern

返回FALSE FALSE TRUE

str_replace()-Find and replace a pattern

tolower()-make all lowercase

toupper()-make all uppercase

# Load the stringr package
library(stringr)

# Trim all leading and trailing whitespace
c(" Filip ", "Nick ", " Jonathan")
str_trim(c(" Filip ", "Nick ", " Jonathan"))
# Pad these strings with leading zeros
c("23485W", "8823453Q", "994Z")

str_pad(c("23485W", "8823453Q", "994Z"),width=9,side="left",pad="0")

Missing values

May be random, but dangerous to assume

Sometimes associated with variable/outcome of interest

In R, represented as NA

May appear in other forms

#N/A (Excel)

Single dot (SPSS, SAS)

Empty string

Inf - "Infinite value" (indicative of outliers?)

1/0

1/0 + 1/0

33333^33333

NaN - "Not a number" (rethink a variable?)

0/0

1/0 - 1/0

Dealing with outliers and obvious errors

When dealing with strange values in your data, you often must decide whether they are just extreme or actually erroneous. Extreme values show up all over the place, but you, the data analyst, must figure out when they are plausible and when they are not.

We have loaded a dataset called students3, which is another slight variation of the original students dataset. Two variables appear to have suspicious values: age and absences. Let's explore these values further.

Another look at strange values

Another useful way of looking at strange values is with boxplots. Simply put, boxplots draw a box around the middle 50% of values for a given variable, with a bolded horizontal line drawn at the median. Values that fall far from the bulk of the data points (i.e. outliers) are denoted by open circles. (If you're curious about the exact formula for determining what is "far", check out ?hist.)

In this situation, we are concerned about three things:

Since this dataset is about students and the only student above the age of 22 is 38 years old, we must wonder whether this is an error in the data or just an older student (perhaps returning to school after working for several years)
There are four values of -1 for the absences variable, which is either a mistake or an intentional coding meant to say, for example, "this value is missing"
There are several extreme values of absences in the positive direction, with a maximum value of 75 (which is over 18 times the median value of 4)