R语言-datacamp笔记-introduction to R-3 factor

R语言-datacamp笔记-introduction to R-3 factor

datacamp-introduction to R-3 factor因子

  1. What’s a factor and why would you use it?
    (1) R将变量分为连续变量和分类变量
    分类变量:性别是一个典型例子
  `# Assign to the variable theory what this chapter is about!
	theory <- "factors"`
  1. What’s a factor and why would you use it? (2)
    (1) 在R中创建因子,可以用factor()函数:
    首先创造一个向量,它包含对一个数量有限的类别的所有观察。例如:
   # Sex vector
    sex_vector <- c("Male", "Female", "Female", "Male", "Male")

这个例子有两个类别,R的术语是“因素水平”(factor levels)
factor()函数将向量转换成因子:

   # Convert sex_vector to a factor
    factor_sex_vector <- factor(sex_vector)

输出得到结果

   # Print out factor_sex_vector
	factor_sex_vector
factor_sex_vector
[1] Male   Female Female Male   Male  
Levels: Female Male
  1. What’s a factor and why would you use it? (3)
    (1) 分类变量可以进一步分为:名称类别变量和序数类别变量(nominal categorical variable和 ordinal categorical variable).
    A.名称类别变量不隐含顺序,即无价值排序。比如动物分类:
    # Animals
	animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
	factor_animals_vector <- factor(animals_vector)
	factor_animals_vector

运行结果

    # Animals
	animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
	factor_animals_vector <- factor(animals_vector)
	factor_animals_vector
	[1] Elephant Giraffe  Donkey   Horse   
	Levels: Donkey Elephant Giraffe Horse

B.相反,序数类别变量隐含自然顺序,有价值排序。比如温度分类:

	# Temperature
	temperature_vector <- c("High", "Low", "High","Low", "Medium")
	factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
	factor_temperature_vector

运行结果

	# Temperature
	temperature_vector <- c("High", "Low", "High","Low", "Medium")
	factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
	factor_temperature_vector
	[1] High   Low    High   Low    Medium
	Levels: Low < Medium < High
  1. Factor levels
    (1) 用levels()函数来改变默认分类标准。
   # Code to build factor_survey_vector
	survey_vector <- c("M", "F", "F", "M", "M")
	factor_survey_vector <- factor(survey_vector)
	factor_survey_vector

运行结果

[1] M F F M M
Levels: F M

指定分类标准

	# Specify the levels of factor_survey_vector
	levels(factor_survey_vector) <- c("Female", "Male")
	factor_survey_vector

运行结果

	[1] Male   Female Female Male   Male  
	Levels: Female Male
  1. summary()函数
    (1) R用lsummary()函数来快速概述一个变量。
	# Build factor_survey_vector with clean levels
	survey_vector <- c("M", "F", "F", "M", "M")
	factor_survey_vector <- factor(survey_vector)
	levels(factor_survey_vector) <- c("Female", "Male")
	factor_survey_vector
	# Generate summary for survey_vector
	summary(survey_vector)
	# Generate summary for factor_survey_vector
	summary(factor_survey_vector)

运行结果

> factor_survey_vector
[1] Male   Female Female Male   Male  
Levels: Female Male

> summary(survey_vector)
   Length     Class      Mode 
        5 character character 
        
> summary(factor_survey_vector)
Female   Male 
     2      3 
  1. 检测名称类别变量是否能比较大小
    (1) R中名称类别变量不能比较大小
	# Build factor_survey_vector with clean levels
	survey_vector <- c("M", "F", "F", "M", "M")
	factor_survey_vector <- factor(survey_vector)
	levels(factor_survey_vector) <- c("Female", "Male")
	
	# Male
	male <- factor_survey_vector[1]
	
	# Female
	female <- factor_survey_vector[2]
	
	# Battle of the sexes: Male 'larger' than female?
	male > female

运行结果

> male > female
[1] NA
Warning message:
In Ops.factor(male, female) :>’ not meaningful for factors
  1. 有序因子
    (1) R中序数类别变量能比较大小
    需添加两个参数
    ordered - 是否排序
    levels - 顺序:从左至右 从小到大
	# Create speed_vector
	speed_vector <- c("medium", "slow", "slow", "medium", "fast")
	
	# Convert speed_vector to ordered factor vector
	factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "medium", "fast"))
	
	# Print factor_speed_vector
	factor_speed_vector
	summary(factor_speed_vector)

运行结果

> factor_speed_vector
[1] medium slow   slow   medium fast  
Levels: slow < medium < fast
> summary(factor_speed_vector)
  slow medium   fast 
     2      2      1 

序数类别变量能够比较大小

	# Create factor_speed_vector
	speed_vector <- c("medium", "slow", "slow", "medium", "fast")
	factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "medium", "fast"))
	
	# Factor value for second data analyst
	da2 <- factor_speed_vector[2]
	
	# Factor value for fifth data analyst
	da5 <- factor_speed_vector[5]
	
	# Is data analyst 2 faster than data analyst 5?
	da2 > da5

运行结果

> da2 > da5
[1] FALSE
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值