R 简介
R 图形用户界面
在下载安装完R语言后我们可以使用RStudio来作为R的图形用户界面
//先安装完成R后,我们再下载RStudio来并从中添加R的路径
RStudio图形用户界面如上图所示。
左上部分是脚本区域 (Scripts),我们可以从这里输入脚本代码。
左下角部分是控制台(Console),脚本区域的运行结果将会在这里显示。
右上部分是工作区(Workspace),我们可以从这里看到当前项目的变量情况。
右下部分是(Plots)区,程序输出的图片将会在这里被展示,同时help等功能也可以在这里找到。
帮助功能: Help(lm) 或者 ?lm
Edit() and fix() 允许更新R变量的内容
Save.image() function 来创建 .Rdata file
Load.image() function to 加载.Rdata file
数据导入和导出
sales <- read.csv("c:/data/yearly_sales.csv")
# 在导入的时候设置路径
setwd("c:/data/")
sales <- read.csv("yearly_sales.csv")
# 先设置完默认路径再导入
sales$per_order < - sales$sales_total/sales$num_of_orders
# 为每个订单的平均销售额添加一个列
write.table(sales,"sales_modified.txt", sep="\t", row.names=FALSE)
# 以制表符分隔,不带行名的方式导出数据
相关的更多内容请参考:
https://cran.r-project.org/doc/manuals/r-release/R-data.html
属性和数据类型
Attributes: Nominal, Ordinal, Interval and Ratio
Data Types: Numeric, character, logical (and list)
Vectors
– A basic building block for data in R
– Simple R variables are actually vectors
– Can only consist of values in the same class
vector() function, by default, create a logical vector
Arrays and Matrices
the dimensions are 3 regions, 4 quarters, and 2 years
quarterly_sales <- array(0, dim=c(3,4,2))
build a 3x3 matrix
M <- matrix(c(1,3,3,5,0,4,3,3,3),nrow = 3,ncol = 3)
M %*% matrix.inverse(M)
#multiply M by inverse(M)
Data Frames
– A structure for storing and accessing several variables of possibly different data types
– Preferred input format for many R functions
关键是使用不同类型的数据,首选/常用的格式。
举例:
sales <- read.csv("c:/data/yearly_sales.csv")
is.data.frame(sales) # returns TRUE
is.vector(sales$cust_id) # returns TRUE
is.vector(sales$sales_total) # returns TRUE
is.vector(sales$num_of_orders) # returns TRUE
is.vector(sales$gender) # returns FALSE
is.factor(sales$gender) # returns TRUE
List: a collection of objects that can be of various types, including other lists
sales <- read.csv("c:/data/yearly_sales.csv")
class(sales) #returns “data.frame”
typeof(sales) #returns “list”
#build an assorted list of a string, a numeric,
#a list, a vector, and a matrix
housing <- list("own", "rent")
assortment <- list("football", 7.5, housing, v, M)
assortment
str(assortment)
Factors: a categorical variable, typically with a few finite levels such as “F” and “M”.
Factors can be ordered or not ordered
其中最常用的情况就是性别了。
class(sales$gender) # returns "factor"
is.ordered(sales$gender) # returns FALSE
Contingency Tables
– A class of objects used to store the observed counts across the factors for a given dataset
– The basis for performing a statistical test on the independence of the factors
#建立一个基于性别和消费因素的Contingency Tables
sales_table <- table(sales$gender,sales$num_of_orders)
sales_table
class(sales_table) # returns "table"
typeof(sales_table) # returns "integer"
dim(sales_table) # returns 2 3
#卡方检验
summary(sales_table)
描述统计
Descriptive Statistics
– Summary() function: mean, median, min, max
– R functions include descriptive statistics
# to simplify the function calls, assign
x <- sales$sales_total
y <- sales$num_of_orders
cor(x,y) # returns 0.7508015 (correlation)
cov(x,y) # returns 345.2111 (covariance)
IQR(x) # returns 215.21 (interquartile range)
mean(x) # returns 249.4557 (mean)
median(x) # returns 151.65 (median)
range(x) # returns 30.02 7606.09 (min max)
sd(x) # returns 319.0508 (std. dev.)
var(x) # returns 101793.4 (variance)
参考书目
- Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, EMC Education Services, John Wiley & Sons, 27 Jan. 2015