数据科学与大数据分析学习笔记-3 简单介绍R语言

R 简介

R 图形用户界面

在下载安装完R语言后我们可以使用RStudio来作为R的图形用户界面
//先安装完成R后,我们再下载RStudio来并从中添加R的路径
RStudio图形用户界面
RStudio图形用户界面如上图所示。
左上部分是脚本区域 (Scripts),我们可以从这里输入脚本代码。
左下角部分是控制台(Console),脚本区域的运行结果将会在这里显示。
右上部分是工作区(Workspace),我们可以从这里看到当前项目的变量情况。
右下部分是(Plots)区,程序输出的图片将会在这里被展示,同时help等功能也可以在这里找到。

帮助功能: Help(lm) 或者 ?lm
Edit() and fix() 允许更新R变量的内容
Save.image() function 来创建 .Rdata file
Load.image() function to 加载.Rdata file

数据导入和导出

sales <- read.csv("c:/data/yearly_sales.csv") 
# 在导入的时候设置路径

setwd("c:/data/")
sales <- read.csv("yearly_sales.csv")
# 先设置完默认路径再导入

sales$per_order < - sales$sales_total/sales$num_of_orders
# 为每个订单的平均销售额添加一个列

write.table(sales,"sales_modified.txt", sep="\t", row.names=FALSE)
# 以制表符分隔,不带行名的方式导出数据

相关的更多内容请参考:
https://cran.r-project.org/doc/manuals/r-release/R-data.html

属性和数据类型

Attributes: Nominal, Ordinal, Interval and Ratio
Data Types: Numeric, character, logical (and list)

Vectors
– A basic building block for data in R
– Simple R variables are actually vectors
– Can only consist of values in the same class
vector() function, by default, create a logical vector

Arrays and Matrices
the dimensions are 3 regions, 4 quarters, and 2 years

quarterly_sales <- array(0, dim=c(3,4,2))

build a 3x3 matrix

M <- matrix(c(1,3,3,5,0,4,3,3,3),nrow = 3,ncol = 3)
M %*% matrix.inverse(M) 
#multiply M by inverse(M)

Data Frames
– A structure for storing and accessing several variables of possibly different data types
– Preferred input format for many R functions
关键是使用不同类型的数据,首选/常用的格式。
举例:

sales <- read.csv("c:/data/yearly_sales.csv")
is.data.frame(sales) # returns TRUE
is.vector(sales$cust_id) # returns TRUE
is.vector(sales$sales_total) # returns TRUE
is.vector(sales$num_of_orders) # returns TRUE
is.vector(sales$gender) # returns FALSE
is.factor(sales$gender) # returns TRUE

List: a collection of objects that can be of various types, including other lists

sales <- read.csv("c:/data/yearly_sales.csv")
class(sales) #returns “data.frame”
typeof(sales) #returns “list”
#build an assorted list of a string, a numeric,
#a list, a vector, and a matrix
housing <- list("own", "rent")
assortment <- list("football", 7.5, housing, v, M)
assortment
str(assortment)

Factors: a categorical variable, typically with a few finite levels such as “F” and “M”.
Factors can be ordered or not ordered
其中最常用的情况就是性别了。

class(sales$gender) # returns "factor"
is.ordered(sales$gender) # returns FALSE

Contingency Tables
– A class of objects used to store the observed counts across the factors for a given dataset
– The basis for performing a statistical test on the independence of the factors

#建立一个基于性别和消费因素的Contingency Tables
sales_table <- table(sales$gender,sales$num_of_orders)
sales_table
class(sales_table) # returns "table"
typeof(sales_table) # returns "integer"
dim(sales_table) # returns 2 3
#卡方检验
summary(sales_table)

描述统计

Descriptive Statistics
– Summary() function: mean, median, min, max
– R functions include descriptive statistics

# to simplify the function calls, assign
x <- sales$sales_total
y <- sales$num_of_orders
cor(x,y) # returns 0.7508015 (correlation)
cov(x,y) # returns 345.2111 (covariance)
IQR(x) # returns 215.21 (interquartile range)
mean(x) # returns 249.4557 (mean)
median(x) # returns 151.65 (median)
range(x) # returns 30.02 7606.09 (min max)
sd(x) # returns 319.0508 (std. dev.)
var(x) # returns 101793.4 (variance)

参考书目

  1. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, EMC Education Services, John Wiley & Sons, 27 Jan. 2015
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值