看书标记——关于R语言
【R语言 商务数据分析实战2】
chapter 2
商品零售购物篮分析
历史数据作为训练>>集建立模型>>分析热销商品+增量数据>>Apriori关联规则分析>>模型应用>>分析结果+迭代优化模型
2.2 分析商品销售状况
分析热销商品>>分析商品结构
统计每种已销售商品的频数和其总占比
# 设置工作目录并读取数据
setwd()
GoodsOrder <- read.csv("./data/GoodsOrder.csv", stringsAsFactors = FALSE)
# 统计热销商品
hotGoods <- data.frame(table(GoodsOrder[, 2]))
names(hotGoods) <- c("Goods", "Num")
hotGoods["Percent"] <- hotGoods$Num / sum(hotGoods$Num)
hotGoods <- hotGoods[order(hotGoods$Percent, decreasing = TRUE),]
write.csv(hotGoods, "./tmp/hotGoods.csv", row.names = FALSE)
对每种已销售商品进行归类,并算其总占比
# 售出商品类型结构分析
GoodTypes <- read.csv("./data/GoodsTypes.csv", stringsAsFactors = FALSE)
Goods <- merge(GoodsOrder, GoodTypes, 'Goods', all.x = TRUE, all.y = TRUE)
hotTypes <- data.frame(table(Goods$Types))
names(hotTypes) <- c("Types", "Num")
hotTypes["Percent"] <- hotTypes[, 2] / sum(hotTypes[, 2])
hotTypes <- hotTypes[order(hotTypes$Percent, decreasing = TRUE),]
write.csv(hotTypes, "./tmp/hotTypes.csv", row.names = FALSE)
分析类别内部商品的分布
# 售出商品类型内部结构分析
Drink <- Goods[which(Goods[,3] == "非酒精饮料"),]
hotDrink <- data.frame(table(Drink$Goods))
names(hotDrink) <- c("Goods", "Num")
hotDrink["Percent"] <- hotDrink$Num / sum(hotDrink$Num)
hotDrink <- hotDrink[order(hotDrink$Percent, decreasing = TRUE),]
write.csv(hotDrink, './tmp/hotDrink.csv', row.names = FALSE)
2.3 使用Apriori关联规则构建购物篮分析模型
一些原理术语:1.置信度、支持度和提升度。2.频繁项集(用于产生产生关联规则)
Apriori算法应用很广泛,但计算量有很大的重复部分,所以计算量较大,在试验时,通过调整parameter参数,得到较合适的关联规则数目。
# 设置工作目录并读取数据
setwd()
GoodsOrder <- read.csv("./data/GoodsOrder.csv", stringsAsFactors = FALSE)
library(arules) # 导入所需库包
# 数据形式转换
dataList <- list()
for (i in unique(GoodsOrder$ID)) {
dataList[[i]] <- GoodsOrder[which(GoodsOrder$ID == i), 2]
}
TransRep <- as(dataList, "transactions")
RulesRep <- apriori(TransRep, parameter = list(support = 0.02, confidence = 0.25))
inspect(sort(RulesRep, by = "lift")[1:25]) # 按提升度从高到低查看前25条规则