![](https://img-blog.csdnimg.cn/20201014180756927.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
R语言
vshadow
这个作者很懒,什么都没留下…
展开
-
R-dataframe常用操作
获取数据维度> dim(df)[1] 14 5获取数据结构> str(df)'data.frame': 14 obs. of 5 variables: $ Outlook : Factor w/ 3 levels "overcast","rainy",..: 3 3 1 2 2 2 1 3 3 2 ... $ Temperature: Factor w/ 3 lev...原创 2018-02-27 09:19:44 · 2228 阅读 · 0 评论 -
信息增益率
单个随机变量的熵为该随机变量的不确定度。对于属性A,它的熵由以下公式计算:(4)其中,P(a)是属性A的概率分布。对于分类信息的信息熵H(class)同样由公式(4)计算得出。在属性A在class条件下的熵,条件熵H(class|A)由以下公式计算:(5)其中,P(l,a)为class与A的联合概率分布,P(l|a) 为class与A的条件概率分布。信息增益是由另一随机变量导致的原随机变量不确定度...原创 2018-02-27 10:58:33 · 640 阅读 · 0 评论 -
模型评估
https://en.wikipedia.org/wiki/Precision_and_recall精度预测为positve的占所有预测为positive的比例。Recall = t p t p + f n {\displaystyle {\text{Recall}}={\frac {tp}{tp+fn}}\,}召回率预测为positve的占实际positive的比例。准确率预测positive和...原创 2018-02-27 10:54:46 · 840 阅读 · 0 评论 -
R - dplyr 包
新增列 MutateMutate is used to add new variables to the data. For example lets adds a new column that displays the temperature in Celsius.mutate(airquality, TempInC = (Temp - 32) * 5 / 9)抽样 SampleThe sam...原创 2018-02-27 10:53:50 · 215 阅读 · 0 评论 -
c50 code called exit with value 1
原因:特征为factor的列有空值。查看空值列:levels(train$Embarked)解决方法:修改空值列为“missing”levels(train$Embarked)[1] = "missing"参考:http://www.mzan.com/article/22803310-c5-0-decision-tree-c50-code-called-exit-with-value-1.shtm...原创 2018-02-27 10:52:38 · 1595 阅读 · 1 评论 -
R- factor因子
因子(factor)类别属性,只有有限数量的值。The term factor refers to a statistical data type used to store categorical variables. The difference between a categorical variable and a continuous variable is that a categor...原创 2018-02-27 10:32:14 · 353 阅读 · 0 评论 -
R-NA值处理
#NA值替换#rs$beg_dif为需要替换的column.rs$beg_dif[is.na(rs$beg_dif)] <- 0#查找NA值的row number#which(is.na(rs$beg_dif ))#消除所有包含NA值的行#df <- na.omit(df)原创 2018-02-27 10:26:47 · 489 阅读 · 0 评论 -
R - 抽样
library(dplyr)df = read.csv('R/play.csv')#通过subset获取子集ydf <- subset(df, Play == "yes")ndf <- subset(df, Play == "no" )#通过sample_n随机抽样ysample <- sample_n(ydf, 5)nsample <- sample_n(ndf, 5)#...原创 2018-02-27 10:25:32 · 281 阅读 · 0 评论 -
R - sqldf
R的sqldf包能对dataframe进行sql操作,对于习惯用sql的人来说比较方便。library(sqldf)TA <- read.csv('table-A.csv', header = TRUE, sep = ",")TB <- read.csv('table-B.csv', header = TRUE, sep = ",")#mergedData <- mer...原创 2018-02-27 09:49:55 · 3668 阅读 · 0 评论 -
R-基础概念
1. 赋值 “<-” 不推荐使用“=”赋值。2. 注释: #”3. R大小写敏感。4.下标从1开始,不是从0.运算符Addition: +Subtraction: -Multiplication: *Division: /Exponentiation: ^Modulo: %%数据类型:Decimals values like 4.5 are called numerics.Natura...原创 2018-02-27 09:33:39 · 223 阅读 · 0 评论 -
R - csv读写
注意:csv数据“,”后面的空格也会被算入新的字段,","前后不要有空格。读取csv文件加载为dataframe:df <- read.csv('play.csv', header = TRUE, sep = ",")查看内容:> df Outlook Temperature Humidity Windy Play1 sunny hot high...原创 2018-02-27 09:24:31 · 1221 阅读 · 0 评论 -
jiebaR - 中文分词
http://qinwenfeng.com/jiebaR/library(jiebaR)wkr = worker()segment("今天天气好晴朗", wkr)library(jiebaR)library(sqldf)TA = read.csv('R/table-A.csv', header = TRUE, sep = ",")txtdf = TA$BAK_TXTTA$BAK_TXT <-...原创 2018-02-28 17:25:30 · 615 阅读 · 0 评论