R语言点滴记录（数组、矩阵运算；遍历子文件夹等，更新中）

最新推荐文章于 2024-09-14 17:22:51 发布

trueman007

最新推荐文章于 2024-09-14 17:22:51 发布

阅读量3.6k

点赞数

分类专栏： Presto Storm Hadoop R语言大数据 Spark Impala 数据挖掘文章标签： r语言数据挖掘

本文链接：https://blog.csdn.net/trueman007/article/details/39394417

版权

Storm 同时被 3 个专栏收录

1 篇文章 0 订阅

订阅专栏

Hadoop

1 篇文章 0 订阅

订阅专栏

R语言

1 篇文章 0 订阅

订阅专栏

最早2003年左右就接触R语言了，当时学习着用了一段时间，感觉和熟悉的Matlab功能差不多，也就没有继续用下去，不过一直有关注。前一段时间在测试Impala，接触了RImpala包，正好也有一些关于数据自动识别处理的想法，就把R语言重新拾起来，在这里记下个人认为有必要记录的学习点滴，不成体系，希望对大家有帮助，接下来会自己写一些R语言的包当做练习，也会有一些面向应用的专题。

（1）数组、矩阵运算（以相乘为例）

一般矩阵运算就不多说了，网上文章很多，主要就是把两个可以相乘的matrix，如dim(2,3)矩阵和dim（3，1）矩阵用%*%相乘就会得到正确结果；

数组和矩阵相乘运算的规则是用数组里的元素与对应的矩阵元素循环相乘，例如

>a<-c(1,2,3)

>b<-array(c(1,0,1,0,0,1),c(2,3))

>a*b

[,1] [,2] [,3]
[1,] 1 0 2
[2,] 0 0 3

此处必须保证dim(a)等于NULL，b可以是任意矩阵，否则会报错 “误于a * b : 非整合陈列”

（2）R语言循环编译子目录读取合并文件

应用场景是处理R语言数据挖掘语料库，引用地址 http://f.dataguru.cn/thread-46051-1-1.html

数据来源于sougou实验室数据。

数据网址：http://download.labs.sogou.com/dl/sogoulabdown/SogouC.mini.20061102.tar.gz

文件结构

└─Sample

├─C000007 汽车

├─C000008 财经

├─C000010 IT

├─C000013 健康

├─C000014 体育

├─C000016 旅游

├─C000020 教育

├─C000022 招聘

├─C000023 文化

└─C000024 军事

原作者附了处理的python脚本，当时粗心没往下看，以为作者忘记贴了，就用R语言写了一个，就当作练习了。

代码：

setwd("D:/Programing/rtextmining/Sample/");
dirlist<-dir();

cat('type','\t','text','\n',file="Train.csv",append=TRUE)

for(dirn in dirlist)
{
	dtmp<-paste(getwd(),'/',dirn,sep='')
	dtmplist<-dir(dtmp)

	if (dirn=="C000007") file_type ="auto"
      if (dirn=="C000008") file_type = "finance"
      if (dirn=="C000010") file_type = "IT"
      if (dirn=="C000013") file_type = "health"
      if (dirn=="C000014") file_type = "sports"
      if (dirn=="C000016") file_type = "travel"       
      if (dirn=="C000020") file_type = "education"              
      if (dirn=="C000022") file_type = "jobs"       
      if (dirn=="C000023") file_type = "culture"        
      if (dirn=="C000024") file_type = "military" 

	for(dirm in dtmplist)
	{
		dtmpm<-paste(dtmp,'/',dirm,sep='')

		tmp<-read.table(dtmpm,sep="\t",header=FALSE)
		str1<-paste(tmp[[1]],collapse="") 

		str1<-gsub('\t','',str1)	
		str1<-gsub('\n','',str1)

		cat(file_type,'\t',str1,'\n',file="Train.csv",append=TRUE)
	}
}
cat("Finish writing!\n")