数据的导入与预处理

1.数据的导入

用于导入数据的R函数:

.使用R包自带数据

.读取csv文件:read.table和它的cousins

.不规则数据:readLines

.读取excel文件:xlsx包/read.xlsx

.读取spss文件:foreign包/read.spss

.读取sas文件:read.ssd

2.不规范数据的预处理

见代码:

#############download data from website, unzip data########
#############read data from mutiple separate files
#美国国际开发署开发政府公开的原始数据
download.file(url="http://jaredlander.com/data/US_Foreign_Aid.zip",
              destfile="ForeignAid.zip")
unzip("ForeignAid.zip")
library(stringr)
dir()
theFiles=dir(pattern = "^US_Foreign_Aid")
theFiles
?regex
#loop through those files
for (a in theFiles)
{
  #build a good name to assign to data
  nameToUse=str_sub(string=a,start=12,end=18)
  temp=read.table(a,header=TRUE,sep=",",stringsAsFactors=FALSE)
  #assign them into workspace
  assign(x=nameToUse,value=temp)
}
head(Aid_00s)

#readLines(): when the rows in a data files are not uniformly formatted
#step(1): Reading data
txt=readLines("data1.2.1.txt")
txt
#step(2)':Selecting lines containing data
I=grepl("^%",txt)
I
dat=txt[!I]
dat
#step(3):Split lines into separate fields
(fieldList=strsplit(dat,split=","))
#step(4):Standardize rows
assignFields=function(x){
  out=character(3)
  i=grepl("[[:alpha:]]",x)
  out[1]=x[i]
  i=which(as.numeric(x)<1890)
  out[2]=ifelse(length(i)>0,x[i],NA)
  i=which(as.numeric(x)>1890)
  out[3]=ifelse(length(i)>0,x[i],NA)
  out
}
standardFields=lapply(fieldList,assignFields) #apply a function over a list
standardFields
#step(5): transform a  list to data.frame
#copy into a matrix which is then coerced into a data.frame
M=matrix(unlist(standardFields),nrow=length(standardFields),byrow=TRUE)  
#unlist() produce a vector which contains all the atomic components which occur in x
colnames(M)=c("name","birth","death")
M
deltons=as.data.frame(M,stringsAsFactors=FALSE)
deltons
#step(6):Normalize and coerce to correct types
deltons$birth=as.numeric(deltons$birth)
deltons$death=as.numeric(deltons$death)
deltons
str(deltons)



  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值