爬数据的两大难点:
1.分析网络结构
2.写正则表达式
科普版:
install.packages("RCurl")
library(RCurl)
url1='http://shenzhen.lashou.com/cate/meishi'
web=readLines(url1,encoding='UTF-8')
goods_name<-web[grep("goods-name",web)]
goods_name2<-substr(goods_name,regexpr("\">",goods_name)+2,nchar(goods_name)-4)
goods_name2
i=2
goods_name[i]
substr(goods_name[i],regexpr("\">",goods_name[i])+2,nchar(goods_name[i])-4)
web[1:10]