本文主要介绍用rvest包对天气后报网的空气数据进行简单的抓取。
具体代码如下:
library(rvest)
html_session("http://www.tianqihoubao.com/aqi/chengdu-201612.html")
url1 <-"http://www.tianqihoubao.com/aqi/chengdu-201601.html"
url2 <-"http://www.tianqihoubao.com/aqi/chengdu-201602.html"
url3 <-"http://www.tianqihoubao.com/aqi/chengdu-201603.html"
url4 <-"http://www.tianqihoubao.com/aqi/chengdu-201604.html"
url5 <-"http://www.tianqihoubao.com/aqi/chengdu-201605.html"
url6 <-"http://www.tianqihoubao.com/aqi/chengdu-201606.html"
url7<-"http://www.tianqihoubao.com/aqi/chengdu-201607.html"
url8 <-"http://www.tianqihoubao.com/aqi/chengdu-201608.html"
url9 <-"http://www.tianqihoubao.com/aqi/chengdu-201609.html"
url10 <-"http://www.tianqihoubao.com/aqi/chengdu-201610.html"
url11 <-"http://www.tianqihoubao.com/aqi/chengdu-201611.html"
url12 <-"http://www.tianqihoubao.com/aqi/chengdu-201612.html"
fun <- function(x){web<-html(x,encoding="gb2312")
qq <- web %>% html_nodes("td") %>% html_text()
m <- matrix(qq,nrow=10)
p <- t(m)
p <- iconv(p,"utf-8","gbk")
p <- gsub("^\\s+|\\s+$","",p)
p[-1,]
}
p <- rbind(fun(url1),fun(url2),fun(url3),fun(url4),fun(url5),fun(url6),
fun(url7),fun(url8),fun(url9),fun(url10),fun(url11),fun(url12))
write.table(p,file="p.txt")
搜索结果如下图:
上述中批量的网址可以用paste0()函数+循环语句来实现。