R语言处理Web数据

最新推荐文章于 2024-11-04 11:40:06 发布

weixin_30851867

最新推荐文章于 2024-11-04 11:40:06 发布

阅读量207

点赞数

文章标签： r语言

原文链接：http://www.cnblogs.com/amengduo/p/9587019.html

版权

R语言处理Web数据

许多网站提供的数据，以供其用户的消费。例如，世界卫生组织(WHO)提供的CSV，TXT和XML文件的形式的健康和医疗信息报告。基于R程序，我们可以通过编程提取这些网站的具体数据。R中一些程序包，用来提取网络数据形式- "RCurl",XML", 和"stringr". 它们被用于连接到的URL，确定所需链接的文件，并将它们下载到本地环境。

安装R程序包

下面的软件包都需要处理的URL和链接文件。如果它们没有R环境中，可以使用下面的命令进行安装。

install.packages("RCurl")
install.packages("XML")
install.packages("stringr")
install.packages("pylr")

输入数据

我们将访问URL：气象资料，并下载使用R中的CSV文件（这是在2015年之前的数据）。

示例

我们将使用函数getHTMLLinks()来收集文件的网址。然后，我们将使用函数download.file()将文件保存到本地系统。我们将一次又一次应用相同的代码下载多个文件，我们将创建一个函数被调用多次。该文件名通过在R列表对象的形式参数到这个函数。

# Read the URL.
url <- "http://www.geos.ed.ac.uk/~weather/jcmb_ws/"

# Gather the html links present in the webpage.
links <- getHTMLLinks(url)

# Identify only the links which point to the JCMB 2015 files.
filenames <- links[str_detect(links, "JCMB_2015")]

# Store the file names as a list.
filenames_list <- as.list(filenames)

# Create a function to download the files by passing the URL and filename list.
downloadcsv <- function (mainurl,filename){
       filedetails <- str_c(mainurl,filename)
       download.file(filedetails,filename)
       }

# Now apply the l_ply function and save the files into the current R working directory.
l_ply(filenames,downloadcsv,mainurl="http://www.geos.ed.ac.uk/~weather/jcmb_ws/")