R语言读取网站文件

一、readLine()
    readLine()读取web网页文本文件。读取法国巴黎第七大学首页html前十行。
> urlinternetaddr='http://www.univ-paris-diderot.fr/sc/site.php?bc=accueil&np=accueil'
>  dlist1=readLines(urlinternetaddr,n=10)
> dlist1
 [1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\""                                                                                              
 [2] "\"http://www.w3.org/TR/html4/loose.dtd\">"                                                                                                                    
 [3] "<html lang=\"fr\">"                                                                                                                                           
 [4] "<HEAD>"                                                                                                                                                       
 [5] "<script type=\"text/javascript\" src=\"http://www.univ-paris-diderot.fr/fancyBox/lib/jquery-1.8.2.min.js\"></script>"                                         
 [6] "\t<!-- Add fancyBox main JS and CSS files -->"                                                                                                                
 [7] "\t<script type=\"text/javascript\" src=\"http://www.univ-paris-diderot.fr/fancyBox/source/jquery.fancybox.js?v=2.1.3\">"                                      
 [8] "    </script>"                                                                                                                                                
 [9] "\t<link rel=\"stylesheet\" type=\"text/css\" href=\"http://www.univ-paris-diderot.fr/fancyBox/source/jquery.fancybox.css?v=2.1.2\" media=\"screen\" />        "
[10]"\t  <script type=\"text/javascript\" src=\"./js/jquery.diaporama.js\"></script>"       
显然,我们在这里找不到想要的内容,例如招生信息。在这种情况下,最常规的方法是增加样本容量,将n=10改为n=50。
> dlist2=readLines(urlinternetaddr,n=50)
> dlist2
 [1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\""                                                                                                                                                                                               
 [2] "\"http://www.w3.org/TR/html4/loose.dtd\">"                                                                                                                                                                                                                     
 [3] "<html lang=\"fr\">"                                                                                                                                                                                                                                            
 [4] "<HEAD>"                                                                                                                                                                                                                                                        
 [5] "<script type=\"text/javascript\" src=\"http://www.univ-paris-diderot.fr/fancyBox/lib/jquery-1.8.2.min.js\"></script>"                                                                                                                                          
 [6] "\t<!-- Add fancyBox main JS and CSS files -->"                                                                                                                                                                                                                 
 [7] "\t<script type=\"text/javascript\" src=\"http://www.u
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值