一、readLine()
readLine()读取web网页文本文件。读取法国巴黎第七大学首页html前十行。
> urlinternetaddr='http://www.univ-paris-diderot.fr/sc/site.php?bc=accueil&np=accueil'
> dlist1=readLines(urlinternetaddr,n=10)
> dlist1
[1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\""
[2] "\"http://www.w3.org/TR/html4/loose.dtd\">"
[3] "<html lang=\"fr\">"
[4] "<HEAD>"
[5] "<script type=\"text/javascript\" src=\"http://www.univ-paris-diderot.fr/fancyBox/lib/jquery-1.8.2.min.js\"></script>"
[6] "\t<!-- Add fancyBox main JS and CSS files -->"
[7] "\t<script type=\"text/javascript\" src=\"http://www.univ-paris-diderot.fr/fancyBox/source/jquery.fancybox.js?v=2.1.3\">"
[8] " </script>"
[9] "\t<link rel=\"stylesheet\" type=\"text/css\" href=\"http://www.univ-paris-diderot.fr/fancyBox/source/jquery.fancybox.css?v=2.1.2\" media=\"screen\" /> "
[10]"\t <script type=\"text/javascript\" src=\"./js/jquery.diaporama.js\"></script>"
> dlist1=readLines(urlinternetaddr,n=10)
> dlist1
[1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\""
[2] "\"http://www.w3.org/TR/html4/loose.dtd\">"
[3] "<html lang=\"fr\">"
[4] "<HEAD>"
[5] "<script type=\"text/javascript\" src=\"http://www.univ-paris-diderot.fr/fancyBox/lib/jquery-1.8.2.min.js\"></script>"
[6] "\t<!-- Add fancyBox main JS and CSS files -->"
[7] "\t<script type=\"text/javascript\" src=\"http://www.univ-paris-diderot.fr/fancyBox/source/jquery.fancybox.js?v=2.1.3\">"
[8] " </script>"
[9] "\t<link rel=\"stylesheet\" type=\"text/css\" href=\"http://www.univ-paris-diderot.fr/fancyBox/source/jquery.fancybox.css?v=2.1.2\" media=\"screen\" /> "
[10]"\t <script type=\"text/javascript\" src=\"./js/jquery.diaporama.js\"></script>"
显然,我们在这里找不到想要的内容,例如招生信息。在这种情况下,最常规的方法是增加样本容量,将n=10改为n=50。
> dlist2=readLines(urlinternetaddr,n=50)
> dlist2
[1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\""
[2] "\"http://www.w3.org/TR/html4/loose.dtd\">"
[3] "<html lang=\"fr\">"
[4] "<HEAD>"
[5] "<script type=\"text/javascript\" src=\"http://www.univ-paris-diderot.fr/fancyBox/lib/jquery-1.8.2.min.js\"></script>"
[6] "\t<!-- Add fancyBox main JS and CSS files -->"
[7] "\t<script type=\"text/javascript\" src=\"http://www.u
> dlist2
[1] "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\""
[2] "\"http://www.w3.org/TR/html4/loose.dtd\">"
[3] "<html lang=\"fr\">"
[4] "<HEAD>"
[5] "<script type=\"text/javascript\" src=\"http://www.univ-paris-diderot.fr/fancyBox/lib/jquery-1.8.2.min.js\"></script>"
[6] "\t<!-- Add fancyBox main JS and CSS files -->"
[7] "\t<script type=\"text/javascript\" src=\"http://www.u