废话
好久没有造过轮子了,突发奇想解决一下一进公司写爬虫就遇到的代理的问题
正文
如果没有代理问题,如下代码就可以获取到网页 html 源码
import urllib
import urllib.request
from bs4 import BeautifulSoup
url = "http://wintersmilesb101.online/"
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
req = urllib.request.Request(url, headers={
'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
})
response = urllib.request.urlopen(req)
content = conn.read().decode('utf-8' )
print(content)
运行:
报错误信息
"C:\Program Files (x86)\Anaconda3\python.exe" D:/Alvin/PersonalProjects/Python/Spider/WinterSmileSB101Blog/main_error.py
Traceback (most recent call last ):
File "D:/Alvin/PersonalProjects/Python/Spider/WinterSmileSB101Blog/main_error.py" , line 12 , in <module>
response = urllib.request.urlopen(req)
File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py" , line 223 , in urlopen
return opener.open (url, data, timeout)
File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py" , line 532 , in open
response = meth(req, response)
File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py" , line 642 , in http_response
'http' , request, response, code, msg, hdrs)
File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py" , line 570 , in error
return self._call _chain(*args)
File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py" , line 504 , in _call _chain
result = func(*args)
File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py" , line 650 , in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 407 : Proxy Authentication Required ( Forefront TMG requires authorization to fulfill the request. Access to the Web Proxy filter is denied. )
Process finished with exit code 1
从信息 urllib.error.HTTPError: HTTP Error 407: Proxy Authentication Required ( Forefront TMG requires authorization to fulfill the request. Access to the Web Proxy filter is denied. )
来看,是需要我们设置代理验证。
通过 request 中的 ProxyHandler 来设置我们的代理,
proxy = req.ProxyHandler({‘https’: ‘s1firewall:8080’}) 这个是 公司的代理设置方式,也就是前面是 链接的方式 http 或者 https,我试过 http 无效,所以这里使用 https,后面就是代理的 Address 和端口号
有些代理还可能需要 用户名和密码,就会写成类似这样,不过这里公司并不需要这样设置,这样设置反而会连不上代理服务器:
完整的设置代码如下:
设置代理之后
import urllib
import urllib.request as req
from bs4 import BeautifulSoup
url = "http://wintersmilesb101.online/"
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
# 设置代理 IP,http 不行,使用 https
proxy = req.ProxyHandler({'https' : 's1firewall:8080' })
auth = req.HTTPBasicAuthHandler()
# 构造 opener
opener = req.build_opener(proxy, auth, req.HTTPHandler)
# 添加 header
opener.addheaders = [('User-Agent' , user_agent)]
# 安装 opener
req.install_opener(opener)
# 打开链接
conn = req.urlopen(url)
# 以 utf-8 编码获取网页内容
content = conn.read().decode('utf-8' )
# 输出
print(content)
运行:
最终输出 Collapse source
"C:\Program Files (x86)\Anaconda3\python.exe" D:/Alvin/PersonalProjects/Python/Spider/WinterSmileSB101Blog/main.py
<!doctype html>
<html class ="theme-next mist use-motion" lang ="zh-Hans,zh-hk,en,fr-FR,ru,de,ja,id,ko,default" >
<head >
<meta charset ="UTF-8" />
<meta http-equiv ="X-UA-Compatible" content ="IE=edge" />
<meta name ="viewport" content ="width=device-width, initial-scale=1, maximum-scale=1" />
<meta http-equiv ="Cache-Control" content ="no-transform" />
<meta http-equiv ="Cache-Control" content ="no-siteapp" />
<link href ="/lib/fancybox/source/jquery.fancybox.css?v=2.1.5" rel ="stylesheet" type ="text/css" />
<link href ="//fonts.googleapis.com/css?family=Monda:300,300italic,400,400italic,700,700italic|Roboto Slab:300,300italic,400,400italic,700,700italic|Lobster Two:300,300italic,400,400italic,700,700italic|PT Mono:300,300italic,400,400italic,700,700italic&subset=latin,latin-ext" rel ="stylesheet" type ="text/css" >
<link href ="/lib/font-awesome/css/font-awesome.min.css?v=4.6.2" rel ="stylesheet" type ="text/css" />
<link href ="/css/main.css?v=5.1.0" rel ="stylesheet" type ="text/css" />
<meta name ="keywords" content ="Android,JAVA,Unity3D,C#,javaScript,开发者,程序猿,极客,编程,开源,IT网站,Developer,Programmer,Coder,Geek,html,用户体验" />
<link rel ="alternate" href ="http://blog.csdn.net/qq_21265915/rss/list" title ="WinterSmileSB101 的个人房间" type ="application/atom+xml" />
<link rel ="shortcut icon" type ="image/x-icon" href ="/images/myHeadImg.jpeg?v=5.1.0" />
<meta name ="description" content ="成都工业学院14级,学习了各种后台技能,对前端也甚是抱有好感,准备再入坑前端。" >
<meta property ="og:type" content ="website" >
<meta property ="og:title" content ="WinterSmileSB101 的个人房间" >
<meta property ="og:url" content ="http://WinterSmileSB101.online/index.html" >
<meta property ="og:site_name" content ="WinterSmileSB101 的个人房间" >
<meta property ="og:description" content ="成都工业学院14级,学习了各种后台技能,对前端也甚是抱有好感,准备再入坑前端。" >
<meta name ="twitter:card" content ="summary" >
<meta name ="twitter:title" content ="WinterSmileSB101 的个人房间" >
<meta name ="twitter:description" content ="成都工业学院14级,学习了各种后台技能,对前端也甚是抱有好感,准备再入坑前端。" >
<script type ="text/javascript" id ="hexo.configurations" >
var NexT = window.NexT || {};
var CONFIG = {
root: '/' ,
scheme: 'Mist' ,
sidebar: {"position" :"right" ,"display" :"post" ,"offset" :12 ,"offset_float" :0 ,"b2t" :true ,"scrollpercent" :true },
fancybox: true ,
motion: true ,
duoshuo: {
userId: '6376853978663093000' ,
author: 'WinterSmileSB101'
},
algolia: {
applicationID: '' ,
apiKey: '' ,
indexName: '' ,
hits: {"per_page" :10 },
labels: {"input_placeholder" :"Search for Posts" ,"hits_empty" :"We didn't find any results for the search: ${query}" ,"hits_stats" :"${hits} results found in ${time} ms" }
}
};
</script >
<link rel ="canonical" href ="http://WinterSmileSB101.online/" />
<title > WinterSmileSB101 的个人房间 </title >
</head >
<body itemscope itemtype ="http://schema.org/WebPage" lang ="zh-Hans" >
<div class ="container sidebar-position-right
page-home
" >
<div class ="headband" > </div >
<header id ="header" class ="header" itemscope itemtype ="http://schema.org/WPHeader" >
<div class ="header-inner" > <div class ="site-brand-wrapper" >
<div class ="site-meta " >
<div class ="custom-logo-site-title" >
<a href ="/" class ="brand" rel ="start" >
<span class ="logo-line-before" > <i > </i > </span >
<span class ="site-title" > WinterSmileSB101 的个人房间</span >
<span class ="logo-line-after" > <i > </i > </span >
</a >
</div >
<h1 class ="site-subtitle" itemprop ="description" > 胆小认生,不易相处</h1 >
</div >
<div class ="site-nav-toggle" >
<button >
<span class ="btn-bar" > </span >
<span class ="btn-bar" > </span >
<span class ="btn-bar" > </span >
</button >
</div >
</div >
<nav class ="site-nav" >
<ul id ="menu" class ="menu" >
<li class ="menu-item menu-item-home" >
<a href ="/" rel ="section" >
<i class ="menu-item-icon fa fa-fw fa-home" > </i > <br />
首页
</a >
</li >
<li class ="menu-item menu-item-categories" >
<a href ="/categories" rel ="section" >
<i class ="menu-item-icon fa fa-fw fa-th" > </i > <br />
分类
</a >
</li >
<li class ="menu-item menu-item-about" >
<a href ="/about" rel ="section" >
<i class ="menu-item-icon fa fa-fw fa-user" > </i > <br />
关于
</a >
</li >
<li class ="menu-item menu-item-archives" >
<a href ="/archives" rel ="section" >
<i class ="menu-item-icon fa fa-fw fa-archive" > </i > <br />
归档
</a >
</li >
<li class ="menu-item menu-item-tags" >
<a href ="/tags" rel ="section" >
<i class ="menu-item-icon fa fa-fw fa-tags" > </i > <br />
标签
</a >
</li >
<li class ="menu-item menu-item-commonweal" >
<a href ="/404.html" rel ="section" >
<i class ="menu-item-icon fa fa-fw fa-heartbeat" > </i > <br />
公益404
</a >
</li >
<li class ="menu-item menu-item-search" >
<a href ="javascript:;" class ="popup-trigger" >
<i class ="menu-item-icon fa fa-search fa-fw" > </i > <br />
搜索
</a >
</li >
</ul >
<div class ="site-search" >
<div class ="popup search-popup local-search-popup" >
<div class ="local-search-header clearfix" >
<span class ="search-icon" >
<i class ="fa fa-search" > </i >
</span >
<span class ="popup-btn-close" >
<i class ="fa fa-times-circle" > </i >
</span >
<div class ="local-search-input-wrapper" >
<input autocapitalize ="off" autocomplete ="off" autocorrect ="off"
placeholder ="搜索..." spellcheck ="false"
type ="text" id ="local-search-input" >
</div >
</div >
<div id ="local-search-result" > </div >
</div >
</div >
</nav >
</div >
</header >
<main id ="main" class ="main" >
<div class ="main-inner" >
<div class ="content-wrap" >
<div id ="content" class ="content" >
<section id ="posts" class ="posts-expand" >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/" itemprop ="url" >
Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup4 抓取解析网页
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-04-08T16:55:47+08:00" >
2017-04-08
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-04-09T14:17:52+08:00" >
2017-04-09
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 爬虫</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/Python-爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > Python 爬虫</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/" class ="leancloud_visitors" data-flag-title ="Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup4 抓取解析网页" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
开篇上一篇中我们通过原生的 re 模块已经完成了网页的解析,对于熟悉正则表达式的童鞋来说很好上手,但是对于萌新来说
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/" itemprop ="url" >
Python3.7 爬虫(一)使用 Urllib2 与正则表达式抓取
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-04-08T16:55:47+08:00" >
2017-04-08
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-04-09T10:25:07+08:00" >
2017-04-09
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 爬虫</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/Python-爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > Python 爬虫</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/" class ="leancloud_visitors" data-flag-title ="Python3.7 爬虫(一)使用 Urllib2 与正则表达式抓取" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
我们今天就一起来通过 Python3 自带库 Urllib 与正则表达式来抓取糗事百科。废话不多说,下面正题:分析
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/" itemprop ="url" >
Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup4 爬取网易云音乐歌单
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-04-08T16:55:47+08:00" >
2017-04-08
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-04-09T20:03:39+08:00" >
2017-04-09
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 爬虫</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/Python-爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > Python 爬虫</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/" class ="leancloud_visitors" data-flag-title ="Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup4 爬取网易云音乐歌单" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
废话在前面的的博客中我们已经能够使用 python3 配合自带的库或者第三方库抓取以及解析网页,我们今天来试试抓取
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/03/29/css-els/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/03/29/css-els/" itemprop ="url" >
Css 文字省略样式(单行/多行)
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-03-29T08:47:44+08:00" >
2017-03-29
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-03-29T09:03:16+08:00" >
2017-03-29
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/" itemprop ="url" rel ="index" >
<span itemprop ="name" > WEB</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/前端开发/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 前端开发</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/前端开发/CSS/" itemprop ="url" rel ="index" >
<span itemprop ="name" > CSS</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/03/29/css-els/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/03/29/css-els/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/03/29/css-els/" class ="leancloud_visitors" data-flag-title ="Css 文字省略样式(单行/多行)" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主转载文章,原文地址。
效果图
上面的效果实现代码如下:12345678910111213141516171819202122232425262728
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/03/29/css-els/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/03/28/mui-tab-pages/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/03/28/mui-tab-pages/" itemprop ="url" >
MUI 使用爬坑之路之 tab 多页面操作
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-03-28T13:08:23+08:00" >
2017-03-28
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-03-29T09:04:49+08:00" >
2017-03-29
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/" itemprop ="url" rel ="index" >
<span itemprop ="name" > WEB</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/前端开发/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 前端开发</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/前端开发/Hbuilder/" itemprop ="url" rel ="index" >
<span itemprop ="name" > Hbuilder</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/前端开发/Hbuilder/MUI/" itemprop ="url" rel ="index" >
<span itemprop ="name" > MUI</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/03/28/mui-tab-pages/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/03/28/mui-tab-pages/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/03/28/mui-tab-pages/" class ="leancloud_visitors" data-flag-title ="MUI 使用爬坑之路之 tab 多页面操作" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
最近想入坑前端开发,也是为了开发 App 更快更接地气。在各种前端框架的纠结中我还是决定先入坑 MUI ,开坑不易
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/03/28/mui-tab-pages/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/03/27/IOnic-first/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/03/27/IOnic-first/" itemprop ="url" >
Ionic2 的使用之坑
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-03-27T19:21:20+08:00" >
2017-03-27
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-03-27T22:20:20+08:00" >
2017-03-27
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/" itemprop ="url" rel ="index" >
<span itemprop ="name" > WEB</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/前端开发/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 前端开发</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/WEB/前端开发/IOnic-AngularJS/" itemprop ="url" rel ="index" >
<span itemprop ="name" > IOnic AngularJS</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/03/27/IOnic-first/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/03/27/IOnic-first/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/03/27/IOnic-first/" class ="leancloud_visitors" data-flag-title ="Ionic2 的使用之坑" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
在这里引用学习 IOnic 的地方,菜鸟驿站,不仅仅有 IOnic 还有很多其他的比如 Node.js、vue、Re
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/03/27/IOnic-first/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/03/24/use-phantomjs-dynamic/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/03/24/use-phantomjs-dynamic/" itemprop ="url" >
一起学爬虫 Node.js 爬虫篇(三)使用 PhantomJS 爬取动态页面
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-03-24T09:29:38+08:00" >
2017-03-24
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-03-24T12:57:00+08:00" >
2017-03-24
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 爬虫</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/Node-js-爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > Node.js 爬虫</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/03/24/use-phantomjs-dynamic/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/03/24/use-phantomjs-dynamic/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/03/24/use-phantomjs-dynamic/" class ="leancloud_visitors" data-flag-title ="一起学爬虫 Node.js 爬虫篇(三)使用 PhantomJS 爬取动态页面" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
今天我们来学习如何使用 PhantomJS 来抓取动态网页,至于 PhantomJS 是啥啊什么的,看这里 我们这
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/03/24/use-phantomjs-dynamic/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/03/24/get-phantomJS-start/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/03/24/get-phantomJS-start/" itemprop ="url" >
Node.js 动态网页爬取 PhantomJS 使用入门
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-03-24T08:43:25+08:00" >
2017-03-24
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-03-24T10:22:14+08:00" >
2017-03-24
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 爬虫</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/Node-js-爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > Node.js 爬虫</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/03/24/get-phantomJS-start/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/03/24/get-phantomJS-start/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/03/24/get-phantomJS-start/" class ="leancloud_visitors" data-flag-title ="Node.js 动态网页爬取 PhantomJS 使用入门" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
既然是入门,那我们就从人类的起源。。PhantomJS 来说起吧。1、PhantomJS是什么?PhantomJS
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/03/24/get-phantomJS-start/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/03/23/node-spider-scend/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/03/23/node-spider-scend/" itemprop ="url" >
一起学爬虫 Node.js 爬虫篇(二)
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-03-23T17:17:58+08:00" >
2017-03-23
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-03-24T10:22:12+08:00" >
2017-03-24
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 爬虫</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/Node-js-爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > Node.js 爬虫</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/03/23/node-spider-scend/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/03/23/node-spider-scend/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/03/23/node-spider-scend/" class ="leancloud_visitors" data-flag-title ="一起学爬虫 Node.js 爬虫篇(二)" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
上一篇中我们对百度首页进行了标题的爬取,本来打算这次直接对上次没有爬取到的推荐新闻进行爬取,谁知道网页加载出来没网
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/03/23/node-spider-scend/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
<article class ="post post-type-normal " itemscope itemtype ="http://schema.org/Article" >
<link itemprop ="mainEntityOfPage" href ="http://WinterSmileSB101.online/2017/03/23/node-spider-first/" >
<span hidden itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<meta itemprop ="name" content ="WinterSmileSB101" >
<meta itemprop ="description" content ="" >
<meta itemprop ="image" content ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg" >
</span >
<span hidden itemprop ="publisher" itemscope itemtype ="http://schema.org/Organization" >
<meta itemprop ="name" content ="WinterSmileSB101 的个人房间" >
</span >
<header class ="post-header" >
<h2 class ="post-title" itemprop ="name headline" >
<a class ="post-title-link" href ="/2017/03/23/node-spider-first/" itemprop ="url" >
一起学爬虫 Node.js 爬虫篇(一)
</a >
</h2 >
<div class ="post-meta" >
<span class ="post-time" >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-o" > </i >
</span >
<span class ="post-meta-item-text" > 发表于</span >
<time title ="创建于" itemprop ="dateCreated datePublished" datetime ="2017-03-23T14:16:38+08:00" >
2017-03-23
</time >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-calendar-check-o" > </i >
</span >
<span class ="post-meta-item-text" > 更新于</span >
<time title ="更新于" itemprop ="dateModified" datetime ="2017-03-24T10:22:55+08:00" >
2017-03-24
</time >
</span >
<span class ="post-category" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-folder-o" > </i >
</span >
<span class ="post-meta-item-text" > 分类于</span >
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > 爬虫</span >
</a >
</span >
,
<span itemprop ="about" itemscope itemtype ="http://schema.org/Thing" >
<a href ="/categories/爬虫/Node-js-爬虫/" itemprop ="url" rel ="index" >
<span itemprop ="name" > Node.js 爬虫</span >
</a >
</span >
</span >
<span class ="post-comments-count" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-comment-o" > </i >
</span >
<a href ="/2017/03/23/node-spider-first/#comments" itemprop ="discussionUrl" >
<span class ="post-comments-count ds-thread-count" data-thread-key ="2017/03/23/node-spider-first/" itemprop ="commentCount" > </span >
</a >
</span >
<span id ="/2017/03/23/node-spider-first/" class ="leancloud_visitors" data-flag-title ="一起学爬虫 Node.js 爬虫篇(一)" >
<span class ="post-meta-divider" > |</span >
<span class ="post-meta-item-icon" >
<i class ="fa fa-eye" > </i >
</span >
<span class ="post-meta-item-text" > 阅读次数 </span >
<span class ="leancloud-visitors-count" > </span >
</span >
</div >
</header >
<div class ="post-body" itemprop ="articleBody" >
版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
一看到爬虫或者一百度爬虫,那是铺天盖地的全是 Python 爬虫啊,不得不说爬虫的框架与资料,Python 基本是最
...
<div class ="post-button text-center" >
<a class ="btn" href ="/2017/03/23/node-spider-first/#more" rel ="contents" >
阅读全文 »
</a >
</div >
</div >
<div >
</div >
<div >
</div >
<div >
</div >
<footer class ="post-footer" >
<div class ="post-eof" > </div >
</footer >
</article >
</section >
<nav class ="pagination" >
<span class ="page-number current" > 1</span > <a class ="page-number" href ="/page/2/" > 2</a > <span class ="space" > …</span > <a class ="page-number" href ="/page/6/" > 6</a > <a class ="extend next" rel ="next" href ="/page/2/" > <i class ="fa fa-angle-right" > </i > </a >
</nav >
</div >
</div >
<div class ="sidebar-toggle" >
<div class ="sidebar-toggle-line-wrap" >
<span class ="sidebar-toggle-line sidebar-toggle-line-first" > </span >
<span class ="sidebar-toggle-line sidebar-toggle-line-middle" > </span >
<span class ="sidebar-toggle-line sidebar-toggle-line-last" > </span >
</div >
</div >
<aside id ="sidebar" class ="sidebar" >
<div class ="sidebar-inner" >
<section class ="site-overview sidebar-panel sidebar-panel-active" >
<div class ="site-author motion-element" itemprop ="author" itemscope itemtype ="http://schema.org/Person" >
<img class ="site-author-image" itemprop ="image"
src ="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg"
alt ="WinterSmileSB101" />
<p class ="site-author-name" itemprop ="name" > WinterSmileSB101</p >
<p class ="site-description motion-element" itemprop ="description" > </p >
</div >
<nav class ="site-state motion-element" >
<div class ="site-state-item site-state-posts" >
<a href ="/archives" >
<span class ="site-state-item-count" > 52</span >
<span class ="site-state-item-name" > 日志</span >
</a >
</div >
<div class ="site-state-item site-state-categories" >
<a href ="/categories/index.html" >
<span class ="site-state-item-count" > 26</span >
<span class ="site-state-item-name" > 分类</span >
</a >
</div >
<div class ="site-state-item site-state-tags" >
<a href ="/tags/index.html" >
<span class ="site-state-item-count" > 113</span >
<span class ="site-state-item-name" > 标签</span >
</a >
</div >
</nav >
<div class ="feed-link motion-element" >
<a href ="http://blog.csdn.net/qq_21265915/rss/list" rel ="alternate" >
<i class ="fa fa-rss" > </i >
RSS
</a >
</div >
<div class ="links-of-author motion-element" >
<span class ="links-of-author-item" >
<a href ="https://github.com/WinterSmileSB101" title ="Github" >
<i class ="fa fa-fw fa-github fa-lg" > </i >
</a >
</span >
<span class ="links-of-author-item" >
<a href ="http://weibo.com/5602632941/profile?rightmod=1&wvr=6&mod=personinfo&is_all=1" title ="微博" >
<i class ="fa fa-fw fa-weibo fa-lg" > </i >
</a >
</span >
<span class ="links-of-author-item" >
<a href ="http://www.jianshu.com/users/73344bc7e890/timeline" title ="简书" >
<i class ="fa fa-fw fa-bookmark fa-lg" > </i >
</a >
</span >
<br />
<span class ="links-of-author-item" >
<a href ="https://www.douban.com/people/159359470/" title ="豆瓣" >
<i class ="fa fa-fw fa-newspaper-o fa-lg" > </i >
</a >
</span >
<span class ="links-of-author-item" >
<a href ="http://blog.csdn.net/qq_21265915" title ="CSDN博客" >
<i class ="fa fa-fw fa-bug fa-lg" > </i >
</a >
</span >
</div >
</section >
<div class ="back-to-top" >
<i class ="fa fa-arrow-up" > </i >
<span id ="scrollpercent" > <span > 0</span > %</span >
</div >
</div >
</aside >
</div >
</main >
<footer id ="footer" class ="footer" >
<div class ="footer-inner" >
<div class ="copyright" >
© 2017.3.20 -
<span itemprop ="copyrightYear" > 2017</span >
<span class ="with-love" >
<i class ="fa fa-heart" > </i >
</span >
<span class ="author" itemprop ="copyrightHolder" > Powered By - WinterSmileSB101</span >
</div >
<div class ="powered-by" >
个人专属
</div >
<div class ="theme-info" >
博客 -
WinterSmileSB101
</div >
</div >
</footer >
</div >
<script type ="text/javascript" >
if (Object .prototype.toString.call(window.Promise) !== '[object Function]' ) {
window.Promise = null ;
}
</script >
<script type ="text/javascript" src ="/lib/jquery/index.js?v=2.1.3" > </script >
<script type ="text/javascript" src ="/lib/fastclick/lib/fastclick.min.js?v=1.0.6" > </script >
<script type ="text/javascript" src ="/lib/jquery_lazyload/jquery.lazyload.js?v=1.9.7" > </script >
<script type ="text/javascript" src ="/lib/velocity/velocity.min.js?v=1.2.1" > </script >
<script type ="text/javascript" src ="/lib/velocity/velocity.ui.min.js?v=1.2.1" > </script >
<script type ="text/javascript" src ="/lib/fancybox/source/jquery.fancybox.pack.js?v=2.1.5" > </script >
<script type ="text/javascript" src ="/lib/canvas-nest/canvas-nest.min.js" > </script >
<script type ="text/javascript" src ="/js/src/utils.js?v=5.1.0" > </script >
<script type ="text/javascript" src ="/js/src/motion.js?v=5.1.0" > </script >
<script type ="text/javascript" src ="/js/src/bootstrap.js?v=5.1.0" > </script >
<script type ="text/javascript" >
var duoshuoQuery = {short_name:"wintersmilesb101" };
(function () {
var ds = document.createElement('script' );
ds.type = 'text/javascript' ;ds.async = true ;
ds.id = 'duoshuo-script' ;
ds.src = (document.location.protocol == 'https:' ? 'https:' : 'http:' ) + '//static.duoshuo.com/embed.js' ;
ds.charset = 'UTF-8' ;
(document.getElementsByTagName('head' )[0 ]
|| document.getElementsByTagName('body' )[0 ]).appendChild(ds);
})();
</script >
<script src ="/lib/ua-parser-js/dist/ua-parser.min.js?v=0.7.9" > </script >
<script src ="/js/src/hook-duoshuo.js?v=5.1.0" > </script >
<script src ="/lib/ua-parser-js/dist/ua-parser.min.js?v=0.7.9" > </script >
<script src ="/js/src/hook-duoshuo.js" > </script >
<script type ="text/javascript" >
var isfetched = false ;
var search_path = "search.xml" ;
if (search_path.length == 0 ) {
search_path = "search.xml" ;
}
var path = "/" + search_path;
function proceedsearch () {
$("body" )
.append('<div class="search-popup-overlay local-search-pop-overlay"></div>' )
.css('overflow' , 'hidden' );
$('.popup' ).toggle();
}
var searchFunc = function (path, search_id, content_id) {
'use strict' ;
$.ajax({
url: path,
dataType: "xml" ,
async: true ,
success: function ( xmlResponse ) {
isfetched = true ;
$('.popup' ).detach().appendTo('.header-inner' );
var datas = $( "entry" , xmlResponse ).map(function () {
return {
title: $( "title" , this ).text(),
content: $("content" ,this ).text(),
url: $( "url" , this ).text()
};
}).get();
var $input = document.getElementById(search_id);
var $resultContent = document.getElementById(content_id);
$input.addEventListener('input' , function () {
var matchcounts = 0 ;
var str='<ul class=\"search-result-list\">' ;
var keywords = this .value.trim().toLowerCase().split(/[\s\-]+/ );
$resultContent.innerHTML = "" ;
if (this .value.trim().length > 1 ) {
datas.forEach(function (data) {
var isMatch = false ;
var content_index = [];
var data_title = data.title.trim().toLowerCase();
var data_content = data.content.trim().replace(/<[^>]+>/g ,"" ).toLowerCase();
var data_url = decodeURIComponent (data.url);
var index_title = -1 ;
var index_content = -1 ;
var first_occur = -1 ;
if (data_title != '' ) {
keywords.forEach(function (keyword, i) {
index_title = data_title.indexOf(keyword);
index_content = data_content.indexOf(keyword);
if ( index_title >= 0 || index_content >= 0 ){
isMatch = true ;
if (i == 0 ) {
first_occur = index_content;
}
}
});
}
if (isMatch) {
matchcounts += 1 ;
str += "<li><a href='" + data_url +"' class='search-result-title'>" + data_title +"</a>" ;
var content = data.content.trim().replace(/<[^>]+>/g ,"" );
if (first_occur >= 0 ) {
var start = first_occur - 20 ;
var end = first_occur + 80 ;
if (start < 0 ){
start = 0 ;
}
if (start == 0 ){
end = 50 ;
}
if (end > content.length){
end = content.length;
}
var match_content = content.substring(start, end);
keywords.forEach(function (keyword) {
var regS = new RegExp (keyword, "gi" );
match_content = match_content.replace(regS, "<b class=\"search-keyword\">" +keyword+"</b>" );
});
str += "<p class=\"search-result\">" + match_content +"...</p>"
}
str += "</li>" ;
}
})};
str += "</ul>" ;
if (matchcounts == 0 ) { str = '<div id="no-result"><i class="fa fa-frown-o fa-5x" /></div>' }
if (keywords == "" ) { str = '<div id="no-result"><i class="fa fa-search fa-5x" /></div>' }
$resultContent.innerHTML = str;
});
proceedsearch();
}
});}
$('.popup-trigger' ).click(function (e) {
e.stopPropagation();
if (isfetched == false ) {
searchFunc(path, 'local-search-input' , 'local-search-result' );
} else {
proceedsearch();
};
});
$('.popup-btn-close' ).click(function (e) {
$('.popup' ).hide();
$(".local-search-pop-overlay" ).remove();
$('body' ).css('overflow' , '' );
});
$('.popup' ).click(function (e) {
e.stopPropagation();
});
</script >
<script src ="https://cdn1.lncld.net/static/js/av-core-mini-0.6.1.js" > </script >
<script > AV.initialize("cOFi0858xVYxKW1wnErxqEra-gzGzoHsz" , "7LaqqR82XnjzTbkv5eCKw5aW" ); </script >
<script >
function showTime (Counter) {
var query = new AV.Query(Counter);
var entries = [];
var $visitors = $(".leancloud_visitors" );
$visitors.each(function () {
entries.push( $(this ).attr("id" ).trim() );
});
query.containedIn('url' , entries);
query.find()
.done(function (results) {
var COUNT_CONTAINER_REF = '.leancloud-visitors-count' ;
if (results.length === 0 ) {
$visitors.find(COUNT_CONTAINER_REF).text(0 );
return ;
}
for (var i = 0 ; i < results.length; i++) {
var item = results[i];
var url = item.get('url' );
var time = item.get('time' );
var element = document.getElementById(url);
$(element).find(COUNT_CONTAINER_REF).text(time);
}
for (var i = 0 ; i < entries.length; i++) {
var url = entries[i];
var element = document.getElementById(url);
var countSpan = $(element).find(COUNT_CONTAINER_REF);
if ( countSpan.text() == '' ) {
countSpan.text(0 );
}
}
})
.fail(function (object, error) {
console.log("Error: " + error.code + " " + error.message);
});
}
function addCount (Counter) {
var $visitors = $(".leancloud_visitors" );
var url = $visitors.attr('id' ).trim();
var title = $visitors.attr('data-flag-title' ).trim();
var query = new AV.Query(Counter);
query.equalTo("url" , url);
query.find({
success: function (results) {
if (results.length > 0 ) {
var counter = results[0 ];
counter.fetchWhenSave(true );
counter.increment("time" );
counter.save(null , {
success: function (counter) {
var $element = $(document.getElementById(url));
$element.find('.leancloud-visitors-count' ).text(counter.get('time' ));
},
error: function (counter, error) {
console.log('Failed to save Visitor num, with error message: ' + error.message);
}
});
} else {
var newcounter = new Counter();
var acl = new AV.ACL();
acl.setPublicReadAccess(true );
acl.setPublicWriteAccess(true );
newcounter.setACL(acl);
newcounter.set("title" , title);
newcounter.set("url" , url);
newcounter.set("time" , 1 );
newcounter.save(null , {
success: function (newcounter) {
var $element = $(document.getElementById(url));
$element.find('.leancloud-visitors-count' ).text(newcounter.get('time' ));
},
error: function (newcounter, error) {
console.log('Failed to create' );
}
});
}
},
error: function (error) {
console.log('Error:' + error.code + " " + error.message);
}
});
}
$(function () {
var Counter = AV.Object.extend("Counter" );
if ($('.leancloud_visitors' ).length == 1 ) {
addCount(Counter);
} else if ($('.post-title-link' ).length > 1 ) {
showTime(Counter);
}
});
</script >
</body >
</html >
Process finished with exit code 0
ok 现在能够正确的访问到网址并且拿到源码了,想怎么嘿嘿怎么嘿嘿。
如有问题,希望不吝赐教