1 html基础
笔记
UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\ufffd’ in position 45078: illegal multibyte sequence
如果遇到这种错误,print函数本身默认的编码是gbk
这个时候就要去pycharm的settings里面的editor里面的file encoding
更改为如下配置
彩蛋
知乎,百度等主页,在开发者工具的控制台console那里,有前端程序员留给同行的招聘彩蛋
通过这种方式参与招聘说不定有意料之外情理之中的惊喜
代码
# 调用requests模块
import requests
import os
import sys
# 获取网页源代码,得到的res是response对象。
res = requests.get('https://hyw200199.github.io/2022/02/08/0-chu-shi-pa-chong/')
#sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8') #改变标准输出的默认编码
# 检测请求是否正确响应
res.encoding = 'utf-8'
print(res.status_code)
print(res.text)
# 正确响应,进行读写操作
# 新建一个名为book的html文档,你看到这里的文件没加路径,它会被保存在程序运行的当前目录下。
# 字符串需要以w读写。你在学习open()函数时接触过它。
if res.status_code == 200:
file = open('html_test.html','w')
# res.text是字符串格式,把它写入文件内。
file.write(res.text)
# 关闭文件
file.close()
<!DOCTYPE HTML>
<html lang="zh-CN">
<head>
<meta charset="utf-8">
<meta name="keywords" content="0初识爬虫, love & peace">
<meta name="description" content="0 认识爬虫CSDN链接传送门
笔记什么是爬虫?爬虫是什么?了解浏览器的工作原理?爬虫能做很多事,能做商业分析,也能做生活助手,比如:分析北京近两年二手房成交均价是多少?深圳的Python工程师平均薪资是多少?北京哪家餐厅粤菜最好吃?等等。">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no">
<meta name="renderer" content="webkit|ie-stand|ie-comp">
<meta name="mobile-web-app-capable" content="yes">
<meta name="format-detection" content="telephone=no">
<meta name="apple-mobile-web-app-capable" content="yes">
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
<meta name="referrer" content="no-referrer-when-downgrade">
<!-- Global site tag (gtag.js) - Google Analytics -->
<title>0初识爬虫 | Tokyo Ghoul</title>
<link rel="icon" type="image/jpeg" href="/jinmuyan.jpg">
<link rel="stylesheet" type="text/css" href="/libs/awesome/css/all.min.css">
<link rel="stylesheet" type="text/css" href="/libs/materialize/materialize.min.css">
<link rel="stylesheet" type="text/css" href="/libs/aos/aos.css">
<link rel="stylesheet" type="text/css" href="/libs/animate/animate.min.css">
<link rel="stylesheet" type="text/css" href="/libs/lightGallery/css/lightgallery.min.css">
<link rel="stylesheet" type="text/css" href="/css/matery.css">
<link rel="stylesheet" type="text/css" href="/css/my.css">
<script src="/libs/jquery/jquery-3.6.0.min.js"></script>
<meta name="generator" content="Hexo 6.0.0">
<style>.github-emoji { position: relative; display: inline-block; width: 1.2em; min-height: 1.2em; overflow: hidden; vertical-align: top; color: transparent; } .github-emoji > span { position: relative; z-index: 10; } .github-emoji img, .github-emoji .fancybox { margin: 0 !important; padding: 0 !important; border: none !important; outline: none !important; text-decoration: none !important; user-select: none !important; cursor: auto !important; } .github-emoji img { height: 1.2em !important; width: 1.2em !important; position: absolute !important; left: 50% !important; top: 50% !important; transform: translate(-50%, -50%) !important; user-select: none !important; cursor: auto !important; } .github-emoji-fallback { color: inherit; } .github-emoji-fallback img { opacity: 0 !important; }</style>
<link rel="alternate" href="/atom.xml" title="Tokyo Ghoul" type="application/atom+xml">
</head>
<style>
body{
background-image: url(https://cdn.jsdelivr.net/gh/Tokisaki-Galaxy/res/site/medias/background.jpg);
background-repeat:no-repeat;
background-size: 100% 100%;
background-attachment:fixed;
}
</style>
<body>
<header class="navbar-fixed">
<nav id="headNav" class="bg-color nav-transparent">
<div id="navContainer" class="nav-wrapper container">
<div class="brand-logo">
<a href="/" class="waves-effect waves-light">
<img src="/medias/jinmuyan.jpg" class="logo-img" alt="LOGO">
<span class="logo-span">Tokyo Ghoul</span>
</a>
</div>
<a href="#" data-target="mobile-nav" class="sidenav-trigger button-collapse"><i class="fas fa-bars"></i></a>
<ul class="right nav-menu">
<li class="hide-on-med-and-down nav-item">
<a href="/" class="waves-effect waves-light">
<i class="fas fa-home" style="zoom: 0.6;"></i>
<span>首页</span>
</a>
</li>
<li class="hide-on-med-and-down nav-item">
<a href="/tags" class="waves-effect waves-light">
<i class="fas fa-tags" style="zoom: 0.6;"></i>
<span>标签</span>
</a>
</li>
<li class="hide-on-med-and-down nav-item">
<a href="/categories" class="waves-effect waves-light">
<i class="fas fa-bookmark" style="zoom: 0.6;"></i>
<span>分类</span>
</a>
</li>
<li class="hide-on-med-and-down nav-item">
<a href="/archives" class="waves-effect waves-light">
<i class="fas fa-archive" style="zoom: 0.6;"></i>
<span>归档</span>
</a>
</li>
<li class="hide-on-med-and-down nav-item">
<a href="/about" class="waves-effect waves-light">
<i class="fas fa-user-circle" style="zoom: 0.6;"></i>
<span>关于</span>
</a>
</li>
<li class="hide-on-med-and-down nav-item">
<a href="/contact" class="waves-effect waves-light">
<i class="fas fa-comments" style="zoom: 0.6;"></i>
<span>留言板</span>
</a>
</li>
<li class="hide-on-med-and-down nav-item">
<a href="/friends" class="waves-effect waves-light">
<i class="fas fa-address-book" style="zoom: 0.6;"></i>
<span>友情链接</span>
</a>
</li>
<li class="hide-on-med-and-down nav-item">
<a href="/live" class="waves-effect waves-light">
<i class="fas fa-fan" style="zoom: 0.6;"></i>
<span>Live</span>
</a>
</li>
<li>
<a href="#searchModal" class="modal-trigger waves-effect waves-light">
<i id="searchIcon" class="fas fa-search" title="搜索" style="zoom: 0.85;"></i>
</a>
</li>
</ul>
<div id="mobile-nav" class="side-nav sidenav">
<div class="mobile-head bg-color">
<img src="/medias/jinmuyan.jpg" class="logo-img circle responsive-img">
<div class="logo-name">Tokyo Ghoul</div>
<div class="logo-desc">
hi story (not history)
</div>
</div>
<ul class="menu-list mobile-menu-list">
<li class="m-nav-item">
<a href="/" class="waves-effect waves-light">
<i class="fa-fw fas fa-home"></i>
首页
</a>
</li>
<li class="m-nav-item">
<a href="/tags" class="waves-effect waves-light">
<i class="fa-fw fas fa-tags"></i>
标签
</a>
</li>
<li class="m-nav-item">
<a href="/categories" class="waves-effect waves-light">
<i class="fa-fw fas fa-bookmark"></i>
分类
</a>
</li>
<li class="m-nav-item">
<a href="/archives" class="waves-effect waves-light">
<i class="fa-fw fas fa-archive"></i>
归档
</a>
</li>
<li class="m-nav-item">
<a href="/about" class="waves-effect waves-light">
<i class="fa-fw fas fa-user-circle"></i>
关于
</a>
</li>
<li class="m-nav-item">
<a href="/contact" class="waves-effect waves-light">
<i class="fa-fw fas fa-comments"></i>
留言板
</a>
</li>
<li class="m-nav-item">
<a href="/friends" class="waves-effect waves-light">
<i class="fa-fw fas fa-address-book"></i>
友情链接
</a>
</li>
<li class="m-nav-item">
<a href="/live" class="waves-effect waves-light">
<i class="fa-fw fas fa-fan"></i>
Live
</a>
</li>
<li><div class="divider"></div></li>
<li>
<a href="https://github.com/blinkfox/hexo-theme-matery" class="waves-effect waves-light" target="_blank">
<i class="fab fa-github-square fa-fw"></i>Fork Me
</a>
</li>
</ul>
</div>
</div>
<style>
.nav-transparent .github-corner {
display: none !important;
}
.github-corner {
position: absolute;
z-index: 10;
top: 0;
right: 0;
border: 0;
transform: scale(1.1);
}
.github-corner svg {
color: #0f9d58;
fill: #fff;
height: 64px;
width: 64px;
}
.github-corner:hover .octo-arm {
animation: a 0.56s ease-in-out;
}
.github-corner .octo-arm {
animation: none;
}
@keyframes a {
0%,
to {
transform: rotate(0);
}
20%,
60% {
transform: rotate(-25deg);
}
40%,
80% {
transform: rotate(10deg);
}
}
</style>
<a href="https://github.com/blinkfox/hexo-theme-matery" class="github-corner tooltipped hide-on-med-and-down" target="_blank"
data-tooltip="Fork Me" data-position="left" data-delay="50">
<svg viewBox="0 0 250 250" aria-hidden="true">
<path d="M0,0 L115,115 L130,115 L142,142 L250,250 L250,0 Z"></path>
<path d="M128.3,109.0 C113.8,99.7 119.0,89.6 119.0,89.6 C122.0,82.7 120.5,78.6 120.5,78.6 C119.2,72.0 123.4,76.3 123.4,76.3 C127.3,80.9 125.5,87.3 125.5,87.3 C122.9,97.6 130.6,101.9 134.4,103.2"
fill="currentColor" style="transform-origin: 130px 106px;" class="octo-arm"></path>
<path d="M115.0,115.0 C114.9,115.1 118.7,116.5 119.8,115.4 L133.7,101.6 C136.9,99.2 139.9,98.4 142.2,98.6 C133.8,88.0 127.5,74.4 143.8,58.0 C148.5,53.4 154.0,51.2 159.7,51.0 C160.3,49.4 163.2,43.6 171.4,40.1 C171.4,40.1 176.1,42.5 178.8,56.2 C183.1,58.6 187.2,61.8 190.9,65.4 C194.5,69.0 197.7,73.2 200.1,77.6 C213.8,80.2 216.3,84.9 216.3,84.9 C212.7,93.1 206.9,96.0 205.4,96.6 C205.1,102.4 203.0,107.8 198.3,112.5 C181.9,128.9 168.3,122.5 157.7,114.1 C157.9,116.9 156.7,120.9 152.7,124.9 L141.0,136.5 C139.8,137.7 141.6,141.9 141.8,141.8 Z"
fill="currentColor" class="octo-body"></path>
</svg>
</a>
</nav>
</header>
<div class="bg-cover pd-header post-cover" style="background-image: url('/medias/featureimages/girl.jpg')">
<div class="container" style="right: 0px;left: 0px;">
<div class="row">
<div class="col s12 m12 l12">
<div class="brand">
<h1 class="description center-align post-title">0初识爬虫</h1>
</div>
</div>
</div>
</div>
</div>
<main class="post-container content">
<link rel="stylesheet" href="/libs/tocbot/tocbot.css">
<style>
#articleContent h1::before,
#articleContent h2::before,
#articleContent h3::before,
#articleContent h4::before,
#articleContent h5::before,
#articleContent h6::before {
display: block;
content: " ";
height: 100px;
margin-top: -100px;
visibility: hidden;
}
#articleContent :focus {
outline: none;
}
.toc-fixed {
position: fixed;
top: 64px;
}
.toc-widget {
width: 345px;
padding-left: 20px;
}
.toc-widget .toc-title {
padding: 35px 0 15px 17px;
font-size: 1.5rem;
font-weight: bold;
line-height: 1.5rem;
}
.toc-widget ol {
padding: 0;
list-style: none;
}
#toc-content {
padding-bottom: 30px;
overflow: auto;
}
#toc-content ol {
padding-left: 10px;
}
#toc-content ol li {
padding-left: 10px;
}
#toc-content .toc-link:hover {
color: #42b983;
font-weight: 700;
text-decoration: underline;
}
#toc-content .toc-link::before {
background-color: transparent;
max-height: 25px;
position: absolute;
right: 23.5vw;
display: block;
}
#toc-content .is-active-link {
color: #42b983;
}
#floating-toc-btn {
position: fixed;
right: 15px;
bottom: 76px;
padding-top: 15px;
margin-bottom: 0;
z-index: 998;
}
#floating-toc-btn .btn-floating {
width: 48px;
height: 48px;
}
#floating-toc-btn .btn-floating i {
line-height: 48px;
font-size: 1.4rem;
}
</style>
<div class="row">
<div id="main-content" class="col s12 m12 l9">
<!-- 文章内容详情 -->
<div id="artDetail">
<div class="card">
<div class="card-content article-info">
<div class="row tag-cate">
<div class="col s7">
<div class="article-tag">
<a href="/tags/%E7%88%AC%E8%99%AB/">
<span class="chip bg-color">爬虫</span>
</a>
<a href="/tags/python/">
<span class="chip bg-color">python</span>
</a>
<a href="/tags/requests/">
<span class="chip bg-color">requests</span>
</a>
<a href="/tags/%E5%8D%8F%E8%AE%AE/">
<span class="chip bg-color">协议</span>
</a>
</div>
</div>
<div class="col s5 right-align">
<div class="post-cate">
<i class="fas fa-bookmark fa-fw icon-category"></i>
<a href="/categories/%E7%88%AC%E8%99%AB/" class="post-category">
爬虫
</a>
</div>
</div>
</div>
<div class="post-info">
<div class="post-date info-break-policy">
<i class="far fa-calendar-minus fa-fw"></i>发布日期:
2022-02-08
</div>
<div class="post-date info-break-policy">
<i class="far fa-calendar-check fa-fw"></i>更新日期:
2022-02-09
</div>
<div class="info-break-policy">
<i class="far fa-file-word fa-fw"></i>文章字数:
2.1k
</div>
<div class="info-break-policy">
<i class="far fa-clock fa-fw"></i>阅读时长:
7 分
</div>
<div id="busuanzi_container_page_pv" class="info-break-policy">
<i class="far fa-eye fa-fw"></i>阅读次数:
<span id="busuanzi_value_page_pv"></span>
</div>
</div>
</div>
<hr class="clearfix">
<div class="card-content article-card-content">
<div id="articleContent">
<h1 id="0-认识爬虫"><a href="#0-认识爬虫" class="headerlink" title="0 认识爬虫"></a>0 认识爬虫</h1><p><a target="_blank" rel="noopener" href="https://blog.csdn.net/supreme567/article/details/122831465">CSDN链接传送门</a></p>
<h2 id="笔记"><a href="#笔记" class="headerlink" title="笔记"></a>笔记</h2><p>什么是爬虫?<br>爬虫是什么?<br>了解浏览器的工作原理?<br>爬虫能做很多事,能做商业分析,也能做生活助手,比如:分析北京近两年二手房成交均价是多少?深圳的Python工程师平均薪资是多少?北京哪家餐厅粤菜最好吃?等等。<br>这是个人利用爬虫所做到的事情,而公司,同样可以利用爬虫来实现巨大的商业价值。比如你所熟悉的搜索引擎——百度和谷歌,它们的核心技术之一也是爬虫,而且是超级爬虫。<br>爬虫还让这些搜索巨头有机会朝着人工智能的未来迈进,因为人工智能的发展离不开海量的数据。而每天使用这些搜索网站的用户都是数以亿计的,产生的数据自然也是难以计量的。<br><img src="https://img-blog.csdnimg.cn/2fb29fcae457437f8606931be9f0c8d4.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA6IOhIOiAgOaWhw==,size_20,color_FFFFFF,t_70,g_se,x_16" alt="浏览器的工作原理"></p>
<p><img src="https://img-blog.csdnimg.cn/b87a5a6fe4fa4481a5523173de2539ad.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA6IOhIOiAgOaWhw==,size_20,color_FFFFFF,t_70,g_se,x_16" alt="在这里插入图片描述"></p>
<p>第0步:获取数据。爬虫程序会根据我们提供的网址,向服务器发起请求然后返回数据<br>第1步:解析数据。爬虫程序会把服务器返回的数据解析成我们能读懂的格式。<br>第2步:提取数据。爬虫程序再从中提取出我们需要的数据。<br>第3步:储存数据。爬虫程序把这些有用的数据保存起来,便于你日后的使用和分析。</p>
<p>requests库可以帮我们下载网页源代码、文本、图片,甚至是音频。其实,“下载”本质上是向服务器发送请求并得到响应。<br>Python是一门面向对象编程的语言,而在爬虫中,理解数据是什么对象是非常、特别、以及极其重要的一件事。因为只有知道了数据是什么对象,我们才知道对象有什么属性和方法可供我们操作。<br><img src="https://img-blog.csdnimg.cn/76ac042c658a44c9b000d1ffdee85db5.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA6IOhIOiAgOaWhw==,size_20,color_FFFFFF,t_70,g_se,x_16" alt="在这里插入图片描述"><br><img src="https://img-blog.csdnimg.cn/94ca39b8c9fc4e61a4e11ec0ce82be58.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA6IOhIOiAgOaWhw==,size_20,color_FFFFFF,t_70,g_se,x_16" alt="在这里插入图片描述"><br>解析文本为什么会出现一段乱码呢?<br>事情是这样的:首先,目标数据本身有它的编码类型,<br>获取目标数据后要知道相应的编码类型才能正确解码。<br>编解码要共享同一种编码类型,就像你给我传纸条用的编码方式如果是“拼音”,我收到后就要拼“拼音”来理解语意——若我以为是“英语”,去查英语字典,那必然看不懂你说了什么。</p>
<p>Python关于requests.exceptions.ProxyError异常的问题<br>把VPN关掉就好了</p>
<p>服务器其实就是一个超级电脑,拥有这个服务器的公司,对爬虫其实也有明确的态度。<br>通常情况下,服务器不太会在意小爬虫,但是,服务器会拒绝频率很高的大型爬虫和恶意爬虫,因为这会给服务器带来极大的压力或伤害。<br>不过,服务器在通常情况下,对搜索引擎是欢迎的态度(刚刚讲过,谷歌和百度的核心技术之一就是爬虫)。当然,这是有条件的,通常这些条件会写在robots协议里。</p>
<p>robots协议是互联网爬虫的一项公认的道德规范,它的全称是“网络爬虫排除标准”(robots exclusion protocol),这个协议用来告诉爬虫,哪些页面是可以抓取的,哪些不可以。<br>我们使用robots协议的场景通常是:看到想获取的内容后,检查一下网站是否允许爬取。因此我们只需要能找到、简单读懂robots协议就足够了。<br>域名中会藏着网站的国籍或功能领域等信息,那么.cn,.com,.gov结尾的域名分别代表了什么?<br>来看一个实例:我们截取了一部分淘宝的robots协议 ( <a target="_blank" rel="noopener" href="http://www.taobao.com/robots.txt%EF%BC%89%E3%80%82%E5%9C%A8%E6%88%AA%E5%8F%96%E7%9A%84%E9%83%A8%E5%88%86%EF%BC%8C%E5%8F%AF%E4%BB%A5%E7%9C%8B%E5%88%B0%E6%B7%98%E5%AE%9D%E5%AF%B9%E7%99%BE%E5%BA%A6%E5%92%8C%E8%B0%B7%E6%AD%8C%E8%BF%99%E4%B8%A4%E4%B8%AA%E7%88%AC%E8%99%AB%E7%9A%84%E8%AE%BF%E9%97%AE%E8%A7%84%E5%AE%9A%EF%BC%8C%E4%BB%A5%E5%8F%8A%E5%AF%B9%E5%85%B6%E5%AE%83%E7%88%AC%E8%99%AB%E7%9A%84%E8%A7%84%E5%AE%9A%E3%80%82">http://www.taobao.com/robots.txt)。在截取的部分,可以看到淘宝对百度和谷歌这两个爬虫的访问规定,以及对其它爬虫的规定。</a></p>
<figure class="highlight xml"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">User-agent: Baiduspiderz # 百度爬虫</span><br><span class="line">Allow: /article # 允许访问 article </span><br><span class="line">Allow: /oshtml # 允许访问 oshtml </span><br><span class="line">Allow: /ershou # 允许访问 ershou </span><br><span class="line">Allow: /$ # 允许访问根目录,即淘宝主页</span><br><span class="line">Disallow: /product/ # 禁止访问product文件夹下面的所有文件,但是product文件夹本身允许被访问</span><br><span class="line">Disallow: / # 禁止访问除 Allow 规定页面之外的其他所有页面</span><br><span class="line"></span><br><span class="line">User-Agent: Googlebot # 谷歌爬虫</span><br><span class="line">Allow: /article</span><br><span class="line">Allow: /oshtml</span><br><span class="line">Allow: /product # 允许访问product文件夹及product文件夹下面的所有文件</span><br><span class="line">Allow: /spu</span><br><span class="line">Allow: /dianpu</span><br><span class="line">Allow: /oversea</span><br><span class="line">Allow: /list</span><br><span class="line">Allow: /ershou</span><br><span class="line">Allow: /$</span><br><span class="line">Disallow: / # 禁止访问除 Allow 规定页面之外的其他所有页面</span><br><span class="line"></span><br><span class="line">…… # 文件太长,省略了对其它爬虫的规定,想看全文的话,点击上面的链接</span><br><span class="line"></span><br><span class="line">User-Agent: * # 其他爬虫</span><br><span class="line">Disallow: / # 禁止访问所有页面</span><br><span class="line"></span><br></pre></td></tr></tbody></table></figure>
<p>可以看出robots协议是“分段”的吗?每个段落都含有以下两种字段:一种是User-agent:,另一种是Allow:或Disallow:。<br>User-agent表示的是爬虫类型,上面的示例代码注释了“百度爬虫”和“谷歌爬虫”,我们自己写的爬虫一般要看User-Agent: <em>,</em>指向所有未被明确提及的爬虫。</p>
<p>Allow代表允许被访问,Disallow代表禁止被访问。字段对应的值都含有路径分隔符/,限制了哪些或哪一层目录的内容是允许或者禁止被访问的。可以对比上述百度爬虫Disallow: /product/和谷歌爬虫Allow: /product的注释行理解一下。<br>比如淘宝禁止其他爬虫访问所有页面,也就是说,我们自己写的爬虫不被欢迎爬取<a target="_blank" rel="noopener" href="http://www.taobao.com域名下的任何网页./">www.taobao.com域名下的任何网页。</a><br>有趣的是,淘宝限制了百度对产品页面的爬虫,却允许谷歌访问。</p>
<p>所以,当你在百度搜索“淘宝网”时,会看到下图的这两行小字。</p>
<p><img src="https://img-blog.csdnimg.cn/6c805722d9924f498f69f8a3bc6a1785.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBA6IOhIOiAgOaWhw==,size_20,color_FFFFFF,t_70,g_se,x_16" alt="在这里插入图片描述"></p>
<p>因为百度很好地遵守了淘宝网的robots.txt协议,自然,你在百度中也查不到淘宝网的具体商品信息了。</p>
<p>互联网并非法外之地,和爬虫相关的法律也在建立和完善之中,目前通用的伦理规范就是robots协议,我们在爬取网络中的信息时,应该有意识地去遵守这个协议。</p>
<p>网站的服务器被爬虫爬得多了,也会受到较大的压力,因此,各大网站也会做一些反爬虫的措施。不过呢,有反爬虫,也就有相应的反反爬虫</p>
<p>爬虫就像是核技术,人们可以利用它去做有用的事,也能利用它去搞破坏。</p>
<p>恶意消耗别人的服务器资源,是一件不道德的事,恶意爬取一些不被允许的数据,还可能会引起严重的法律后果。</p>
<p>工具在你手中,如何利用它是你的选择。当你在爬取网站数据的时候,别忘了先看看网站的robots协议是否允许你去爬取。</p>
<p>同时,限制好爬虫的速度,对提供数据的服务器心存感谢,避免给它造成太大压力,维持良好的互联网秩序,也是我们该做的事。</p>
<h2 id="代码"><a href="#代码" class="headerlink" title="代码"></a>代码</h2><figure class="highlight cpp"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> requests</span><br><span class="line">from requests.<span class="function">exceptions <span class="keyword">import</span> RequestException</span></span><br><span class="line"><span class="function"></span></span><br><span class="line"><span class="function">def <span class="title">get_one_page</span><span class="params">(url)</span>:</span></span><br><span class="line"><span class="function"> headers =</span> {<span class="string">"user-agent"</span>: <span class="string">"Mizilla/5.0"</span>}</span><br><span class="line"> response = requests.<span class="built_in">get</span>(url,headers=headers)</span><br><span class="line"> <span class="keyword">try</span>:</span><br><span class="line"> <span class="keyword">if</span> response.status_code == <span class="number">200</span>:</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">"请求成功"</span>)</span><br><span class="line"> <span class="keyword">return</span> response</span><br><span class="line"> elif response.status_code == <span class="number">100</span>:</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">"继续提出请求"</span>)</span><br><span class="line"> <span class="keyword">return</span> None</span><br><span class="line"> elif response.status_code == <span class="number">305</span>:</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">"应使用代理访问"</span>)</span><br><span class="line"> <span class="keyword">return</span> None</span><br><span class="line"> elif response.status_code == <span class="number">403</span>:</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">"禁止访问"</span>)</span><br><span class="line"> <span class="keyword">return</span> None</span><br><span class="line"> elif response.status_code == <span class="number">503</span>:</span><br><span class="line"> <span class="built_in">print</span>(<span class="string">"服务不可用"</span>)</span><br><span class="line"> <span class="keyword">return</span> None</span><br><span class="line"></span><br><span class="line"> except RequestException:</span><br><span class="line"> <span class="keyword">return</span> None</span><br><span class="line"></span><br><span class="line">def <span class="built_in">main</span>():</span><br><span class="line"> url = <span class="string">'https://localprod.pandateacher.com/python-manuscript/crawler-html/sanguo.md'</span></span><br><span class="line"> html = <span class="built_in">get_one_page</span>(url)</span><br><span class="line"> html.encoding = <span class="string">'utf-8'</span></span><br><span class="line"> novel = html.text</span><br><span class="line"> <span class="built_in">print</span>(novel[:<span class="number">100</span>])</span><br><span class="line"> f = <span class="built_in">open</span>(<span class="string">'《三国演义》.txt'</span>,<span class="string">'w'</span>)</span><br><span class="line"> f.<span class="built_in">write</span>(novel)</span><br><span class="line"> f.<span class="built_in">close</span>()</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line"> <span class="built_in">main</span>()</span><br><span class="line"></span><br></pre></td></tr></tbody></table></figure>
<hr>
<figure class="highlight cpp"><table><tbody><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> requests</span><br><span class="line"></span><br><span class="line">url = <span class="string">'https://cdn.jsdelivr.net/gh/hyw200199/Figurebed@master/my_image/安智焕21.jpg'</span></span><br><span class="line"></span><br><span class="line">headers = {<span class="string">"user-agent"</span>: <span class="string">"Mizilla/5.0"</span>}</span><br><span class="line">res = requests.<span class="built_in">get</span>(url,headers=headers)</span><br><span class="line">pic = res.content # 把Reponse对象的内容以二进制数据的形式返回</span><br><span class="line"># 新建了一个文件ppt.jpg,这里的文件没加路径,它会被保存在程序运行的当前目录下。</span><br><span class="line"># 图片内容需要以二进制wb读写。你在学习<span class="built_in">open</span>()函数时接触过它。</span><br><span class="line">photo = <span class="built_in">open</span>(<span class="string">'girl1.jpg'</span>,<span class="string">'wb'</span>)</span><br><span class="line"># 获取pic的二进制内容</span><br><span class="line">photo.<span class="built_in">write</span>(pic)</span><br><span class="line"># 关闭文件</span><br><span class="line">photo.<span class="built_in">close</span>()</span><br></pre></td></tr></tbody></table></figure>
<hr>
<p><img src="https://img-blog.csdnimg.cn/img_convert/cfa71f586475cf580ec807e5c91c644a.png" alt="that_show"><img src="/0%E5%88%9D%E8%AF%86%E7%88%AC%E8%99%AB/cfa71f586475cf580ec807e5c91c644a.png" alt="that_show"></p>
</div>
<hr/>
<div class="reprint" id="reprint-statement">
<div class="reprint__author">
<span class="reprint-meta" style="font-weight: bold;">
<i class="fas fa-user">
文章作者:
</i>
</span>
<span class="reprint-info">
<a href="/about" rel="external nofollow noreferrer">胡耀文</a>
</span>
</div>
<div class="reprint__type">
<span class="reprint-meta" style="font-weight: bold;">
<i class="fas fa-link">
文章链接:
</i>
</span>
<span class="reprint-info">
<a href="https://hyw200199.github.io/2022/02/08/0-chu-shi-pa-chong/">https://hyw200199.github.io/2022/02/08/0-chu-shi-pa-chong/</a>
</span>
</div>
<div class="reprint__notice">
<span class="reprint-meta" style="font-weight: bold;">
<i class="fas fa-copyright">
版权声明:
</i>
</span>
<span class="reprint-info">
本博客所有文章除特別声明外,均采用
<a href="https://creativecommons.org/licenses/by/4.0/deed.zh" rel="external nofollow noreferrer" target="_blank">CC BY 4.0</a>
许可协议。转载请注明来源
<a href="/about" target="_blank">胡耀文</a>
!
</span>
</div>
</div>
<script async defer>
document.addEventListener("copy", function (e) {
let toastHTML = '<span>复制成功,请遵循本文的转载规则</span><button class="btn-flat toast-action" οnclick="navToReprintStatement()" style="font-size: smaller">查看</a>';
M.toast({html: toastHTML})
});
function navToReprintStatement() {
$("html, body").animate({scrollTop: $("#reprint-statement").offset().top - 80}, 800);
}
</script>
<div class="tag_share" style="display: block;">
<div class="post-meta__tag-list" style="display: inline-block;">
<div class="article-tag">
<a href="/tags/%E7%88%AC%E8%99%AB/">
<span class="chip bg-color">爬虫</span>
</a>
<a href="/tags/python/">
<span class="chip bg-color">python</span>
</a>
<a href="/tags/requests/">
<span class="chip bg-color">requests</span>
</a>
<a href="/tags/%E5%8D%8F%E8%AE%AE/">
<span class="chip bg-color">协议</span>
</a>
</div>
</div>
<div class="post_share" style="zoom: 80%; width: fit-content; display: inline-block; float: right; margin: -0.15rem 0;">
<link rel="stylesheet" type="text/css" href="/libs/share/css/share.min.css">
<div id="article-share">
<div class="social-share" data-sites="twitter,facebook,google,qq,qzone,wechat,weibo,douban,linkedin" data-wechat-qrcode-helper="<p>微信扫一扫即可分享!</p>"></div>
<script src="/libs/share/js/social-share.min.js"></script>
</div>
</div>
</div>
<style>
#reward {
margin: 40px 0;
text-align: center;
}
#reward .reward-link {
font-size: 1.4rem;
line-height: 38px;
}
#reward .btn-floating:hover {
box-shadow: 0 6px 12px rgba(0, 0, 0, 0.2), 0 5px 15px rgba(0, 0, 0, 0.2);
}
#rewardModal {
width: 320px;
height: 350px;
}
#rewardModal .reward-title {
margin: 15px auto;
padding-bottom: 5px;
}
#rewardModal .modal-content {
padding: 10px;
}
#rewardModal .close {
position: absolute;
right: 15px;
top: 15px;
color: rgba(0, 0, 0, 0.5);
font-size: 1.3rem;
line-height: 20px;
cursor: pointer;
}
#rewardModal .close:hover {
color: #ef5350;
transform: scale(1.3);
-moz-transform:scale(1.3);
-webkit-transform:scale(1.3);
-o-transform:scale(1.3);
}
#rewardModal .reward-tabs {
margin: 0 auto;
width: 210px;
}
.reward-tabs .tabs {
height: 38px;
margin: 10px auto;
padding-left: 0;
}
.reward-content ul {
padding-left: 0 !important;
}
.reward-tabs .tabs .tab {
height: 38px;
line-height: 38px;
}
.reward-tabs .tab a {
color: #fff;
background-color: #ccc;
}
.reward-tabs .tab a:hover {
background-color: #ccc;
color: #fff;
}
.reward-tabs .wechat-tab .active {
color: #fff !important;
background-color: #22AB38 !important;
}
.reward-tabs .alipay-tab .active {
color: #fff !important;
background-color: #019FE8 !important;
}
.reward-tabs .reward-img {
width: 210px;
height: 210px;
}
</style>
<div id="reward">
<a href="#rewardModal" class="reward-link modal-trigger btn-floating btn-medium waves-effect waves-light red">赏</a>
<!-- Modal Structure -->
<div id="rewardModal" class="modal">
<div class="modal-content">
<a class="close modal-close"><i class="fas fa-times"></i></a>
<h4 class="reward-title">thank you!!! mua~</h4>
<div class="reward-content">
<div class="reward-tabs">
<ul class="tabs row">
<li class="tab col s6 alipay-tab waves-effect waves-light"><a href="#alipay">支付宝</a></li>
<li class="tab col s6 wechat-tab waves-effect waves-light"><a href="#wechat">微 信</a></li>
</ul>
<div id="alipay">
<img src="/medias/reward/my_alipay.jpg" class="reward-img" alt="支付宝打赏二维码">
</div>
<div id="wechat">
<img src="/medias/reward/my_wechat.png" class="reward-img" alt="微信打赏二维码">
</div>
</div>
</div>
</div>
</div>
</div>
<script>
$(function () {
$('.tabs').tabs();
});
</script>
</div>
</div>
<article id="prenext-posts" class="prev-next articles">
<div class="row article-row">
<div class="article col s12 m6" data-aos="fade-up" data-aos="fade-up">
<div class="article-badge left-badge text-color">
<i class="far fa-dot-circle"></i> 本篇
</div>
<div class="card">
<a href="/2022/02/08/0-chu-shi-pa-chong/">
<div class="card-image">
<img src="/medias/featureimages/girl.jpg" class="responsive-img" alt="0初识爬虫">
<span class="card-title">0初识爬虫</span>
</div>
</a>
<div class="card-content article-content">
<div class="summary block-with-text">
</div>
<div class="publish-info">
<span class="publish-date">
<i class="far fa-clock fa-fw icon-date"></i>2022-02-08
</span>
<span class="publish-author">
<i class="fas fa-bookmark fa-fw icon-category"></i>
<a href="/categories/%E7%88%AC%E8%99%AB/" class="post-category">
爬虫
</a>
</span>
</div>
</div>
<div class="card-action article-tags">
<a href="/tags/%E7%88%AC%E8%99%AB/">
<span class="chip bg-color">爬虫</span>
</a>
<a href="/tags/python/">
<span class="chip bg-color">python</span>
</a>
<a href="/tags/requests/">
<span class="chip bg-color">requests</span>
</a>
<a href="/tags/%E5%8D%8F%E8%AE%AE/">
<span class="chip bg-color">协议</span>
</a>
</div>
</div>
</div>
<div class="article col s12 m6" data-aos="fade-up">
<div class="article-badge right-badge text-color">
下一篇 <i class="fas fa-chevron-right"></i>
</div>
<div class="card">
<a href="/2022/02/07/csoj-han-jia-05/">
<div class="card-image">
<img src="/medias/featureimages/jinx10.jpg" class="responsive-img" alt="csoj寒假05">
<span class="card-title">csoj寒假05</span>
</div>
</a>
<div class="card-content article-content">
<div class="summary block-with-text">
</div>
<div class="publish-info">
<span class="publish-date">
<i class="far fa-clock fa-fw icon-date"></i>2022-02-07
</span>
<span class="publish-author">
<i class="fas fa-bookmark fa-fw icon-category"></i>
<a href="/categories/%E7%AE%97%E6%B3%95/" class="post-category">
算法
</a>
</span>
</div>
</div>
<div class="card-action article-tags">
<a href="/tags/%E5%8C%BA%E9%97%B4dp/">
<span class="chip bg-color">区间dp</span>
</a>
<a href="/tags/%E6%9C%9F%E6%9C%9Bdp/">
<span class="chip bg-color">期望dp</span>
</a>
<a href="/tags/%E6%A0%91%E9%93%BE%E5%89%96%E5%88%86/">
<span class="chip bg-color">树链剖分</span>
</a>
<a href="/tags/%E5%87%B8%E5%8C%85/">
<span class="chip bg-color">凸包</span>
</a>
</div>
</div>
</div>
</div>
</article>
</div>
<!-- 代码块功能依赖 -->
<script type="text/javascript" src="/libs/codeBlock/codeBlockFuction.js"></script>
<!-- 代码语言 -->
<script type="text/javascript" src="/libs/codeBlock/codeLang.js"></script>
<!-- 代码块复制 -->
<script type="text/javascript" src="/libs/codeBlock/codeCopy.js"></script>
<!-- 代码块收缩 -->
<script type="text/javascript" src="/libs/codeBlock/codeShrink.js"></script>
</div>
<div id="toc-aside" class="expanded col l3 hide-on-med-and-down">
<div class="toc-widget card" style="background-color: white;">
<div class="toc-title"><i class="far fa-list-alt"></i> 目录</div>
<div id="toc-content"></div>
</div>
</div>
</div>
<!-- TOC 悬浮按钮. -->
<div id="floating-toc-btn" class="hide-on-med-and-down">
<a class="btn-floating btn-large bg-color">
<i class="fas fa-list-ul"></i>
</a>
</div>
<script src="/libs/tocbot/tocbot.min.js"></script>
<script>
$(function () {
tocbot.init({
tocSelector: '#toc-content',
contentSelector: '#articleContent',
headingsOffset: -($(window).height() * 0.4 - 45),
collapseDepth: Number('0'),
headingSelector: 'h2, h3, h4'
});
// modify the toc link href to support Chinese.
let i = 0;
let tocHeading = 'toc-heading-';
$('#toc-content a').each(function () {
$(this).attr('href', '#' + tocHeading + (++i));
});
// modify the heading title id to support Chinese.
i = 0;
$('#articleContent').children('h2, h3, h4').each(function () {
$(this).attr('id', tocHeading + (++i));
});
// Set scroll toc fixed.
let tocHeight = parseInt($(window).height() * 0.4 - 64);
let $tocWidget = $('.toc-widget');
$(window).scroll(function () {
let scroll = $(window).scrollTop();
/* add post toc fixed. */
if (scroll > tocHeight) {
$tocWidget.addClass('toc-fixed');
} else {
$tocWidget.removeClass('toc-fixed');
}
});
/* 修复文章卡片 div 的宽度. */
let fixPostCardWidth = function (srcId, targetId) {
let srcDiv = $('#' + srcId);
if (srcDiv.length === 0) {
return;
}
let w = srcDiv.width();
if (w >= 450) {
w = w + 21;
} else if (w >= 350 && w < 450) {
w = w + 18;
} else if (w >= 300 && w < 350) {
w = w + 16;
} else {
w = w + 14;
}
$('#' + targetId).width(w);
};
// 切换TOC目录展开收缩的相关操作.
const expandedClass = 'expanded';
let $tocAside = $('#toc-aside');
let $mainContent = $('#main-content');
$('#floating-toc-btn .btn-floating').click(function () {
if ($tocAside.hasClass(expandedClass)) {
$tocAside.removeClass(expandedClass).hide();
$mainContent.removeClass('l9');
} else {
$tocAside.addClass(expandedClass).show();
$mainContent.addClass('l9');
}
fixPostCardWidth('artDetail', 'prenext-posts');
});
});
</script>
</main>
<footer class="page-footer bg-color">
<div class="container row center-align"
style="margin-bottom: 0px !important;">
<div class="col s12 m8 l8 copy-right">
Copyright ©
<span id="year">2022</span>
<a href="/about" target="_blank">��ҫ��</a>
| Powered by <a href="https://hexo.io/" target="_blank">Hexo</a>
| Theme <a href="https://github.com/blinkfox/hexo-theme-matery" target="_blank">Matery</a>
<br>
<i class="fas fa-chart-area"></i> 站点总字数: <span
class="white-color">2.1k</span>
<span id="busuanzi_container_site_pv">
| <i class="far fa-eye"></i> 总访问量:
<span id="busuanzi_value_site_pv" class="white-color"></span>
</span>
<span id="busuanzi_container_site_uv">
| <i class="fas fa-users"></i> 总访问人数:
<span id="busuanzi_value_site_uv" class="white-color"></span>
</span>
<br>
<!-- 运行天数提醒. -->
<br>
</div>
<div class="col s12 m4 l4 social-link social-statis">
<a href="https://github.com/hyw200199" class="tooltipped" target="_blank" data-tooltip="访问我的GitHub" data-position="top" data-delay="50">
<i class="fab fa-github"></i>
</a>
<a href="mailto:2372847321@qq.com" class="tooltipped" target="_blank" data-tooltip="邮件联系我" data-position="top" data-delay="50">
<i class="fas fa-envelope-open"></i>
</a>
<a href="tencent://AddContact/?fromId=50&fromSubId=1&subcmd=all&uin=2372847321" class="tooltipped" target="_blank" data-tooltip="QQ联系我: 2372847321" data-position="top" data-delay="50">
<i class="fab fa-qq"></i>
</a>
<a href="https://weibo.com/huyaowen200199" class="tooltipped" target="_blank" data-tooltip="关注我的微博: https://weibo.com/huyaowen200199" data-position="top" data-delay="50">
<i class="fab fa-weibo"></i>
</a>
<a href="https://www.zhihu.com/people/unravel-7-4-18" class="tooltipped" target="_blank" data-tooltip="关注我的知乎: https://www.zhihu.com/people/unravel-7-4-18" data-position="top" data-delay="50">
<i class="fab fa-zhihu1">知</i>
</a>
<a href="/atom.xml" class="tooltipped" target="_blank" data-tooltip="RSS 订阅" data-position="top" data-delay="50">
<i class="fas fa-rss"></i>
</a>
</div>
</div>
</footer>
<div class="progress-bar"></div>
<!-- 搜索遮罩框 -->
<div id="searchModal" class="modal">
<div class="modal-content">
<div class="search-header">
<span class="title"><i class="fas fa-search"></i> 搜索</span>
<input type="search" id="searchInput" name="s" placeholder="请输入搜索的关键字"
class="search-input">
</div>
<div id="searchResult"></div>
</div>
</div>
<script type="text/javascript">
$(function () {
var searchFunc = function (path, search_id, content_id) {
'use strict';
$.ajax({
url: path,
dataType: "xml",
success: function (xmlResponse) {
// get the contents from search data
var datas = $("entry", xmlResponse).map(function () {
return {
title: $("title", this).text(),
content: $("content", this).text(),
url: $("url", this).text()
};
}).get();
var $input = document.getElementById(search_id);
var $resultContent = document.getElementById(content_id);
$input.addEventListener('input', function () {
var str = '<ul class=\"search-result-list\">';
var keywords = this.value.trim().toLowerCase().split(/[\s\-]+/);
$resultContent.innerHTML = "";
if (this.value.trim().length <= 0) {
return;
}
// perform local searching
datas.forEach(function (data) {
var isMatch = true;
var data_title = data.title.trim().toLowerCase();
var data_content = data.content.trim().replace(/<[^>]+>/g, "").toLowerCase();
var data_url = data.url;
data_url = data_url.indexOf('/') === 0 ? data.url : '/' + data_url;
var index_title = -1;
var index_content = -1;
var first_occur = -1;
// only match artiles with not empty titles and contents
if (data_title !== '' && data_content !== '') {
keywords.forEach(function (keyword, i) {
index_title = data_title.indexOf(keyword);
index_content = data_content.indexOf(keyword);
if (index_title < 0 && index_content < 0) {
isMatch = false;
} else {
if (index_content < 0) {
index_content = 0;
}
if (i === 0) {
first_occur = index_content;
}
}
});
}
// show search results
if (isMatch) {
str += "<li><a href='" + data_url + "' class='search-result-title'>" + data_title + "</a>";
var content = data.content.trim().replace(/<[^>]+>/g, "");
if (first_occur >= 0) {
// cut out 100 characters
var start = first_occur - 20;
var end = first_occur + 80;
if (start < 0) {
start = 0;
}
if (start === 0) {
end = 100;
}
if (end > content.length) {
end = content.length;
}
var match_content = content.substr(start, end);
// highlight all keywords
keywords.forEach(function (keyword) {
var regS = new RegExp(keyword, "gi");
match_content = match_content.replace(regS, "<em class=\"search-keyword\">" + keyword + "</em>");
});
str += "<p class=\"search-result\">" + match_content + "...</p>"
}
str += "</li>";
}
});
str += "</ul>";
$resultContent.innerHTML = str;
});
}
});
};
searchFunc('/search.xml', 'searchInput', 'searchResult');
});
</script>
<!-- 回到顶部按钮 -->
<div id="backTop" class="top-scroll">
<a class="btn-floating btn-large waves-effect waves-light" href="#!">
<i class="fas fa-arrow-up"></i>
</a>
</div>
<script src="/libs/materialize/materialize.min.js"></script>
<script src="/libs/masonry/masonry.pkgd.min.js"></script>
<script src="/libs/aos/aos.js"></script>
<script src="/libs/scrollprogress/scrollProgress.min.js"></script>
<script src="/libs/lightGallery/js/lightgallery-all.min.js"></script>
<script src="/js/matery.js"></script>
<script type="text/javascript">
// 只在桌面版网页启用特效
var windowWidth = $(window).width();
if (windowWidth > 768) {
document.write('<script type="text/javascript" src="/libs/others/sakura.js"><\/script>');
}
</script>
<!-- 雪花特效 -->
<script type="text/javascript">
// 只在桌面版网页启用特效
var windowWidth = $(window).width();
if (windowWidth > 768) {
document.write('<script type="text/javascript" src="/libs/others/snow.js"><\/script>');
}
</script>
<!-- 鼠标星星特效 -->
<script type="text/javascript">
// 只在桌面版网页启用特效
var windowWidth = $(window).width();
if (windowWidth > 768) {
document.write('<script type="text/javascript" src="/libs/others/star.js"><\/script>');
}
</script>
<script src="https://ssl.captcha.qq.com/TCaptcha.js"></script>
<script src="/libs/others/TencentCaptcha.js"></script>
<button id="TencentCaptcha" data-appid="xxxxxxxxxx" data-cbfn="callback" type="button" hidden></button>
<!-- Baidu Analytics -->
<!-- Baidu Push -->
<script>
(function () {
var bp = document.createElement('script');
var curProtocol = window.location.protocol.split(':')[0];
if (curProtocol === 'https') {
bp.src = 'https://zz.bdstatic.com/linksubmit/push.js';
} else {
bp.src = 'http://push.zhanzhang.baidu.com/push.js';
}
var s = document.getElementsByTagName("script")[0];
s.parentNode.insertBefore(bp, s);
})();
</script>
<script src="/libs/others/clicklove.js" async="async"></script>
<script async src="/libs/others/busuanzi.pure.mini.js"></script>
<!--腾讯兔小巢-->
<script src="/libs/instantpage/instantpage.js" type="module"></script>
</body>
</html>