python网络爬虫并输出excel

本文介绍了使用Python3.6.5和PyCharm进行网络爬虫,重点是爬取选股宝网站的股票信息。通过urllib模块建立连接,解析HTTP响应,抓取所有股票ID和对应信息,并将数据存储为JSON对象。之后,利用xlwt库将数据导出到Excel文件,整个过程详细分析了爬虫的工作原理和遇到的问题。
摘要由CSDN通过智能技术生成

Python版本与IDE

笔者用的是python3.6.5以及PyCharm不得不说,JetBrains做的IDE都很不错,无论是这款PyCharm还是IntelliJ
在形参处的名称提示太方便了

网络爬虫

笔者学java的,本次写这个爬虫纯粹为了交作业
学过java的URL类的基本都明白
所为爬虫无非就是建立一个与某个网站的连接
通过该连接获取输入流,读取网站内容
实质上就是一个socket的输入输出操作,根据http状态码以及请求头里的信息,验证是否发送完毕(一般是200),结束连接
网络爬虫

urllib模块

本次使用的爬取类库是python3.6的一个标准库
urllib不依赖任何第三方库,无需安装
通过以下代码

req=urllib.request.Request(url,header)

可以得到一个HttpRequest对象
再通过以下代码,发出该请求并得到一个HttpResponse对象

res=urllib.request.urlopen(req,context)

以上代码中,如果是http:开头的网站header和context都可以省略,则采用默认参数
而对于https开头的网站,因为要进行证书验证
所以要创建一个带ssl证书的context并传入

context = ssl._create_unverified_context()

通过以下方式创建header来指定模拟爬虫的浏览器信息
并在网站支持国际化时指定语言为中文

header = {
   
    'Accept': 'text/html, application/xhtml+xml, */*',
    'Accept-Language': 'zh-CN',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
    'DNT': '1',
    'Connection': 'Keep-Alive',
}

通过得到的response对象的read方法会返回页面内容的byte数组
而我们需要的是字符串
所以使用decode方法,对byte数组编码并转换成字符串
而网络传输的数据格式是json,所以使用json模块的load方法将得到的数据转换为json对象
总结以上几步,我们封装一下爬取资源的方法

def getUrlJson(inUrl):
    req = urllib.request.Request(url=inUrl, headers=webheader)
    webPage = urllib.request.urlopen(req,context=context)
    data = webPage.read().decode('utf-8')
    data = json.loads(data)
    return data

通过调用该函数

data = getUrlJson(Url)

这样就得到了网页的内容
有个bug,有时候爬取一些网站会提示页面中存在0x8b这个值不能编码为utf-8中的字符
笔者上网百度了很久,包括在stackOverflow上查看相关问题
发现没有真正能解决的,像(注释掉Accept-encode,不注释对压缩giz进行处理,都不能解决)
然后在PyCharm中进行运行居然又不报这个错了…
也希望知道这个bug解决方法的,联系笔者QQ1183609515,谢谢

爬取的网站

本次爬取的是选股宝这个股票网站上面所有股票的信息
作为一个股票网站,实时更新是必须的,所以肯定是个动态网站
选股宝https://xuangubao.cn/dingpan/
选股宝数据
爬取下来页面内容后,输出查看

print(data)

结果是这个

<!doctype html>
<html data-n-head-ssr data-n-head="">
 <head> 
  <meta data-n-head="true" name="referrer" content="always">
  <meta data-n-head="true" name="renderer" content="webkit">
  <meta data-n-head="true" name="force-rendering" content="webkit">
  <meta data-n-head="true" name="baidu-site-verification" content="GFgkG2X61Y">
  <meta data-n-head="true" http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1">
  <meta data-n-head="true" charset="utf-8">
  <meta data-n-head="true" name="viewport" content="width=device-width, initial-scale=1">
  <meta data-n-head="true" data-hid="description" name="description" content="选股宝,一款主打“主题投资”的A股资讯神器,每日根据用户个性化关注主题,推送最新、最快、最狠的消息,帮助第一时间抓住机会。">
  <meta data-n-head="true" data-hid="keywords" name="keywords" content="选股宝,xuangubao.cn,主题投资,资讯,股票,板块,题材,产业链,主题库,今日机会,中长线机会,近期风口,提前埋伏">
  <title data-n-head="true">选题材抓龙头,就用选股宝 xuangubao.cn</title>
  <link data-n-head="true" rel="icon" type="image/x-icon" href="/img/favicon.ico">
  <link data-n-head="true" rel="stylesheet" type="text/css" href="//at.alicdn.com/t/font_117096_dawtptwmkjawnrk9.css">
  <link data-n-head="true" rel="stylesheet" type="text/css" href="//cdn.bootcss.com/minireset.css/0.0.2/minireset.min.css">
  <script data-n-head="true" src="/js/qrcode.js"></script>
  <script data-n-head="true" src="https://polyfillservice.wallstreetcn.com/v2/polyfill.js?features=default,es6,es7,fetch&amp;unknown=polyfill&amp;flags=gated"></script>
  <link rel="preload" href="https://static-alpha.wallstreetcn.com/clay/manifest.1f6b2202e79ed3bc8f71.js" as="script">
  <link rel="preload" href="https://static-alpha.wallstreetcn.com/clay/vendor.787b748b2aa791af3419.js" as="script">
  <link rel="preload" href="https://static-alpha.wallstreetcn.com/clay/app.a93c21ff1869eac54397.js" as="script">
  <link rel="preload" href="https://static-alpha.wallstreetcn.com/clay/app.a22dace7c29e562d6288b8728abc210f.css" as="style">
  <link rel="preload" href="https://static-alpha.wallstreetcn.com/clay/layouts/default.c8fc2ca668da67a5522f.js" as="script">
  <link rel="preload" href="https://static-alpha.wallstreetcn.com/clay/pages/dingpan/_id.a8362de5e835a04f692c.js" as="script">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/vip.cad641367f2be08dc49c.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/theme/_id.c2cc1b6f07508499d2a7.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/index.416454fc7d221d35ea69.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/zhangjiazaozhidao.66e109fd0a57268089a9.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/zaozhidao.fdf4f05456e0643e0f7d.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/tuoshuiyanbao.535cdaa72fdb33b1618c.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/tuoshuidiaoyan.914a90c524077a21b5c5.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/panzhongtufa.9b33803561c97b306e26.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/article/_id.3e1f7cf3ed25784ab36f.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/premium-article/_id.1d8c500639f65de2abc7.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/zhutiku.091e3e4f763c772dc931.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/purchased-message.362581f93699b6157764.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/subject/bkj/_id.cdc6888d383a44cb4697.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/yuanchuang.ecbabd7749a1994c6d4e.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/subject/_id.c5309899dee6e0d161e7.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/stock/_symbol.d82369e79268ab7033fb.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/chart.9d43c34c1e20b7540958.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/agreement.cbb1a7e2159afe928317.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/confirm-logout.2681979448cddb9bbea4.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/layouts/empty.3e708aa6c28c9df56ec8.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/zhuti/_id.580075df75bf6c81393a.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/ban/index.469b59d541cc54816be8.js">
  <link rel="prefetch" href="https://static-alpha.wallstreetcn.com/clay/pages/ban/_id.7ea69e8444dc15b61cb5.js">
  <link rel="stylesheet" href="https://static-alpha.wallstreetcn.com/clay/app.a22dace7c29e562d6288b8728abc210f.css">
  <style data-vue-ssr-id="0780e740:0">.nav[data-v-789bdac6]{
    min-width:1200px;height:56px;line-height:56px;-webkit-box-shadow:0 2px 3px hsla(0,0%,4%,.1);box-shadow:0 2px 3px hsla(0,0%,4%,.1);background-color:#30333f;color:#fff}.nav .container[data-v-789bdac6]{
    width:1300px;margin:0 auto}.nav .nav-left[data-v-789bdac6]{
    float:left;display:-webkit-box;display:-ms-flexbox;display:flex;-webkit-box-align:center;-ms-flex-align:center;align-items:center}.nav .nav-right[data-v-789bdac6]{
    float:right}.nav .nav-right .nav-item[data-v-789bdac6]{
    color:#fff}.nav .nav-right .go-login[data-v-789bdac6],.nav .nav-right .log-out[data-v-789bdac6]{
    cursor:pointer}.nav .nav-item[data-v-789bdac6]{
    position:relative;color:#d8d8d8;height:56px;line-height:56px;margin-right:40px;-webkit-transition:.3s;transition:.3s}.nav .nav-item[data-v-789bdac6]:hover{
    color:#fff}.nav .nav-item.is-active-route[data-v-789bdac6]{
    border-bottom:3px solid #e6394d;pointer-events:none;color:#fff}.nav .nav-item .is-hot[data-v-789bdac6]{
    width:25px}.nav .logo[data-v-789bdac6]{
    margin-right:81px}.nav .logo img[data-v-789bdac6]{
    width:108px;height:26px}.nav .slogan[data-v-789bdac6]{
    display:inline-block;text-indent:-9999px;width:0}@media screen and (max-width:1366px){
    .nav .container[data-v-789bdac6]{
    width:1200px}}</style>
  <style data-vue-ssr-id="17042f68:0">.nav-item[data-v-0a76a5aa]{
    cursor:pointer;margin:0 0 0 -13px!important;padding:0 40px 0 13px}.nav-item img[data-v-0a76a5aa]{
    vertical-align:middle;margin-top:-2px;margin-left:2px}.nav-dropdown[data-v-0a76a5aa]{
    display:none;position:absolute;top:56px;left:0;background-color:#fff;-webkit-box-shadow:0 0 12px 0 rgba(0,0,0,.3);box-shadow:0 0 12px 0 rgba(0,0,0,.3);z-index:1000}.nav-dropdown.is-active[data-v-0a76a5aa]{
    display:block}.nav-dropdown[data-v-0a76a5aa]:before{
    content:"";position:absolute;top:-10px;left:20px;border:5px solid #fff;border-color:transparent;border-bottom-color:#fff}.nav-dropdown li[data-v-0a76a5aa]{
    width:160px;height:48px;line-height:48px}.nav-dropdown li a[data-v-0a76a5aa]{
    display:block;padding-left:16px;color:#666}.nav-dropdown li:hover a[data-v-0a76a5aa]{
    color:#333;background-color:#f5f5f5}</style>
  <style data-vue-ssr-id="18e89e59:0">.ban[data-v-34069733]{
    display:-webkit-box;display:-ms-flexbox;display:flex;width:100%;min-width:1280px;background:#292c33;min-height:100vh}.ban-main[data-v-34069733]{
    display:block;width:75%;min-width:1000px}.ban.simple .ban-main[data-v-34069733]{
    width:100%}.ban-chart-out[data-v-34069733]{
    width:100%;position:relative;background:#292c33;z-index:20}</style>
  <style data-vue-ssr-id="2b055f8b:0">.ban.normal .ban-table-tab.fixed,.ban.normal .hit-pool__table.table.fixed{
    max-width:auto;width:75%}.ban.simple .ban-chart,.ban.simple .ban-chart-out{
    width:1200px;margin:0 auto}.ban.simple .ban-table-tab{
    width:100%}.ban.simple .ban-table-tab .ban-table-tab-container{
    width:1200px;margin:0 auto;position:relative}.ban.simple .hit-pool__table.table.fixed{
    left:50%;margin-left:-600px}.ban.simple .ban-table-main{
    width:1200px;margin:0 auto;position:relative}@media screen and (max-width:1280px){
    .ban.simple{
    min-width:1000px}.ban.simple .ban-chart{
    width:1000px;margin:0 auto}.ban.simple .ban-table-tab{
    width:100%}.ban.simple .ban-table-tab .ban-table-tab-container{
    width:1000px;margin:0 auto;position:relative}.ban.simple .hit-pool__table.table.fixed{
    left:50%;margin-left:-500px}.ban.simple .ban-table-main{
    width:1000px;margin:0 auto;position:relative}.ban.simple .ban-chart-out{
    width:1000px;margin:0 auto}}.guide{
    position:fixed;top:0;bottom:0;left:0;right:0;z-index:9999;background:#262c32;overflow:scroll}.guide-container{
    width:900px;margin:0 auto;padding:60px 0 0}.guide-container-logo{
    position:relative;margin-bottom:20px}.guide-container-desc{
    font-size:16px;color:#fff;letter-spacing:0;line-height:24px;margin-top:20px}.guide .guide-video{
    margin:32px 0}.guide-start{
    bottom:0;height:100px;position:fixed;left:0;background:#2d303b;right:0;padding-top:20px;-webkit-box-shadow:0 0 12px 0 rgba(0,0,0,.15);box-shadow:0 0 12px 0 rgba(0,0,0,.15)}.guide-start-btn{
    opacity:.8;background:#e6394d;-webkit-box-shadow:0 0 16px 0 rgba(0,0,0,.3);box-shadow:0 0 16px 0 rgba(0,0,0,.3);color:#fff;font-size:16px;line-height:48px;text-align:center;margin:0 auto;cursor:pointer;width:240px;height:48px}.guide .video-js .vjs-big-play-button{
    top:50%;left:50%;-webkit-transform:translate(-50%,-50%);transform:translate(-50%,-50%)}.guide .vjs-custom-skin{
    margin:32px 0}</style>
  <style data-vue-ssr-id="52e02d56:0">.ban-chart[data-v-563fd61e]{
    display:-webkit-box;display:-ms-flexbox;display:flex;max-width:1200px;margin:0 auto;position:relative;height:96px;-ms-flex-pack:distribute;justify-content:space-around;padding-left:80px}.ban-chart-date[data-v-563fd61e]{
    width:64px;height:82px;margin-top:12px;background-color:#3f4352;position:absolute;left:15px;display:block;z-index:2;cursor:pointer}.ban-chart-date .ivu-date-picker[data-v-563fd61e]{
    position:relative;width:100%}.ban-chart-date .ivu-date-picker .ivu-select-dropdown[data-v-563fd61e]{
    border-radius:0}.ban-chart-date[data-v-563fd61e]:before{
    content:"";display:block;position:absolute;top:0;right:0;left:0;height:2px;background:#f2564e}.ban-chart-date[data-v-563fd61e]:after{
    content:"";display:block;position:absolute;bottom:0;right:0;left:0;height:10px;background:#292c33}.ban-chart-date-container[data-v-563fd61e]{
    padding-bottom:15px}.ban-chart-date-month-week[data-v-563fd61e]{
    color:#bcbcbc;font-size:12px;padding-top:6px;text-align:center}.ban-chart-date-day[data-v-563fd61e]{
    color:#fff;font-size:36px;line-height:36px;text-align:center;display:block;position:relative}.ban-chart-date-day[data-v-563fd61e]:before{
    content:"";display:block;position:absolute;bottom:-5px;left:27px;width:0;height:0;border-left:6px solid transparent;border-right:6px solid transparent;border-bottom:6px solid #636a7f}</style>
  <style data-vue-ssr-id="492747dc:0">.ban-chart-date{
    font-family:Helvetica Neue,Helvetica,Arial,PingFang SC,Hiragino Sans GB,Heiti SC,Microsoft YaHei,WenQuanYi Micro Hei,sans-serif}.ban-chart-date .ivu-date-picker .ivu-select-dropdown{
    border-radius:0}.ban-chart-date .ivu-date-picker .ivu-date-picker-header{
    background:#3e4352;border:none;color:#fff}.ban-chart-date .ivu-date-picker .ivu-date-picker-header .ivu-date-picker-next-btn-arrow-double,.ban-chart-date .ivu-date-picker .ivu-date-picker-header .ivu-date-picker-prev-btn-arrow-double{
    display:none}.ban-chart-date .ivu-date-picker .ivu-picker-panel-body:before{
    content:"";display:block;position:absolute;top:-6px;left:27px;width:0;height:0;border-left:6px solid transparent;border-right:6px solid transparent;border-bottom:6px solid #636a7f}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content{
    background:#3e4352;padding:5px;position:relative}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content:before{
    content:"";display:block;top:0;left:0;right:0;height:39px;position:absolute;background:#363a47}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells{
    margin:0}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-header{
    background:#363a47;padding:3px 0;font-size:14px;position:relative}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-header span{
    color:#e6e6e6}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell{
    font-size:14px}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell em{
    color:#e6e6e6;border-radius:0}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell-next-month{
    display:none}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell-prev-month em{
    color:#3e4352}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell-selected em{
    color:#e6e6e6;position:relative;background:transparent}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell-selected em:before{
    content:"";display:block;bottom:0;left:3px;right:3px;height:2px;position:absolute;background:#f2564e}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell-today em{
    color:#e6e6e6;position:relative;background:transparent}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell-today em:after{
    content:"";display:none}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell:hover em{
    background:transparent;position:relative}.ban-chart-date .ivu-date-picker .ivu-picker-panel-content .ivu-date-picker-cells-cell:hover em:before{
    content:"";display:block;bottom:0;left:3px;right:3px;height:2px;position<
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值