爬虫-中国银行卡-优惠商户活动数据（2018-11-15）

最新推荐文章于 2024-04-21 09:43:19 发布

当法律与事业相遇

最新推荐文章于 2024-04-21 09:43:19 发布

阅读量1k

点赞数 1

分类专栏： python爬虫文章标签：爬虫 python 数据库

本文链接：https://blog.csdn.net/qq_29622761/article/details/84106626

版权

文章目录

爬虫地址

http://www.boc.cn/sdbapp/rwmerchant/sra32/

设计技术

requests请求页面
re正则表达式
xpath语法解析html对象

爬虫思路

爬虫开始
先找到大类，大类比如：

在这里插入图片描述

每一个大类找到分页的链接
解析每一个分页的链接里面的商店的链接
对每一个商店的链接进行抓取和解析
爬虫结束

爬虫代码

#-*-coding:utf-8-*-
import json
import os
import re
import time

import lxml
import requests
import xlrd
import xlwt
from lxml import etree
from xlutils.copy import copy


def get_page(url):
    try:
        response= requests.get(url)
        if response.status_code==200:
            return response
    except:
        return None


def parse_detail_page(detail_html):
    company_name=''.join(detail_html.xpath('//td[@colspan="3"]/text()'))
    try:
        company_address = detail_html.xpath('//td[@colspan="5"]/text()')[0]
    except:
        company_address=''
    try:
        company_phone = detail_html.xpath('//td[@colspan="5"]/text()')[1]
    except:

最低0.47元/天解锁文章

当法律与事业相遇

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
0
评论
爬虫-中国银行卡-优惠商户活动数据（2018-11-15）

文章目录爬虫地址设计技术爬虫思路爬虫代码致谢爬虫地址http://www.boc.cn/sdbapp/rwmerchant/sra32/设计技术requests请求页面re正则表达式xpath语法解析html对象爬虫思路爬虫开始先找到大类，大类比如：每一个大类找到分页的链接解析每一个分页的链接里面的商店的链接对每一个商店的链接进行抓取和解析爬虫结束爬虫代...
复制链接

扫一扫