python读取excel数据参数化爬虫请求

最新推荐文章于 2021-07-07 23:57:29 发布

SD_JZZ

最新推荐文章于 2021-07-07 23:57:29 发布

阅读量1k

点赞数

分类专栏： Python学习测试小兵文章标签： python爬虫 python读取excel python请求参数化 python url编码

本文链接：https://blog.csdn.net/sd_jzz/article/details/102961143

版权

测试小兵同时被 2 个专栏收录

36 篇文章 3 订阅

订阅专栏

Python学习

12 篇文章 1 订阅

订阅专栏

背景：

业务方提供了一批线上数据（搜索关键词），在页面漏出快筛项投放广告活动，测试过程中发现，提供的数据中存在无效数据（搜索无结果），需要把无效的垃圾数据找出来过滤掉

解决思路：

python爬虫请求相应接口，解析接口返回数据，判断当前关键词是否有效；

实施步骤：

python读取Excel文件，获取关键词参数
拼接请求url，使用python进行爬虫请求
解析接口返回结果，判断当前关键字是否有效
回写结果到Excel文件

实现代码：

# This Python file uses the following encoding: utf-8
import json
from urllib import quote
import xlutils.copy
import xlrd
import urllib2
#关键字含有中文，所以需要指定编码格式
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )


def getHotelCount():
    data = xlrd.open_workbook('data.xls')   ##读取句柄打开指定的文件
    table = data.sheet_by_index(0) ##获取Excel工作簿的指定工作表

    ws = xlutils.copy.copy(data)   ##写入句柄copy一份文件数据
    tables_ws= ws.get_sheet(0)  

    for rowNum in range(table.nrows):  ##按行循环读取Excel数据
        cityid = table.cell(rowNum, 1).value  ##获取指定单元格数据
        keywords = table.cell(rowNum, 3).value
        
        ##参数化url
        url = "https://xxxxxxxx/yyyyyyyyyyy/zzzzzzzzz/gethotellist?indate=2019-11-07&outdate=2019-11-08&city=%s&pageindex=0&pagesize=20&lbstype=2&searchtype=4&sortmethod=1&sortdirection=1&isnear=0&startlat=&startlng=&placename=&areaid=&areatype=&starlevels=&lowprice=&highprice=&bedbreakfast=&sale=&saleId=&hotelbrandids=&rating=&facilityids=&demand=&themeids=&radius=&pageType=xcx&showTaginfos=true&keywords=%s" % (cityid, keywords)

        urls = quote(str(url), safe=";/?:@&=+$,")  ## urlencode编码

        try:
            request = urllib2.Request(urls)
            response = urllib2.urlopen(request)
            res = json.loads(response.read())
            # print(json.loads(res))
            # print(type(res))
            hotelListCount = res['data']['hotelListCount']
            print(hotelListCount)
            tables_ws.write(rowNum, 4, hotelListCount)
        except:
            tables_ws.write(rowNum, 4, 'error')  ##写入数据
    ws.save('data.xls')   ##关闭文件句柄

getHotelCount()

以上

SD_JZZ

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
python读取excel数据参数化爬虫请求

背景：业务方提供了一批线上数据（搜索关键词），在页面漏出快筛项投放广告活动，测试过程中发现，提供的数据中存在无效数据（搜索无结果），需要把无效的垃圾数据找出来过滤掉解决思路：python爬虫请求相应接口，解析接口返回数据，判断当前关键词是否有效；实施步骤：python读取Excel文件，获取关键词参数拼接请求url，使用python进行爬虫请求解析接口返回结果，判断当前...
复制链接

扫一扫