python抓取数据并修改_Python 实现爬取数据功能（一）

最新推荐文章于 2024-04-29 12:44:54 发布

weixin_39568597

最新推荐文章于 2024-04-29 12:44:54 发布

阅读量602

点赞数

文章标签： python抓取数据并修改

前沿

由于老婆大人的要求，她自己有一个某网站的账户，数据大概有20多万条，她想把其账户下的数据转到另外一个账户中。但是网站提供的功能是只能每次导出20条到Excel 中，然而将Excel 导入到另外一个账户中, 另外数据有限制，最多能查1万行。

作为程序猿的老公，还是有义务去分担这个痛点的。。。。因此，用一周空余时间+周末时间学习了python ，以及实现了数据自动导出的功能。哈哈哈也当作自己的学习动力之一吧。成果如下：

数据爬取代码

hello.py

import requests

import json

import logging

import csv

import datetime

import time

import sys

import os

import xlwt

# 文件输出目录

output_dir = "/Users/xxxx/"

# 最大可搜索条数

maxNumber = 10000

# 每页大小

pageSize = 200

//这里定义根据第几页、页面大小以及手机号进行搜索(可以自定义参数，具体需要根据自己实际业务使用)

def my_frist_func(page, size, phone):

mkdir(phone)

# 这两句消除警告

logging.captureWarnings(True)

fromIndex = (page-1)*size

# 这里是使用Charles 抓包下来的请求参数，可以自己修改参数

data = {“”}

//这里是使用Charles 抓包下来的 header 设置

headers = { "Cookie":"CNZZDATA1261213312=270079913-1559008705-%7C1559524218; JSESSIONID=F042C28D5F0253C34E0EA5A003176565; last_login=xxxx; remember=1;

}

url = 'https://xxxx'

response = requests.post(url, json=data, headers=headers, verify=False) #添加verify=False SSLError 消失

if response.status_code is not 200:

print(response.status_code)

return

//获取数据转成json 对象

resp = json.loads(response.text)

list = resp["result"]["resume"]["list"]

totoal = resp["result"]["resume"]["total"]

//重新整理数据对象模型

res_list = []

year = datetime.datetime.now().year

for people in list:

newPeople = {}

basicInfo = people["basicInfo"]

newPeople["姓名"] = basicInfo["Fname"]

res_list.append(newPeople)

file_full_path = output_dir + phone + "/第" + str(page)+"页.xls"

//将新对象输出到Excel 中

write_excel_people(file_full_path, res_list)

next_from = fromIndex + size

//打印进度条

if next_from < totoal and next_from < maxNumber:

process = (next_from / totoal) * 100

_output = sys.stdout

_output.write(f"\r percent:{process:.2f}% [{phone}开头共计[{totoal}条]")

_output.flush()

my_frist_func(page=page+1, size=pageSize, phone=phone)

elif next_from >= maxNumber:

print(f"\n超出第1万行,[{phone}开头共计[{totoal}条]")

else:

print(f"\n打印结束,[{phone}开头共计[{totoal}条]")

def mkdir(dirName):

# 去除首位空格

path = output_dir + dirName

path=path.strip()

# 去除尾部 \ 符号

path=path.rstrip("\\")

# 判断路径是否存在

# 存在 True

# 不存在 False

isExists=os.path.exists(path)

# 判断结果

if not isExists:

# 如果不存在则创建目录

# 创建目录操作函数

os.makedirs(path)

print('创建目录成功' + str(dirName))

return True

else:

# 如果目录存在则不创建，并提示目录已存在

return False

def write_excel_people(path, list=[]):

count = len(list)

if count <= 0:

return

xls = xlwt.Workbook()

sht1 = xls.add_sheet('Sheet1')

for row in range(count):

data = list[row]

keys = data.keys()

column = 0

for key in keys:

if row == 0:

sht1.write(row, column, key)

sht1.write(row+1, column, data[key])

column = column + 1

xls.save(path)

if __name__ == '__main__':

phones = ['123','123']

# for phone in phones :

for phone in iter(phones):

my_frist_func(page=1, size=pageSize, phone=phone)

如何使用方式

1、打开Pycharm 直接运行 main

2、打开终端 ,cd 到当前文件下，然后执行命令

>> python3 hello.py

(如果出现如下错误：ModuleNotFoundError: No module named 'xlwt' ，则执行命令pip install xlwt 安装xlwt 插件这是用来处理Excel的)

weixin_39568597

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python抓取数据并修改_Python 实现爬取数据功能（一）

前沿由于老婆大人的要求，她自己有一个某网站的账户，数据大概有20多万条，她想把其账户下的数据转到另外一个账户中。但是网站提供的功能是只能每次导出20条到Excel 中，然而将Excel 导入到另外一个账户中, 另外数据有限制，最多能查1万行。作为程序猿的老公，还是有义务去分担这个痛点的。。。。因此，用一周空余时间+周末时间学习了python ，以及实现了数据自动导出的功能。哈哈哈也当作自...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。