python批量下载网页文件夹_Python如何实现批量下载网页功能

1

用import代码导入urllib模块,具体代码如下:

import urllib.request

import urllib.parse

efb861bd4c7c34b37c5116095841037de03731a7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

2

创建一个类,名字为PiLiangXia,具体代码如下:

class PiLiangXia():

e076d77622bc7dc587770eec5e460596b91429a7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

3

接下来在类下面写入相关功能代码,首先确定一个初始的url,具体代码如下:

def __init__(self):

self.url = "https://sou.autohome.com.cn/zonghe?"

b955ead0b503c8d216ee62fd498333bf3aef21a7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

4

接下来写发送请求的功能,具体代码如下:

def send_request(self,url,page):

response = urllib.request.urlopen(url)

self.write_file(response.read(),page)

560be432939c2cf782a2f9aa452c5b1b1fde12a7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

5

由于我们是需要把网页下载到本地,所以我们需要写入文件的功能,具体代码如下:

def write_file(self,content,page):

print("正在保存页数{}".format(page))

with open("car_{}.html".format(page),"wb") as f:

f.write(content)

125ed0ecd3d969759ad4c92ad243040149fe09a7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

6

接下来我们完成让用户输入页面,并且构造好url地址的功能,由于是多个页面

这里我们需要分析网页的url地址。具体代码如下:

def start(self):

page = int(input("请输入要下载的页数:"))

for i in range(1,page+1):

q = {"q":"奔驰","pn":page}

res = urllib.parse.urlencode(q,encoding= "gbk")

url_full = self.url + res

self.send_request(url_full,i)

这里的 q = {"q":"奔驰","pn":page} 是构造网页的整体url,提前分析,具体的不同网站会有不同的算法。

edd84743040148fe735278d88fd149299b8802a7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

7

接下来我们用main函数执行刚刚的代码块功能,具体代码如下:

if __name__ == '__main__':

plx = PiLiangXia()

plx.start()

5c2a1ad149299a884b03cd5667eeadbcbf2f7fa7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

8

整体代码总结和运行效果:

import urllib.request

import urllib.parse

class PiLiangXia():

def __init__(self):

self.url = "https://sou.autohome.com.cn/zonghe?"

def send_request(self,url,page):

response = urllib.request.urlopen(url)

self.write_file(response.read(),page)

def write_file(self,content,page):

print("正在保存页数{}".format(page))

with open("car_{}.html".format(page),"wb") as f:

f.write(content)

def start(self):

page = int(input("请输入要下载的页数:"))

for i in range(1,page+1):

q = {"q":"奔驰","pn":page}

res = urllib.parse.urlencode(q,encoding= "gbk")

url_full = self.url + res

self.send_request(url_full,i)

if __name__ == '__main__':

plx = PiLiangXia()

plx.start()

e9a4f2eeadbcbe2f306063ab54dae43b3a8678a7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

de9bfa3b3b8602213247b6e5d8bbf82065fb72a7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

30601b6104a355e904becd64d22ae3efe17868a7.jpg?x-bce-process=image%2Fresize%2Cm_lfit%2Cw_500%2Climit_1

END

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值