python3.6爬虫_Python爬虫(Python3.6)

Unstable Element

于 2021-02-03 06:28:38 发布

阅读量153

点赞数

文章标签： python3.6爬虫

本文链接：https://blog.csdn.net/weixin_30540871/article/details/113646034

版权

import urllib.request

import urllib.error

import os

import re

import imageio

capterId = 5301

sectionId = 1

dir = 'C:/Users/zybang/Desktop/gaoshu'

url = "http://netedu.xauat.edu.cn/jpkc/netedu/jpkc/gdsx/homepage/5jxsd/51/513/"

pattern = re.compile('

while capterId < 5313:

url = url + str(capterId)+"/"

while sectionId < 20:

if sectionId<10:

strSectionId = str(capterId)+str(0)+str(sectionId)

else:

strSectionId = str(capterId)+str(sectionId)

requestUrl = url+strSectionId+'.htm'

try:

response = urllib.request.urlopen(requestUrl)

except urllib.error.HTTPError as e :

print(requestUrl)

print(e.code)

continue

data = response.read()

data1 = data.decode('gbk')

data2 = str(data)

titlePattern = re.compile('

(.*?)')

images = pattern.findall(data2)

title = titlePattern.findall(data1)

title1 = title[0]

f = open(dir+'/'+title1+'.htm','wb')

f.write(data)

for image in images:

imageUrl = url+image

try:

imgResponse = urllib.request.urlopen(imageUrl)

except urllib.error.URLError as e :

print(imageUrl)

print(e.reason)

continue

imgBytes = imgResponse.read()

pathpatt = re.compile('/')

path = pathpatt.split(image)

imgDir = dir+"/"+path[0]

if not os.path.exists(imgDir):

os.makedirs(imgDir)

imgFile = open(dir+"/"+image,"wb")

imgFile.write(imgBytes)

sectionId += 1

capterId +=1

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

Unstable Element

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python3.6爬虫_Python爬虫(Python3.6)

import urllib.requestimport urllib.errorimport osimport reimport imageiocapterId = 5301sectionId = 1dir = 'C:/Users/zybang/Desktop/gaoshu'url = "http://netedu.xauat.edu.cn/jpkc/netedu/jpkc/gdsx/homepa...
复制链接

扫一扫