python3 爬虫 urlretrieve

最新推荐文章于 2024-07-24 16:36:32 发布

vicoqi

最新推荐文章于 2024-07-24 16:36:32 发布

阅读量9.2k

点赞数

分类专栏： python3爬虫文章标签：爬虫小例子 python3

本文链接：https://blog.csdn.net/vicoqi/article/details/52263226

版权

python3爬虫专栏收录该内容

3 篇文章 0 订阅

订阅专栏

网上的Python3 爬虫教程很少，自己就写了个小例子。

import re
import urllib.request
###result yes
# version 3.5
def Schedule(a,b,c):
       '''
       a:已经下载的数据块
       b:数据库块的大小
       c:远程文件的大小
       '''
       per = 100.0 * a * b / c
       if per>100:
           per = 100
           print('完成！')
       print('%.2f%%' % per)
def getHtml(url):
       page = urllib.request.urlopen(url)
       html = page.read()
       return html

def getImg(html):
       html = html.decode('utf-8')
       reg = r'src="(.*?\.jpg)" width'
       imgre = re.compile(reg)
       imglist = imgre.findall(html)
       #print(imglist)
       x = 0
       for imgurl in imglist:
              urllib.request.urlretrieve(imgurl,'e:\\test\\%s.jpg' % x,Schedule)#是不是Python3.X中把这个也改变了？
              x += 1

html = getHtml('http://tieba.baidu.com/p/741081023')
print(getImg(html))