简单入门爬斗鱼图片

最新推荐文章于 2021-11-28 22:22:11 发布

7voyage

最新推荐文章于 2021-11-28 22:22:11 发布

阅读量211

点赞数

分类专栏： Python3爬虫文章标签：新手上路

本文链接：https://blog.csdn.net/qq_42776455/article/details/81300840

版权

Python3爬虫专栏收录该内容

16 篇文章 0 订阅

订阅专栏

这是个比较简单的入门爬虫。基于python3。

urllib,urllib2,python3中用urllib.request代替，使用方法基本一致。

#python3
import urllib.request
import time
import re
import random
def getHtml(url):
　 #添加User_agent，头信息，伪装成浏览器请求。
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
    req = urllib.request.Request(url=url, headers=headers)
    page = urllib.request.urlopen(req)
    html = page.read()
    return html

def getImage(html):
    html = html.decode('utf-8')
    imageList = re.findall(r'src="(.*?\.(jpg|png))"',html)
    x = 1
    for imageUrl in imageList:
        urllib.request.urlretrieve(imageUrl[0],'/home/hang/pythonLearning/Crawler/CrawlDouyuGirl/%d.%s'%(x,imageUrl[1]))
        print("已下载：%s" % imageUrl[0])
        x += 1
        temp = random.randint(3,7)
        time.sleep(temp)

html = getHtml("https://www.douyu.com/directory/game/yz")

getImage(html)