Python 爬取豆瓣

最新推荐文章于 2019-07-02 21:51:00 发布

aibeng2705

最新推荐文章于 2019-07-02 21:51:00 发布

阅读量91

点赞数

文章标签： python

原文链接：http://www.cnblogs.com/mysterious-killer/p/10156985.html

版权

...

import urllib.request
import time
from bs4 import BeautifulSoup

def url_open(url):
    response = urllib.request.urlopen(url)
    return response
def parse_html(response):
    html_content = response.read()
    html_soup = BeautifulSoup(html_content, 'html.parser', from_encoding='utf-8')
    tag_lis = html_soup.find_all('li')
    for li in tag_lis:
        em = li.find('em')
        title = li.find_all('span', class_='title')
        # other = li.find_all('span', class_='other')
        rating = li.find('span', class_='rating_num')
        if title != []:
            rank=em.get_text()
            print("排名:" + rank + "------评分:" + str(rating.get_text()) + "-------" + title[0].get_text())
            if rank==250:
                return None
            if int(rank)%25==0:
                url="https://movie.douban.com/top250?start="+rank+"&filter="
                return url

url = "https://movie.douban.com/top250?start=0&filter="
if __name__=='__main__':
    response=url_open(url)
    start_time=time.time()
    print("开始："+str(start_time))
    while 1:
        url=parse_html(response)
        if url==None:
            break
        response=url_open(url)
    end_time=time.time()
    print("结束:"+str(end_time))
    print("一共用了："+str(end_time-start_time)+"秒")

转载于:https://www.cnblogs.com/mysterious-killer/p/10156985.html

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

aibeng2705

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python 爬取豆瓣

...import urllib.requestimport timefrom bs4 import BeautifulSoupdef url_open(url): response = urllib.request.urlopen(url) return responsedef parse_html(response): htm...
复制链接

扫一扫