爬虫实战_request爬取豆瓣电影top250

最新推荐文章于 2022-07-15 18:10:37 发布

hello鸡米花

最新推荐文章于 2022-07-15 18:10:37 发布

阅读量399

点赞数

文章标签： python 爬虫 request

本文链接：https://blog.csdn.net/jiminghua/article/details/115015278

版权

本文介绍了如何运用Python的request库爬取豆瓣电影Top250的数据，详细解析了yield的用法以及数据存储的相关知识。

摘要由CSDN通过智能技术生成

import time
import requests
from lxml import etree
import json

def getPage(url):
    '''请求页面数据'''
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36 Edg/89.0.774.50'
        }
        #发起请求
        response=requests.get(url,headers=headers)
        #判断响应状态
        if response.status_code==200:
            return response.text
        else:
            return None
    except:
        return None


def parsePage(html):
    '''解析页面数据'''
    html=etree.HTML(html)
    items=html.xpath('//div[@class="item"]')
    #遍历封装数据并返回
    for item in items:
        res={
            'index':item.xpath('.//div/em[@class=""]/text()'),
            'image':item.xpath('.//img[@width=&#

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

hello鸡米花

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬虫实战_request爬取豆瓣电影top250

import timeimport requestsfrom lxml import etreeimport jsondef getPage(url): '''请求页面数据''' try: headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82.
复制链接

扫一扫