爬取数据并写入Excel表格——猫眼电影的爬取

最新推荐文章于 2021-04-25 21:09:04 发布

梅花14

最新推荐文章于 2021-04-25 21:09:04 发布

阅读量1.8k

点赞数 1

分类专栏：爬虫文章标签：网络爬虫数据存储

欢迎转载，但要标明出处！

本文链接：https://blog.csdn.net/qq_41621362/article/details/87368774

版权

爬虫专栏收录该内容

14 篇文章 0 订阅

订阅专栏

from bs4 import BeautifulSoup
import requests
import xlwt
import os

def get_page(pages):
    offset = pages*10
    url = "https://maoyan.com/board/4?offset=" + str(offset)
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36"
    }
    try:
        res = requests.get(url, headers=headers)
        if res.status_code == 200:
            return res.text
    except requests.exceptions.RequestException as e:
        print("Error:",e.args)
        
def parse_page(html):
    soup = BeautifulSoup(html, "lxml")
    all_movies_details = soup.find_all(name="div", class_ = "board-item-content")
    for lins,each in enumerate(all_movies_details):
        name = each.find(class_="name").a.string
        actors = each.find(class_="star").string.strip(" \n")[3:]
        releasetime = each.find(class_="releasetime").string[5:]
        score = each.find(class_="score").get_text()
        details = [name, actors, releasetime, score]
        print(details)
        yield details
        
def write_to_excle():
    mysheet = xlwt.Workbook()
    sheet = mysheet.add_sheet("sheet1")
    title=["电影名", "主演", "上映时间", "评分"]
    for col,head in enumerate(title):
        sheet.write(0,col,head)
    for page in range(0,10):
        html = get_page(page)
        details =parse_page(html)
        start = page*10+1
        for row,con in enumerate(details,start):
            for col,art in enumerate(con):
                sheet.write(row,col,art) 
    mysheet.save("maoyan.xls")
def main():
    write_to_excle()
    
main()

战果

在这里插入图片描述

关注

1
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

梅花14 CSDN认证博客专家 CSDN认证企业博客

码龄7年

174: 原创

3万+: 周排名

94万+: 总排名

120万+: 访问

: 等级

9203: 积分

238: 粉丝

932: 获赞

206: 评论

2504: 收藏

私信

关注

热门文章

分类专栏

最新评论

Using the URLconf defined in login.urls, Django tried these URL patterns, in this order:
ilchang99: 访问http://127.0.0.1:8000/login/，而不是http://127.0.0.1:8000/
Using the URLconf defined in login.urls, Django tried these URL patterns, in this order:
qq_65868485: 我加了path后面那名后,页面就不报错了,谢谢博主
LeetCode121--买卖股票的最佳时机
普通网友: 感谢大佬分享好文，学到了不少新知识，支持大佬，期待大佬持续输出优质文章！【我也写了一些相关领域的文章，希望能够得到博主的指导，共同进步！】
LeetCode121--买卖股票的最佳时机
普通网友: 写的很好，细节很到位！【我也写了一些相关领域的文章，希望能够得到博主的指导，共同进步！】
Ubuntu16.04 下安装虚拟环境
奋斗的java小伙: root@iZ2ze8y0z6ur611hgjwco6Z:~# source ~/.bashrc Command '' not found, but can be installed with: apt install mailutils-mh # version 1:3.7-2.1, or apt install meshio-tools # version 4.0.4-1 apt install mmh # version 0.4-2 apt install nmh # version 1.7.1-6 apt install termtris # version 1.3-1 virtualenvwrapper.sh: There was a problem running the initialization hooks. If Python could not import the module virtualenvwrapper.hook_loader, check that virtualenvwrapper has been installed for VIRTUALENVWRAPPER_PYTHON= and that PATH is set properly.

大家在看

yaml注入配置文件 499

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。