python豆瓣爬取电影_Python爬取豆瓣看过的电影

最新推荐文章于 2024-06-24 15:49:09 发布

小豆君的干货铺

最新推荐文章于 2024-06-24 15:49:09 发布

阅读量453

点赞数

文章标签： python豆瓣爬取电影

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_31201481/article/details/113678876

版权

直接附上Python代码：#coding=utf-8

import requests

from requests.exceptions import RequestException

import re

import json

import xlwt

import xlrd

def get_one_page(url):

try:

response = requests.get(url)#拿到网页数据

if response.status_code == 200:#返回200表示响应正常

return response.text#返回数据

return None#如果响应不正常，则不返回任何数据

except RequestException as e:#所有异常输出为空

print(e)

return None

n=1

def parse_one_page(html):

pattern = re.compile('

.*? (.*?).*?(.*?).*? .*? (.*?).*? (.*?)', re.S)

items = re.findall(pattern, html)#以列表形式返回所有能匹配到的字符串

for item in items:

global n

sheet.write(n,0,str(item［0］))

sheet.write(n,1,str(item［2］))

sheet.write(n,2,str(item［3］))

sheet.write(n,3,str(item［4］))

cut=item［1］.split('/')

i=4

for j in cut:

sheet.write(n,i,str(j))

i=i+1

n=n+1

print(n)

def main(start):

n=start+1

url = 'https://movie.douban.com/people/7847299/collect?start='+str(start)+'&sort=time&rating=all&filter=all&mode=grid'

html = get_one_page(url)

parse_one_page(html)

try:

book=xlwt.Workbook(encoding='utf-8',style_compression=0)

sheet=book.add_sheet('看过的电影',cell_overwrite_ok=True)

sheet.write(0,0,'电影名')

sheet.write(0,1,'评分')

sheet.write(0,2,'看过的时间')

sheet.write(0,3,'评价')

for b in range(0,60):

m=b*15

try:

main(m)

book.save(r'C:\Users\Administrator\Desktop\movie.xls')

except Exception as e:

print(e)

pass

except Exception as e:

print(e)

出来是这样子的excel：

因为没有分词包，而上映时间国家导演演员等等全都在一个字段里，这部分并没有能做到很好的区分

另外对于有的标记了已看但是没有做评论的，会导致评论不能很好的和电影匹配上，这样的话会自动匹配下一个，但是只影响单个电影，所以几乎可以忽略

标签：

小豆君的干货铺

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python豆瓣爬取电影_Python爬取豆瓣看过的电影

直接附上Python代码：#coding=utf-8import requestsfrom requests.exceptions import RequestExceptionimport reimport jsonimport xlwtimport xlrddef get_one_page(url):try:response = requests.get(url)#拿到网页数据if respo...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。