Python 爬取豆瓣排行榜

最新推荐文章于 2024-09-17 23:15:58 发布

C葭葭

最新推荐文章于 2024-09-17 23:15:58 发布

阅读量67

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/yhb123___ahnd/article/details/134871251

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

import requests
import re
# 爬取数据
url = "https://movie.douban.com/chart"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0'}
response = requests.get(url, headers=headers)
# print(response)
html_str = response.text

# 解析数据
pattern = re.compile('<a.*?nbg.*?title="(.*?)">', re.S)
items = re.findall(pattern, html_str)
# print(items)

# 存储数据
with open('douBan.txt', 'w', encoding='utf-8') as f:
    for item in items:
        f.write(item + '\n')
        print(item)

print('done!')