一个简单爬虫–获取豆瓣电影前250部中文名
1、代码如下:
import requests
from bs4 import BeautifulSoup
headers={
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0"
}
url="https://study.scho.com/web/checkpointContent?info=eyJnYW1lSWQiOjg3OTExLCJxdWVzdElkIjoxOTIwMjI0LCJnYW1lVHlwZSI6Im1vZHVsZSJ9"
start_movie=0
while start_movie<250:
url="https://movie.douban.com/top250?start="+str(start_movie)+"&filter="
response=requests.get(url,headers=headers)
content = response.text
soup =BeautifulSoup(content,"html.parser")
all_comments = soup.findAll("span",attrs={"class":"title"})
# tmp_title = ""
for comments in all_comments:
if "/" not in comments.string:
print (comments.string)
# tmp_title = comments.string
# else:
# print (tmp_title+comments.string)
# tmp_title=""
start_movie+=25
2、需要安装两个包:
#安装
pip install requests // 通讯请求
pip install bs4 //处理html
#导入俩包:
import requests
from bs4 import BeautifulSoup
3、遇到代码存在中文报错SyntaxError: Non-ASCII character ‘\xe5’ in file D:\pythonProjection\1_print_demo.py on line 2, but no encoding declared;
修复:在代码第一行输入# coding:UTF-8或者# -- coding:UTF-8 --