学习了python入门,自己练习着想爬取日文网站文本,遇到三个问题一直解决不了。望老师高手给个解答。代码如下
1.想爬取('div', {'class': 'boxIn clearfix minH'})下“p”的文字,报错内容为:AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
2.把find_boxin.find_all("p")改成find_boxin[:-1]试了一下,可以输出,但全是乱码。
3.在2的基础上把encoding="utf-8", 改成encoding="shift_jis",
报错为UnicodeEncodeError: 'shift_jis' codec can't encode character '\x83' in position 4: illegal multibyte sequence
菜鸟求教中!!!"""
获取价格网评论
"""
import requests
from bs4 import BeautifulSoup
import csv
def get_comment_div(url):
get_comment_list =
r = requests.get(url, timeout=30)
soup = BeautifulSoup(r.text, "lxml")
find_boxin = soup.find_all('div', {'class': 'boxIn clearfix minH'})
find_comment = find_boxin.find_all("p")
for comment in find_comment:
user_comment = comment.text
get_comment_list.append(user_comment)
return get_comment_list
def main():
"""
主函数
"""
url = "http://bbs.kakaku.com/bbs/-/CategoryCD=6460/"
# 获取评论
get_comment = get_comment_div(url)
with open("get_user_word.csv", "w", encoding="utf-8", newline="")as f:
writer = csv.writer(f)
for i, comment in enumerate(get_comment):
if (i + 1) % 5 == 0:
print("正在读取第{}条数据,一共{}条数据".format(i+1, len(get_comment)))
writer.writerow(comment)
if __name__ == '__main__':
main()