爬取安徽省博物馆:
import requests
from bs4 import BeautifulSoup
url = 'http://www.ahm.cn/Service/Leaveword/zxzx#page='
def get_info(url, data=None):
wd_data = requests.get(url)
soup = BeautifulSoup(wd_data.text, 'lxml')
questions = soup.select('#articles > ul > li > div.question.item')
times = soup.select('#articles > ul > li > p > span:nth-child(2)')
replys = soup.select('#articles > ul > li > div.answer.item')
primary_class = '安徽省博物馆'
print(questions)
for question, time, reply in zip(questions, times, replys):
data = {
'question': question.get_text(),
'time': time.get_text(),
'reply': reply.get_text(),
'primary': primary_class
}
#print(data)
with open('安徽博物馆.txt', 'a', encoding='ut

本文介绍如何利用Python的BeautifulSoup库爬取安徽省博物馆的网站信息,并详细讲解如何解析网页数据,最终将获取的数据保存为固定格式的文件。
最低0.47元/天 解锁文章
64万+

被折叠的 条评论
为什么被折叠?



