以下是一个Python爬虫程序,从古诗词网爬取三首经典古诗,并将其写入桌面的文本文件中。这个程序使用Requests和BeautifulSoup库来实现:
# 导入所需的库
import requests
from bs4 import BeautifulSoup
import os
# 确定爬虫目标URL
url = 'https://www.gushiwen.org/'
# 向目标URL发送GET请求
response = requests.get(url)
# 解析HTML代码
soup = BeautifulSoup(response.content, 'html.parser')
# 通过CSS选择器获取古诗列表
poem_list = soup.select('.main3 .left .sons .cont a')
# 获取前三首古诗的标题和内容
poem_titles = []
poem_contents = []
for i in range(3):
# 获取古诗的标题
poem_title = poem_list[i].text.strip()
poem_titles.append(poem_title)
# 获取古诗的URL
poem_url = url + poem_list[i].get('href')
# 向古诗的URL发送GET请求
poem_response = requests.get(poem_url)
# 解析HTML代码
poem_soup = BeautifulSoup(poem_response.content, 'html.parser')
# 获取古诗的内容
poem_content = poem_soup.select('.main3 .left .sons .contson')[0].text
poem_contents.append(poem_content.strip())
# 将三首古诗写入文本文件
desktop_path = os.path.expanduser("~") + '/Desktop/'
file_path = desktop_path + 'poems.txt'
with open(file_path, 'w', encoding='utf-8') as f:
for i in range(3):
f.write(poem_titles[i] + '\n\n')
f.write(poem_contents[i] + '\n\n\n')
这段代码会首先向古诗词网发送一个GET请求,然后使用BeautifulSoup库解析返回的HTML代码。接着,它通过CSS选择器获取古诗列表,并获取前三首古诗的标题和内容。最后,它将这三首古诗写入文本文件并保存到桌面上。