从Excel表中依次读取列中的网页地址，并抽取这些网页的标题和文本，存储到TXT文件中，先写代码...-CSDN博客

本文链接：https://blog.csdn.net/weixin_35756373/article/details/129074785

好的，这是一个使用 Python 和 Excel 库来完成上述任务的例子。首先，我们需要安装相关的库：

!pipinstall openpyxl requests bs4

然后，我们可以使用以下代码来完成上述任务：

import openpyxl
import requestsfrom bs4 import BeautifulSoup

# 打开 Excel 文件
wb = openpyxl.load_workbook('urls.xlsx')

# 选择第一个工作表
sheet = wb['Sheet1']

# 遍历表中的每一行
for row in sheet.iter_rows(min_row=1, max_col=1):
    # 获取单元格中的 URL
    url = row[0].value
    
    # 使用 requests 库发送 HTTP 请求
    response = requests.get(url)
    
    # 使用 BeautifulSoup 解析 HTML 响应
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 提取页面的标题和文本
    title = soup.title.string
    text = soup.get_text()
    
    # 将标题和文本写入文件
    with open('output.txt', 'a', encoding='utf-8') as f:
        f.write(title + '\n')
        f.write(text + '\n')

在这段代码中，我们使用了 openpyxl 库来打开 Excel 文件，然后使用 BeautifulSoup 库来解析 HTML 响应并提取页面的标题和文本。最后，我们使用 with open 语句将标题和文本写入文件。

请注意，这只是一个简单的例子，可能需要根据实际情况进行修改。例如，你可能需要捕获网络请求的异常，或者处理编码问题等等