Prompt之多智能体评分系统实战 2）评论数据的提取与预处理

本文链接：https://blog.csdn.net/qq_42540492/article/details/147354907

文章目录

第2章：评论数据的提取与预处理

第2章：评论数据的提取与预处理

2.1 统一化：normalize() 函数详解

在自然语言处理中，用户可能以不同的方式输入相同的餐厅名称：

“Panda Express”
“panda-express”
“PandaExpress”

为避免匹配失败，我们使用 normalize() 函数统一处理餐厅名称：

def normalize(name: str) -> str:
    return (name.lower()
            .replace('-', ' ')
            .replace('.', ' ')
            .replace('  ', ' ')
            .strip())

📌功能说明：

小写化 lower()
将 - 与 . 替换为空格
删除多余空格
去除首尾空白

✅效果：不同变体的输入都可匹配到相同的评论内容。

2.2 从文本中抓取数据：fetch_restaurant_data() 的实现逻辑

评论数据保存在本地文件 restaurant-data.txt 中，每一行代表一条评论，格式如下：

Panda Express. The food was awesome and the staff were amazing.

我们通过如下函数提取目标餐厅的评论：

def fetch_restaurant_data(restaurant_name: str) -> Dict[str, List[str]]:
    ...

⭐实现流程：

使用 normalize() 统一用户输入
打开文件并读取所有行
对每一行执行匹配：
- 使用 normalize() 对行进行标准化
- 判断其是否以指定餐厅名开头
- 匹配成功则加入结果字典
返回：{真实餐厅名称: 评论列表}

🔒 错误处理：

如果找不到文件，函数会提示错误：

except FileNotFoundError:
    print("Error: restaurant-data.txt not found")

2.3 错误处理与数据结构设计

数据结构设计：

输出结构统一为字典：

{
  "Panda Express": [
    "Panda Express. The food was awesome and the staff were amazing.",
    "Panda Express. Pretty good noodles but customer service was average."
  ]
}

这样便于后续传递给下一个智能体进行分析。

错误处理建议：

如果输入为空？抛出异常或提示
如果没有找到任何匹配？返回空字典并提示用户
如果评论数据中餐厅名称与用户输入不一致？通过 normalize() 提高容错率

2.4 构建数据提取代理（Data Fetch Agent）

我们为数据提取代理编写了专属提示词，详见：

def get_data_fetch_agent_prompt(restaurant_query: str) -> str:
    return f"""You are a data fetch agent...
    Analyze the user query: "{restaurant_query}"
    Extract the restaurant name...
    """

该代理的职责：

从用户查询中识别出餐厅名称
调用 fetch_restaurant_data(restaurant_name)
返回评论内容

在主控制器中通过 register_function() 将其连接到系统：

register_function(
    fetch_restaurant_data,
    caller=entrypoint_agent,  
    executor=agents['data_fetch'], 
    name="fetch_restaurant_data",
    description="Fetches the reviews for a specific restaurant."
)

✨小贴士：如何测试数据提取是否成功？

快速测试方法：

print(fetch_restaurant_data("panda express"))

✅预期结果：打印出所有包含 “Panda Express” 的评论列表。
⛔错误结果：空字典或报错，需检查文件路径、餐厅名称是否规范化。

✅ 本章小结

我们深入理解了 normalize 和 fetch_restaurant_data 的实现
学会将非结构化文本变成可用的结构化数据
完成了数据提取代理的构建和注册

这一部分为整个流程提供了“燃料”——原始评论数据，打好坚实的基础。

🧠练习区

代码填空题：补全下面函数中的缺失部分

def fetch_restaurant_data(restaurant_name: str):
    restaurant_name_normalized = _______(restaurant_name)
    ...
    if line_normalized.startswith(__________):
        ...

实践题：请尝试编写一个新函数 list_all_restaurants()，用于读取 restaurant-data.txt 并返回出现过的所有餐厅名称。
进阶挑战：将 fetch_restaurant_data() 改造为可以模糊匹配（例如支持“熊猫快餐”也能找到 “Panda Express”）