DeepSeek 赋能:Agent 如何精准挖掘 FishBase 解答鱼类疑问

最近学习了一下 Agent开发,虽然只学了一点皮毛,但是跃跃欲试,想要做一个Agent试试。
AI Agent 是指一种能与环境交互、收集数据并利用这些数据执行任务以满足特定目标的软件程序。是一种基于人工智能和自动化原理的计算范式,用于设计和实现能够自主执行任务、感知环境并与其他实体交互的软件或硬件组件。
本文利用大语言模型 deepseekAPI,借助两个函数自动查询fishbase内某类鱼的信息,然后根据查到的信息回答我们的信息。

Fishbase中相关信息的获取

我们首先获取fishbase中相关鱼类的信息,主要利用爬虫:

import requests  
from bs4 import BeautifulSoup  
import pandas as pd  
import time  
import os  
import json  
  
pages = list(range(1,333))  
  
def get_pages(page):  
    url = f"https://fishbase.org/ComNames/ScriptList.php?resultPage={page}&script=Chinese"  
    headers = {  
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36'}  
    species = []  
    herfs = []  
    try:  
        response = requests.get(url, headers=headers)  
        soup = BeautifulSoup(response.text, 'html.parser')  
        tbody = soup.find_all('tbody')[0]  
        rows = tbody.find_all('tr')  
        for row in rows:  
            columns = row.find_all('td')  
            second_td_content = columns[1].get_text(strip=True)  
            a_tag = columns[1].find('a')  
            link = a_tag.get('href').split('=')[1]  
            species.append(second_td_content)  
            herfs.append(link)  
        data = pd.DataFrame({'species': species, 'herfs': herfs})  
        data.to_csv('data/' + str(page) + ".csv", index=False)  
  
  
    except Exception as e:  
        data = pd.DataFrame({'species': species, 'herfs': herfs})  
        data.to_csv('data/' + str(page) + ".csv", index=False)  
        print(page)  
  
for page in pages:  
    get_pages(page)  
    time.sleep(1)  
  
D = []  
filenames = os.listdir('data')  
for filename in filenames:  
    data = pd.read_csv('data/' + filename)  
    D.append(data)  
D = pd.concat(D)  
D.drop_duplicates(inplace=True)  
D.reset_index(drop=True, inplace=True)  
D = {s:h for s, h in zip(D['species'].values.tolist(), D['herfs'].values.tolist())}  
with open("data/species.json", "w") as f:  
    json.dump(D, f)

通过这个函数,我们将爬取的关于fishbase中鱼种的信息存储在species.json的文件中。
我们的json文件中每个键值对为:

物种的拉丁文名称:在fishbase中的编号

Tools

我们下面需要编写一些tools,来供Agent进行调用:

import requests  
from bs4 import BeautifulSoup  
import json
# 导入我们上一步得到的数据
with open("data/species.json", "r") as f:  
    fishbase = json.load(f)

def get_id(species):  
    """  
    根据给出的物种的拉丁文名称查询其在fishbase中的id  
    :param species: 物种的拉丁文名称,字符串类型  
    :return: 物种在fishbase中的id  
    """    
    return fishbase[species]

def get_information(species_id):  
    """  
    根据某一鱼类在fishbase中的id查询该鱼类在fishbase中的各种信息  
    :param species_id: 某一鱼类在fishbase中的id  
    :return: 鱼类的信息,包含其分类、生物学以及生活史特征,以及模型对其的估计  
    """    information = {}  
    url = f"https://fishbase.org/summary/SpeciesSummary.php?id={species_id}&lang=English"  
    headers = {  
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36'  
        }  
    response = requests.get(url, headers=headers)  
    soup = BeautifulSoup(response.text, 'lxml')  
    target_div = soup.find('div', id='ss-main')  
    cata = target_div.find_all('div', class_='smallSpace',recursive=False)  
    information["Classification / Names"] = cata[0].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "")  
    information["Environment: milieu / climate zone / depth range / distribution range"] = cata[1].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "")  
    information["Distribution"] = cata[2].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "")  
    information["Size / Weight / Age"] = cata[3].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "").replace("&nbsp", "")  
    information["Short description"] = cata[4].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "")  
    information["Biology"] = cata[5].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "")  
    information["Life cycle and mating behavior"] = cata[6].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "")  
    information["Human uses"] = cata[8].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "")  
    if information["Human uses"].startswith("FAO"):  
        information["Human uses"] = ""  
    if cata[-1].get_text() == "\n":  
        information["Estimates based on models"] = cata[-2].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "")  
    else:  
        information["Estimates based on models"] = cata[-1].get_text().replace("\n", "").replace("\r", "").replace("\xa0", "").replace("\t", "")  
    return information

我们提供两个函数:

  • get_id:根据拉丁文名称获取fishbaseid
  • get_information:根据id获取鱼类信息

构建Agent

# 使用deepseek API  
from openai import OpenAI  
from swarm import Swarm, Agent

client = OpenAI(api_key="You API", base_url="https://api.deepseek.com")
swarm_client = Swarm(client)
# 构建agent
agent_fish = Agent(  
    name="fishbase",  
    model="deepseek-chat",  
    instructions="你是一个鱼类生物学家,我们可以问你关于鱼类的问题",  
    functions=[get_id, get_information],  
)
system_message = "你只能通过fishbase中查询得到的信息来回答用户的问题,查询到的信息为字典类型,如果fishbase中的信息无法查询到,你可以说不知道,不要自己编造信息,要用中文回答用户问题。如果查询不到,或者查询过程中出错了,就说不知道"  
user_message = "请你告诉我一些关于Aaptosyax grypus的鱼类的分布"
response = swarm_client.run(agent_fish, [{"role": "system", "content": system_message}, {"role": "user", "content": user_message}])

# 查看结果
response

我们看一下response

Response(messages=[{'content': '', 'refusal': None, 'role': 'assistant', 'audio': None, 'function_call': None, 'tool_calls': [{'id': 'call_0_0e6fe3d8-c39e-4671-a463-0caf33ba6e31', 'function': {'arguments': '{"species":"Aaptosyax grypus"}', 'name': 'get_id'}, 'type': 'function', 'index': 0}], 'sender': 'fishbase'}, {'role': 'tool', 'tool_call_id': 'call_0_0e6fe3d8-c39e-4671-a463-0caf33ba6e31', 'tool_name': 'get_id', 'content': '16239'}, {'content': '', 'refusal': None, 'role': 'assistant', 'audio': None, 'function_call': None, 'tool_calls': [{'id': 'call_0_ff3bfc87-ff02-41c4-b356-237286d66087', 'function': {'arguments': '{"species_id":"16239"}', 'name': 'get_information'}, 'type': 'function', 'index': 0}], 'sender': 'fishbase'}, {'role': 'tool', 'tool_call_id': 'call_0_ff3bfc87-ff02-41c4-b356-237286d66087', 'tool_name': 'get_information', 'content': "{'Classification / Names': 'Teleostei (teleosts) > Cypriniformes (Carps) > Cyprinidae (Minnows or carps) > Cyprininae Etymology: Aaptosyax: Greek, aaptos =giant, terrible + Greek, ykis, yaina, yanis, syaina, syax and syakon = a kind of sole; but it is also related to yaina =hyena and wild pig;grypus: Named for its strongly curved jaws (Ref. 34719). ', 'Environment: milieu / climate zone / depth range / distribution range': 'Freshwater;  pelagic; potamodromous (Ref. 51243). Tropical (Ref. 71989)', 'Distribution': 'Asia:  Mekong River. ', 'Size / Weight / Age': 'Maturity: Lm? range ? - ? cm Max length : 130 cm SL male/unsexed; (Ref. 9497); max. published weight: 30.0 kg (Ref. 9497)', 'Short description': 'Well-developed adipose eye-lid, covering most of eye except pupil in large adults, less extensive in juveniles; presence of a large symphyseal knob in lower jaw fitting in a median notch in upper jaw (Ref. 43281).', 'Biology': 'Inhabits mainstreams of middle reaches in deep rocky rapids.  Juveniles occur in tributaries (Ref. 58784). A large fast-swimming predator, feeding on fish of the middle and the upper water levels.  Although most common along the Thai-Lao border at the mouth of the Mun River, its numbers have drastically decreased in recent years.  This is perhaps due to dam construction or excessive gill netting, to which active pursuit predators, like this species, are particularly vulnerable (Ref. 12693).  Undertakes upstream migration at the same time as Probarbus sp. in December-February (Ref. 37770) which may be related to spawning activity (Ref. 9497).  Attains over 30 kg (Ref. 9497).', 'Life cycle and mating behavior': '', 'Human uses': 'Fisheries: subsistence fisheries', 'Estimates based on models': 'Phylogenetic diversity index  (Ref. 82804):PD50 = 1.0000 [Uniqueness, from 0.5 = low to 2.0 = high].Bayesian length-weight: a=0.01122 (0.00521 - 0.02417), b=3.02 (2.85 - 3.19), in cm total length, based on LWR estimates for this (Sub)family-body shape (Ref. 93245).Trophic level  (Ref. 69278):4.5  ±0.80 se; based on food items.Resilience  (Ref. 120179):Very Low, minimum population doubling time more than 14 years (Preliminary K or Fecundity.).Fishing Vulnerability  (Ref. 59153):Very high vulnerability (90 of 100).Price category  (Ref. 80766):Unknown.'}"}, {'content': ' Aaptosyax grypus 是一种分布于亚洲湄公河的淡水鱼类。它主要栖息在湄公河中游的深水急流中,幼鱼则出现在支流中。这种鱼是一种大型的快速游动捕食者,以中层和上层水域的鱼类为食。尽管在泰国-老挝边境的湄公河入口处较为常见,但由于水坝建设和过度捕捞,其数量近年来急剧减少。这种鱼在12月至2月期间会进行上游迁徙,可能与产卵活动有关。Aaptosyax grypus 的最大体长可达130厘米,最大体重可达30公斤。', 'refusal': None, 'role': 'assistant', 'audio': None, 'function_call': None, 'tool_calls': None, 'sender': 'fishbase'}], agent=Agent(name='fishbase', model='deepseek-chat', instructions='你是一个鱼类生物学家,我们可以问你关于鱼类的问题', functions=[<function get_id at 0x0000013F1C0FF6A0>, <function get_information at 0x0000013F1B218E00>], tool_choice=None, parallel_tool_calls=True), context_variables={})

回答为:
Aaptosyax grypus 是一种分布于亚洲湄公河的淡水鱼类。它主要栖息在湄公河中游的深水急流中,幼鱼则出现在支流中。这种鱼是一种大型的快速游动捕食者,以中层和上层水域的鱼类为食。尽管在泰国-老挝边境的湄公河入口处较为常见,但由于水坝建设和过度捕捞,其数量近年来急剧减少。这种鱼在12月至2月期间会进行上游迁徙,可能与产卵活动有关。Aaptosyax grypus 的最大体长可达130厘米,最大体重可达30公斤。

可以看到也调用了我们提供的函数,并且自动确定了参数:

  • ‘function’: {‘arguments’: ‘{“species”:“Aaptosyax grypus”}’, ‘name’: ‘get_id’}
  • ‘function’: {‘arguments’: ‘{“species_id”:“16239”}’, ‘name’: ‘get_information’}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值