AutoGen结合GoogleSearch实现查询

本文介绍了一个名为AutoGen-google-search的Python库,它结合GoogleSearch和大模型如gpt-3.5,用于自动搜索、信息整合,解决大模型知识更新问题。库中包含搜索函数和网页抓取功能,可通过调用API进行高效研究和报告生成。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

太叼了。

解决AutoGen下使用GoogleSearch痛点问题,当然,这个工具可以结合任意langchain或者Agent使用,将工具作为Tools中的一个Function进行调用即可,可以实现全面的自动搜索和信息整合,解决大模型知识陈旧问题!!!

即使使用gpt-3.5也可以实现意外的效果!!!

集帅们!墙推!

链接如下:

https://pypi.org/project/autogen-google-search/0.0.5/

1.源代码

以下为源代码:

import os
from autogen import config_list_from_json
import autogen
import requests
from bs4 import BeautifulSoup
import json

from langchain.chat_models import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.summarize import load_summarize_chain
from langchain import PromptTemplate
import openai
from dotenv import load_dotenv
from google_prompt import REASEARCH_PROMPT
# Get API key
load_dotenv()
config_list3 = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": ["gpt-3.5-turbo"],
    },
)

# Define research function
def search(query):
    url = "https://google.serper.dev/search"

    payload = json.dumps({
        "q": query
    })
    headers = {
        'X-API-KEY': 'xxxxx',
        'Content-Type': 'application/json'
    }

    response = requests.request("POST", url, headers=headers, data=payload)

    return response.json()


def scrape(url: str):
    # scrape website, and also will summarize the content based on objective if the content is too large
    # objective is the original objective & task that user give to the agent, url is the url of the website to be scraped

    print("Scraping website...")
    # Define the headers for the request
    headers = {
        'Cache-Control': 'no-cache',
        'Content-Type': 'application/json',
    }

    # Define the data to be sent in the request
    data = {
        "url": url
    }

    # Convert Python object to JSON string
    data_json = json.dumps(data)

    # Send the POST request
    response = requests.post(
        "https://chrome.browserless.io/content?token=2db344e9-a08a-4179-8f48-195a2f7ea6ee", headers=headers, data=data_json)

    # Check the response status code
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")
        text = soup.get_text()
        print("CONTENTTTTTT:", text)
        if len(text) > 8000:
            output = summary(text)
            return output
        else:
            return text
    else:
        print(f"HTTP request failed with status code {response.status_code}")


def summary(content):
    llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")
    text_splitter = RecursiveCharacterTextSplitter(
        separators=["\n\n", "\n"], chunk_size=10000, chunk_overlap=500)
    docs = text_splitter.create_documents([content])
    map_prompt = """
    Write a detailed summary of the following text for a research purpose:
    "{text}"
    SUMMARY:
    """
    map_prompt_template = PromptTemplate(
        template=map_prompt, input_variables=["text"])

    summary_chain = load_summarize_chain(
        llm=llm,
        chain_type='map_reduce',
        map_prompt=map_prompt_template,
        combine_prompt=map_prompt_template,
        verbose=True
    )

    output = summary_chain.run(input_documents=docs,)

    return output


def research(query):
    llm_config_researcher = {
        "functions": [
            {
                "name": "search",
                "description": "google search for relevant information",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "Google search query",
                        }
                    },
                    "required": ["query"],
                },
            },
            {
                "name": "scrape",
                "description": "Scraping website content based on url",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "url": {
                            "type": "string",
                            "description": "Website url to scrape",
                        }
                    },
                    "required": ["url"],
                },
            },
        ],
        "config_list": config_list3}

    researcher = autogen.AssistantAgent(
        name="researcher",
        # system_message="Research about a given query, collect as many information as possible, and generate detailed research results with loads of technique details with all reference links attached;Add TERMINATE to the end of the research report;",
        system_message=REASEARCH_PROMPT,
        llm_config=llm_config_researcher,
    )

    user_proxy = autogen.UserProxyAgent(
        name="User_proxy",
        code_execution_config={"last_n_messages": 2, "work_dir": "coding","use_docker": False,},
        # code_execution_config=False,
        is_termination_msg=lambda x: x.get("content", "") and x.get(
            "content", "").rstrip().endswith("TERMINATE"),
        human_input_mode="NEVER",
        function_map={
            "search": search,
            "scrape": scrape,
        }
    )

    user_proxy.initiate_chat(researcher, message=query,max_round=4)

    # set the receiver to be researcher, and get a summary of the research report
    user_proxy.stop_reply_at_receive(researcher)
    user_proxy.send(
        "Give me the research report that just generated again, return ONLY the report & reference links", researcher)

    # return the last message the expert received
    return user_proxy.last_message()["content"]

 2.依赖内容

2.1 OAI_CONFIG_LIST文件

文件格式如下:

 [
     {
         "model": "gpt-3.5-turbo",
         "api_key": "sk-xxxxx"
     },
     {
        "model": "gpt-4",
        "api_key": "sk-xxxx",
        "base_url": "xxxxx"
     }
 ]

2.2 prompt文件

Agent结构较为简单,可以根据自己实际需求修改Agent的system_message,或者添加更多的Agent完成更为复杂的查询任务,比如最后增加一个写blog的Agent实现根据搜索结果写文章的功能!

COMPLETION_PROMPT = "If everything looks good, respond with APPROVED"

REASEARCH_PROMPT = """
You are a specialist in online resource searching. You can search for resources and summarize them in a reasonable format for users based on various questions they pose. Research a given query, collect as much information as possible, and generate detailed research results with loads of technical details, all reference links attached. If product search is involved, please use the full name rather than abbreviations for related products and companies. Add "TERMINATE" to the end of the research report.
"""

3.使用

执行

pip install autogen-google-search==0.0.1

python环境测试:

一行代码即可完成伟大的功能,当然,他完全可以作为一个外部函数供其他Agent使用。

from autogen_product_withgoogle import google_search

res = google_search.search("帮我MiltiAgent框架最新知识,帮我写一篇500字博客")
print(res)

运行结果:

### AutoGen与RAG技术的结合使用方法 #### 背景介绍 AutoGen 是一种能够自动生成对话流程的应用程序开发工具,它可以通过 Streamlist 快速完成 Web 应用的构建[^1]。而 RAG 技术则是一种基于检索增强生成的方法,旨在通过引入外部知识库来提升生成内容的质量和准确性[^2]。 当将这两者结合起来时,可以显著改善生成任务的效果,尤其是在处理复杂场景下的自然语言理解和生成方面。以下是具体的实现方式: --- #### 集成的核心思路 结合 GraphRAG 和 AutoGen 智能体的技术优势,可以从以下几个维度入手: 1. **知识增强智能体** 使用 GraphRAG 作为知识源为 AutoGen 提供丰富的结构化数据支持。这种设计可以让 AutoGen 在生成过程中充分利用已有的高质量信息,从而减少因缺乏背景知识而导致的人工智能幻觉现象[^3]。 2. **图谱感知验证机制** 将 GraphRAG 整合至 AutoGen 的内部验证逻辑中,利用知识图谱对生成的内容进行实时的事实校验。这一过程不仅提高了最终输出的一致性和可信度,还增强了整个系统的鲁棒性。 3. **持续学习能力** 设计一套动态更新策略,允许 AutoGen 根据最新的输入不断调整并优化自身的知识图谱存储。这一步骤对于保持长期运行环境中的竞争力至关重要。 --- #### 实现步骤概述 虽然不允许采用传统意义上的分步描述形式,但仍可通过列举关键组件的方式展现整体架构的设计要点: - 数据层:部署 UltraRAG 或其他类似的开源框架充当底层基础设施,负责管理大规模文档索引以及高效查询操作。 - 控制层:编写特定接口使得 AutoGen 可以无缝调用上述资源,在适当时候触发相关联的功能模块执行检索动作。 - 输出层:经过融合后的结果需再次经历多轮筛选过滤环节,确保满足目标用户的实际需求标准之前不会轻易暴露给外界查看。 --- ```python from autogen import AssistantAgent, UserProxyAgent import graphrag_library as grl def integrate_autogen_with_rag(): # 初始化GraphRAG实例 knowledge_base = grl.GraphRAG() # 创建Autogen代理对象 assistant = AssistantAgent(name="Assistant", llm_config={"request_timeout": 60}) user_proxy = UserProxyAgent( name="User", human_input_mode="TERMINATE", max_consecutive_auto_reply=5, code_execution_config={"work_dir": "coding"}, ) def query_knowledge(query_text): """封装好的函数用于向GraphRAG发起请求""" retrieved_data = knowledge_base.search(query=query_text) return retrieved_data # 注册回调以便于在必要时刻介入控制流 user_proxy.register_function(function_map={ "search_kb": lambda q: query_knowledge(q), }) conversation_history = [] while True: message_from_user = input("Enter your question here:") response_generated_by_model = assistant.receive(message=message_from_user, sender=user_proxy) enriched_response = "" if isinstance(response_generated_by_model, str): potential_queries = extract_possible_search_terms(response_generated_by_model) additional_info = [query_knowledge(term) for term in potential_queries] combined_answer = merge_responses(original=response_generated_by_model, extra_infos=additional_info) enriched_response += f"{combined_answer}\n" print(enriched_response) # 辅助功能定义省略... ``` --- #### 总结说明 通过以上方案可以看出,把 AutoGen 同 RAG 结合起来确实可行,并且具备诸多潜在价值等待挖掘探索。值得注意的是,在具体实施过程中还需要考虑诸如性能瓶颈、成本效益分析等诸多因素的影响。 ---
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值