社区共建任务 在 MindSearch 中支持新的搜索 API

我们之前已将MindSearch部署到huggingface.co上,环境配置等一系列安装,便不需要重复安装。

只需重新安装最新的 lagent 即可。 

conda activate mindsearch
# 卸载旧版本 lagent
pip uninstall lagent -y
# 安装最新版的 lagent
pip install git+https://github.com/InternLM/lagent.git

前置 

创建账户

  • Serper API 页面 获取 API KEY。

    如果没有账号的话,则需要在 Serper 上注册一个账户。
    需要科学上网┗|`O′|┛ 嗷~~

 首次创建需要去邮箱验证┗|`O′|┛ 嗷~~

  • SiliconFlow API 页面 创建 API KEY。

    如果没有账号的话,则需要在 SiliconFlow 上注册一个账户。
    完成MindSearch闯关的应该都创建了。

在 Codespaces 上运行 GoogleSearch Searcher 

  • 接下来需要把 API KEY 设置到 Codespace 的环境变量中

    conda activate mindsearch
    cd /workspaces/mindsearch/MindSearch
    export SILICON_API_KEY='你的  SILICONFLOW API KEY'
    export BING_API_KEY='你的 GOOGLE SERPER API KEY'

    运行 MindSearch

  • 启动 MindSearch 后端

conda activate mindsearch
cd /workspaces/mindsearch/MindSearch
python -m mindsearch.app --lang cn --model_format internlm_silicon --search_engine GoogleSearch

 启动成功,显示如下

  • 打开新的终端,运行 MindSearch 前端(这里运行的是 streamlit 版本)

conda activate mindsearch
cd /workspaces/mindsearch/MindSearch
streamlit run frontend/mindsearch_streamlit.py
  • Codespace 自带自动转发端口的功能,这里直接点击终端所显示的 streamlit 链接。(不用再打开powershell,转端口了)

部署到 HuggingFace 上 (Streamlit + GoogleSearch Searcher)

创建新的 HF Space

  • 打开 HF Space 界面,点击右上角的 Create new Space

准备 HF Space 所需的文件 

  • 在 HF space 上使用 GoogleSearch Searcher 需要先在 Codespace 上手动构建 lagent 的轮子包 ,并且一并上传到 HF space 上。

conda activate mindsearch
cd /workspaces/mindsearch
git clone https://github.com/InternLM/lagent.git
cd lagent
git checkout b6bb4e0
#构建 lagent 轮子包
pip install wheel
python setup.py bdist_wheel
  • 将 HF Space 仓库 clone 到 CodeSpace 上。

    conda activate mindsearch
    cd /workspaces/mindsearch
    git clone https://huggingface.co/spaces/<HF用户名>/<HF_Space_仓库名字>

  • 将需要的文件拷贝到 HF space 仓库中。

    cp -r /workspaces/mindsearch/MindSearch/mindsearch /workspaces/mindsearch/<HF_Space_仓库名字>
    cp /workspaces/mindsearch/MindSearch/requirements.txt /workspaces/mindsearch/<HF_Space_仓库名字>
    cp /workspaces/mindsearch/lagent/dist/lagent-0.2.3-py3-none-any.whl /workspaces/mindsearch/<HF_Space_仓库名字>

    创建一个 app.py 并复制一下代码,作为 HF Space 的主入口

1  cd /workspaces/mindsearch/<HF_Space_仓库名字>
   touch app.py


2  touch /workspaces/mindsearch/MindSearch_Streamlit/app.py
import json
import tempfile
import os
os.system("pip install lagent-0.2.3-py3-none-any.whl")
import requests
import streamlit as st
from lagent.schema import AgentStatusCode
from pyvis.network import Network

os.system("python -m mindsearch.app --lang cn --model_format internlm_silicon --search_engine GoogleSearch &")

# Function to create the network graph
def create_network_graph(nodes, adjacency_list):
    net = Network(height='500px',
                  width='60%',
                  bgcolor='white',
                  font_color='black')
    for node_id, node_data in nodes.items():
        if node_id in ['root', 'response']:
            title = node_data.get('content', node_id)
        else:
            title = node_data['detail']['content']
        net.add_node(node_id,
                     label=node_id,
                     title=title,
                     color='#FF5733',
                     size=25)
    for node_id, neighbors in adjacency_list.items():
        for neighbor in neighbors:
            if neighbor['name'] in nodes:
                net.add_edge(node_id, neighbor['name'])
    net.show_buttons(filter_=['physics'])
    return net

# Function to draw the graph and return the HTML file path
def draw_graph(net):
    path = tempfile.mktemp(suffix='.html')
    net.save_graph(path)
    return path

def streaming(raw_response):
    for chunk in raw_response.iter_lines(chunk_size=8192,
                                         decode_unicode=False,
                                         delimiter=b'\n'):
        if chunk:
            decoded = chunk.decode('utf-8')
            if decoded == '\r':
                continue
            if decoded[:6] == 'data: ':
                decoded = decoded[6:]
            elif decoded.startswith(': ping - '):
                continue
            response = json.loads(decoded)
            yield (response['response'], response['current_node'])

# Initialize Streamlit session state
if 'queries' not in st.session_state:
    st.session_state['queries'] = []
    st.session_state['responses'] = []
    st.session_state['graphs_html'] = []
    st.session_state['nodes_list'] = []
    st.session_state['adjacency_list_list'] = []
    st.session_state['history'] = []
    st.session_state['already_used_keys'] = list()

# Set up page layout
st.set_page_config(layout='wide')
st.title('MindSearch-思索')

# Function to update chat
def update_chat(query):
    with st.chat_message('user'):
        st.write(query)
    if query not in st.session_state['queries']:
        # Mock data to simulate backend response
        # response, history, nodes, adjacency_list
        st.session_state['queries'].append(query)
        st.session_state['responses'].append([])
        history = None
        # 暂不支持多轮
        message = [dict(role='user', content=query)]

        url = 'http://localhost:8002/solve'
        headers = {'Content-Type': 'application/json'}
        data = {'inputs': message}
        raw_response = requests.post(url,
                                     headers=headers,
                                     data=json.dumps(data),
                                     timeout=20,
                                     stream=True)

        for resp in streaming(raw_response):
            agent_return, node_name = resp
            if node_name and node_name in ['root', 'response']:
                continue
            nodes = agent_return['nodes']
            adjacency_list = agent_return['adj']
            response = agent_return['response']
            history = agent_return['inner_steps']
            if nodes:
                net = create_network_graph(nodes, adjacency_list)
                graph_html_path = draw_graph(net)
                with open(graph_html_path, encoding='utf-8') as f:
                    graph_html = f.read()
            else:
                graph_html = None
            if 'graph_placeholder' not in st.session_state:
                st.session_state['graph_placeholder'] = st.empty()
            if 'expander_placeholder' not in st.session_state:
                st.session_state['expander_placeholder'] = st.empty()
            if graph_html:
                with st.session_state['expander_placeholder'].expander(
                        'Show Graph', expanded=False):
                    st.session_state['graph_placeholder']._html(graph_html,
                                                                height=500)
            if 'container_placeholder' not in st.session_state:
                st.session_state['container_placeholder'] = st.empty()
            with st.session_state['container_placeholder'].container():
                if 'columns_placeholder' not in st.session_state:
                    st.session_state['columns_placeholder'] = st.empty()
                col1, col2 = st.session_state['columns_placeholder'].columns(
                    [2, 1])
                with col1:
                    if 'planner_placeholder' not in st.session_state:
                        st.session_state['planner_placeholder'] = st.empty()
                    if 'session_info_temp' not in st.session_state:
                        st.session_state['session_info_temp'] = ''
                    if not node_name:
                        if agent_return['state'] in [
                                AgentStatusCode.STREAM_ING,
                                AgentStatusCode.ANSWER_ING
                        ]:
                            st.session_state['session_info_temp'] = response
                        elif agent_return[
                                'state'] == AgentStatusCode.PLUGIN_START:
                            thought = st.session_state[
                                'session_info_temp'].split('```')[0]
                            if agent_return['response'].startswith('```'):
                                st.session_state[
                                    'session_info_temp'] = thought + '\n' + response
                        elif agent_return[
                                'state'] == AgentStatusCode.PLUGIN_RETURN:
                            assert agent_return['inner_steps'][-1][
                                'role'] == 'environment'
                            st.session_state[
                                'session_info_temp'] += '\n' + agent_return[
                                    'inner_steps'][-1]['content']
                        st.session_state['planner_placeholder'].markdown(
                            st.session_state['session_info_temp'])
                        if agent_return[
                                'state'] == AgentStatusCode.PLUGIN_RETURN:
                            st.session_state['responses'][-1].append(
                                st.session_state['session_info_temp'])
                            st.session_state['session_info_temp'] = ''
                    else:
                        st.session_state['planner_placeholder'].markdown(
                            st.session_state['responses'][-1][-1] if
                            not st.session_state['session_info_temp'] else st.
                            session_state['session_info_temp'])
                with col2:
                    if 'selectbox_placeholder' not in st.session_state:
                        st.session_state['selectbox_placeholder'] = st.empty()
                    if 'searcher_placeholder' not in st.session_state:
                        st.session_state['searcher_placeholder'] = st.empty()
                    # st.session_state['searcher_placeholder'].markdown('')
                    if node_name:
                        selected_node_key = f"selected_node_{len(st.session_state['queries'])}_{node_name}"
                        if selected_node_key not in st.session_state:
                            st.session_state[selected_node_key] = node_name
                        if selected_node_key not in st.session_state[
                                'already_used_keys']:
                            selected_node = st.session_state[
                                'selectbox_placeholder'].selectbox(
                                    'Select a node:',
                                    list(nodes.keys()),
                                    key=f'key_{selected_node_key}',
                                    index=list(nodes.keys()).index(node_name))
                            st.session_state['already_used_keys'].append(
                                selected_node_key)
                        else:
                            selected_node = node_name
                        st.session_state[selected_node_key] = selected_node
                        if selected_node in nodes:
                            node = nodes[selected_node]
                            agent_return = node['detail']
                            node_info_key = f'{selected_node}_info'
                            if 'node_info_temp' not in st.session_state:
                                st.session_state[
                                    'node_info_temp'] = f'### {agent_return["content"]}'
                            if node_info_key not in st.session_state:
                                st.session_state[node_info_key] = []
                            if agent_return['state'] in [
                                    AgentStatusCode.STREAM_ING,
                                    AgentStatusCode.ANSWER_ING
                            ]:
                                st.session_state[
                                    'node_info_temp'] = agent_return[
                                        'response']
                            elif agent_return[
                                    'state'] == AgentStatusCode.PLUGIN_START:
                                thought = st.session_state[
                                    'node_info_temp'].split('```')[0]
                                if agent_return['response'].startswith('```'):
                                    st.session_state[
                                        'node_info_temp'] = thought + '\n' + agent_return[
                                            'response']
                            elif agent_return[
                                    'state'] == AgentStatusCode.PLUGIN_END:
                                thought = st.session_state[
                                    'node_info_temp'].split('```')[0]
                                if isinstance(agent_return['response'], dict):
                                    st.session_state[
                                        'node_info_temp'] = thought + '\n' + f'```json\n{json.dumps(agent_return["response"], ensure_ascii=False, indent=4)}\n```'  # noqa: E501
                            elif agent_return[
                                    'state'] == AgentStatusCode.PLUGIN_RETURN:
                                assert agent_return['inner_steps'][-1][
                                    'role'] == 'environment'
                                st.session_state[node_info_key].append(
                                    ('thought',
                                     st.session_state['node_info_temp']))
                                st.session_state[node_info_key].append(
                                    ('observation',
                                     agent_return['inner_steps'][-1]['content']
                                     ))
                            st.session_state['searcher_placeholder'].markdown(
                                st.session_state['node_info_temp'])
                            if agent_return['state'] == AgentStatusCode.END:
                                st.session_state[node_info_key].append(
                                    ('answer',
                                     st.session_state['node_info_temp']))
                                st.session_state['node_info_temp'] = ''
        if st.session_state['session_info_temp']:
            st.session_state['responses'][-1].append(
                st.session_state['session_info_temp'])
            st.session_state['session_info_temp'] = ''
        # st.session_state['responses'][-1] = '\n'.join(st.session_state['responses'][-1])
        st.session_state['graphs_html'].append(graph_html)
        st.session_state['nodes_list'].append(nodes)
        st.session_state['adjacency_list_list'].append(adjacency_list)
        st.session_state['history'] = history

def display_chat_history():
    for i, query in enumerate(st.session_state['queries'][-1:]):
        # with st.chat_message('assistant'):
        if st.session_state['graphs_html'][i]:
            with st.session_state['expander_placeholder'].expander(
                    'Show Graph', expanded=False):
                st.session_state['graph_placeholder']._html(
                    st.session_state['graphs_html'][i], height=500)
            with st.session_state['container_placeholder'].container():
                col1, col2 = st.session_state['columns_placeholder'].columns(
                    [2, 1])
                with col1:
                    st.session_state['planner_placeholder'].markdown(
                        st.session_state['responses'][-1][-1])
                with col2:
                    selected_node_key = st.session_state['already_used_keys'][
                        -1]
                    st.session_state['selectbox_placeholder'] = st.empty()
                    selected_node = st.session_state[
                        'selectbox_placeholder'].selectbox(
                            'Select a node:',
                            list(st.session_state['nodes_list'][i].keys()),
                            key=f'replay_key_{i}',
                            index=list(st.session_state['nodes_list'][i].keys(
                            )).index(st.session_state[selected_node_key]))
                    st.session_state[selected_node_key] = selected_node
                    if selected_node not in [
                            'root', 'response'
                    ] and selected_node in st.session_state['nodes_list'][i]:
                        node_info_key = f'{selected_node}_info'
                        for item in st.session_state[node_info_key]:
                            if item[0] in ['thought', 'answer']:
                                st.session_state[
                                    'searcher_placeholder'] = st.empty()
                                st.session_state[
                                    'searcher_placeholder'].markdown(item[1])
                            elif item[0] == 'observation':
                                st.session_state[
                                    'observation_expander'] = st.empty()
                                with st.session_state[
                                        'observation_expander'].expander(
                                            'Results'):
                                    st.write(item[1])
                        # st.session_state['searcher_placeholder'].markdown(st.session_state[node_info_key])

def clean_history():
    st.session_state['queries'] = []
    st.session_state['responses'] = []
    st.session_state['graphs_html'] = []
    st.session_state['nodes_list'] = []
    st.session_state['adjacency_list_list'] = []
    st.session_state['history'] = []
    st.session_state['already_used_keys'] = list()
    for k in st.session_state:
        if k.endswith('placeholder') or k.endswith('_info'):
            del st.session_state[k]

# Main function to run the Streamlit app
def main():
    st.sidebar.title('Model Control')
    col1, col2 = st.columns([4, 1])
    with col1:
        user_input = st.chat_input('Enter your query:')
    with col2:
        if st.button('Clear History'):
            clean_history()
    if user_input:
        update_chat(user_input)
    display_chat_history()

if __name__ == '__main__':
    main()

  • /workspaces/mindsearch/<HF_Space_仓库名字>/requirements.txt 中的 git+https://github.com/InternLM/lagent.git 删除掉,最终 requirements.txt 中的内容如下。

    duckduckgo_search==5.3.1b1
    einops
    fastapi
    
    gradio
    janus
    lmdeploy
    pyvis
    sse-starlette
    termcolor
    transformers==4.41.0
    uvicorn
    

    上传文件到 HF Space

  • HF Access Tokens 页面创建一个带有写入权限的 token,并且复制此 token。

    • 将准备好的文件上传到 HF Space 中,这里需要使用上一步复制的 token 进行授权。

cd /workspaces/mindsearch/<HF_Space_仓库名字>
git add .
git commit -m "create streamlit demo"
git remote set-url origin https://<HF用户名>:<上一步复制的token>@huggingface.co/spaces/<HF用户名>/<HF_Space_仓库名字>
git push

 

大佬赐教方成此文   原文链接: 

https://aicarrier.feishu.cn/wiki/DP3bwjkWPiGX38kKfFncVKxunMb

这两天使用又一次出现bug

what 部署后还会

决解方案 

打开/workspaces/mindsearch/MindSearch_Streamlit/requirements.txt 

cd /workspaces/mindsearch/MindSearch_Streamlit
git add .

git commit -m "class_registry==2.1.2"

git push

 重启

参考:

ImportError: cannot import name 'AutoRegister' from 'class_registry' (/opt/conda/envs/mindsearch/lib/python3.10/site-packages/class_registry/__init__.py) · Issue #202 · InternLM/MindSearch (github.com)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值