我们之前已将MindSearch部署到huggingface.co上,环境配置等一系列安装,便不需要重复安装。
只需重新安装最新的 lagent 即可。
conda activate mindsearch
# 卸载旧版本 lagent
pip uninstall lagent -y
# 安装最新版的 lagent
pip install git+https://github.com/InternLM/lagent.git
前置
创建账户
-
在 Serper API 页面 获取 API KEY。
如果没有账号的话,则需要在 Serper 上注册一个账户。
需要科学上网┗|`O′|┛ 嗷~~
首次创建需要去邮箱验证┗|`O′|┛ 嗷~~
-
在 SiliconFlow API 页面 创建 API KEY。
如果没有账号的话,则需要在 SiliconFlow 上注册一个账户。
完成MindSearch闯关的应该都创建了。
在 Codespaces 上运行 GoogleSearch Searcher
-
接下来需要把 API KEY 设置到 Codespace 的环境变量中
conda activate mindsearch cd /workspaces/mindsearch/MindSearch export SILICON_API_KEY='你的 SILICONFLOW API KEY' export BING_API_KEY='你的 GOOGLE SERPER API KEY'
运行 MindSearch
-
启动 MindSearch 后端
conda activate mindsearch
cd /workspaces/mindsearch/MindSearch
python -m mindsearch.app --lang cn --model_format internlm_silicon --search_engine GoogleSearch
启动成功,显示如下
-
打开新的终端,运行 MindSearch 前端(这里运行的是 streamlit 版本)
conda activate mindsearch
cd /workspaces/mindsearch/MindSearch
streamlit run frontend/mindsearch_streamlit.py
-
Codespace 自带自动转发端口的功能,这里直接点击终端所显示的 streamlit 链接。(不用再打开powershell,转端口了)
部署到 HuggingFace 上 (Streamlit + GoogleSearch Searcher)
创建新的 HF Space
-
打开 HF Space 界面,点击右上角的
Create new Space
。
准备 HF Space 所需的文件
-
在 HF space 上使用 GoogleSearch Searcher 需要先在 Codespace 上手动构建 lagent 的轮子包 ,并且一并上传到 HF space 上。
conda activate mindsearch
cd /workspaces/mindsearch
git clone https://github.com/InternLM/lagent.git
cd lagent
git checkout b6bb4e0
#构建 lagent 轮子包
pip install wheel
python setup.py bdist_wheel
-
将 HF Space 仓库 clone 到 CodeSpace 上。
conda activate mindsearch cd /workspaces/mindsearch git clone https://huggingface.co/spaces/<HF用户名>/<HF_Space_仓库名字>
-
将需要的文件拷贝到 HF space 仓库中。
cp -r /workspaces/mindsearch/MindSearch/mindsearch /workspaces/mindsearch/<HF_Space_仓库名字> cp /workspaces/mindsearch/MindSearch/requirements.txt /workspaces/mindsearch/<HF_Space_仓库名字> cp /workspaces/mindsearch/lagent/dist/lagent-0.2.3-py3-none-any.whl /workspaces/mindsearch/<HF_Space_仓库名字>
创建一个
app.py
并复制一下代码,作为 HF Space 的主入口
1 cd /workspaces/mindsearch/<HF_Space_仓库名字>
touch app.py
2 touch /workspaces/mindsearch/MindSearch_Streamlit/app.py
import json
import tempfile
import os
os.system("pip install lagent-0.2.3-py3-none-any.whl")
import requests
import streamlit as st
from lagent.schema import AgentStatusCode
from pyvis.network import Network
os.system("python -m mindsearch.app --lang cn --model_format internlm_silicon --search_engine GoogleSearch &")
# Function to create the network graph
def create_network_graph(nodes, adjacency_list):
net = Network(height='500px',
width='60%',
bgcolor='white',
font_color='black')
for node_id, node_data in nodes.items():
if node_id in ['root', 'response']:
title = node_data.get('content', node_id)
else:
title = node_data['detail']['content']
net.add_node(node_id,
label=node_id,
title=title,
color='#FF5733',
size=25)
for node_id, neighbors in adjacency_list.items():
for neighbor in neighbors:
if neighbor['name'] in nodes:
net.add_edge(node_id, neighbor['name'])
net.show_buttons(filter_=['physics'])
return net
# Function to draw the graph and return the HTML file path
def draw_graph(net):
path = tempfile.mktemp(suffix='.html')
net.save_graph(path)
return path
def streaming(raw_response):
for chunk in raw_response.iter_lines(chunk_size=8192,
decode_unicode=False,
delimiter=b'\n'):
if chunk:
decoded = chunk.decode('utf-8')
if decoded == '\r':
continue
if decoded[:6] == 'data: ':
decoded = decoded[6:]
elif decoded.startswith(': ping - '):
continue
response = json.loads(decoded)
yield (response['response'], response['current_node'])
# Initialize Streamlit session state
if 'queries' not in st.session_state:
st.session_state['queries'] = []
st.session_state['responses'] = []
st.session_state['graphs_html'] = []
st.session_state['nodes_list'] = []
st.session_state['adjacency_list_list'] = []
st.session_state['history'] = []
st.session_state['already_used_keys'] = list()
# Set up page layout
st.set_page_config(layout='wide')
st.title('MindSearch-思索')
# Function to update chat
def update_chat(query):
with st.chat_message('user'):
st.write(query)
if query not in st.session_state['queries']:
# Mock data to simulate backend response
# response, history, nodes, adjacency_list
st.session_state['queries'].append(query)
st.session_state['responses'].append([])
history = None
# 暂不支持多轮
message = [dict(role='user', content=query)]
url = 'http://localhost:8002/solve'
headers = {'Content-Type': 'application/json'}
data = {'inputs': message}
raw_response = requests.post(url,
headers=headers,
data=json.dumps(data),
timeout=20,
stream=True)
for resp in streaming(raw_response):
agent_return, node_name = resp
if node_name and node_name in ['root', 'response']:
continue
nodes = agent_return['nodes']
adjacency_list = agent_return['adj']
response = agent_return['response']
history = agent_return['inner_steps']
if nodes:
net = create_network_graph(nodes, adjacency_list)
graph_html_path = draw_graph(net)
with open(graph_html_path, encoding='utf-8') as f:
graph_html = f.read()
else:
graph_html = None
if 'graph_placeholder' not in st.session_state:
st.session_state['graph_placeholder'] = st.empty()
if 'expander_placeholder' not in st.session_state:
st.session_state['expander_placeholder'] = st.empty()
if graph_html:
with st.session_state['expander_placeholder'].expander(
'Show Graph', expanded=False):
st.session_state['graph_placeholder']._html(graph_html,
height=500)
if 'container_placeholder' not in st.session_state:
st.session_state['container_placeholder'] = st.empty()
with st.session_state['container_placeholder'].container():
if 'columns_placeholder' not in st.session_state:
st.session_state['columns_placeholder'] = st.empty()
col1, col2 = st.session_state['columns_placeholder'].columns(
[2, 1])
with col1:
if 'planner_placeholder' not in st.session_state:
st.session_state['planner_placeholder'] = st.empty()
if 'session_info_temp' not in st.session_state:
st.session_state['session_info_temp'] = ''
if not node_name:
if agent_return['state'] in [
AgentStatusCode.STREAM_ING,
AgentStatusCode.ANSWER_ING
]:
st.session_state['session_info_temp'] = response
elif agent_return[
'state'] == AgentStatusCode.PLUGIN_START:
thought = st.session_state[
'session_info_temp'].split('```')[0]
if agent_return['response'].startswith('```'):
st.session_state[
'session_info_temp'] = thought + '\n' + response
elif agent_return[
'state'] == AgentStatusCode.PLUGIN_RETURN:
assert agent_return['inner_steps'][-1][
'role'] == 'environment'
st.session_state[
'session_info_temp'] += '\n' + agent_return[
'inner_steps'][-1]['content']
st.session_state['planner_placeholder'].markdown(
st.session_state['session_info_temp'])
if agent_return[
'state'] == AgentStatusCode.PLUGIN_RETURN:
st.session_state['responses'][-1].append(
st.session_state['session_info_temp'])
st.session_state['session_info_temp'] = ''
else:
st.session_state['planner_placeholder'].markdown(
st.session_state['responses'][-1][-1] if
not st.session_state['session_info_temp'] else st.
session_state['session_info_temp'])
with col2:
if 'selectbox_placeholder' not in st.session_state:
st.session_state['selectbox_placeholder'] = st.empty()
if 'searcher_placeholder' not in st.session_state:
st.session_state['searcher_placeholder'] = st.empty()
# st.session_state['searcher_placeholder'].markdown('')
if node_name:
selected_node_key = f"selected_node_{len(st.session_state['queries'])}_{node_name}"
if selected_node_key not in st.session_state:
st.session_state[selected_node_key] = node_name
if selected_node_key not in st.session_state[
'already_used_keys']:
selected_node = st.session_state[
'selectbox_placeholder'].selectbox(
'Select a node:',
list(nodes.keys()),
key=f'key_{selected_node_key}',
index=list(nodes.keys()).index(node_name))
st.session_state['already_used_keys'].append(
selected_node_key)
else:
selected_node = node_name
st.session_state[selected_node_key] = selected_node
if selected_node in nodes:
node = nodes[selected_node]
agent_return = node['detail']
node_info_key = f'{selected_node}_info'
if 'node_info_temp' not in st.session_state:
st.session_state[
'node_info_temp'] = f'### {agent_return["content"]}'
if node_info_key not in st.session_state:
st.session_state[node_info_key] = []
if agent_return['state'] in [
AgentStatusCode.STREAM_ING,
AgentStatusCode.ANSWER_ING
]:
st.session_state[
'node_info_temp'] = agent_return[
'response']
elif agent_return[
'state'] == AgentStatusCode.PLUGIN_START:
thought = st.session_state[
'node_info_temp'].split('```')[0]
if agent_return['response'].startswith('```'):
st.session_state[
'node_info_temp'] = thought + '\n' + agent_return[
'response']
elif agent_return[
'state'] == AgentStatusCode.PLUGIN_END:
thought = st.session_state[
'node_info_temp'].split('```')[0]
if isinstance(agent_return['response'], dict):
st.session_state[
'node_info_temp'] = thought + '\n' + f'```json\n{json.dumps(agent_return["response"], ensure_ascii=False, indent=4)}\n```' # noqa: E501
elif agent_return[
'state'] == AgentStatusCode.PLUGIN_RETURN:
assert agent_return['inner_steps'][-1][
'role'] == 'environment'
st.session_state[node_info_key].append(
('thought',
st.session_state['node_info_temp']))
st.session_state[node_info_key].append(
('observation',
agent_return['inner_steps'][-1]['content']
))
st.session_state['searcher_placeholder'].markdown(
st.session_state['node_info_temp'])
if agent_return['state'] == AgentStatusCode.END:
st.session_state[node_info_key].append(
('answer',
st.session_state['node_info_temp']))
st.session_state['node_info_temp'] = ''
if st.session_state['session_info_temp']:
st.session_state['responses'][-1].append(
st.session_state['session_info_temp'])
st.session_state['session_info_temp'] = ''
# st.session_state['responses'][-1] = '\n'.join(st.session_state['responses'][-1])
st.session_state['graphs_html'].append(graph_html)
st.session_state['nodes_list'].append(nodes)
st.session_state['adjacency_list_list'].append(adjacency_list)
st.session_state['history'] = history
def display_chat_history():
for i, query in enumerate(st.session_state['queries'][-1:]):
# with st.chat_message('assistant'):
if st.session_state['graphs_html'][i]:
with st.session_state['expander_placeholder'].expander(
'Show Graph', expanded=False):
st.session_state['graph_placeholder']._html(
st.session_state['graphs_html'][i], height=500)
with st.session_state['container_placeholder'].container():
col1, col2 = st.session_state['columns_placeholder'].columns(
[2, 1])
with col1:
st.session_state['planner_placeholder'].markdown(
st.session_state['responses'][-1][-1])
with col2:
selected_node_key = st.session_state['already_used_keys'][
-1]
st.session_state['selectbox_placeholder'] = st.empty()
selected_node = st.session_state[
'selectbox_placeholder'].selectbox(
'Select a node:',
list(st.session_state['nodes_list'][i].keys()),
key=f'replay_key_{i}',
index=list(st.session_state['nodes_list'][i].keys(
)).index(st.session_state[selected_node_key]))
st.session_state[selected_node_key] = selected_node
if selected_node not in [
'root', 'response'
] and selected_node in st.session_state['nodes_list'][i]:
node_info_key = f'{selected_node}_info'
for item in st.session_state[node_info_key]:
if item[0] in ['thought', 'answer']:
st.session_state[
'searcher_placeholder'] = st.empty()
st.session_state[
'searcher_placeholder'].markdown(item[1])
elif item[0] == 'observation':
st.session_state[
'observation_expander'] = st.empty()
with st.session_state[
'observation_expander'].expander(
'Results'):
st.write(item[1])
# st.session_state['searcher_placeholder'].markdown(st.session_state[node_info_key])
def clean_history():
st.session_state['queries'] = []
st.session_state['responses'] = []
st.session_state['graphs_html'] = []
st.session_state['nodes_list'] = []
st.session_state['adjacency_list_list'] = []
st.session_state['history'] = []
st.session_state['already_used_keys'] = list()
for k in st.session_state:
if k.endswith('placeholder') or k.endswith('_info'):
del st.session_state[k]
# Main function to run the Streamlit app
def main():
st.sidebar.title('Model Control')
col1, col2 = st.columns([4, 1])
with col1:
user_input = st.chat_input('Enter your query:')
with col2:
if st.button('Clear History'):
clean_history()
if user_input:
update_chat(user_input)
display_chat_history()
if __name__ == '__main__':
main()
-
把
/workspaces/mindsearch/<HF_Space_仓库名字>/requirements.txt
中的git+
https://github.com/InternLM/lagent.git
删除掉,最终 requirements.txt 中的内容如下。duckduckgo_search==5.3.1b1 einops fastapi gradio janus lmdeploy pyvis sse-starlette termcolor transformers==4.41.0 uvicorn
上传文件到 HF Space
-
在 HF Access Tokens 页面创建一个带有写入权限的 token,并且复制此 token。
-
将准备好的文件上传到 HF Space 中,这里需要使用上一步复制的 token 进行授权。
-
cd /workspaces/mindsearch/<HF_Space_仓库名字>
git add .
git commit -m "create streamlit demo"
git remote set-url origin https://<HF用户名>:<上一步复制的token>@huggingface.co/spaces/<HF用户名>/<HF_Space_仓库名字>
git push
大佬赐教方成此文 原文链接:
https://aicarrier.feishu.cn/wiki/DP3bwjkWPiGX38kKfFncVKxunMb
这两天使用又一次出现bug
what 部署后还会
决解方案
打开/workspaces/mindsearch/MindSearch_Streamlit/requirements.txt
cd /workspaces/mindsearch/MindSearch_Streamlit
git add .
git commit -m "class_registry==2.1.2"
git push
重启
参考: