目录
背景与目标
在Dify智能体平台投入实际应用前,压力测试是不可或缺的环节。这一测试旨在通过模拟极端条件,全面评估平台在应对高并发请求、大量用户同时访问等场景下的性能表现,确保平台的稳定性和可靠性。
本次压力测试旨在实现以下目标:
- 8核16G最小部署的tps
- 16核32G资源下的tps及最优部署架构
压测场景
场景一:简单chatFLow场景
压测Dify最简单chatFlow应用的chat-messages接口
场景二:复杂chatFLow场景
压测Dify涉及到所有服务应用的chat-messages接口
场景三:文件召回场景
压测Dify的retrieve接口
压测工具
本次压测使用的压测工具为Locust,Locust是一个用于HTTP和其他协议的开源性能/负载测试工具,使用Python代码编写测试脚本。
选择该工具的原因:由于Dify的chat-messages为流式响应接口,经调研Locust相较其他工具(如Jmter、K6、wrk),直接在测试脚本中通过参数设置即可支持流式接口的调用,故本次压测使用Locust作为压测工具。
压测物料准备
配置Dify
简单chatFLow场景
新建一个应用,类型为CHATFLOE,编排中只包含开始和直接回复两个节点,在直接回复节点中设置任意回复内容
复杂chatFLow场景
新建一个应用,类型为CHATFLOE,编排中包含涉及到所有相关服务的节点
文件召回场景
新建一个知识库,不勾选Rerank,在知识库中上传一个文件,文件中包含文字及相关图片
Locust安装
确认压测机已经安装好python,通过以下命令安装Locust
pip install locust
Locust脚本编写
脚本如下:
from locust import HttpUser, TaskSet, task, between
import time
class ChatMessages(TaskSet):
@task
def chat_messages(self):
url = "/chat-messages"
headers = {
"Authorization": "Bearer app-vUKfsDmFlRJCLkQcUSniuUYX",
"Content-Type": "application/json",
}
payload = {
"inputs": {},
"query": "压测",
"response_mode": "streaming",
"user": "压测"
}
# 记录开始时间
start_time = time.time()
try:
# 发起 POST 请求
with self.client.post(url, json=payload, headers=headers, stream=True, catch_response=True, timeout=60) as response:
if response.status_code != 200:
response.failure(f"Unexpected status code: {response.status_code}")
return
# 记录首字节时间(TTFB)
ttfb = time.time() - start_time
print(f"TTFB: {ttfb:.2f}s")
# 逐块读取响应内容
chunk_count = 0
for line in response.iter_lines(decode_unicode=True):
if line.startswith("data:"):
chunk_count += 1
total_time = time.time() - start_time
print(f"Received chunk #{chunk_count}: {line} :{total_time:.2f}s")
if "message_end" in line:
total_time = time.time() - start_time
print(f"Total response time: {total_time:.2f}s")
break
except Exception as e:
response.failure(f"Request failed: {str(e)}")
class Retrieve(TaskSet):
@task
def retrieve(self):
url = "/datasets/66fb8951-bdfb-457c-8267-66b9e822a4ee/retrieve"
headers = {
"Authorization": "Bearer dataset-pqrDBWoy9UILq7zbHnCkN3dY",
"Content-Type": "application/json",
}
payload = {
"query": "流程审批是什么,如果有图片,请一起返回",
"retrieval_model": {
"search_method": "hybrid_search",
"reranking_enable": False,
"reranking_mode": None,
"reranking_model": {
"reranking_provider_name": "",
"reranking_model_name": ""
},
"weights": None,
"score_threshold_enabled": False,
"score_threshold": None
}
}
# 记录开始时间
start_time = time.time()
try:
# 发起 POST 请求
response = self.client.post(url, json=payload, headers=headers, timeout=60)
print(response.text)
if response.status_code != 200:
print(f"Unexpected status code: {response.status_code}")
return
# 记录请求时间(TTFB)
ttfb = time.time() - start_time
print(f"TTFB: {ttfb:.2f}s")
except Exception as e:
print(f"Request failed: {str(e)}")
class ChatMessagesTest(HttpUser):
# 声明执行的任务集是哪个类
tasks = [ChatMessages]
# 设置运行过程中间隔时间
# wait_time = between(1, 2)
# 每用户结束动作,作用等同于pytest、unittest的teardown
def on_stop(self):
self.client.close()
class RetrieveTest(HttpUser):
# 声明执行的任务集是哪个类
tasks = [Retrieve]
# 设置运行过程中间隔时间
# wait_time = between(1, 2)
# 每用户结束动作,作用等同于pytest、unittest的teardown
def on_stop(self):
self.client.close()
压测步骤
启动locust
通过以下命令启动Locust(test.py为上一小节的测试脚本)
locust -f test.py
开始压测
浏览器打开Locust的web页面,输入对应参数后开始压测(并发数依次为50,100,150压测持续时间为5分钟)
指标调整
压测期间查看Dify的dify-api、dify-plugin-daemon、xinference、ollama服务的性能指标是否异常,对异常的指标进行调整后重复压测,以确定哪些指标为Dify的性能瓶颈,相关性能指标如下:
CPU使用率 | 内存使用率 | 网络IO | 磁盘IO | 线程数 |
资源及配置调优过程(8核16G)
智能体配置:
并发数:100
压测持续时间:3分钟
第一次压测
服务部署详情:
服务名称 | cpu | 内存 |
xinference | 1 | 2048 |
ollama | 1 | 2048 |
dify-sandbox | 0.8 | 1024 |
dify-nginx | 0.5 | 1024 |
dify-web | 0.5 | 1024 |
dify-ssrf | 0.3 | 300 |
dify-plugin-daemon | 0.8 | 2772 |
dify-worker | 0.8 | 1024 |
dify-api | 0.8 | 2048 |
dify-weaviate | 0.5 | 1024 |
dify-redis | 0.5 | 1024 |
dify-postgres | 0.5 | 1024 |
合计 | 8 | 16384 |
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 5319 | 0 | 2200 | 4100 | 9000 | 2262.98 | 37 | 10843 | 0 | 21.9 | 0 |
压测过程发现dify-api服务CPU跑满
第二次压测
增加dify-api服务的CPU,具体配置如下:
服务名称 | CPU | 内存 |
xinference | 1 | 2048 |
ollama | 1 | 2048 |
dify-sandbox | 0.8 | 1024 |
dify-nginx | 0.5 | 1024 |
dify-web | 0.5 | 1024 |
dify-ssrf | 0.3 | 300 |
dify-plugin-daemon | 0.8 | 2772 |
dify-worker | 0.8 | 1024 |
dify-api | 1 | 2048 |
dify-weaviate | 0.5 | 1024 |
dify-redis | 0.5 | 1024 |
dify-postgres | 0.5 | 1024 |
合计 | 8.2 | 16384 |
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 5932 | 0 | 1500 | 3400 | 7900 | 1636.67 | 40 | 13931 | 0 | 31.2 | 0 |
RPS由21.9上升到31.2,证明增加dify-api的cpu对RPS指标有影响
第三次压测
再次增加dify-api服务的CPU,同时调整整体服务配置达到8核16G的配置要求,具体配置如下:
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 0.5 | 2048 | |
ollama | 0.5 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 2 | 2048 | SERVER_WORKER_AMOUNT=2 |
dify-weaviate | 0.5 | 1024 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 0.5 | 1024 | |
合计 | 8 | 16384 |
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 6741 | 0 | 1600 | 4700 | 9000 | 1675.86 | 39 | 9897 | 0 | 49.5 | 0 |
RPS上升到49.5
第四次压测
由于在8核16G的限制下,资源已无法再进行调整,同时在压测过程中发现接口最终会通过pdsql进行数据的查询和修改,故对pdsql的配置进行调优,具体调整如下:
# 内存相关配置
shared_buffers = 256MB
work_mem = 16MB
work_mem = 17MB
work_mem = 18MB
# 并发连接
max_connections = 200
# 检查点
checkpoint_timeout = 15min
# 打印慢日志
log_min_duration_statement = 200ms
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 6688 | 0 | 1700 | 4600 | 13000 | 1708.36 | 40 | 18824 | 0 | 54.1 | 0 |
RPS没有明显提升
第五次压测
增加pdsql配置,具体配置如下:
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 0.5 | 2048 | |
ollama | 0.5 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 2 | 2048 | SERVER_WORKER_AMOUNT=2 |
dify-weaviate | 0.5 | 1024 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 1 | 2048 | |
合计 | 8.5 | 17408 |
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 7748 | 0 | 1500 | 3900 | 6700 | 1437.3 | 41 | 7148 | 0 | 48.4 | 0 |
RPS没有明显提升,观察pdsql慢日志发现存在大量的慢查询及慢事务
该查询sql是压测接口中会执行的sql,同时查看表结构确认查询可以命中索引,且该表无数据,通过nfsiostat命令查看K8S的nfs性能
192.168.83.248:/mnt/nfs_share/dify-postgres mounted on /var/lib/kubelet/pods/4e22098e-0df1-4a90-9126-bf91e937772f/volumes/kubernetes.io~nfs/postgres-data:
op/s rpc bklog
28.87 0.00
read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
4.509 62.329 13.823 0 (0.0%) 1.007 5.010
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
9.524 140.775 14.781 0 (0.0%) 5.562 160.093
发现写操作的平均执行时间(160 ms)远高于读操作(5 ms),这可能会影响 PostgreSQL 的性能,特别是在事务提交(COMMIT)时,故提示pdsql后RPS无提升原因大概率为nfs性能问题。
第六次压测
调整pdsql为直连SSD磁盘
压测结果
RPS无明显提升,但性能曲线更平稳
数据库慢日志打印的慢事务数量和时间减少
同时发先dify-api服务的CPU跑满
调优结论
在8核16G的资源限制下已无法再进行资源及配置调优,故改限制下的最优部署配置如下:
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 0.5 | 2048 | |
ollama | 0.5 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 2 | 2048 | SERVER_WORKER_AMOUNT=2 |
dify-weaviate | 0.5 | 1024 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 0.5 | 1024 | |
合计 | 8 | 16384 |
接口压测结果
场景1
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 7748 | 0 | 1500 | 3900 | 6700 | 1437.3 | 41 | 7148 | 0 | 48.4 | 0 |
场景2
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 2752 | 0 | 4900 | 11000 | 14000 | 5688.9 | 3552 | 14582 | 0 | 20.7 | 0 |
场景3
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/datasets/da0bcf35-abc5-4c77-8e2b-4e890b93b61c/retrieve | 3342 | 0 | 5300 | 6700 | 7300 | 5287.1 | 450 | 8038 | 2625 | 17.6 | 0 |
压测结论
8核16G的最优部署配置
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 0.5 | 2048 | |
ollama | 0.5 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 2 | 2048 | SERVER_WORKER_AMOUNT=2 |
dify-weaviate | 0.5 | 1024 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 0.5 | 1024 | |
8 | 16384 |
tps
- 简单chatflow场景TPS:48.4
- 复杂chatflow场景TPS:20.7
- 文件检索场景TPS:17.6
资源及配置调优过程(16核32G)
第一次压测
增加dify-api服务的资源配置,具体配置如下:
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 0.5 | 2048 | |
ollama | 0.5 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 4 | 4096 | SERVER_WORKER_AMOUNT=4 |
dify-weaviate | 0.5 | 1024 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 0.5 | 1024 | |
合计 | 10 | 18422 |
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 3537 | 0 | 3800 | 7700 | 11000 | 4120.93 | 1518 | 12092 | 0 | 22.9 | 0 |
RPS下降为22,观察dify-plugin-daemon日志发现请求响应时间变长且打印大量慢sql
第二次压测
增加pdsql的资源配置,具体配置如下:
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 0.5 | 2048 | |
ollama | 0.5 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 4 | 4086 | SERVER_WORKER_AMOUNT=4 |
dify-weaviate | 0.5 | 1024 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 1 | 2048 | |
合计 | 10.5 | 19446 |
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 14167 | 0 | 910 | 1800 | 2000 | 1032.79 | 759 | 2807 | 0 | 91.3 | 0 |
RPS上升到91.3,dify-plugin-daemon日志的响应时间变短且无打印慢sql
第三次压测
降低dify-api服务的资源配置,同时实例数增加为2,具体配置如下:
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 0.5 | 2048 | |
ollama | 0.5 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 2 | 2048 | SERVER_WORKER_AMOUNT=2 |
dify-api | 2 | 2048 | |
dify-weaviate | 0.5 | 1024 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 1 | 2048 | |
合计 | 10.5 | 19456 |
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 14222 | 0 | 940 | 2800 | 3800 | 1067.39 | 37 | 5519 | 0 | 95.1 | 0 |
RPS为95,压测过程中峰值超过100,但性能曲线存在明显突刺
第四次压测
调整dify-api的SERVER_WORKER_AMOUNT值为3
压测结果
# Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s | |
/v1/chat-messages | 12241 | 0 | 1000 | 2600 | 2800 | 1098.56 | 38 | 4751 | 0 | 59.9 | 0 |
RPS为60,峰值为85且性能曲线平缓
第五次压测
调整dify-api的实例数为3,具体配置如下
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 0.5 | 2048 | |
ollama | 0.5 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 2 | 2048 | SERVER_WORKER_AMOUNT=2 |
dify-api | 2 | 2048 | |
dify-api | 2 | 2048 | |
dify-weaviate | 0.5 | 1024 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 1 | 2048 | |
12.5 | 21504 |
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 4980 | 0 | 2000 | 5300 | 9000 | 2423.31 | 126 | 12645 | 0 | 26.3 | 0 |
RPS降为26,观察dify-plugin-daemon日志发现请求响应时间变长且打印大量慢sql
第六次压测
增加pdsql的资源配置,具体配置如下:
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 0.5 | 2048 | |
ollama | 0.5 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 2 | 2048 | SERVER_WORKER_AMOUNT=2 |
dify-api | 2 | 2048 | |
dify-api | 2 | 2048 | |
dify-weaviate | 0.5 | 1024 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 2 | 3072 | |
13.5 | 22528 |
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 18552 | 0 | 410 | 2000 | 2500 | 667.39 | 34 | 3910 | 0 | 98.4 | 0 |
RPS为98.4,峰值达到124,场景1调优目标达成
压测结果
场景1
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 18552 | 0 | 410 | 2000 | 2500 | 667.39 | 34 | 3910 | 0 | 98.4 | 0 |
场景2
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/chat-messages | 5648 | 0 | 350 | 4200 | 5500 | 979.49 | 36 | 7128 | 0 | 61.2 | 0 |
场景3
压测结果
Name | # Requests | # Fails | Median (ms) | 95%ile (ms) | 99%ile (ms) | Average (ms) | Min (ms) | Max (ms) | Average size (bytes) | Current RPS | Current Failures/s |
/v1/datasets/da0bcf35-abc5-4c77-8e2b-4e890b93b61c/retrieve | 7312 | 0 | 1900 | 6100 | 8300 | 2411.12 | 91 | 15778 | 2625 | 44.3 | 0 |
压测结论
16核32G的最优部署配置
服务名称 | CPU | 内存 | 环境参数配置 |
xinference | 1 | 2048 | |
ollama | 1 | 2048 | |
dify-sandbox | 0.6 | 1024 | |
dify-nginx | 0.5 | 1024 | |
dify-web | 0.5 | 1024 | |
dify-ssrf | 0.3 | 300 | |
dify-plugin-daemon | 0.8 | 2772 | |
dify-worker | 0.8 | 1024 | |
dify-api | 2 | 2048 | SERVER_WORKER_AMOUNT=2 |
dify-api | 2 | 2048 | |
dify-api | 2 | 2048 | |
dify-weaviate | 1 | 2048 | |
dify-redis | 0.5 | 1024 | |
dify-postgres | 2 | 3072 | |
15 | 24576 |
tps
简单chatflow场景TPS:98.4
复杂chatflow场景TPS:61.2
文件检索场景TPS:44.3
更高TPS要求调优意见
- chatflow接口可通过增加dify-api服务的数量及提高pdsql(独立部署)的性能来尝试
- 复杂编排需具体场景具体分析性能瓶颈在哪个服务,针对具体服务进行调优
- 文件检索接口可通过提高weaviate(独立部署)的性能来尝试
- 本报告结果不涉及大模型部署,如需大模型部署需按需提高xinference、ollama的资源配置
作者:道一云低代码
作者想说:喜欢本文请点点关注~