Dify平台压测


目录

背景与目标

压测场景

场景一:简单chatFLow场景

场景二:复杂chatFLow场景

场景三:文件召回场景

压测工具

压测物料准备

配置Dify

简单chatFLow场景

复杂chatFLow场景

文件召回场景

Locust安装

Locust脚本编写

压测步骤

启动locust

开始压测

指标调整

资源及配置调优过程(8核16G)

第一次压测

第二次压测

第三次压测

第四次压测

第五次压测

第六次压测

调优结论

接口压测结果

场景1

​编辑

场景2

​编辑

场景3

压测结论

8核16G的最优部署配置

tps

资源及配置调优过程(16核32G)

第一次压测

第二次压测

第三次压测

第四次压测

第五次压测

第六次压测

压测结果

场景1

场景2

​编辑

场景3

压测结论

16核32G的最优部署配置

tps

更高TPS要求调优意见


背景与目标

在Dify智能体平台投入实际应用前,压力测试是不可或缺的环节。这一测试旨在通过模拟极端条件,全面评估平台在应对高并发请求、大量用户同时访问等场景下的性能表现,确保平台的稳定性和可靠性。

本次压力测试旨在实现以下目标:

  • 8核16G最小部署的tps
  • 16核32G资源下的tps及最优部署架构

压测场景

场景一:简单chatFLow场景

压测Dify最简单chatFlow应用的chat-messages接口

场景二:复杂chatFLow场景

压测Dify涉及到所有服务应用的chat-messages接口

场景三:文件召回场景

压测Dify的retrieve接口

压测工具

本次压测使用的压测工具为Locust,Locust是一个用于HTTP和其他协议的开源性能/负载测试工具,使用Python代码编写测试脚本。

选择该工具的原因:由于Dify的chat-messages为流式响应接口,经调研Locust相较其他工具(如Jmter、K6、wrk),直接在测试脚本中通过参数设置即可支持流式接口的调用,故本次压测使用Locust作为压测工具。

压测物料准备

配置Dify

简单chatFLow场景

新建一个应用,类型为CHATFLOE,编排中只包含开始和直接回复两个节点,在直接回复节点中设置任意回复内容

复杂chatFLow场景

新建一个应用,类型为CHATFLOE,编排中包含涉及到所有相关服务的节点

文件召回场景

新建一个知识库,不勾选Rerank,在知识库中上传一个文件,文件中包含文字及相关图片

Locust安装

确认压测机已经安装好python,通过以下命令安装Locust

pip install locust

Locust脚本编写

脚本如下:

from locust import HttpUser, TaskSet, task, between
import time

class ChatMessages(TaskSet):
    @task
    def chat_messages(self):
        url = "/chat-messages"
        headers = {
            "Authorization": "Bearer app-vUKfsDmFlRJCLkQcUSniuUYX",
            "Content-Type": "application/json",
        }
        payload = {
                      "inputs": {},
                      "query": "压测",
                      "response_mode": "streaming",
                      "user": "压测"
                  }

        # 记录开始时间
        start_time = time.time()

        try:
            # 发起 POST 请求
            with self.client.post(url, json=payload, headers=headers, stream=True, catch_response=True, timeout=60) as response:
                if response.status_code != 200:
                    response.failure(f"Unexpected status code: {response.status_code}")
                    return

                # 记录首字节时间(TTFB)
                ttfb = time.time() - start_time
                print(f"TTFB: {ttfb:.2f}s")
                # 逐块读取响应内容
                chunk_count = 0
                for line in response.iter_lines(decode_unicode=True):
                    if line.startswith("data:"):
                        chunk_count += 1
                        total_time = time.time() - start_time
                        print(f"Received chunk #{chunk_count}: {line} :{total_time:.2f}s")
                    if "message_end" in line:
                        total_time = time.time() - start_time
                        print(f"Total response time: {total_time:.2f}s")
                        break
        except Exception as e:
            response.failure(f"Request failed: {str(e)}")


class Retrieve(TaskSet):

    @task
    def retrieve(self):
            url = "/datasets/66fb8951-bdfb-457c-8267-66b9e822a4ee/retrieve"
            headers = {
                "Authorization": "Bearer dataset-pqrDBWoy9UILq7zbHnCkN3dY",
                "Content-Type": "application/json",
            }
            payload = {
                        "query": "流程审批是什么,如果有图片,请一起返回",
                        "retrieval_model": {
                          "search_method": "hybrid_search",
                          "reranking_enable": False,
                          "reranking_mode": None,
                          "reranking_model": {
                            "reranking_provider_name": "",
                            "reranking_model_name": ""
                          },
                          "weights": None,
                          "score_threshold_enabled": False,
                          "score_threshold": None
                        }
                      }

            # 记录开始时间
            start_time = time.time()

            try:
                # 发起 POST 请求
                response = self.client.post(url, json=payload, headers=headers, timeout=60)
                print(response.text)
                if response.status_code != 200:
                    print(f"Unexpected status code: {response.status_code}")
                    return
                # 记录请求时间(TTFB)
                ttfb = time.time() - start_time
                print(f"TTFB: {ttfb:.2f}s")
            except Exception as e:
                print(f"Request failed: {str(e)}")


class ChatMessagesTest(HttpUser):
    # 声明执行的任务集是哪个类
    tasks = [ChatMessages]
    # 设置运行过程中间隔时间
#     wait_time = between(1, 2)

    # 每用户结束动作,作用等同于pytest、unittest的teardown
    def on_stop(self):
        self.client.close()

class RetrieveTest(HttpUser):
    # 声明执行的任务集是哪个类
    tasks = [Retrieve]
    # 设置运行过程中间隔时间
#     wait_time = between(1, 2)

    # 每用户结束动作,作用等同于pytest、unittest的teardown
    def on_stop(self):
        self.client.close()

压测步骤

启动locust

通过以下命令启动Locust(test.py为上一小节的测试脚本)

locust -f test.py

开始压测

浏览器打开Locust的web页面,输入对应参数后开始压测(并发数依次为50,100,150压测持续时间为5分钟)

指标调整

压测期间查看Dify的dify-api、dify-plugin-daemon、xinference、ollama服务的性能指标是否异常,对异常的指标进行调整后重复压测,以确定哪些指标为Dify的性能瓶颈,相关性能指标如下:

CPU使用率

内存使用率

网络IO

磁盘IO

线程数

资源及配置调优过程(8核16G)

智能体配置:

并发数:100

压测持续时间:3分钟

第一次压测

服务部署详情:

服务名称

cpu

内存

xinference

1

2048

ollama

1

2048

dify-sandbox

0.8

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

0.8

2048

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

0.5

1024

合计

8

16384

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

5319

0

2200

4100

9000

2262.98

37

10843

0

21.9

0

压测过程发现dify-api服务CPU跑满

第二次压测

增加dify-api服务的CPU,具体配置如下:

服务名称

CPU

内存

xinference

1

2048

ollama

1

2048

dify-sandbox

0.8

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

1

2048

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

0.5

1024

合计

8.2

16384

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

5932

0

1500

3400

7900

1636.67

40

13931

0

31.2

0

RPS由21.9上升到31.2,证明增加dify-api的cpu对RPS指标有影响

第三次压测

再次增加dify-api服务的CPU,同时调整整体服务配置达到8核16G的配置要求,具体配置如下:

服务名称

CPU

内存

环境参数配置

xinference

0.5

2048

ollama

0.5

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

2

2048

SERVER_WORKER_AMOUNT=2

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

0.5

1024

合计

8

16384

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

6741

0

1600

4700

9000

1675.86

39

9897

0

49.5

0

RPS上升到49.5

第四次压测

由于在8核16G的限制下,资源已无法再进行调整,同时在压测过程中发现接口最终会通过pdsql进行数据的查询和修改,故对pdsql的配置进行调优,具体调整如下

# 内存相关配置
shared_buffers = 256MB
work_mem = 16MB
work_mem = 17MB
work_mem = 18MB
# 并发连接
max_connections = 200
# 检查点
checkpoint_timeout = 15min
# 打印慢日志
log_min_duration_statement = 200ms

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

6688

0

1700

4600

13000

1708.36

40

18824

0

54.1

0

RPS没有明显提升

第五次压测

增加pdsql配置,具体配置如下:

服务名称

CPU

内存

环境参数配置

xinference

0.5

2048

ollama

0.5

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

2

2048

SERVER_WORKER_AMOUNT=2

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

1

2048

合计

8.5

17408

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

7748

0

1500

3900

6700

1437.3

41

7148

0

48.4

0

RPS没有明显提升,观察pdsql慢日志发现存在大量的慢查询及慢事务

该查询sql是压测接口中会执行的sql,同时查看表结构确认查询可以命中索引,且该表无数据,通过nfsiostat命令查看K8S的nfs性能

192.168.83.248:/mnt/nfs_share/dify-postgres mounted on /var/lib/kubelet/pods/4e22098e-0df1-4a90-9126-bf91e937772f/volumes/kubernetes.io~nfs/postgres-data:

   op/s		rpc bklog
  28.87	   0.00
read:            ops/s		   kB/s		  kB/op		retrans		avg RTT (ms)	avg exe (ms)
		  4.509	 62.329	 13.823       0 (0.0%)	  1.007	  5.010
write:           ops/s		   kB/s		  kB/op		retrans		avg RTT (ms)	avg exe (ms)
		  9.524	140.775	 14.781       0 (0.0%)	  5.562	160.093

发现写操作的平均执行时间(160 ms)远高于读操作(5 ms),这可能会影响 PostgreSQL 的性能,特别是在事务提交(COMMIT)时,故提示pdsql后RPS无提升原因大概率为nfs性能问题。

第六次压测

调整pdsql为直连SSD磁盘

压测结果

RPS无明显提升,但性能曲线更平稳

数据库慢日志打印的慢事务数量和时间减少

同时发先dify-api服务的CPU跑满

调优结论

在8核16G的资源限制下已无法再进行资源及配置调优,故改限制下的最优部署配置如下:

服务名称

CPU

内存

环境参数配置

xinference

0.5

2048

ollama

0.5

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

2

2048

SERVER_WORKER_AMOUNT=2

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

0.5

1024

合计

8

16384

接口压测结果

场景1

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

7748

0

1500

3900

6700

1437.3

41

7148

0

48.4

0

场景2

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

2752

0

4900

11000

14000

5688.9

3552

14582

0

20.7

0

场景3

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/datasets/da0bcf35-abc5-4c77-8e2b-4e890b93b61c/retrieve

3342

0

5300

6700

7300

5287.1

450

8038

2625

17.6

0

压测结论

8核16G的最优部署配置

服务名称

CPU

内存

环境参数配置

xinference

0.5

2048

ollama

0.5

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

2

2048

SERVER_WORKER_AMOUNT=2

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

0.5

1024

8

16384

tps

  • 简单chatflow场景TPS:48.4
  • 复杂chatflow场景TPS:20.7
  • 文件检索场景TPS:17.6

资源及配置调优过程(16核32G)

第一次压测

增加dify-api服务的资源配置,具体配置如下:

服务名称

CPU

内存

环境参数配置

xinference

0.5

2048

ollama

0.5

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

4

4096

SERVER_WORKER_AMOUNT=4

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

0.5

1024

合计

10

18422

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

3537

0

3800

7700

11000

4120.93

1518

12092

0

22.9

0

RPS下降为22,观察dify-plugin-daemon日志发现请求响应时间变长且打印大量慢sql

第二次压测

增加pdsql的资源配置,具体配置如下:

服务名称

CPU

内存

环境参数配置

xinference

0.5

2048

ollama

0.5

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

4

4086

SERVER_WORKER_AMOUNT=4

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

1

2048

合计

10.5

19446

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

14167

0

910

1800

2000

1032.79

759

2807

0

91.3

0

RPS上升到91.3,dify-plugin-daemon日志的响应时间变短且无打印慢sql

第三次压测

降低dify-api服务的资源配置,同时实例数增加为2,具体配置如下:

服务名称

CPU

内存

环境参数配置

xinference

0.5

2048

ollama

0.5

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

2

2048

SERVER_WORKER_AMOUNT=2

dify-api

2

2048

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

1

2048

合计

10.5

19456

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

14222

0

940

2800

3800

1067.39

37

5519

0

95.1

0

RPS为95,压测过程中峰值超过100,但性能曲线存在明显突刺

第四次压测

调整dify-apiSERVER_WORKER_AMOUNT值为3

压测结果

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

12241

0

1000

2600

2800

1098.56

38

4751

0

59.9

0

RPS为60,峰值为85且性能曲线平缓

第五次压测

调整dify-api的实例数为3,具体配置如下

服务名称

CPU

内存

环境参数配置

xinference

0.5

2048

ollama

0.5

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

2

2048

SERVER_WORKER_AMOUNT=2

dify-api

2

2048

dify-api

2

2048

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

1

2048

12.5

21504

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

4980

0

2000

5300

9000

2423.31

126

12645

0

26.3

0

RPS降为26,观察dify-plugin-daemon日志发现请求响应时间变长且打印大量慢sql

第六次压测

增加pdsql的资源配置,具体配置如下:

服务名称

CPU

内存

环境参数配置

xinference

0.5

2048

ollama

0.5

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

2

2048

SERVER_WORKER_AMOUNT=2

dify-api

2

2048

dify-api

2

2048

dify-weaviate

0.5

1024

dify-redis

0.5

1024

dify-postgres

2

3072

13.5

22528

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

18552

0

410

2000

2500

667.39

34

3910

0

98.4

0

RPS为98.4,峰值达到124,场景1调优目标达成

压测结果

场景1

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

18552

0

410

2000

2500

667.39

34

3910

0

98.4

0

场景2

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/chat-messages

5648

0

350

4200

5500

979.49

36

7128

0

61.2

0

场景3

压测结果

Name

# Requests

# Fails

Median (ms)

95%ile (ms)

99%ile (ms)

Average (ms)

Min (ms)

Max (ms)

Average size (bytes)

Current RPS

Current Failures/s

/v1/datasets/da0bcf35-abc5-4c77-8e2b-4e890b93b61c/retrieve

7312

0

1900

6100

8300

2411.12

91

15778

2625

44.3

0

压测结论

16核32G的最优部署配置

服务名称

CPU

内存

环境参数配置

xinference

1

2048

ollama

1

2048

dify-sandbox

0.6

1024

dify-nginx

0.5

1024

dify-web

0.5

1024

dify-ssrf

0.3

300

dify-plugin-daemon

0.8

2772

dify-worker

0.8

1024

dify-api

2

2048

SERVER_WORKER_AMOUNT=2

dify-api

2

2048

dify-api

2

2048

dify-weaviate

1

2048

dify-redis

0.5

1024

dify-postgres

2

3072

15

24576

tps

简单chatflow场景TPS:98.4

复杂chatflow场景TPS:61.2

文件检索场景TPS:44.3

更高TPS要求调优意见

  • chatflow接口可通过增加dify-api服务的数量及提高pdsql(独立部署)的性能来尝试
  • 复杂编排需具体场景具体分析性能瓶颈在哪个服务,针对具体服务进行调优
  • 文件检索接口可通过提高weaviate(独立部署)的性能来尝试
  • 本报告结果不涉及大模型部署,如需大模型部署需按需提高xinference、ollama的资源配置

作者:道一云低代码

作者想说:喜欢本文请点点关注~

更多资料分享

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

道一云黑板报

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值