Pipecat - 构建语音和多模态会话代理-CSDN博客

文章目录

一、关于 Pipecat

github : https://github.com/pipecat-ai/pipecat
官方文档：https://docs.pipecat.ai/
应用：https://app.commanddash.io/agent/github_pipecat-ai_pipecat

Pipecat 是一个用于构建语音和多模态会话代理的开源Python框架。

它处理人工智能服务、网络传输、音频处理和多模态交互的复杂编排，让您专注于创造引人入胜的体验。

你能建造什么

语音助手：与人工智能进行自然、实时的对话
互动代理：私人教练和会议助理
多模态应用程序：组合语音、视频、图像和文本
工具线：讲故事的经验和社交伙伴
业务解决方案：客户接收流程和支持机器人
复杂的会话流程：请参阅PipeCat流程以了解更多信息

看到它在行动

主要特点

语音优先设计：内置语音识别、TTS和对话处理
灵活集成：适用于流行的AI服务（OpenAI、😍enLabs等）
管道架构：从简单、可重用的组件构建复杂的应用程序
实时处理：用于流体交互的基于框架的管道架构
生产就绪：企业级WebRTC和Websocket支持

💡想构建结构化的对话吗？查看PipeCat流程，用于管理复杂的对话状态和转换。

二、开始工作

您可以开始在本地机器上运行Pipicat，然后在准备好后将代理进程移动到云端。您还可以添加📞电话号码、🖼️图像输出、📺视频输入、使用不同的LLM等等。

# Install the module
pip install pipecat-ai

# Set up your environment
cp dot-env.template .env

为了保持轻量级，默认情况下只包含核心框架。如果您需要支持第三方AI服务，您可以添加必要的依赖项：

pip install "pipecat-ai[option,...]"

可用服务

Category	Services	Install Command Example
Speech-to-Text	AssemblyAI, Azure, Deepgram, Gladia, Whisper	`pip install "pipecat-ai[deepgram]"`
LLMs	Anthropic, Azure, Cerebras, DeepSeek, Fireworks AI, Gemini, Grok, Groq, NVIDIA NIM, Ollama, OpenAI, OpenRouter, Together AI	`pip install "pipecat-ai[openai]"`
Text-to-Speech	AWS, Azure, Cartesia, Deepgram, ElevenLabs, Fish, Google, LMNT, OpenAI, PlayHT, Rime, XTTS	`pip install "pipecat-ai[cartesia]"`
Speech-to-Speech	Gemini Multimodal Live, OpenAI Realtime	`pip install "pipecat-ai[openai]"`
Transport	Daily (WebRTC), FastAPI Websocket, WebSocket Server, Local	`pip install "pipecat-ai[daily]"`
Video	Tavus, Simli	`pip install "pipecat-ai[tavus,simli]"`
Vision & Image	Moondream, fal	`pip install "pipecat-ai[moondream]"`
Audio Processing	Silero VAD, Krisp, Koala, Noisereduce	`pip install "pipecat-ai[silero]"`
Analytics & Metrics	Canonical AI, Sentry	`pip install "pipecat-ai[canonical]"`

📚查看完整的服务文档→

三、代码示例

基础-相互构建的小片段，一次引入一两个概念
示例应用程序-可以用作开发起点的完整应用程序

1、本地运行的简单语音代理

这是一个非常基本的Pipicat机器人，当用户加入实时会话时，它会向他们打招呼。我们将使用Daily进行实时媒体传输，使用Cartesia进行语音合成。

import asyncio

from pipecat.frames.frames import TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport

async def main():
  # Use Daily as a real-time media transport (WebRTC)
  transport = DailyTransport(
    room_url=...,
    token="", # leave empty. Note: token is _not_ your api key
    bot_name="Bot Name",
    params=DailyParams(audio_out_enabled=True))

  # Use Cartesia for Text-to-Speech
  tts = CartesiaTTSService(
    api_key=...,
    voice_id=...
  )

  # Simple pipeline that will process text to speech and output the result
  pipeline = Pipeline([tts, transport.output()])

  # Create Pipecat processor that can run one or more pipelines tasks
  runner = PipelineRunner()

  # Assign the task callable to run the pipeline
  task = PipelineTask(pipeline)

  # Register an event handler to play audio when a
  # participant joins the transport WebRTC session
  @transport.event_handler("on_first_participant_joined")
  async def on_first_participant_joined(transport, participant):
    participant_name = participant.get("info", {}).get("userName", "")
    # Queue a TextFrame that will get spoken by the TTS service (Cartesia)
    await task.queue_frame(TextFrame(f"Hello there, {participant_name}!"))

  # Register an event handler to exit the application when the user leaves.
  @transport.event_handler("on_participant_left")
  async def on_participant_left(transport, participant, reason):
    await task.cancel()

  # Run the pipeline task
  await runner.run(task)

if __name__ == "__main__":
  asyncio.run(main())

运行它：

python app.py

Daily提供了一个预构建的WebRTC用户交互界面。当应用程序运行时，您可以访问https://<yourdomain>.daily.co/<room_url>并听机器人打招呼！

2、用于生产的WebRTC

WebSockets可以用于服务器到服务器的通信或初始开发。但是对于生产使用，您需要客户端-服务器音频来使用为实时媒体传输设计的协议。（有关WebSockets和WebRTC之间区别的解释，请参阅这篇文章。）

使用WebRTC快速启动和运行的一种方法是注册一个Daily开发人员帐户。Daily为您提供SDK和音频（和视频）路由的全球基础设施。每个帐户每月免费获得10,000分钟的音频/视频/转录时间。

在此处注册并在开发人员仪表板中创建一个房间。

四、对框架本身进行黑客攻击

请注意，在按照以下说明操作之前，您可能需要设置虚拟环境。例如，您可能需要从repo的根目录运行以下内容：

python3 -m venv venv
source venv/bin/activate

从此存储库的根目录中，运行以下命令：

pip install -r dev-requirements.txt

这将安装必要的开发依赖项。此外，请确保安装git预提交挂钩：

pre-commit install

当您提交PR时，钩子会通过确保您的代码遵循项目规则来节省您的时间。

要在本地使用包（例如运行示例文件），请运行：

pip install --editable ".[option,...]"

使用--editable选项可以确保您不必再次运行pip install，只需在本地编辑项目文件即可。

如果要从另一个目录使用此包，可以运行：

pip install "path_to_this_repo[option,...]"

运行测试

从根目录中，运行：

pytest

五、设置您的编辑器

该项目通过Ruff使用严格的PEP 8格式。

Emacs

您可以使用use-pack来安装emacs-lay-ruff包并配置ruff参数：

(use-package lazy-ruff
  :ensure t
  :hook ((python-mode . lazy-ruff-mode))
  :config
  (setq lazy-ruff-format-command "ruff format")
  (setq lazy-ruff-check-command "ruff check --select I"))

ruff安装在前面描述的venv环境中，因此您应该能够使用pyvenv-auto在Emacs中自动加载该环境。

(use-package pyvenv-auto
  :ensure t
  :defer t
  :hook ((python-mode . pyvenv-auto-run)))

Visual Studio代码

安装Ruff扩展。然后编辑用户设置（Ctrl-Shift-P Open User Settings (JSON)）并将其设置为默认Python格式化程序，并在保存时启用格式化：

"[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
    "editor.formatOnSave": true
}

PyCharm

ruff安装在之前描述的venv环境中，现在要在保存时启用自动格式化，请转到File->Settings->Tools->File Watchers并添加一个具有以下设置的新观察者：

名称：Ruff formatter
文件类型：Python
工作目录： $ContentRoot$
参数：format $FilePath$
程序：$PyInterpreterDirectory$/ruff

2025-02-02(日)