Langchain学习日记——Model I/O篇

缓释多巴胺。

已于 2024-08-27 16:15:25 修改

阅读量310

点赞数 3

分类专栏： langchain 文章标签： langchain 学习

于 2024-08-18 21:31:39 首次发布

本文链接：https://blog.csdn.net/everyglow/article/details/141284973

版权

langchain 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

一、prompt

1.1 prompt补全

部分补全：partial，全部补全：format

prompt = PromptTemplate(template="a:{foo},b:{bar}", input_variables=["foo", "bar"])\
## 部分传参
partial_prompt = prompt.partial(foo="foo")
full_prompt=partial_prompt.format(bar="baz")
## 全部传参
full_prompt=prompt.format(foo="foo",bar="baz")

1.2 prompt模板类

1. 基础：PromptTemplate

2. 示例：FewShotPromptTemplate+示例选择器

虽然提供的examples是列表，但selector会根据条件，挑选单个example补全example_prompt，而后将selector与example_prompt给FewShotPromptTemplate，补全整个prompt

from langchain_core.prompts import PromptTemplate
from langchain_core.prompts import FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

# 创建一些反义词输入输出的示例内容
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
    examples=examples, 
    example_prompt=example_prompt, 
    # 设定期望的示例文本长度
    max_length=29
)
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    # 设置示例以外部分的前置文本
    prefix="Give the antonym of every input",
    # 设置示例以外部分的后置文本
    suffix="Input: {adjective}\nOutput:\n\n",
    input_variables=["adjective"],
)

# 当用户输入的内容比较少时，所有示例都足够被使用
#print(dynamic_prompt.format(adjective="big"))

# 当用户输入的内容足够长时，只有少量示例会被引用
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))

了解长度筛选示例选择器：

LengthBasedExampleSelector 类的筛选条件基于以下几个关键点：

最大长度限制 (max_length)：这是提示的最大允许长度。选择器将确保所选示例的总长度加上用户输入的长度不会超过这个限制。

输入变量长度 (input_variables)：在选择示例之前，首先计算输入变量（用户输入）的文本长度。这是通过调用 get_text_length 函数对输入变量的字符串表示进行测量得到的。

剩余长度计算：从最大长度中减去输入变量的长度，得到剩余可用于示例的长度。

示例文本长度 (example_text_lengths)：每个示例的文本长度，这是通过 example_prompt 格式化示例并使用 get_text_length 函数测量得到的。

选择逻辑：select_examples 方法遍历所有示例，并检查每个示例的文本长度。如果添加当前示例后剩余长度仍然大于或等于零，则将该示例添加到所选示例列表中，并更新剩余长度。如果添加示例后剩余长度变为负数，则停止选择更多的示例。

示例选择：选择器将选择那些可以被包含在提示中，而不会使总长度超过 max_length 的示例。

示例格式化：示例是通过 example_prompt 格式化的，这意味着每个示例的 input 和 output 将按照模板进行格式化，然后计算其长度。

总结来说，LengthBasedExampleSelector 的筛选条件是确保所选示例的总长度加上用户输入的长度不超过预设的最大长度限制。选择器通过计算和比较文本长度来动态决定哪些示例可以被包含在最终的提示中。

3. 对话：ChatPromptTemplate

构建对话信息，每条聊天信息=角色+内容

from langchain_core.prompts import ChatPromptTemplate

chat_template = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful AI bot. Your name is {name}."),
        ("human", "Hello, how are you doing?"),
        ("ai", "I'm doing well, thanks!"),
        ("human", "{user_input}"),
    ]
)
chat_template.format_messages(name="Bob", user_input="What is your name?")

二、OutputParser

1. parser类型指明了应该解析成为的数据结构，langchain支持多种数据模型

1️⃣PydanticOutputParser：以BaseModel定义的pydantic数据模型

2️⃣XMLOutputParser：解析为XML格式

3️⃣YamlOutputParser：解析为YAML格式

2. parser.get_format_instructions：提供指导模型生成标准结构化数据的prompt，将它也加入到prompt模版中去

3. 最后paser.parse(output)：验证、解析为parser规定的数据模型（进一步确保）

pydantic数据模型对parser.get_format_instructions的影响：

1.book_names:List[str]=Field(description="list of names of book they wrote")

text='请回答下面的问题：\n随机生成一位知名的作家及其代表作品\n\n

The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\n

As an example, for the schema

{"properties": {

        "foo": {

                "title": "Foo",

                "description": "a list of strings",

                "type": "array",

                "items": {"type": "string"}

                }

        },

  "required": ["foo"]

}\n

the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema.

The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\n

Here is the output schema:\n```\n

{"properties": {

        "name":{

                "title": "Name",

                "description": "name of an author",

                "type": "string"

                     }        ,

        "book_names": {

                "title": "Book Names",

                "description": "list of names of book they wrote",

                "type": "array",

                "items": {"type": "string"}

                                }

        },

"required": ["name", "book_names"]

}\n```\n

如果输出是代码块，请不要包含首尾的```符号'

输出：{"name": "William Shakespeare", "book_names": ["Romeo and Juliet", "Hamlet"]}

2.book_names: str = Field(description="names of books they wrote, separated by a delimiter")

text='请回答下面的问题：\n随机生成一位知名的作家及其代表作品\n\nThe output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\n

Here is the output schema:\n```\n

{"properties": {

        "name": {

                "title": "Name",

                "description": "name of an author",

                "type": "string"

                        },

        "book_names": {

                "title": "Book Names",

                "description": "names of books they wrote, separated by a delimiter",

                "type": "string"

                                }

        },

"required": ["name", "book_names"]

}\n```\n

如果输出是代码块，请不要包含首尾的```符号'

输出：

{

    "name": "Jane Austen",

    "book_names": "Pride and Prejudice, Sense and Sensibility"

}