【LangChain系列 13】样例选择器

最新推荐文章于 2024-06-23 21:35:52 发布

大白爱爬山

最新推荐文章于 2024-06-23 21:35:52 发布

阅读量829

点赞数 9

文章标签： langchain prompt ai

本文链接：https://blog.csdn.net/LClansefengbao/article/details/138246460

版权

本文速读

自定义样例选择器
长度选样例择器
MMR样例选择器
n-gram重叠度样例选择器
相似度样例选择器

在上一篇【LangChain系列 8】Prompt模版——少样本prompt模版(二)中，介绍动态少样本prompt模版的时候，根据输入的内容，样例选择器 从所有样例中动态地选择部分样例，这部分样例与输入内容更加相关，这样语言模型能更好地理解prompt，从而给出更好的回答。

本文将介绍五种样例选择器的用法：

自定义样例选择器
长度样例选择器
MMR样例选择器
n-gram重叠度样例选择器
相似度样例选择器

01 自定义样例选择器

自定义样例选择器是一种常见的操作，因为业务逻辑千变万化，通过自定义样例选择器可以更加灵活地选择选择样例。

一个样例选择器至少要实现两个方法：

1. add_example方法：接收一个样例，然后把它传递给样例选择器。

2. select_examples方法：接收用户输入变量，然后返回一个样例列表给 少样本prompt模版 使用。

下面我们来实现一个随机选择两个样例的选择器。


from langchain.prompts.example_selector.base import BaseExampleSelector
from typing import Dict, List
import numpy as np


class CustomExampleSelector(BaseExampleSelector):

    def __init__(self, examples: List[Dict[str, str]]):
        self.examples = examples

    def add_example(self, example: Dict[str, str]) -> None:
        """Add new example to store for a key."""
        self.examples.append(example)

    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""
        return np.random.choice(self.examples, size=2, replace=False)

定义好选择器后，就可以使用它了。

examples = [
    {"foo": "1"},
    {"foo": "2"},
    {"foo": "3"}
]

# Initialize example selector.
example_selector = CustomExampleSelector(examples)

# Select examples
example_selector.select_examples({"foo": "foo"})
# -> array([{'foo': '2'}, {'foo': '3'}], dtype=object)

# Add new example to the set of examples
example_selector.add_example({"foo": "4"})
example_selector.examples
# -> [{'foo': '1'}, {'foo': '2'}, {'foo': '3'}, {'foo': '4'}]

# Select examples
example_selector.select_examples({"foo": "foo"})
# -> array([{'foo': '1'}, {'foo': '4'}], dtype=object)

02 长度样例选择器

顾名思义，长度样例选择器就是根据样例的长度来选择样例，这适用于prompt过长会超过上下文窗口长度的情况。如果用户输入内容比较长，那么就会选择更少的样例，如果用户输入内容比较短，那么就会选择更多的样例。

LengthBasedExampleSelector是LangChain提供的长度选择器。

from langchain.prompts import PromptTemplate
from langchain.prompts import FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector


# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
    # The examples it has available to choose from.
    examples=examples, 
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt, 
    # The maximum length that the formatted examples should be.
    # Length is measured by the get_text_length function below.
    max_length=25,
    # The function used to get the length of a string, which is used
    # to determine which examples to include. It is commented out because
    # it is provided as a default value if none is specified.
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:", 
    input_variables=["adjective"],
)

用户输入内容比较短时，选择了所有样例。

# An example with small input, so it selects all examples.
print(dynamic_prompt.format(adjective="big"))


  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: tall
  Output: short
  
  Input: energetic
  Output: lethargic
  
  Input: sunny
  Output: gloomy
  
  Input: windy
  Output: calm
  
  Input: big
  Output:

用户输入较长时，只选择了一个样例。

# An example with long input, so it selects only one example.
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))


  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
  Output:

同时，你可以动态增加样例。


# You can add an example to an example selector as well.
new_example = {"input": "big", "output": "small"}
dynamic_prompt.example_selector.add_example(new_example)
print(dynamic_prompt.format(adjective="enthusiastic"))

  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: tall
  Output: short
  
  Input: energetic
  Output: lethargic
  
  Input: sunny
  Output: gloomy
  
  Input: windy
  Output: calm
  
  Input: big
  Output: small
  
  Input: enthusiastic
  Output:

03 MMR样例选择器

MMR(maximal marginal relevance)样例选择器的意思是：选择一组样例，既保证这些样例与用户输入是相似的，同时也要保证样例的多样性。它是如何做到的呢？主要从两个方面实现的：

1. 相似度：通过embeddings计算样本和用户输入余弦相似度，从样本中选择相似度高的，从而保持选择的样本和用户输入是相似的。

2. 多样性：当选择一个新样本加入时，如果它与已选择的样本很相似，那么会做一个惩罚计算，从而保证了多样性。

from langchain.prompts.example_selector import (
    MaxMarginalRelevanceExampleSelector,
    SemanticSimilarityExampleSelector,
)
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]


# Input is a feeling, so should select the happy/sad example as the first one
print(mmr_prompt.format(adjective="worried"))

执行代码，输出结果：


  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: windy
  Output: calm
  
  Input: worried
  Output:

04 相似度样例选择器

与MMR样例选择器不同，相似度选择器仅仅根据相似度来选择样例，LangChain提供了SemanticSimilarityExampleSelector可以直接使用，继续用MMR的样例，但选择器用 相似度样例选择器 时，我们看一下选择的样例有什么不同？


# Let's compare this to what we would just get if we went solely off of similarity,
# by using SemanticSimilarityExampleSelector instead of MaxMarginalRelevanceExampleSelector.
example_selector = SemanticSimilarityExampleSelector.from_examples(
    # The list of examples available to select from.
    examples,
    # The embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # The VectorStore class that is used to store the embeddings and do a similarity search over.
    FAISS,
    # The number of examples to produce.
    k=2,
)
similar_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)
print(similar_prompt.format(adjective="worried"))

执行代码，输出结果：


  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: sunny
  Output: gloomy
  
  Input: worried
  Output:

05 n-gram重叠度样例选择器

N-gram重叠度样例选择器本质上也是基于相似度的，只不过它不是通过余弦来计算相似度，而是通过n-gram重叠度分数来计算相似度的，分数处于0到1之间，包含0和1。

n-gram重叠度样例选择器可以设置一个阈值，那么与用户输入重叠度小于等于这个阈值的样例都会被排除掉。这个阈值默认值是-1.0，表示不排除任何样例。

NGramOverlapExampleSelector会根据重叠度选择和排序样例。

from langchain.prompts import PromptTemplate
from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "See Spot run.", "output": "Ver correr a Spot."},
    {"input": "My dog barks.", "output": "Mi perro ladra."},
    {"input": "Spot can run.", "output": "Spot puede correr."},
]


example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = NGramOverlapExampleSelector(
    # The examples it has available to choose from.
    examples=examples,
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # The threshold, at which selector stops.
    # It is set to -1.0 by default.
    threshold=-1.0,
    # For negative threshold:
    # Selector sorts examples by ngram overlap score, and excludes none.
    # For threshold greater than 1.0:
    # Selector excludes all examples, and returns an empty list.
    # For threshold equal to 0.0:
    # Selector sorts examples by ngram overlap score,
    # and excludes those with no ngram overlap with input.
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the Spanish translation of every input",
    suffix="Input: {sentence}\nOutput:",
    input_variables=["sentence"],
)


# An example input with large ngram overlap with "Spot can run."
# and no overlap with "My dog barks."
print(dynamic_prompt.format(sentence="Spot can run fast."))

执行代码，输出结果：


  Give the Spanish translation of every input
  
  Input: Spot can run.
  Output: Spot puede correr.
  
  Input: See Spot run.
  Output: Ver correr a Spot.
  
  Input: My dog barks.
  Output: Mi perro ladra.
  
  Input: Spot can run fast.
  Output:

由于阈值默认是-1.0，所以会选择所有样例，并排序；当然，你也可以动态增加样例：


# You can add examples to NGramOverlapExampleSelector as well.
new_example = {"input": "Spot plays fetch.", "output": "Spot juega a buscar."}

example_selector.add_example(new_example)
print(dynamic_prompt.format(sentence="Spot can run fast."))

把阈值设置为0时：


# You can set a threshold at which examples are excluded.
# For example, setting threshold equal to 0.0
# excludes examples with no ngram overlaps with input.
# Since "My dog barks." has no ngram overlaps with "Spot can run fast."
# it is excluded.
example_selector.threshold = 0.0
print(dynamic_prompt.format(sentence="Spot can run fast."))


  Give the Spanish translation of every input
  
  Input: Spot can run.
  Output: Spot puede correr.
  
  Input: See Spot run.
  Output: Ver correr a Spot.
  
  Input: Spot plays fetch.
  Output: Spot juega a buscar.
  
  Input: Spot can run fast.
  Output:

当阈值设置为0.09时：

# Setting small nonzero threshold
example_selector.threshold = 0.09
print(dynamic_prompt.format(sentence="Spot can play fetch."))


  Give the Spanish translation of every input
  
  Input: Spot can run.
  Output: Spot puede correr.
  
  Input: Spot plays fetch.
  Output: Spot juega a buscar.
  
  Input: Spot can play fetch.
  Output:

当阈值设置为大于1时：


# Setting threshold greater than 1.0
example_selector.threshold = 1.0 + 1e-9
print(dynamic_prompt.format(sentence="Spot can play fetch."))

  Give the Spanish translation of every input
    
  Input: Spot can play fetch.
  Output:

此时所有样例都会被排除。

本文小结

本文主要介绍了几种样例选择器的用法和区别，在不同的业务场景，我们可以选择合适的样例选择器来提高少样本prompt的质量。

公众号：大白爱爬山

大白爱爬山

关注

9
点赞
踩
30

收藏

觉得还不错? 一键收藏
0
评论
【LangChain系列 13】样例选择器

本文主要介绍了几种样例选择器的用法和区别，在不同的业务场景，我们可以选择合适的样例选择器来提高少样本prompt的质量。
复制链接

扫一扫

【LangChain系列 13】样例选择器

本文速读

01 自定义样例选择器

02 长度样例选择器

03 MMR样例选择器

04 相似度样例选择器

05 n-gram重叠度样例选择器

本文小结

公众号：大白爱爬山

“相关推荐”对你有帮助么？