【LangChain系列 13】样例选择器

本文速读

  • 自定义样例选择器

  • 长度选样例择器

  • MMR样例选择器

  • n-gram重叠度样例选择器

  • 相似度样例选择器

在上一篇【LangChain系列 8】Prompt模版——少样本prompt模版(二)中,介绍动态少样本prompt模版的时候,根据输入的内容,样例选择器 从所有样例中动态地选择部分样例,这部分样例与输入内容更加相关,这样语言模型能更好地理解prompt,从而给出更好的回答。

本文将介绍五种样例选择器的用法:

  • 自定义样例选择器

  • 长度样例选择器

  • MMR样例选择器

  • n-gram重叠度样例选择器

  • 相似度样例选择器

01 自定义样例选择器


自定义样例选择器是一种常见的操作,因为业务逻辑千变万化,通过自定义样例选择器可以更加灵活地选择选择样例。

一个样例选择器至少要实现两个方法:

1. add_example方法:接收一个样例,然后把它传递给样例选择器。

2. select_examples方法:接收用户输入变量,然后返回一个样例列表给 少样本prompt模版 使用。

下面我们来实现一个随机选择两个样例的选择器。


from langchain.prompts.example_selector.base import BaseExampleSelector
from typing import Dict, List
import numpy as np


class CustomExampleSelector(BaseExampleSelector):

    def __init__(self, examples: List[Dict[str, str]]):
        self.examples = examples

    def add_example(self, example: Dict[str, str]) -> None:
        """Add new example to store for a key."""
        self.examples.append(example)

    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""
        return np.random.choice(self.examples, size=2, replace=False)

定义好选择器后,就可以使用它了。

examples = [
    {"foo": "1"},
    {"foo": "2"},
    {"foo": "3"}
]

# Initialize example selector.
example_selector = CustomExampleSelector(examples)

# Select examples
example_selector.select_examples({"foo": "foo"})
# -> array([{'foo': '2'}, {'foo': '3'}], dtype=object)

# Add new example to the set of examples
example_selector.add_example({"foo": "4"})
example_selector.examples
# -> [{'foo': '1'}, {'foo': '2'}, {'foo': '3'}, {'foo': '4'}]

# Select examples
example_selector.select_examples({"foo": "foo"})
# -> array([{'foo': '1'}, {'foo': '4'}], dtype=object)

02 长度样例选择器


顾名思义,长度样例选择器就是根据样例的长度来选择样例,这适用于prompt过长会超过上下文窗口长度的情况。如果用户输入内容比较长,那么就会选择更少的样例,如果用户输入内容比较短,那么就会选择更多的样例。

LengthBasedExampleSelector是LangChain提供的长度选择器。

from langchain.prompts import PromptTemplate
from langchain.prompts import FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector


# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = LengthBasedExampleSelector(
    # The examples it has available to choose from.
    examples=examples, 
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt, 
    # The maximum length that the formatted examples should be.
    # Length is measured by the get_text_length function below.
    max_length=25,
    # The function used to get the length of a string, which is used
    # to determine which examples to include. It is commented out because
    # it is provided as a default value if none is specified.
    # get_text_length: Callable[[str], int] = lambda x: len(re.split("\n| ", x))
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:", 
    input_variables=["adjective"],
)

用户输入内容比较短时,选择了所有样例。

# An example with small input, so it selects all examples.
print(dynamic_prompt.format(adjective="big"))

  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: tall
  Output: short
  
  Input: energetic
  Output: lethargic
  
  Input: sunny
  Output: gloomy
  
  Input: windy
  Output: calm
  
  Input: big
  Output:

用户输入较长时,只选择了一个样例。

# An example with long input, so it selects only one example.
long_string = "big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else"
print(dynamic_prompt.format(adjective=long_string))

  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: big and huge and massive and large and gigantic and tall and much much much much much bigger than everything else
  Output:

同时,你可以动态增加样例。


# You can add an example to an example selector as well.
new_example = {"input": "big", "output": "small"}
dynamic_prompt.example_selector.add_example(new_example)
print(dynamic_prompt.format(adjective="enthusiastic"))
  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: tall
  Output: short
  
  Input: energetic
  Output: lethargic
  
  Input: sunny
  Output: gloomy
  
  Input: windy
  Output: calm
  
  Input: big
  Output: small
  
  Input: enthusiastic
  Output:

03 MMR样例选择器


MMR(maximal marginal relevance)样例选择器的意思是:选择一组样例,既保证这些样例与用户输入是相似的,同时也要保证样例的多样性。它是如何做到的呢?主要从两个方面实现的:

1. 相似度:通过embeddings计算样本和用户输入余弦相似度,从样本中选择相似度高的,从而保持选择的样本和用户输入是相似的。

2. 多样性:当选择一个新样本加入时,如果它与已选择的样本很相似,那么会做一个惩罚计算,从而保证了多样性。

from langchain.prompts.example_selector import (
    MaxMarginalRelevanceExampleSelector,
    SemanticSimilarityExampleSelector,
)
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

# Input is a feeling, so should select the happy/sad example as the first one
print(mmr_prompt.format(adjective="worried"))

执行代码,输出结果:


  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: windy
  Output: calm
  
  Input: worried
  Output:

04 相似度样例选择器


与MMR样例选择器不同,相似度选择器仅仅根据相似度来选择样例,LangChain提供了SemanticSimilarityExampleSelector可以直接使用,继续用MMR的样例,但选择器用 相似度样例选择器 时,我们看一下选择的样例有什么不同?


# Let's compare this to what we would just get if we went solely off of similarity,
# by using SemanticSimilarityExampleSelector instead of MaxMarginalRelevanceExampleSelector.
example_selector = SemanticSimilarityExampleSelector.from_examples(
    # The list of examples available to select from.
    examples,
    # The embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # The VectorStore class that is used to store the embeddings and do a similarity search over.
    FAISS,
    # The number of examples to produce.
    k=2,
)
similar_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)
print(similar_prompt.format(adjective="worried"))

执行代码,输出结果:


  Give the antonym of every input
  
  Input: happy
  Output: sad
  
  Input: sunny
  Output: gloomy
  
  Input: worried
  Output:

05 n-gram重叠度样例选择器


N-gram重叠度样例选择器本质上也是基于相似度的,只不过它不是通过余弦来计算相似度,而是通过n-gram重叠度分数来计算相似度的,分数处于0到1之间,包含0和1。

n-gram重叠度样例选择器可以设置一个阈值,那么与用户输入重叠度小于等于这个阈值的样例都会被排除掉。这个阈值默认值是-1.0,表示不排除任何样例。

NGramOverlapExampleSelector会根据重叠度选择和排序样例。

from langchain.prompts import PromptTemplate
from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

# Examples of a pretend task of creating antonyms.
examples = [
    {"input": "See Spot run.", "output": "Ver correr a Spot."},
    {"input": "My dog barks.", "output": "Mi perro ladra."},
    {"input": "Spot can run.", "output": "Spot puede correr."},
]

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)
example_selector = NGramOverlapExampleSelector(
    # The examples it has available to choose from.
    examples=examples,
    # The PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # The threshold, at which selector stops.
    # It is set to -1.0 by default.
    threshold=-1.0,
    # For negative threshold:
    # Selector sorts examples by ngram overlap score, and excludes none.
    # For threshold greater than 1.0:
    # Selector excludes all examples, and returns an empty list.
    # For threshold equal to 0.0:
    # Selector sorts examples by ngram overlap score,
    # and excludes those with no ngram overlap with input.
)
dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the Spanish translation of every input",
    suffix="Input: {sentence}\nOutput:",
    input_variables=["sentence"],
)

# An example input with large ngram overlap with "Spot can run."
# and no overlap with "My dog barks."
print(dynamic_prompt.format(sentence="Spot can run fast."))

执行代码,输出结果:


  Give the Spanish translation of every input
  
  Input: Spot can run.
  Output: Spot puede correr.
  
  Input: See Spot run.
  Output: Ver correr a Spot.
  
  Input: My dog barks.
  Output: Mi perro ladra.
  
  Input: Spot can run fast.
  Output:

由于阈值默认是-1.0,所以会选择所有样例,并排序;当然,你也可以动态增加样例:


# You can add examples to NGramOverlapExampleSelector as well.
new_example = {"input": "Spot plays fetch.", "output": "Spot juega a buscar."}

example_selector.add_example(new_example)
print(dynamic_prompt.format(sentence="Spot can run fast."))

把阈值设置为0时:


# You can set a threshold at which examples are excluded.
# For example, setting threshold equal to 0.0
# excludes examples with no ngram overlaps with input.
# Since "My dog barks." has no ngram overlaps with "Spot can run fast."
# it is excluded.
example_selector.threshold = 0.0
print(dynamic_prompt.format(sentence="Spot can run fast."))

  Give the Spanish translation of every input
  
  Input: Spot can run.
  Output: Spot puede correr.
  
  Input: See Spot run.
  Output: Ver correr a Spot.
  
  Input: Spot plays fetch.
  Output: Spot juega a buscar.
  
  Input: Spot can run fast.
  Output:

当阈值设置为0.09时:

# Setting small nonzero threshold
example_selector.threshold = 0.09
print(dynamic_prompt.format(sentence="Spot can play fetch."))

  Give the Spanish translation of every input
  
  Input: Spot can run.
  Output: Spot puede correr.
  
  Input: Spot plays fetch.
  Output: Spot juega a buscar.
  
  Input: Spot can play fetch.
  Output:

当阈值设置为大于1时:


# Setting threshold greater than 1.0
example_selector.threshold = 1.0 + 1e-9
print(dynamic_prompt.format(sentence="Spot can play fetch."))
  Give the Spanish translation of every input
    
  Input: Spot can play fetch.
  Output:

此时所有样例都会被排除。

本文小结

本文主要介绍了几种样例选择器的用法和区别,在不同的业务场景,我们可以选择合适的样例选择器来提高少样本prompt的质量。

公众号:大白爱爬山

  • 9
    点赞
  • 30
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值