Ngram Overlap Example Selector in langchain

https://python.langchain.com.cn/docs/modules/model_io/prompts/example_selectors/ngram_overlap

NGram Overlap Example Selector

This content is based on LangChain’s official documentation (langchain.com.cn) and explains the NGramOverlapExampleSelector—a tool that selects examples by n-gram overlap with the input—in simplified terms. It strictly preserves all original source codes, examples, and knowledge points without any additions or modifications.

1. What is NGramOverlapExampleSelector?

NGramOverlapExampleSelector selects and sorts examples based on n-gram overlap with the user’s input.

  • An n-gram is a sequence of words (e.g., “Spot can” is a 2-gram, “run fast” is a 2-gram).
  • The n-gram overlap score ranges from 0.0 (no overlap) to 1.0 (full overlap).
  • It uses a threshold to filter examples:
    • Default threshold: -1.0 (no examples excluded—only sorted by overlap score).
    • Threshold 0.0: Excludes examples with no n-gram overlap with the input.
    • Threshold >1.0: Excludes all examples (returns an empty list).

2. Step 1: Import Required Modules

The code below imports all necessary LangChain classes—exactly as in the original documentation:

from langchain.prompts import PromptTemplate
from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

3. Step 2: Prepare Examples

We use two sets of examples from the original text, with the final implementation focusing on a fictional Spanish translation task (as in the original):

Initial Antonym Examples (mentioned but not used later):

# These are a lot of examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

Final Translation Task Examples (used for all tests):

# These are examples of a fictional translation task.
examples = [
    {"input": "See Spot run.", "output": "Ver correr a Spot."},
    {"input": "My dog barks.", "output": "Mi perro ladra."},
    {"input": "Spot can run.", "output": "Spot puede correr."},
]

4. Step 3: Create example_prompt

This PromptTemplate defines how each example is formatted (matching the original structure):

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

5. Step 4: Initialize NGramOverlapExampleSelector

Configure the selector with examples, formatting prompt, and default threshold (-1.0). The comments are kept exactly as in the original:

example_selector = NGramOverlapExampleSelector(
    # These are the examples it has available to choose from.
    examples=examples,
    # This is the PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # This is the threshold, at which selector stops.
    # It is set to -1.0 by default.
    threshold=-1.0,
    # For negative threshold:
    # Selector sorts examples by ngram overlap score, and excludes none.
    # For threshold greater than 1.0:
    # Selector excludes all examples, and returns an empty list.
    # For threshold equal to 0.0:
    # Selector sorts examples by ngram overlap score,
    # and excludes those with no ngram overlap with input.
)

6. Step 5: Create Dynamic Prompt

Combine the selector with a prefix (instruction) and suffix (user input placeholder) using FewShotPromptTemplate:

dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the Spanish translation of every input",
    suffix="Input: {sentence}\nOutput:",
    input_variables=["sentence"],
)

7. Step 6: Test the Selector (5 Scenarios)

We test the selector with different thresholds and after adding a new example—exactly as in the original documentation, including code and outputs.

Scenario 1: Default Threshold (-1.0)

No examples are excluded; they are sorted by n-gram overlap with the input ("Spot can run fast.").

Code:

# An example input with large ngram overlap with "Spot can run."
# and no overlap with "My dog barks."
print(dynamic_prompt.format(sentence="Spot can run fast."))

Output (exact as original):

Give the Spanish translation of every input
Input: Spot can run.
Output: Spot puede correr.
Input: See Spot run.
Output: Ver correr a Spot.
Input: My dog barks.
Output: Mi perro ladra.
Input: Spot can run fast.
Output:

Scenario 2: Add a New Example

Use add_example() to append a new translation example. The selector will now include it in sorting.

Code:

# You can add examples to NGramOverlapExampleSelector as well.
new_example = {"input": "Spot plays fetch.", "output": "Spot juega a buscar."}
example_selector.add_example(new_example)

print(dynamic_prompt.format(sentence="Spot can run fast."))

Output (exact as original—new example added):

Give the Spanish translation of every input
Input: Spot can run.
Output: Spot puede correr.
Input: See Spot run.
Output: Ver correr a Spot.
Input: Spot plays fetch.
Output: Spot juega a buscar.
Input: My dog barks.
Output: Mi perro ladra.
Input: Spot can run fast.
Output:

Scenario 3: Threshold = 0.0

Excludes examples with no n-gram overlap with the input ("Spot can run fast."). “My dog barks.” has no overlap and is excluded.

Code:

# You can set a threshold at which examples are excluded.
# For example, setting threshold equal to 0.0
# excludes examples with no ngram overlaps with input.
# Since "My dog barks." has no ngram overlaps with "Spot can run fast."
# it is excluded.
example_selector.threshold = 0.0

print(dynamic_prompt.format(sentence="Spot can run fast."))

Output (exact as original):

Give the Spanish translation of every input
Input: Spot can run.
Output: Spot puede correr.
Input: See Spot run.
Output: Ver correr a Spot.
Input: Spot plays fetch.
Output: Spot juega a buscar.
Input: Spot can run fast.
Output:

Scenario 4: Threshold = 0.09

A small nonzero threshold filters examples with very low overlap. Only examples with sufficient overlap are included.

Code:

# Setting small nonzero threshold
example_selector.threshold = 0.09

print(dynamic_prompt.format(sentence="Spot can play fetch."))

Output (exact as original):

Give the Spanish translation of every input
Input: Spot can run.
Output: Spot puede correr.
Input: Spot plays fetch.
Output: Spot juega a buscar.
Input: Spot can play fetch.
Output:

Scenario 5: Threshold > 1.0

Any threshold greater than 1.0 excludes all examples (returns an empty list of examples).

Code:

# Setting threshold greater than 1.0
example_selector.threshold = 1.0 + 1e-9

print(dynamic_prompt.format(sentence="Spot can play fetch."))

Output (exact as original):

Give the Spanish translation of every input
Input: Spot can play fetch.
Output:
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值