Ngram Overlap Example Selector in langchain

最新推荐文章于 2025-11-15 22:58:33 发布

原创最新推荐文章于 2025-11-15 22:58:33 发布 · 627 阅读

15 ·

CC 4.0 BY-SA版权

文章标签：

#langchain #windows

Langchain 专栏收录该内容

77 篇文章

订阅专栏

https://python.langchain.com.cn/docs/modules/model_io/prompts/example_selectors/ngram_overlap

NGram Overlap Example Selector

This content is based on LangChain’s official documentation (langchain.com.cn) and explains the NGramOverlapExampleSelector—a tool that selects examples by n-gram overlap with the input—in simplified terms. It strictly preserves all original source codes, examples, and knowledge points without any additions or modifications.

1. What is `NGramOverlapExampleSelector`?

NGramOverlapExampleSelector selects and sorts examples based on n-gram overlap with the user’s input.

An n-gram is a sequence of words (e.g., “Spot can” is a 2-gram, “run fast” is a 2-gram).
The n-gram overlap score ranges from 0.0 (no overlap) to 1.0 (full overlap).
It uses a threshold to filter examples:
- Default threshold: -1.0 (no examples excluded—only sorted by overlap score).
- Threshold 0.0: Excludes examples with no n-gram overlap with the input.
- Threshold >1.0: Excludes all examples (returns an empty list).

2. Step 1: Import Required Modules

The code below imports all necessary LangChain classes—exactly as in the original documentation:

from langchain.prompts import PromptTemplate
from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

3. Step 2: Prepare Examples

We use two sets of examples from the original text, with the final implementation focusing on a fictional Spanish translation task (as in the original):

Initial Antonym Examples (mentioned but not used later):

# These are a lot of examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

Final Translation Task Examples (used for all tests):

# These are examples of a fictional translation task.
examples = [
    {"input": "See Spot run.", "output": "Ver correr a Spot."},
    {"input": "My dog barks.", "output": "Mi perro ladra."},
    {"input": "Spot can run.", "output": "Spot puede correr."},
]

4. Step 3: Create `example_prompt`

This PromptTemplate defines how each example is formatted (matching the original structure):

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

5. Step 4: Initialize `NGramOverlapExampleSelector`

Configure the selector with examples, formatting prompt, and default threshold (-1.0). The comments are kept exactly as in the original:

example_selector = NGramOverlapExampleSelector(
    # These are the examples it has available to choose from.
    examples=examples,
    # This is the PromptTemplate being used to format the examples.
    example_prompt=example_prompt,
    # This is the threshold, at which selector stops.
    # It is set to -1.0 by default.
    threshold=-1.0,
    # For negative threshold:
    # Selector sorts examples by ngram overlap score, and excludes none.
    # For threshold greater than 1.0:
    # Selector excludes all examples, and returns an empty list.
    # For threshold equal to 0.0:
    # Selector sorts examples by ngram overlap score,
    # and excludes those with no ngram overlap with input.
)

6. Step 5: Create Dynamic Prompt

Combine the selector with a prefix (instruction) and suffix (user input placeholder) using FewShotPromptTemplate:

dynamic_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the Spanish translation of every input",
    suffix="Input: {sentence}\nOutput:",
    input_variables=["sentence"],
)

7. Step 6: Test the Selector (5 Scenarios)

We test the selector with different thresholds and after adding a new example—exactly as in the original documentation, including code and outputs.

Scenario 1: Default Threshold (`-1.0`)

No examples are excluded; they are sorted by n-gram overlap with the input ("Spot can run fast.").

Code:

# An example input with large ngram overlap with "Spot can run."
# and no overlap with "My dog barks."
print(dynamic_prompt.format(sentence="Spot can run fast."))

Output (exact as original):

Give the Spanish translation of every input
Input: Spot can run.
Output: Spot puede correr.
Input: See Spot run.
Output: Ver correr a Spot.
Input: My dog barks.
Output: Mi perro ladra.
Input: Spot can run fast.
Output:

Scenario 2: Add a New Example

Use add_example() to append a new translation example. The selector will now include it in sorting.

Code:

# You can add examples to NGramOverlapExampleSelector as well.
new_example = {"input": "Spot plays fetch.", "output": "Spot juega a buscar."}
example_selector.add_example(new_example)

print(dynamic_prompt.format(sentence="Spot can run fast."))

Output (exact as original—new example added):

Give the Spanish translation of every input
Input: Spot can run.
Output: Spot puede correr.
Input: See Spot run.
Output: Ver correr a Spot.
Input: Spot plays fetch.
Output: Spot juega a buscar.
Input: My dog barks.
Output: Mi perro ladra.
Input: Spot can run fast.
Output:

Scenario 3: Threshold = `0.0`

Excludes examples with no n-gram overlap with the input ("Spot can run fast."). “My dog barks.” has no overlap and is excluded.

Code:

# You can set a threshold at which examples are excluded.
# For example, setting threshold equal to 0.0
# excludes examples with no ngram overlaps with input.
# Since "My dog barks." has no ngram overlaps with "Spot can run fast."
# it is excluded.
example_selector.threshold = 0.0

print(dynamic_prompt.format(sentence="Spot can run fast."))

Output (exact as original):

Give the Spanish translation of every input
Input: Spot can run.
Output: Spot puede correr.
Input: See Spot run.
Output: Ver correr a Spot.
Input: Spot plays fetch.
Output: Spot juega a buscar.
Input: Spot can run fast.
Output:

Scenario 4: Threshold = `0.09`

A small nonzero threshold filters examples with very low overlap. Only examples with sufficient overlap are included.

Code:

# Setting small nonzero threshold
example_selector.threshold = 0.09

print(dynamic_prompt.format(sentence="Spot can play fetch."))

Output (exact as original):

Give the Spanish translation of every input
Input: Spot can run.
Output: Spot puede correr.
Input: Spot plays fetch.
Output: Spot juega a buscar.
Input: Spot can play fetch.
Output:

Scenario 5: Threshold > `1.0`

Any threshold greater than 1.0 excludes all examples (returns an empty list of examples).

Code:

# Setting threshold greater than 1.0
example_selector.threshold = 1.0 + 1e-9

print(dynamic_prompt.format(sentence="Spot can play fetch."))

Output (exact as original):

Give the Spanish translation of every input
Input: Spot can play fetch.
Output:

Ngram Overlap Example Selector in langchain

NGram Overlap Example Selector

1. What is NGramOverlapExampleSelector?

2. Step 1: Import Required Modules

3. Step 2: Prepare Examples

Initial Antonym Examples (mentioned but not used later):

Final Translation Task Examples (used for all tests):

4. Step 3: Create example_prompt

5. Step 4: Initialize NGramOverlapExampleSelector

6. Step 5: Create Dynamic Prompt

7. Step 6: Test the Selector (5 Scenarios)

Scenario 1: Default Threshold (-1.0)

Scenario 2: Add a New Example

Scenario 3: Threshold = 0.0

Scenario 4: Threshold = 0.09

Scenario 5: Threshold > 1.0

1. What is `NGramOverlapExampleSelector`?

4. Step 3: Create `example_prompt`

5. Step 4: Initialize `NGramOverlapExampleSelector`

Scenario 1: Default Threshold (`-1.0`)

Scenario 3: Threshold = `0.0`

Scenario 4: Threshold = `0.09`

Scenario 5: Threshold > `1.0`