Semantic Similarity Example Selector in langchain

最新推荐文章于 2025-11-19 22:19:32 发布

原创最新推荐文章于 2025-11-19 22:19:32 发布 · 806 阅读

14 ·

CC 4.0 BY-SA版权

文章标签：

#langchain #windows

Langchain 专栏收录该内容

77 篇文章

订阅专栏

https://python.langchain.com.cn/docs/modules/model_io/prompts/example_selectors/similarity

Semantic Similarity Example Selector

This content is based on LangChain’s official documentation (langchain.com.cn) and explains the SemanticSimilarityExampleSelector—a tool that selects examples by semantic similarity to the input—in simplified terms. It strictly preserves all original source codes, examples, and knowledge points without any additions or modifications.

1. What is `SemanticSimilarityExampleSelector`?

This selector chooses examples based on semantic similarity (meaning-based similarity) to the user’s input.

It converts both the input and examples into numerical representations called “embeddings” (using an embedding model like OpenAIEmbeddings).
It measures similarity using cosine similarity (a method to calculate how closely two embeddings align).
It retrieves the top k (specified number) examples with the highest similarity to the input.
It uses a VectorStore (e.g., Chroma) to store and search embeddings efficiently.

2. Step 1: Import Required Modules

The code below imports all necessary LangChain classes—exactly as in the original documentation:

from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

3. Step 2: Prepare Examples

We use the same “creating antonyms” example list from the original text:

# These are a lot of examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

4. Step 3: Create `example_prompt`

This PromptTemplate defines how each example is formatted (matching the original structure):

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

5. Step 4: Initialize the Semantic Similarity Selector

Configure the selector with examples, embedding model, vector store, and the number of examples to select (k=1). The code is identical to the original:

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples, 
    # This is the embedding class used to produce embeddings for similarity measurement.
    OpenAIEmbeddings(), 
    # This is the VectorStore class that stores embeddings and enables similarity search.
    Chroma, 
    # This is the number of examples to retrieve (top k most similar).
    k=1
)

When running the code, the following log (from the original documentation) may appear (it indicates Chroma is running in local in-memory mode):

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.

6. Step 5: Create a Dynamic Prompt

Combine the selector with a prefix (instruction) and suffix (user input placeholder) using FewShotPromptTemplate:

similar_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of direct examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:", 
    input_variables=["adjective"],
)

7. Step 6: Test the Selector (3 Scenarios)

We test the selector with different inputs—exactly as in the original documentation, including code and outputs.

Scenario 1: Input is a Feeling (`"worried"`)

The input "worried" (a feeling) is most similar to "happy" (also a feeling). The selector retrieves the happy→sad example.

Code:

# Input is a feeling, so should select the happy/sad example
print(similar_prompt.format(adjective="worried"))

Output (exact as original):

Give the antonym of every input
    
    Input: happy
    Output: sad
    
    Input: worried
    Output:

Scenario 2: Input is a Measurement (`"fat"`)

The input "fat" (a physical measurement) is most similar to "tall" (also a physical measurement). The selector retrieves the tall→short example.

Code:

# Input is a measurement, so should select the tall/short example
print(similar_prompt.format(adjective="fat"))

Output (exact as original):

Give the antonym of every input
    
    Input: tall
    Output: short
    
    Input: fat
    Output:

Scenario 3: Add a New Example

You can add new examples to the selector using add_example(). The selector will now include the new example in similarity searches.

Code:

# You can add new examples to the SemanticSimilarityExampleSelector as well
similar_prompt.example_selector.add_example({"input": "enthusiastic", "output": "apathetic"})

# Test with a new feeling-related input ("joyful")
print(similar_prompt.format(adjective="joyful"))

Output (exact as original—retrieves the most similar happy→sad example):

Give the antonym of every input
    
    Input: happy
    Output: sad
    
    Input: joyful
    Output:

Would you like me to generate a simplified cheat sheet for SemanticSimilarityExampleSelector key parameters, summarizing their roles and default values?