Semantic Similarity Example Selector in langchain

https://python.langchain.com.cn/docs/modules/model_io/prompts/example_selectors/similarity

Semantic Similarity Example Selector

This content is based on LangChain’s official documentation (langchain.com.cn) and explains the SemanticSimilarityExampleSelector—a tool that selects examples by semantic similarity to the input—in simplified terms. It strictly preserves all original source codes, examples, and knowledge points without any additions or modifications.

1. What is SemanticSimilarityExampleSelector?

This selector chooses examples based on semantic similarity (meaning-based similarity) to the user’s input.

  • It converts both the input and examples into numerical representations called “embeddings” (using an embedding model like OpenAIEmbeddings).
  • It measures similarity using cosine similarity (a method to calculate how closely two embeddings align).
  • It retrieves the top k (specified number) examples with the highest similarity to the input.
  • It uses a VectorStore (e.g., Chroma) to store and search embeddings efficiently.

2. Step 1: Import Required Modules

The code below imports all necessary LangChain classes—exactly as in the original documentation:

from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

3. Step 2: Prepare Examples

We use the same “creating antonyms” example list from the original text:

# These are a lot of examples of a pretend task of creating antonyms.
examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
    {"input": "energetic", "output": "lethargic"},
    {"input": "sunny", "output": "gloomy"},
    {"input": "windy", "output": "calm"},
]

4. Step 3: Create example_prompt

This PromptTemplate defines how each example is formatted (matching the original structure):

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

5. Step 4: Initialize the Semantic Similarity Selector

Configure the selector with examples, embedding model, vector store, and the number of examples to select (k=1). The code is identical to the original:

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples, 
    # This is the embedding class used to produce embeddings for similarity measurement.
    OpenAIEmbeddings(), 
    # This is the VectorStore class that stores embeddings and enables similarity search.
    Chroma, 
    # This is the number of examples to retrieve (top k most similar).
    k=1
)

When running the code, the following log (from the original documentation) may appear (it indicates Chroma is running in local in-memory mode):

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.

6. Step 5: Create a Dynamic Prompt

Combine the selector with a prefix (instruction) and suffix (user input placeholder) using FewShotPromptTemplate:

similar_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of direct examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Input: {adjective}\nOutput:", 
    input_variables=["adjective"],
)

7. Step 6: Test the Selector (3 Scenarios)

We test the selector with different inputs—exactly as in the original documentation, including code and outputs.

Scenario 1: Input is a Feeling ("worried")

The input "worried" (a feeling) is most similar to "happy" (also a feeling). The selector retrieves the happy→sad example.

Code:

# Input is a feeling, so should select the happy/sad example
print(similar_prompt.format(adjective="worried"))

Output (exact as original):

Give the antonym of every input
    
    Input: happy
    Output: sad
    
    Input: worried
    Output:

Scenario 2: Input is a Measurement ("fat")

The input "fat" (a physical measurement) is most similar to "tall" (also a physical measurement). The selector retrieves the tall→short example.

Code:

# Input is a measurement, so should select the tall/short example
print(similar_prompt.format(adjective="fat"))

Output (exact as original):

Give the antonym of every input
    
    Input: tall
    Output: short
    
    Input: fat
    Output:

Scenario 3: Add a New Example

You can add new examples to the selector using add_example(). The selector will now include the new example in similarity searches.

Code:

# You can add new examples to the SemanticSimilarityExampleSelector as well
similar_prompt.example_selector.add_example({"input": "enthusiastic", "output": "apathetic"})

# Test with a new feeling-related input ("joyful")
print(similar_prompt.format(adjective="joyful"))

Output (exact as original—retrieves the most similar happy→sad example):

Give the antonym of every input
    
    Input: happy
    Output: sad
    
    Input: joyful
    Output:

Would you like me to generate a simplified cheat sheet for SemanticSimilarityExampleSelector key parameters, summarizing their roles and default values?

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值