大语言模型的格式化输出原理思考->构建输出解释器output Parsers

输出解释器 :output Parsers

下面我们介绍下输出解析(Output Parsers)的使用场景。假如我们现在有到一些商品的评价,想要对这些评价做一些处理从而可以更好的分析数据。比如,我们期望可以根据输入用户的商品评价,输出对应的 JOSN 数据,包含以下字段信息:

gift:是否是把商品当作礼物, bool 类型;
delivery_days:商品配送时间,没有的话返回 -1 ;

  "gift": False,
  "delivery_days": 5,
  "price_value": "pretty affordable!"
接下来我们先来看看langchain给出的output Parsers解决方案,在本文的末尾,笔者会带着大家根据其原理,自己构建一个output Parsers json输出解释器
!pip install langchain
!pip install openai
!pip install langchain_openai
from langchain.llms import OpenAI
# from langchain.chat_models import ChatOpenAI
from langchain_openai import ChatOpenAI
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser
import os

os.environ['HTTP_PROXY'] = ''
os.environ['HTTPS_PROXY'] = ''
customer_review = """\
This leaf blower is pretty amazing.  It has four settings:\
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversary present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.

review_template = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

Format the output as JSON with the following keys:

text: {text}
from langchain.prompts import ChatPromptTemplate

llm_model = 'gpt-3.5-turbo-0301'

# 创建chatPromptTemplate
prompt_template = ChatPromptTemplate.from_template(review_template)

messages = prompt_template.format_messages(text=customer_review)

# 创建LLM
chat = ChatOpenAI(temperature=0.0, model=llm_model, openai_api_key=api_key)

# 请求
response = chat(messages)

input_variables=['text'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['text'], template='For the following text, extract the following information:\n\ngift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.\n\ndelivery_days: How many days did it take for the product to arrive? If this information is not found, output -1.\n\nprice_value: Extract any sentences about the value or price,and output them as a comma separated Python list.\n\nFormat the output as JSON with the following keys:\ngift\ndelivery_days\nprice_value\n\ntext: {text}\n'))]
<class 'str'>
    "gift": true,
    "delivery_days": 2,
    "price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]

AttributeError                            Traceback (most recent call last)

Cell In[73], line 1
----> 1 response.content.get('gift')

AttributeError: 'str' object has no attribute 'get'
error :我们不能直接对response返回的结果直接操纵,目前它还只是一个字符串
import json
output_dict = json.loads(response.content)
{'gift': True, 'delivery_days': 2, 'price_value': ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]}





Parse the LLM output string into a Python dictionary

from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser
gift_schema = ResponseSchema(name="gift",
                             description="Was the item purchased\
                             as a gift for someone else? \
                             Answer True if yes,\
                             False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
                                      description="How many days\
                                      did it take for the product\
                                      to arrive? If this \
                                      information is not found,\
                                      output -1.")
price_value_schema = ResponseSchema(name="price_value",
                                    description="Extract any\
                                    sentences about the value or \
                                    price, and output them as a \
                                    comma separated Python list.")

response_schemas = [gift_schema, 


[ResponseSchema(name='gift', description='Was the item purchased                             as a gift for someone else?                              Answer True if yes,                             False if not or unknown.', type='string'),
 ResponseSchema(name='delivery_days', description='How many days                                      did it take for the product                                      to arrive? If this                                       information is not found,                                      output -1.', type='string'),
 ResponseSchema(name='price_value', description='Extract any                                    sentences about the value or                                     price, and output them as a                                     comma separated Python list.', type='string')]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

	"gift": string  // Was the item purchased                             as a gift for someone else?                              Answer True if yes,                             False if not or unknown.
	"delivery_days": string  // How many days                                      did it take for the product                                      to arrive? If this                                       information is not found,                                      output -1.
	"price_value": string  // Extract any                                    sentences about the value or                                     price, and output them as a                                     comma separated Python list.
review_template_2 = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: {text}


prompt = ChatPromptTemplate.from_template(template=review_template_2)

messages = prompt.format_messages(text=customer_review, 
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the productto arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,and output them as a comma separated Python list.

text: This leaf blower is pretty amazing.  It has four settings:candle blower, gentle breeze, windy city, and tornado. It arrived in two days, just in time for my wife's anniversary present. I think my wife liked it so much she was speechless. So far I've been the only one using it, and I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.

​ The output should be a markdown code snippet formatted in the following schema, including the leading and trailing “json" and "”:

	"gift": string  // Was the item purchased                             as a gift for someone else?                              Answer True if yes,                             False if not or unknown.
	"delivery_days": string  // How many days                                      did it take for the product                                      to arrive? If this                                       information is not found,                                      output -1.
	"price_value": string  // Extract any                                    sentences about the value or                                     price, and output them as a                                     comma separated Python list.
response = chat(messages)
	"gift": true,
	"delivery_days": "2",
	"price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]


<class 'str'>
json_str = json.loads(response.content) 
{'gift': True, 'delivery_days': 2, 'price_value': ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]}
output_dict = output_parser.parse(response.content)
{'gift': True,
 'delivery_days': 2,
 'price_value': ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]}


# 导入LLM
from langchain_openai import ChatOpenAI

# 创建一个ChatMessage的列表
from langchain.schema import HumanMessage

# 创建LLM
chat = ChatOpenAI(temperature=0.0, model=llm_model, openai_api_key=api_key)
text = """\
This leaf blower is pretty amazing.  It has four settings:\
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversary present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.

format_instructions = """\
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

	"gift": string  // Was the item purchased as a gift for someone else?  Answer True if yes,  False if not or unknown.
	"delivery_days": string  // How many days did it take for the product to arrive? If this information is not found,                                      output -1.
	"price_value": string  // Extract any sentences about the value or price, and output them as a  comma separated Python list.


review_template = f"“”
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else?
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,
and output them as a comma separated Python list.

Format the output as JSON with the following keys:

text: {text}



    For the following text, extract the following information:
    gift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.
    delivery_days: How many days did it take for the product to arrive? If this information is not found, output -1.
    price_value: Extract any sentences about the value or price,and output them as a comma separated Python list.
    Format the output as JSON with the following keys:
    text: This leaf blower is pretty amazing.  It has four settings:candle blower, gentle breeze, windy city, and tornado. It arrived in two days, just in time for my wife's anniversary present. I think my wife liked it so much she was speechless. So far I've been the only one using it, and I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.
    The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":
    	"gift": string  // Was the item purchased as a gift for someone else?  Answer True if yes,  False if not or unknown.
    	"delivery_days": string  // How many days did it take for the product to arrive? If this information is not found,                                      output -1.
    	"price_value": string  // Extract any sentences about the value or price, and output them as a  comma separated Python list.

# 构建一个chatMessage列表
messages = [HumanMessage(content=review_template)]
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,and output them as a comma separated Python list.

Format the output as JSON with the following keys:

text: This leaf blower is pretty amazing.  It has four settings:candle blower, gentle breeze, windy city, and tornado. It arrived in two days, just in time for my wife's anniversary present. I think my wife liked it so much she was speechless. So far I've been the only one using it, and I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.

​ The output should be a markdown code snippet formatted in the following schema, including the leading and trailing “json" and "”:

	"gift": string  // Was the item purchased as a gift for someone else?  Answer True if yes,  False if not or unknown.
	"delivery_days": string  // How many days did it take for the product to arrive? If this information is not found,                                      output -1.
	"price_value": string  // Extract any sentences about the value or price, and output them as a  comma separated Python list.

response = chat(messages)
	"gift": true,
	"delivery_days": 2,
	"price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]


import re

def parse_json(input_str: str):
   # 定义正则表达式模式,用于匹配以```json开头,以```结尾的字符串
    pattern = r'```json(.*?)```'
    # 使用正则表达式搜索匹配的内容
    match = re.search(pattern, input_str, re.DOTALL)

    if match:
        # 提取匹配到的JSON字符串
        json_str = json.loads(match.group(1))
        return json_str
        return None
input_str = parse_json(response.content)
{'gift': True, 'delivery_days': 2, 'price_value': ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]}
<class 'dict'>
["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]

大公告成,我们可以基于prompt的原理+正则表达式的提取器来实现一个LLM输出解释器output Parsers






