Azure OpenAI + Cognitive Search GPT处理自有数据（三）

最新推荐文章于 2025-05-01 15:48:49 发布

东临碣石82

最新推荐文章于 2025-05-01 15:48:49 发布

阅读量480

点赞数

文章标签： gpt

本文链接：https://blog.csdn.net/m0_66899341/article/details/133677119

版权

打定思路后针对原始PDF文件进行了处理，完全转化成word文档格式，其中的格式为图片的表格以及说明等都进行了相应的文字转换，接下来我们进行了代码实现。实现思路主要有三种：

1. Playground里提示代码使用adapter方式允许在completion里指定Cognitive Search为datasource，不过这个思路还需要进一步完善，通过这个方案返回来的信息并没有被组装成具有自然语义的内容，进一步把这个信息和Promp再送给completion进行正常处理时遇到“The extensions chat completions operation must have at least one extension”错误，核心原因在于前面的adapter修改了completion的请求目标URL，后续如何进行进一步调整网上可找得到的代码不多，需要另行研究；

2. 通过Cognitive Search的SearchClient类来进行查询（一般建议Hybrid模式，Vector+关键字），查询回来的结果再和Prompt一起送给completion进行处理；

3. 通过langchain来进行处理，langchain支持的向量查询库很多，不一定要用Cognitive Search，关于这一块可以另行尝试。

本文主要提供第2种思路的代码：

import os  
import json  
import openai  
import streamlit as st
import requests
from dotenv import load_dotenv  
from tenacity import retry, wait_random_exponential, stop_after_attempt  
from azure.core.credentials import AzureKeyCredential  
from azure.search.documents import SearchClient  
from azure.search.documents.indexes import SearchIndexClient  
from azure.search.documents.models import Vector  
from azure.search.documents.indexes.models import (  
    SearchIndex,  
    SearchField,  
    SearchFieldDataType,  
    SimpleField,  
    SearchableField,  
    SearchIndex,  
    SemanticConfiguration,  
    PrioritizedFields,  
    SemanticField,  
    SearchField,  
    SemanticSettings,  
    VectorSearch,  
    HnswVectorSearchAlgorithmConfiguration,  
)  
# References: https://github.com/Azure/cognitive-search-vector-pr/blob/main/demo-python/code/azure-search-vector-python-sample.ipynb
# 初始化openai，其中通过Streamlit获取key的方式可以被任何方式替代
openai.api_key = st.secrets["OPENAI_API_KEY"]
openai.api_type = "azure"
openai.api_version = "2023-08-01-preview"
openai.api_base = "https://***.openai.azure.com/"
deployment_id = "gpt4model"

search_endpoint = "https://***.search.windows.net"
search_key = st.secrets["SEARCH_KEY"]
search_index_name = "***index01"

credential = AzureKeyCredential(search_key)
search_client = SearchClient(endpoint=search_endpoint, index_name=search_index_name, credential=credential)

def generate_embeddings(text):
    response = openai.Embedding.create(
        input=text, engine="embeddingmodel")
    embeddings = response['data'][0]['embedding']
    return embeddings

# Step 1: Query from Azure Cognitive Search
# Create query vector

prompt = "静脉留置针有什么特点？"
vector = Vector(value=generate_embeddings(prompt), k=3, fields="contentVector")

results = search_client.search(  
    search_text=prompt,
    top=3,
    vectors= [vector],
    select=["title", "content"],
)  
rawdata = ''

for result in results:  
    rawdata += f"Title: {result['title']}\n"
    rawdata += f"Score: {result['@search.score']}\n"
    rawdata += f"Content: {result['content']}\n"  

# Step 2: Query from OpenAI    
prompt += '###\n' + rawdata + '\n###\n'
completion = openai.ChatCompletion.create(
    engine="gpt4model",
    messages=[{"role": "user", "content": prompt}],
)
rawdata = json.dumps(completion, ensure_ascii=False)
print(rawdata)

这个思路往下如何进一步优化？下一篇文章继续。