Chroma向量数据库的简单入门

核桃AI编程

已于 2024-08-29 16:32:19 修改

阅读量508

点赞数 12

文章标签：数据库

于 2024-08-29 12:14:44 首次发布

本文链接：https://blog.csdn.net/combination1379/article/details/141674340

版权

Chroma向量数据库的简单入门

Chroma 是 AI 原生开源向量数据库。Chroma 通过使知识、事实和技能可插入 LLM，使构建 LLM 应用程序变得容易。关于向量数据库的介绍，可以参考其它文章。

Chroma的工作原理如下图所示：
在这里插入图片描述
App 当中的查询字符串在经过 Embedding 之后，生成一个向量。然后，这个向量会在 Chroma 数据库中查询，得到与 queryies 相似的几个向量结果。最终返回给 App 向量查询结果对应的文本形式。

Chroma 提供了以下工具：

存储 embeddings 和它们的 metadata
把文档（documents）和查询（queries）嵌入到数据库当中
搜索 embeddings

Chroma 的优点是易于使用、查询速度快。为 Python 和 JavaScript/TypeScript 提供了客户端 SDKs。

下面以 Python 为例说明 Chroma 的简单用法。

首先需要安装 Chroma： pip install chromada。

import chromadb

# 1. 创建 Chroma 客户端
chroma_client = chromadb.Client()

# 2. 创建一个Collection，其用来存储embeddings, documents和其它元数据的数据结构
collection = chroma_client.create_collection(name="collection")

# 3. 添加一些文本文档到 collection 当中。
# Chroma 会自动存储文本、handle embedding 以及索引。你可以定制化 embedding 模型
collection.add(
    documents=[
        "This is a document about pineapple",
        "This is a document about oranges"
    ],
    ids=["id1", "id2"]
)


# 4. 查询集合。可以用查询文本的列表查询集合，然后 Chroma 将要返回 n 个最相似的结果
results = collection.query(
    query_texts=["This is a query document about hawaii"],
    n_results=2
)

print(results)

下面是输出的结果：

{
  'documents': [[
      'This is a document about pineapple',
      'This is a document about oranges'
  ]],
  'ids': [['id1', 'id2']],
  'distances': [[1.0404009819030762, 1.243080496788025]],
  'uris': None,
  'data': None,
  'metadatas': [[None, None]],
  'embeddings': None,
}