[深入了解Google Vertex AI Vector Search：构建高效向量检索系统]

afTFODguAKBF

于 2024-10-02 03:21:25 发布

阅读量194

点赞数 3

文章标签：人工智能算法机器学习 python

本文链接：https://blog.csdn.net/afTFODguAKBF/article/details/142677174

版权

引言

在现代应用中，快速检索相似数据的能力至关重要。Google Vertex AI Vector Search提供了一个高性能、低延迟的向量数据库，满足大规模近似相似性匹配需求。本篇文章将带您深入了解如何使用Vertex AI构建和利用向量检索系统。

主要内容

创建索引并部署到端点

在使用Vertex AI进行向量检索之前，需要创建索引并将其部署到端点。

from google.cloud import aiplatform

# 设置项目和存储常量
PROJECT_ID = "<my_project_id>"
REGION = "<my_region>"
BUCKET = "<my_gcs_bucket>"
BUCKET_URI = f"gs://{BUCKET}"

# 初始化平台
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

# 创建存储桶
! gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI

# 显示名称
DISPLAY_NAME = "<my_matching_engine_index_id>"
DEPLOYED_INDEX_ID = "<my_matching_engine_endpoint_id>"
DIMENSIONS = 768  # 维度数

使用VertexAIEmbeddings作为嵌入模型

from langchain_google_vertexai import VertexAIEmbeddings

embedding_model = VertexAIEmbeddings(model_name="textembedding-gecko@003")

创建并部署索引

# 创建索引
my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name=DISPLAY_NAME,
    dimensions=DIMENSIONS,
    approximate_neighbors_count=150,
    distance_measure_type="DOT_PRODUCT_DISTANCE",
    index_update_method="STREAM_UPDATE",
)

# 创建并部署端点
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name=f"{DISPLAY_NAME}-endpoint", public_endpoint_enabled=True
)
my_index_endpoint.deploy_index(
    index=my_index, deployed_index_id=DEPLOYED_INDEX_ID
)

从文本创建向量存储

from langchain_google_vertexai import VectorSearchVectorStore

texts = [
    "The cat sat on",
    "the mat.",
    "I like to",
    "eat pizza for",
    "dinner.",
    "The sun sets",
    "in the west.",
]

vector_store = VectorSearchVectorStore.from_components(
    project_id=PROJECT_ID,
    region=REGION,
    gcs_bucket_name=BUCKET,
    index_id=my_index.name,
    endpoint_id=my_index_endpoint.name,
    embedding=embedding_model,
    stream_update=True,
)

# 添加文本向量
vector_store.add_texts(texts=texts)