Weaviate

在这里插入图片描述



关于 Weaviate

Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.


核心功能

在这里插入图片描述


部署方式

Multiple deployment options are available to cater for different users and use cases.

All options offer vectorizer and RAG module integration.

在这里插入图片描述


使用场景

Weaviate is flexible and can be used in many contexts and scenarios.

在这里插入图片描述


快速上手 (Python)

参考:https://weaviate.io/developers/weaviate/quickstart


1、创建 Weaviate 数据库

你可以在 Weaviate Cloud Services (WCS). 创建一个免费的 cloud sandbox 实例

方式如:https://weaviate.io/developers/wcs/quickstart

从WCS 的Details tab 拿到 API keyURL


2、安装

使用 v4 client, Weaviate 1.23.7 及以上:

pip install -U weaviate-client

使用 v3

pip install "weaviate-client==3.*"

3、连接到 Weaviate

使用步骤一拿到的 API Key 和 URL,以及 OpenAI 的推理 API Key:https://platform.openai.com/signup


运行以下代码:

V4

import weaviate
import weaviate.classes as wvc
import os
import requests
import json

client = weaviate.connect_to_wcs(
    cluster_url=os.getenv("WCS_CLUSTER_URL"),
    auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),
    headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key
    }
)

try:
    pass # Replace with your code. Close client gracefully in the finally block.

finally:
    client.close()  # Close client gracefully

V3

import weaviate
import json

client = weaviate.Client(
    url = "https://some-endpoint.weaviate.network",  # Replace with your endpoint
    auth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"),  # Replace w/ your Weaviate instance API key
    additional_headers = {
        "X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Replace with your inference API key
    }
)

4、定义数据集

Next, we define a data collection (a “class” in Weaviate) to store objects in.

This is analogous to creating a table in relational (SQL) databases.


The following code:

  • Configures a class object with:
    • Name Question
    • Vectorizer module text2vec-openai
    • Generative module generative-openai
  • Then creates the class.

V4

    questions = client.collections.create(
        name="Question",
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_openai(),  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
        generative_config=wvc.config.Configure.Generative.openai()  # Ensure the `generative-openai` module is used for generative queries
    )

V3

class_obj = {
    "class": "Question",
    "vectorizer": "text2vec-openai",  # If set to "none" you must always provide vectors yourself. Could be any other "text2vec-*" also.
    "moduleConfig": {
        "text2vec-openai": {},
        "generative-openai": {}  # Ensure the `generative-openai` module is used for generative queries
    }
}

client.schema.create_class(class_obj)

5、添加对象

You can now add objects to Weaviate. You will be using a batch import (read more) process for maximum efficiency.

The guide covers using the vectorizer defined for the class to create a vector embedding for each object.


The above code:

  • Loads objects, and
  • Adds objects to the target class (Question) one by one.

V4

    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
    data = json.loads(resp.text)  # Load data

    question_objs = list()
    for i, d in enumerate(data):
        question_objs.append({
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        })

    questions = client.collections.get("Question")
    questions.data.insert_many(question_objs)  # This uses batching under the hood

V3

import requests
import json
resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
data = json.loads(resp.text)  # Load data

client.batch.configure(batch_size=100)  # Configure batch
with client.batch as batch:  # Initialize a batch process
    for i, d in enumerate(data):  # Batch import data
        print(f"importing question: {i+1}")
        properties = {
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        }
        batch.add_data_object(
            data_object=properties,
            class_name="Question"
        )

6、查询

1)Semantic search

Let’s start with a similarity search. A nearText search looks for objects in Weaviate whose vectors are most similar to the vector for the given input text.

Run the following code to search for objects whose vectors are most similar to that of biology.


V4

import weaviate
import weaviate.classes as wvc
import os

client = weaviate.connect_to_wcs(
    cluster_url=os.getenv("WCS_CLUSTER_URL"),
    auth_credentials=weaviate.auth.AuthApiKey(os.getenv("WCS_API_KEY")),
    headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your inference API key
    }
)

try:
    pass # Replace with your code. Close client gracefully in the finally block.
    questions = client.collections.get("Question")

    response = questions.query.near_text(
        query="biology",
        limit=2
    )

    print(response.objects[0].properties)  # Inspect the first object

finally:
    client.close()  # Close client gracefully

V3

import weaviate
import json

client = weaviate.Client(
    url = "https://some-endpoint.weaviate.network",  # Replace with your endpoint
    auth_client_secret=weaviate.auth.AuthApiKey(api_key="YOUR-WEAVIATE-API-KEY"),  # Replace w/ your Weaviate instance API key
    additional_headers = {
        "X-OpenAI-Api-Key": "YOUR-OPENAI-API-KEY"  # Replace with your inference API key
    }
)

response = (
    client.query
    .get("Question", ["question", "answer", "category"])
    .with_near_text({"concepts": ["biology"]})
    .with_limit(2)
    .do()
)

print(json.dumps(response, indent=4))

结果如下

{
    "data": {
        "Get": {
            "Question": [
                {
                    "answer": "DNA",
                    "category": "SCIENCE",
                    "question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance"
                },
                {
                    "answer": "Liver",
                    "category": "SCIENCE",
                    "question": "This organ removes excess glucose from the blood & stores it as glycogen"
                }
            ]
        }
    }
}

2) Semantic search with a filter

You can add Boolean filters to searches. For example, the above search can be modified to only in objects that have a “category” value of “ANIMALS”. Run the following code to see the results:


V4

    questions = client.collections.get("Question")

    response = questions.query.near_text(
        query="biology",
        limit=2,
        filters=wvc.query.Filter.by_property("category").equal("ANIMALS")
    )

    print(response.objects[0].properties)  # Inspect the first object

V3

response = (
    client.query
    .get("Question", ["question", "answer", "category"])
    .with_near_text({"concepts": ["biology"]})
    .with_where({
        "path": ["category"],
        "operator": "Equal",
        "valueText": "ANIMALS"
    })
    .with_limit(2)
    .do()
)

print(json.dumps(response, indent=4))

结果如下:

{
    "data": {
        "Get": {
            "Question": [
                {
                    "answer": "Elephant",
                    "category": "ANIMALS",
                    "question": "It's the only living mammal in the order Proboseidea"
                },
                {
                    "answer": "the nose or snout",
                    "category": "ANIMALS",
                    "question": "The gavial looks very much like a crocodile except for this bodily feature"
                }
            ]
        }
    }
}

更多可见:https://weaviate.io/developers/weaviate/quickstart


使用示例

This page illustrates various use cases for vector databases by way of open-source demo projects. You can fork and modify any of them.

If you would like to contribute your own project to this page, please let us know by creating an issue on GitHub.


Similarity search

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#similarity-search

A vector databases enables fast, efficient similarity searches on and across any modalities, such as text or images, as well as their combinations. Vector database’ similarity search capabilities can be used for other complex use cases, such as recommendation systems in classical machine learning applications.

TitleDescriptionModalityCode
Plant searchSemantic search over plants.TextJavascript
Wine searchSemantic search over wines.TextPython
Book recommender system (Video, Demo)Find book recommendations based on search query.TextTypeScript
Movie recommender system (Blog)Find similar movies.TextJavascript
Multilingual Wikipedia SearchSearch through Wikipedia in multiple languages.TextTypeScript
Podcast searchSemantic search over podcast episodes.TextPython
Video Caption SearchFind the timestamp of the answer to your question in a video.TextPython
Facial RecognitionIdentify people in imagesImagePython
Image Search over dogs (Blog)Find images of similar dog breeds based on uploaded image.ImagePython
Text to image searchFind images most similar to a text query.MultimodalJavascript
Text to image and image to image searchFind images most similar to a text or image query.MultimodalPython

LLMs and search

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#llms-and-search

Vector databases and LLMs go together like cookies and milk!

Vector databases help to address some of large language models (LLMs) limitations, such as hallucinations, by helping to retrieve the relevant information to provide to the LLM as a part of its input.

TitleDescriptionModalityCode
Verba, the golden RAGtriever (Video, Demo)Retrieval-Augmented Generation (RAG) system to chat with Weaviate documentation and blog posts.TextPython
HealthSearch (Blog, Demo)Recommendation system of health products based on symptoms.TextPython
Magic ChatSearch through Magic The Gathering cardsTextPython
AirBnB Listings (Blog)Generation of customized advertisements for AirBnB listings with Generative Feedback LoopsTextPython
DistyllSummarize text or video content.TextPython

Learn more in our LLMs and Search blog post.


Classification

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#classification

Weaviate can leverage its vectorization capabilities to enable automatic, real-time classification of unseen, new concepts based on its semantic understanding.

TitleDescriptionModalityCode
Toxic Comment ClassificationClasify whether a comment is toxic or non-toxic.TextPython
Audio Genre ClassificationClassify the music genre of an audio file.ImagePython

Other use cases

https://weaviate.io/developers/weaviate/more-resources/example-use-cases#other-use-cases

Weaviate’s modular ecosystem unlocks many other use cases of the Weaviate vector database, such as Named Entity Recognition or spell checking.

TitleDescriptionCode
Named Entity Recognition (NER)tbdPython

2024-03-27(三)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

编程乐园

请我喝杯伯爵奶茶~!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值