在数据隐私问题日益严重的时代,本地大型语言模型 (LLM) 应用程序的开发为基于云的解决方案提供了替代方案。Ollama 提供了一个解决方案,使 LLM 可以在本地下载和使用。在本文中,我们将探讨如何使用 Jupyter Notebook 将 Ollama 与 LangChain 和 SingleStore 一起使用。

使用 Ollama 和 SingleStore 构建本地 LLM 应用程序_工作区

介绍

    我们将使用运行 Ubuntu 22.04.2 的虚拟机作为测试环境。另一种方法是使用 .venv

创建 SingleStoreDB 云帐户

        我们将使用 Ollama Demo Group 作为工作区组名称,使用 ollama-demo 作为工作区名称。我们将记下我们的密码主机名。在本文中,我们将通过在 Ollama Demo Group > Firewall 下配置防火墙来暂时允许从任何位置进行访问。对于生产环境,应添加防火墙规则以提供更高的安全性。

创建数据库

        在我们的 SingleStore Cloud 帐户中,让我们使用 SQL 编辑器创建一个新数据库。调用此函数,如下所示:ollama_demo

SQL的

CREATE DATABASE IF NOT EXISTS ollama_demo;
  • 1.

安装 Jupyter

在命令行中,我们将安装经典的 Jupyter Notebook,如下所示:

pip install notebook
  • 1.

安装 Ollama

我们将安装 Ollama,如下所示:

curl -fsSL https://ollama.com/install.sh | sh
  • 1.

环境变量

        使用我们之前保存的密码主机信息,我们将创建一个环境变量来指向我们的 SingleStore 实例,如下所示:

export SINGLESTOREDB_URL="admin:<password>@<host>:3306/ollama_demo"
  • 1.

将 和 替换为您的环境的值。<password><host>

启动 Jupyter

        我们现在已准备好与 Ollama 合作,我们将推出 Jupyter:

jupyter notebook
  • 1.

填写笔记本

首先,一些软件包:

!pip install langchain ollama --quiet --no-warn-script-location
  • 1.

接下来,我们将导入一些库:

import ollama
from langchain_community.vectorstores import SingleStoreDB
from langchain_community.vectorstores.utils import DistanceStrategy
from langchain_core.documents import Document
from langchain_community.embeddings import OllamaEmbeddings
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

我们将使用创建嵌入:all-minilm

ollama.pull("all-minilm")
  • 1.

输出示例:

{'status': 'success'}
  • 1.

对于我们的 LLM,我们将使用(在撰写本文时为 3.8 GB):llama2

ollama.pull("llama2")
  • 1.

输出示例:

{'status': 'success'}
  • 1.

接下来,我们将使用 Ollama 网站上的示例文本:

documents = [
    "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
    "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
    "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
    "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
    "Llamas are vegetarians and have very efficient digestive systems",
    "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]

embeddings = OllamaEmbeddings(
    model = "all-minilm",
)

dimensions = len(embeddings.embed_query(documents[0]))

docs = [Document(text) for text in documents]
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.

        将指定嵌入,确定第一个文档返回的维度数,并将文档转换为 SingleStore 所需的格式。all-minilm

接下来,使用 LangChain:

docsearch = SingleStoreDB.from_documents(
    docs,
    embeddings,
    table_name = "langchain_docs",
    distance_strategy = DistanceStrategy.EUCLIDEAN_DISTANCE,
    use_vector_index = True,
    vector_size = dimensions
)
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.

        除了文档和嵌入之外,我们还将提供要用于存储的表的名称、要使用向量索引的距离策略,以及使用我们之前确定的维度的向量大小。LangChain文档中对这些选项和其他选项进行了更详细的解释。

        使用 SingleStore Cloud 中的 SQL 编辑器,让我们检查一下 LangChain 创建的表的结构:

SQL的

USE ollama_demo;

DESCRIBE langchain_docs;
  • 1.
  • 2.
  • 3.

输出示例:

+----------+------------------+------+------+---------+----------------+
| Field    | Type             | Null | Key  | Default | Extra          |
+----------+------------------+------+------+---------+----------------+
| id       | bigint(20)       | NO   | PRI  | NULL    | auto_increment |
| content  | longtext         | YES  |      | NULL    |                |
| vector   | vector(384, F32) | NO   | MUL  | NULL    |                |
| metadata | JSON             | YES  |      | NULL    |                |
+----------+------------------+------+------+---------+----------------+
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.

我们可以看到,创建了一个具有 384 个维度的列来存储嵌入。vector

我们还要快速检查存储的数据:

USE ollama_demo;

SELECT SUBSTRING(content, 1, 30) AS content, SUBSTRING(vector, 1, 30) AS vector FROM langchain_docs;
  • 1.
  • 2.
  • 3.

输出示例:

+--------------------------------+--------------------------------+
| content                        | vector                         |
+--------------------------------+--------------------------------+
| Llamas weigh between 280 and 4 | [0.235754818,0.242168128,-0.26 |
| Llamas were first domesticated | [0.153105229,0.219774529,-0.20 |
| Llamas are vegetarians and hav | [0.285528302,0.10461951,-0.313 |
| Llamas are members of the came | [-0.0174482632,0.173883006,-0. |
| Llamas can grow as much as 6 f | [-0.0232818555,0.122274697,-0. |
| Llamas live to be about 20 yea | [0.0260244086,0.212311044,0.03 |
+--------------------------------+--------------------------------+
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.

最后,我们来检查一下向量索引:

USE ollama_demo;

SHOW INDEX FROM langchain_docs;
  • 1.
  • 2.
  • 3.

输出示例:

+----------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------------+---------+---------------+---------------------------------------+
| Table          | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type       | Comment | Index_comment | Index_options                         |
+----------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------------+---------+---------------+---------------------------------------+
| langchain_docs |          0 | PRIMARY    |            1 | id          | NULL      |        NULL |     NULL |   NULL |      | COLUMNSTORE HASH |         |               |                                       |
| langchain_docs |          1 | vector     |            1 | vector      | NULL      |        NULL |     NULL |   NULL |      | VECTOR           |         |               | {"metric_type": "EUCLIDEAN_DISTANCE"} |
| langchain_docs |          1 | __SHARDKEY |            1 | id          | NULL      |        NULL |     NULL |   NULL |      | METADATA_ONLY    |         |               |                                       |
+----------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------------+---------+---------------+---------------------------------------+
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.

我们现在要问一个问题,如下所示:

prompt = "What animals are llamas related to?"
docs = docsearch.similarity_search(prompt)
data = docs[0].page_content
print(data)
  • 1.
  • 2.
  • 3.
  • 4.

输出示例:

Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels
  • 1.

接下来,我们将使用 LLM,如下所示:

output = ollama.generate(
    model = "llama2",
    prompt = f"Using this data: {data}. Respond to this prompt: {prompt}"
)

print(output["response"])
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.

输出示例:

Llamas are members of the camelid family, which means they are closely related to other animals such as:

1. Vicuñas: Vicuñas are small, wild relatives of llamas and alpacas. They are native to South America and are known for their soft, woolly coats.
2. Camels: Camels are also members of the camelid family and are known for their distinctive humps on their backs. There are two species of camel: the dromedary and the Bactrian.
3. Alpacas: Alpacas are domesticated animals that are closely related to llamas and vicuñas. They are native to South America and are known for their soft, luxurious fur.

So, in summary, llamas are related to vicuñas, camels, and alpacas.
  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.

    我们已经看到我们可以连接到 SingleStore,存储文档和嵌入,询问有关数据库中数据的问题,并通过 Ollama 在本地使用 LLM 的强大功能。