使用 Ollama、Llama 3.1 和 Milvus 进行函数调用案例

最新推荐文章于 2024-08-29 11:48:33 发布

曲奇人工智能安全

最新推荐文章于 2024-08-29 11:48:33 发布

阅读量1.2k

点赞数 21

文章标签： llama milvus

本文链接：https://blog.csdn.net/qq_29883477/article/details/141645746

版权

请添加图片描述

偶然看到一篇很短代码就能实现llama function call的文章。
秉着学习加分项的原则，进行了实验测试。这里给出案例和全部改写后的代码。

将 LLM 与函数调用相结合，就如同赋予您的 AI 连接世界的能力。通过将您的 LLM 与外部工具（例如用户自定义函数或
API）集成起来，您可以构建解决现实问题应用程序。在本文中，我们将探讨如何将 Llama 3.1 与 Milvus 等外部工具以及 API
集成，以构建强大且具备上下文感知能力的应用程序。

函数调用简介 Function Calling

像 GPT-4、Mistral Nemo 和 Llama 3.1 这样的 LLM 现在可以检测到它们需要调用函数时，然后输出包含用于调用该函数的参数的
JSON。这使得您的 AI 应用更加通用和强大。
函数调用使开发人员能够创建：

由 LLM 提供支持的数据提取和标记解决方案（例如，从维基百科文章中提取人物姓名）
可以帮助将自然语言转换为 API 调用或有效数据库查询的应用程序
与知识库交互的对话式知识检索引擎

使用的工具介绍

Ollama：将 LLM 的强大功能带到您的笔记本电脑上，简化本地操作。
ref: https://ollama.com/
Milvus：我们首选的向量数据库，用于高效的数据存储和检索。
ref: https://milvus.io/
Llama 3.1–8B：8B 模型的升级版本支持多语言，上下文长度显著提高至 128K，并且可以利用工具使用。
ref: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B

使用 Llama 3.1 和 Ollama Llama 3.1 已针对函数调用进行了微调

它支持单一、嵌套和并行函数调用，以及多回合函数调用。这意味着您的
AI 可以处理涉及多个步骤或并行流程的复杂任务。在我们的示例中，我们将实现不同的函数来模拟 API 调用以获取航班时间并在 Milvus
中执行搜索。Llama 3.1 将根据用户的查询决定调用哪个函数。

安装依赖

使用 Ollama 下载 Llama 3.1：
ollama run llama3.1
这将把模型下载到您的笔记本电脑，使其可以使用 Ollama。
接下来，安装必要的依赖项：
pip install ollama openai "pymilvus[model]"

我们正在安装带有模型扩展的 Milvus Lite，它允许您使用 Milvus 中提供的模型嵌入数据。

像我这里使用的截图一样。这里建议使用conda创建一个单独的环境进行实验。详细见网上conda的使用教程。

将数据插入 Milvus

现在，让我们将一些数据插入 Milvus。如果 Llama 3.1 认为它相关，它将在以后决定搜索这些数据！
这里建议大家创建一个python文件，单独运行。

from pymilvus import MilvusClient, model

# 获取默认的嵌入函数（用于将文本转换为向量）
embedding_fn = model.DefaultEmbeddingFunction()

# 文档列表，包含要进行向量化的文本
docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

# 使用嵌入函数将文档转换为嵌入向量
vectors = embedding_fn.encode_documents(docs)

# 输出向量的维度是768，与我们刚创建的集合相匹配
print("Dim:", embedding_fn.dim, vectors[0].shape)  # 输出维度信息

# 每个实体包含id、向量表示、原始文本和一个主题标签
data = [
    {"id": i, "vector": vectors[i], "text": docs[i], "subject": "history"}
    for i in range(len(vectors))
]

# 打印数据的基本信息，每个实体都包含多个字段
print("Data has", len(data), "entities, each with fields: ", data[0].keys())
print("Vector dim:", len(data[0]["vector"]))  # 向量的维度

# 创建Milvus客户端连接到本地数据库
client = MilvusClient('./milvus_local.db')

# 创建一个集合来存储数据
client.create_collection(
    collection_name="demo_collection",
    dimension=768,  # 我们在此示例中使用的向量有768维
)

# 将数据插入到集合中
client.insert(collection_name="demo_collection", data=data)

这里我们在环境上运行这段代码，可以看到如下截图，代表插入数据成功。
请添加图片描述

定义要使用的函数

在这个示例中，我们定义了两个函数。第一个函数模拟一个获取航班时间的API调用。第二个函数在Milvus中运行搜索查询。

from pymilvus import model, MilvusClient
import json

# 获取默认的嵌入函数
embedding_fn = model.DefaultEmbeddingFunction()

client = MilvusClient('./milvus_local.db')


# 模拟获取航班时间的API调用
# 在实际应用中，这会从实时数据库或API获取数据
def get_flight_times(departure: str, arrival: str) -> str:
    flights = {
        'NYC-LAX': {'departure': '08:00 AM', 'arrival': '11:30 AM', 'duration': '5h 30m'},
        'LAX-NYC': {'departure': '02:00 PM', 'arrival': '10:30 PM', 'duration': '5h 30m'},
        'LHR-JFK': {'departure': '10:00 AM', 'arrival': '01:00 PM', 'duration': '8h 00m'},
        'JFK-LHR': {'departure': '09:00 PM', 'arrival': '09:00 AM', 'duration': '7h 00m'},
        'CDG-DXB': {'departure': '11:00 AM', 'arrival': '08:00 PM', 'duration': '6h 00m'},
        'DXB-CDG': {'departure': '03:00 AM', 'arrival': '07:30 AM', 'duration': '7h 30m'},
    }
    # 将出发地和目的地组合成键，并查找航班信息
    key = f'{departure}-{arrival}'.upper()
    return json.dumps(flights.get(key, {'error': 'Flight not found'}))


# 在向量数据库中搜索与人工智能相关的数据
def search_data_in_vector_db(query: str) -> str:
    # 将查询转换为向量
    query_vectors = embedding_fn.encode_queries([query])

    # 执行向量数据库搜索
    res = client.search(
        collection_name="demo_collection",
        data=query_vectors,
        limit=2,  # 限制返回结果数量
        output_fields=["text", "subject"],  # 指定返回的字段
    )

    print(res)  # 打印搜索结果
    return json.dumps(res)


print(client)

现在，让我们为LLM提供指令，以便它能够使用我们定义的这些函数。

def run(model: str, question: str):
    client = ollama.Client()
    # Initialize conversation with a user query
    messages = [{'role': 'user', 'content': question}]
    # First API call: Send the query and function description to the model
    response = client.chat(
        model=model,
        messages=messages,
        tools=[
            {
                'type': 'function',
                'function': {
                    'name': 'get_flight_times',
                    'description': 'Get the flight times between two cities',
                    'parameters': {
                        'type': 'object',
                        'properties': {
                            'departure': {
                                'type': 'string',
                                'description': 'The departure city (airport code)',
                            },
                            'arrival': {
                                'type': 'string',
                                'description': 'The arrival city (airport code)',
                            },
                        },
                        'required': ['departure', 'arrival'],
                    },
                },
            },
            {
                'type': 'function',
                'function': {
                    'name': 'search_data_in_vector_db',
                    'description': 'Search about Artificial Intelligence data in a vector database',
                    'parameters': {
                        'type': 'object',
                        'properties': {
                            'query': {
                                'type': 'string',
                                'description': 'The search query',
                            },
                        },
                        'required': ['query'],
                    },
                },
            },
        ],
    )
    messages.append(response['message'])
    # Check if the model decided to use the provided function
    if not response['message'].get('tool_calls'):
        print("The model didn't use the function. Its response was:")
        print(response['message']['content'])
        return
    # Process function calls made by the model
    if response['message'].get('tool_calls'):
        available_functions = {
            'get_flight_times': get_flight_times,
            'search_data_in_vector_db': search_data_in_vector_db,
        }
        for tool in response['message']['tool_calls']:
            function_to_call = available_functions[tool['function']['name']]
            function_args = tool['function']['arguments']
            function_response = function_to_call(**function_args)
            # Add function response to the conversation
            messages.append(
                {
                    'role': 'tool',
                    'content': function_response,
                }
            )
    # Second API call: Get final response from the model
    final_response = client.chat(model=model, messages=messages)
    print(final_response['message']['content'])

最终使用实例

让我们检查一下是否可以获取特定航班的时间：

question = "What is the flight time from New York (NYC) to Los Angeles (LAX)?"

run('llama3.1', question)

回复实例：

(ollama) xdrshjr@xdrshjr:~/experience/JR-Agent/autogen/exps/ollama_function_call$ python ollama_function_call.py 
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
The flight time from New York (NYC) to Los Angeles (LAX) is approximately 5 hours. However, please note that this time may vary depending on several factors such as the airline, flight schedule, and weather conditions.

To give you a more accurate answer, I can suggest some flight routes with their estimated flight times:

* American Airlines: 5 hours and 10 minutes
* Delta Air Lines: 5 hours and 15 minutes
* United Airlines: 5 hours and 20 minutes

Please note that these times are approximate and may vary depending on the specific flight schedule. It's always best to check with your airline or a travel website like Expedia, Kayak, or Skyscanner for the most up-to-date and accurate information.

对应中文结果为：

从纽约（NYC）到洛杉矶（LAX）的飞行时间大约是5小时。不过，这个时间可能会因航空公司、航班安排和天气状况等因素而有所不同。

为了提供更准确的信息，以下是一些航班路线及其预计飞行时间：

美国航空：5小时10分钟
达美航空：5小时15分钟
联合航空：5小时20分钟
请注意，这些时间是大概的，可能会根据具体航班安排而有所变化。最好向航空公司或使用 Expedia、Kayak 或 Skyscanner 等旅游网站查询最新和最准确的信息。

我们再来测试一下是否可以进行向量搜索

question = "What is Artificial Intelligence?"

run('llama3.1', question)

回复结果为：

(ollama) xdrshjr@xdrshjr:~/experience/JR-Agent/autogen/exps/ollama_function_call$ python ollama_function_call.py 
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
data: ["[{'id': 0, 'distance': 0.47026658058166504, 'entity': {'text': 'Artificial intelligence was founded as an academic discipline in 1956.', 'subject': 'history'}}, {'id': 1, 'distance': 0.2702861428260803, 'entity': {'text': 'Alan Turing was the first person to conduct substantial research in AI.', 'subject': 'history'}}]"] , extra_info: {'cost': 0}
Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. The term may also be applied to any machine or program that exhibits traits associated with a mind possessing consciousness, self-awareness, or the ability to reason. AI systems can perform tasks that typically require human intelligence, such as visual understanding, speech recognition, decision-making, and learning.

AI technology has been rapidly advancing in recent years and has found its way into various aspects of our lives, including but not limited to:

1. Virtual Assistants: Many people use virtual assistants like Siri, Alexa, or Google Assistant to perform tasks, set reminders, or play music.
2. Image Recognition: Facebook uses AI-powered image recognition to identify and flag suspicious activity on the platform.
3. Personalized Recommendations: Netflix and Amazon recommend movies and products based on your viewing history and purchase behavior.
4. Self-Driving Cars: Several companies like Waymo (formerly Google Self-Driving Car project) are working towards developing fully autonomous vehicles.
5. Healthcare: AI is being used in medical diagnosis, personalized treatment plans, and even robotic surgery.

The development of AI has led to significant improvements in efficiency, accuracy, and speed across various industries. However, it also raises concerns about job displacement, bias in decision-making, and the need for more regulation in certain areas.

可以看到，模型检索到了准确的数据，并给出了对应的总结的内容。

结论

使用大型语言模型进行函数调用开启了无限可能。通过将Llama 3.1与Milvus等外部工具和API集成，您可以构建强大且具备上下文感知能力的应用程序，以满足特定的使用场景和实际问题。

全部代码

insert_data_milvus.py

from pymilvus import MilvusClient, model
embedding_fn = model.DefaultEmbeddingFunction()
docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]
vectors = embedding_fn.encode_documents(docs)
# The output vector has 768 dimensions, matching the collection that we just created.
print("Dim:", embedding_fn.dim, vectors[0].shape)  # Dim: 768 (768,)
# Each entity has id, vector representation, raw text, and a subject label.
data = [
    {"id": i, "vector": vectors[i], "text": docs[i], "subject": "history"}
    for i in range(len(vectors))
]
print("Data has", len(data), "entities, each with fields: ", data[0].keys())
print("Vector dim:", len(data[0]["vector"]))
# Create a collection and insert the data
client = MilvusClient('./milvus_local.db')
client.create_collection(

    collection_name="demo_collection",

    dimension=768,  # The vectors we will use in this demo has 768 dimensions

)
client.insert(collection_name="demo_collection", data=data)

ollama_function_call.py

import ollama
from pymilvus import model, MilvusClient
import json

# 获取默认的嵌入函数
embedding_fn = model.DefaultEmbeddingFunction()

milvus_client = MilvusClient('./milvus_local.db')


# 模拟获取航班时间的API调用
# 在实际应用中，这会从实时数据库或API获取数据
def get_flight_times(departure: str, arrival: str) -> str:
    flights = {
        'NYC-LAX': {'departure': '08:00 AM', 'arrival': '11:30 AM', 'duration': '5h 30m'},
        'LAX-NYC': {'departure': '02:00 PM', 'arrival': '10:30 PM', 'duration': '5h 30m'},
        'LHR-JFK': {'departure': '10:00 AM', 'arrival': '01:00 PM', 'duration': '8h 00m'},
        'JFK-LHR': {'departure': '09:00 PM', 'arrival': '09:00 AM', 'duration': '7h 00m'},
        'CDG-DXB': {'departure': '11:00 AM', 'arrival': '08:00 PM', 'duration': '6h 00m'},
        'DXB-CDG': {'departure': '03:00 AM', 'arrival': '07:30 AM', 'duration': '7h 30m'},
    }
    # 将出发地和目的地组合成键，并查找航班信息
    key = f'{departure}-{arrival}'.upper()
    return json.dumps(flights.get(key, {'error': 'Flight not found'}))


# 在向量数据库中搜索与人工智能相关的数据
def search_data_in_vector_db(query: str) -> str:
    # 将查询转换为向量
    query_vectors = embedding_fn.encode_queries([query])

    # 执行向量数据库搜索
    res = milvus_client.search(
        collection_name="demo_collection",
        data=query_vectors,
        limit=2,  # 限制返回结果数量
        output_fields=["text", "subject"],  # 指定返回的字段
    )

    print(res)  # 打印搜索结果
    return json.dumps(res)


def run(model: str, question: str):
    client = ollama.Client()
    # Initialize conversation with a user query
    messages = [{'role': 'user', 'content': question}]
    # First API call: Send the query and function description to the model
    response = client.chat(
        model=model,
        messages=messages,
        tools=[
            {
                'type': 'function',
                'function': {
                    'name': 'get_flight_times',
                    'description': 'Get the flight times between two cities',
                    'parameters': {
                        'type': 'object',
                        'properties': {
                            'departure': {
                                'type': 'string',
                                'description': 'The departure city (airport code)',
                            },
                            'arrival': {
                                'type': 'string',
                                'description': 'The arrival city (airport code)',
                            },
                        },
                        'required': ['departure', 'arrival'],
                    },
                },
            },
            {
                'type': 'function',
                'function': {
                    'name': 'search_data_in_vector_db',
                    'description': 'Search about Artificial Intelligence data in a vector database',
                    'parameters': {
                        'type': 'object',
                        'properties': {
                            'query': {
                                'type': 'string',
                                'description': 'The search query',
                            },
                        },
                        'required': ['query'],
                    },
                },
            },
        ],
    )
    messages.append(response['message'])
    # Check if the model decided to use the provided function
    if not response['message'].get('tool_calls'):
        print("The model didn't use the function. Its response was:")
        print(response['message']['content'])
        return
    # Process function calls made by the model
    if response['message'].get('tool_calls'):
        available_functions = {
            'get_flight_times': get_flight_times,
            'search_data_in_vector_db': search_data_in_vector_db,
        }
        for tool in response['message']['tool_calls']:
            function_to_call = available_functions[tool['function']['name']]
            function_args = tool['function']['arguments']
            function_response = function_to_call(**function_args)
            # Add function response to the conversation
            messages.append(
                {
                    'role': 'tool',
                    'content': function_response,
                }
            )
    # Second API call: Get final response from the model
    final_response = client.chat(model=model, messages=messages)
    print(final_response['message']['content'])


if __name__ == '__main__':
    # 1 函数调用
    # question = "What is the flight time from New York (NYC) to Los Angeles (LAX)?"
    # run('llama3.1', question)

    # 2 向量检索
    question = "What is Artificial Intelligence?"
    run('llama3.1', question)

参考文献

[1] ollama function call, https://medium.com/@zilliz_learn/function-calling-with-ollama-llama-3-1-and-milvus-3fd268405ad7
[2] ollama, https://milvus.io/
[3] Llama 3.1–8B, https://huggingface.co/meta-llama/Meta-Llama-3.1-8B