一、关于 PandasAI
1、项目概览
PandasAI 是一个 Python 平台,支持通过自然语言对数据进行提问。它帮助非技术用户 以更自然的方式与数据交互,同时帮助技术用户节省处理数据的时间和精力。
2、相关链接资源
- Github:https://github.com/sinaptik-ai/pandas-ai
- 官方文档:https://pandas-ai.readthedocs.io/en/latest/
- Colab:https://colab.research.google.com/drive/1ZnO-njhL7TBOYPZaqvMvGtsjckZKrv2E?usp=sharing
- 演示视频:https://github.com/sinaptik-ai/pandas-ai/raw/main/assets/demo.gif
- 示例:https://github.com/sinaptik-ai/pandas-ai/tree/main/examples
- 社区支持:Discord
- License:MIT
- 平台入口:https://app.pandabi.ai
- 企业版咨询:https://getpanda.ai/pricing
3、功能特性
1、自然语言查询
支持通过简单对话形式查询数据内容
2、可视化生成
可自动生成数据可视化图表
3、多表关联
支持跨多个DataFrame的关联查询
4、安全沙箱
提供Docker沙箱环境保障代码安全执行
二、安装配置
系统要求
- Python 3.8+ (❤️.12)
# pip安装方式
pip install "pandasai>=3.0.0b2"
# poetry安装方式
poetry add "pandasai>=3.0.0b2"
# Docker支持包
pip install "pandasai-docker"
企业版说明
pandasai/ee
目录使用独立许可证,详情见:EE License
三、使用示例
1、提问
import pandasai as pai
# Sample DataFrame
df = pai.DataFrame({
"country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
"revenue": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000]
})
# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://app.pandabi.ai (you can also configure it in your .env file)
pai.api_key.set("your-pai-api-key")
df.chat('Which are the top 5 countries by sales?')
hina, United States, Japan, Germany, Australia
或者你可以提出更复杂的问题:
df.chat(
"What is the total sales for the top 3 countries by sales?"
)
The total sales for the top 3 countries by sales is 16500
2、可视化图表
你也可以让 PandasAI 为你生成图表:
df.chat(
"Plot the histogram of countries showing for each one the gd. Use different colors for each bar",
)
3、多数据框处理
您还可以向 PandasAI 传入多个数据框,并针对它们之间的关系提出问题。
import pandasai as pai
employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}
salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}
employees_df = pai.DataFrame(employees_data)
salaries_df = pai.DataFrame(salaries_data)
# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://app.pandabi.ai (you can also configure it in your .env file)
pai.api_key.set("your-pai-api-key")
pai.chat("Who gets paid the most?", employees_df, salaries_df)
Olivia gets paid the most.
4、Docker 沙箱
您可以在 Docker 沙箱中运行 PandasAI,这提供了一个安全隔离的环境来安全执行代码,并降低恶意攻击的风险。
Python 要求
pip install "pandasai-docker"
使用方法
import pandasai as pai
from pandasai_docker import DockerSandbox
# Initialize the sandbox
sandbox = DockerSandbox()
sandbox.start()
employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}
salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}
employees_df = pai.DataFrame(employees_data)
salaries_df = pai.DataFrame(salaries_data)
# By default, unless you choose a different LLM, it will use BambooLLM.
# You can get your free API key signing up at https://app.pandabi.ai (you can also configure it in your .env file)
pai.api_key.set("your-pai-api-key")
pai.chat("Who gets paid the most?", employees_df, salaries_df, sandbox=sandbox)
# Don't forget to stop the sandbox when done
sandbox.stop()
Olivia gets paid the most.
你可以在 examples 目录中找到更多示例。
伊织 xAI 2025-05-28(二)