测试提示词
导入需要的Python包
Please provide a Python code snippet that imports the necessary libraries for data analysis, such as pandas, numpy and matplotlib.
读取CSV
Now, could you provide a Python code snippet to load a CSV file into a pandas DataFrame? Assume the file is named ‘data.csv’.
说明:Assume the file is named ‘data.csv’.应该为前端上传csv文件后自动内置的默认提示词
数据汇总统计
Could you provide a Python code snippet that gives the summary statistics of the DataFrame?
缺失值处理
Now, could you provide a Python code snippet to check for missing values in the DataFrame?
特定数据查询
Could you provide a Python code snippet that shows the distribution of a specific column? Let’s say the column is named ‘Age’.
直方图绘制
Now, could you provide a Python code snippet to plot a histogram of the ‘Age’ column using matplotlib.pyplot?
散点图绘制
Can you provide a Python code snippet to plot a scatter plot between ‘Age’ and another numerical column, let’s say ‘Income’, using matplotlib.pyplot
计算相关性并热图可视化
Can you provide a Python code snippet to compute the correlation between different numerical columns in the DataFrame and visualize it using a heatmap in seaborn?
保存数据清洗后的data.csv
Finally, could you provide a Python code snippet to save the cleaned and processed DataFrame back to a CSV file, named ‘processed_data.csv’?
测试样例data.csv生成脚本
import pandas as pd
import numpy as np
# 设置随机数种子以确保结果的可复现性
np.random.seed(0)
# 生成随机数据
names = ['Name' + str(i) for i in range(50)]
ages = np.random.randint(18, 65, size=50)
incomes = np.random.randint(30000, 80000, size=50)
# 处理缺失值,我们将一部分数据设为NaN
ages[ages < 25] = np.nan
incomes[incomes < 40000] = np.nan
# 创建DataFrame
df = pd.DataFrame({'Name': names, 'Age': ages, 'Income': incomes})
# 将一部分数据设为缺失值
df.loc[df.sample(frac=0.1).index, 'Age'] = np.nan
df.loc[df.sample(frac=0.1).index, 'Income'] = np.nan
# 保存为CSV文件
df.to_csv('data.csv', index=False)