Python的基本使用（numpy、pandas、matplotlib）_3、介绍下numpy、panadas、matplotlip都是做什么的?-CSDN博客

本文链接：https://blog.csdn.net/weixin_66547608/article/details/139201105

numpy、pandas、matplotlib

1. numpy

numpy（Numerical Python 的简称）是 Python 语言的一个扩展程序库，支持大量的维度数组与矩阵运算，此外也针对数组运算提供大量的数学函数库。它的主要特点是：

N维数组对象：用于存储单一数据类型的多维数组。

快速的元素级运算：如加法、减法、乘法等。

广播：一种强大的机制，使得不同大小的数组之间可以进行数学运算。

线性代数、统计和傅里叶变换等：提供了大量的高级数学函数。

常用代码：

numpy（Numerical Python 的简称）是 Python 中的一个基础库，用于处理大型多维数组和矩阵，以及执行各种与这些数组相关的数学操作。以下是一些 numpy 的常用代码示例：

1. 导入 numpy

python复制代码

import numpy as np

2. 创建数组

python复制代码

	# 一维数组
	arr1d = np.array([1, 2, 3, 4, 5])

	# 二维数组（矩阵）
	arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

	# 使用 zeros, ones, empty 创建特定形状的数组
	zeros_arr = np.zeros((3, 3))
	ones_arr = np.ones((2, 4))
	empty_arr = np.empty((2, 3)) # 注意：内容未初始化，可能是任何值

	# 使用 arange, linspace 创建一维数组
	arange_arr = np.arange(0, 10, 2) # 从 0 开始，到 10（不包括），步长为 2
	linspace_arr = np.linspace(0, 1, 5) # 从 0 到 1，生成 5 个等间隔的数

	# 使用 random 创建随机数组
	random_arr = np.random.rand(3, 3) # 生成 0 到 1 之间的随机数
	randint_arr = np.random.randint(0, 10, (3, 3)) # 生成 0 到 9 之间的随机整数

3. 数组操作

python复制代码

	# 数组运算（元素级）
	result = arr1d + arr1d # 对应元素相加
	result = arr2d * 2 # 所有元素乘以 2

	# 索引和切片
	element = arr2d[0, 0] # 获取第一个元素
	row = arr2d[1, :] # 获取第二行
	col = arr2d[:, 1] # 获取第二列

	# 形状（shape）和大小（size）
	shape = arr2d.shape # 获取形状，例如 (3, 3)
	size = arr2d.size # 获取元素总数

	# 数据类型（dtype）
	dtype = arr1d.dtype # 获取数据类型，例如 dtype('int64')

	# 排序
	sorted_arr = np.sort(arr1d)

	# 条件选择
	mask = arr1d > 3
	selected_elements = arr1d[mask]

	# 数组重塑（reshape）
	reshaped_arr = arr1d.reshape((1, 5))

	# 连接数组（concatenate）
	concat_arr = np.concatenate((arr1d, [6, 7]))

	# 数组转置（transpose）
	transposed_arr = arr2d.T

	# 矩阵乘法
	dot_product = np.dot(arr2d, arr2d.T)

4. 统计和聚合

python复制代码

	# 最小值、最大值、平均值、中位数等
	min_val = np.min(arr1d)
	max_val = np.max(arr1d)
	mean_val = np.mean(arr1d)
	median_val = np.median(arr1d)

	# 标准差和方差
	std_dev = np.std(arr1d)
	variance = np.var(arr1d)

	# 沿指定轴求和
	sum_axis0 = np.sum(arr2d, axis=0)
	sum_axis1 = np.sum(arr2d, axis=1)

5. 查找和搜索

python复制代码

	# 非零元素的索引
	nonzero_indices = np.nonzero(arr1d)

	# 查找特定值的位置
	positions = np.where(arr1d == 3)

	# 查找唯一值和它们的计数
	unique_values, counts = np.unique(arr1d, return_counts=True)

2. pandas

pandas 是一个强大的数据分析工具包，提供了数据结构和数据分析工具，能够处理和分析大量数据。其主要特点包括：

DataFrame：二维的、大小可变的、可以包含异质类型列的表格型数据结构。

Series：一维的、大小可变的、可以包含任何数据类型的数组，以及一组与之相关的数据标签（索引）。

数据读取/写入：可以从各种文件格式（如 CSV、Excel、SQL 数据库等）中读取数据，也可以将数据写入这些格式。

数据处理：提供了数据清洗、转换、合并、重塑等多种功能。

统计分析：提供了各种统计函数和方法。

pandas常用代码

pandas 是 Python 中一个强大的数据分析库，它提供了数据结构（如 DataFrame 和 Series）以及一系列用于数据清洗、转换、分析和可视化的工具。以下是一些 pandas 的常用代码示例：

1. 导入 pandas

python复制代码

import pandas as pd

2. 创建 DataFrame

python复制代码

	# 从字典创建 DataFrame
	data = {
	'Name': ['Alice', 'Bob', 'Charlie'],
	'Age': [25, 30, 35],
	'City': ['New York', 'San Francisco', 'Los Angeles']
	}
	df = pd.DataFrame(data)

	# 从 CSV 文件读取 DataFrame
	df = pd.read_csv('data.csv')

	# 从 SQL 数据库读取 DataFrame
	# 需要安装 sqlalchemy 和数据库连接库（如 pymysql）
	from sqlalchemy import create_engine
	engine = create_engine('mysql+pymysql://user:password@localhost:3306/dbname')
	df = pd.read_sql_table('table_name', engine)

3. 查看 DataFrame 信息

python复制代码

	# 显示前几行
	print(df.head())

	# 显示后几行
	print(df.tail())

	# 显示 DataFrame 的结构（列名、数据类型和非空值数量）
	print(df.info())

	# 显示 DataFrame 的前几行和列的数据类型
	print(df.dtypes)

	# 显示 DataFrame 的描述性统计信息
	print(df.describe())

4. 选择数据

python复制代码

	# 选择列
	print(df['Age'])

	# 选择多列
	print(df[['Name', 'Age']])

	# 使用 loc 和 iloc 选择行
	print(df.loc[0]) # 选择第一行
	print(df.iloc[0]) # 同样选择第一行，但基于整数位置

	# 基于条件选择行
	print(df[df['Age'] > 30])

5. 数据清洗和转换

python复制代码

	# 处理缺失值
	df.fillna(0, inplace=True) # 将缺失值替换为 0

	# 重命名列名
	df.rename(columns={'Age': 'Age_Years'}, inplace=True)

	# 删除列
	df.drop('City', axis=1, inplace=True)

	# 删除行
	df.drop(df[df['Age_Years'] < 30].index, inplace=True)

	# 数据类型转换
	df['Age_Years'] = df['Age_Years'].astype(int)

	# 字符串操作（例如，将字符串转为大写）
	df['Name'] = df['Name'].str.upper()

	# 应用函数到 DataFrame 的每个元素
	df['Age_Squared'] = df['Age_Years'].apply(lambda x: x**2)

6. 数据分组和聚合

python复制代码

	# 使用 groupby 进行分组
	grouped = df.groupby('City')

	# 对分组后的数据进行聚合（例如，计算每个城市的平均年龄）
	agg_result = grouped['Age_Years'].mean()

	# 多重聚合
	agg_result = grouped.agg({'Age_Years': ['mean', 'count']})

7. 数据排序

python复制代码

	# 按列排序
	df_sorted = df.sort_values(by='Age_Years')

	# 按多列排序
	df_sorted_multi = df.sort_values(by=['City', 'Age_Years'])

8. 保存到文件

python复制代码

	# 保存到 CSV 文件
	df.to_csv('output.csv', index=False)

	# 保存到 Excel 文件
	df.to_excel('output.xlsx', index=False)

	# 保存到 SQL 数据库
	df.to_sql('table_name', engine, if_exists='replace', index=False)

3. matplotlib

matplotlib 是一个 Python 2D 绘图库，它提供了类似于 MATLAB 的绘图框架和界面，可以用于绘制各种静态、动态、交互式的可视化图形。其主要特点包括：

简单的绘图语法：类似于 MATLAB 的绘图命令，易于上手。

丰富的图形类型：支持折线图、散点图、柱状图、饼图等多种图形类型。

精细的图形控制：可以控制图形的颜色、线条样式、坐标轴标签等。

交互性：可以与图形进行交互，如放大、缩小、拖动等。

集成性：可以与 numpy、pandas 等库无缝集成，方便地进行数据分析和可视化。

matplotlib 是 Python 中一个非常流行的绘图库，它提供了丰富的绘图功能和接口。以下是一些 matplotlib 的常用代码示例：

1. 导入 matplotlib

python复制代码

import matplotlib.pyplot as plt

2. 绘制折线图

python复制代码

	x = [1, 2, 3, 4, 5]
	y = [2, 4, 6, 8, 10]

	plt.plot(x, y)
	plt.title('Line Plot')
	plt.xlabel('X Axis')
	plt.ylabel('Y Axis')
	plt.show()

3. 绘制散点图

python复制代码

	x = [1, 2, 3, 4, 5]
	y = [2, 3, 5, 7, 11]

	plt.scatter(x, y)
	plt.title('Scatter Plot')
	plt.xlabel('X Axis')
	plt.ylabel('Y Axis')
	plt.show()

4. 绘制柱状图

python复制代码

	x = ['A', 'B', 'C', 'D', 'E']
	y = [2, 4, 6, 8, 10]

	plt.bar(x, y)
	plt.title('Bar Plot')
	plt.xlabel('Category')
	plt.ylabel('Value')
	plt.show()

5. 绘制饼图

python复制代码

	labels = ['A', 'B', 'C', 'D', 'E']
	sizes = [15, 30, 45, 10, 5]

	plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
	plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
	plt.show()

6. 绘制直方图

python复制代码

	data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5]

	plt.hist(data, bins=5, edgecolor='black')
	plt.title('Histogram')
	plt.xlabel('Value')
	plt.ylabel('Frequency')
	plt.show()

7. 绘制多个子图

python复制代码

	plt.figure(figsize=(10, 6))

	plt.subplot(2, 2, 1) # 2 rows, 2 columns, first plot
	plt.plot(x, y)
	plt.title('First Plot')

	plt.subplot(2, 2, 2) # second plot
	plt.scatter(x, y)
	plt.title('Second Plot')

	plt.subplot(2, 2, 3) # third plot
	plt.bar(x, y)
	plt.title('Third Plot')

	plt.tight_layout() # Adjusts spacing between subplots
	plt.show()

8. 添加图例

python复制代码

	x = [1, 2, 3, 4, 5]
	y1 = [2, 4, 6, 8, 10]
	y2 = [3, 5, 7, 9, 11]

	plt.plot(x, y1, label='Line 1')
	plt.plot(x, y2, label='Line 2')
	plt.legend()
	plt.title('Line Plot with Legend')
	plt.xlabel('X Axis')
	plt.ylabel('Y Axis')
	plt.show()

9. 自定义颜色、线型等

python复制代码

	x = [1, 2, 3, 4, 5]
	y = [2, 4, 6, 8, 10]

	plt.plot(x, y, color='red', linestyle='--', marker='o')
	plt.title('Customized Line Plot')
	plt.xlabel('X Axis')
	plt.ylabel('Y Axis')
	plt.show()

numpy

import numpy as np


#创建数组

# 一维数组
arr1d = np.array([1, 2, 3, 4, 5])

# 二维数组（矩阵）
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# 使用 zeros, ones, empty 创建特定形状的数组
zeros_arr = np.zeros((3, 3))
ones_arr = np.ones((2, 4))
empty_arr = np.empty((2, 3))  # 注意：内容未初始化，可能是任何值

# 使用 arange, linspace 创建一维数组
arange_arr = np.arange(0, 10, 2)  # 从 0 开始，到 10（不包括），步长为 2
linspace_arr = np.linspace(0, 1, 5)  # 从 0 到 1，生成 5 个等间隔的数

# 使用 random 创建随机数组
random_arr = np.random.rand(3, 3)  # 生成 0 到 1 之间的随机数
randint_arr = np.random.randint(0, 10, (3, 3))  # 生成 0 到 9 之间的随机整数

#数组操作

# 数组运算（元素级）
result = arr1d + arr1d  # 对应元素相加
result = arr2d * 2  # 所有元素乘以 2

# 索引和切片
element = arr2d[0, 0]  # 获取第一个元素
row = arr2d[1, :]  # 获取第二行
col = arr2d[:, 1]  # 获取第二列

# 形状（shape）和大小（size）
shape = arr2d.shape  # 获取形状，例如 (3, 3)
size = arr2d.size  # 获取元素总数

# 数据类型（dtype）
dtype = arr1d.dtype  # 获取数据类型，例如 dtype('int64')

# 排序
sorted_arr = np.sort(arr1d)

# 条件选择
mask = arr1d > 3
selected_elements = arr1d[mask]

# 数组重塑（reshape）
reshaped_arr = arr1d.reshape((1, 5))

# 连接数组（concatenate）
concat_arr = np.concatenate((arr1d, [6, 7]))

# 数组转置（transpose）
transposed_arr = arr2d.T

# 矩阵乘法
dot_product = np.dot(arr2d, arr2d.T)

#统计和聚合

# 最小值、最大值、平均值、中位数等
min_val = np.min(arr1d)
max_val = np.max(arr1d)
mean_val = np.mean(arr1d)
median_val = np.median(arr1d)

# 标准差和方差
std_dev = np.std(arr1d)
variance = np.var(arr1d)

# 沿指定轴求和
sum_axis0 = np.sum(arr2d, axis=0)
sum_axis1 = np.sum(arr2d, axis=1)

#查找和搜索

# 非零元素的索引
nonzero_indices = np.nonzero(arr1d)

# 查找特定值的位置
positions = np.where(arr1d == 3)

# 查找唯一值和它们的计数
unique_values, counts = np.unique(arr1d, return_counts=True)

pandas


import pandas as pd


#创建DataFrame和Series

# 创建 DataFrame  
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Paris', 'London']}
df = pd.DataFrame(data)

# 创建 Series  
s = pd.Series([1, 2, 3, 4], name='Numbers')

#读取和写入数据

# 读取 CSV 文件  
df = pd.read_csv('data.csv')

# 写入 CSV 文件  
df.to_csv('output.csv', index=False)

# 读取 Excel 文件  
df = pd.read_excel('data.xlsx')

# 写入 Excel 文件  
df.to_excel('output.xlsx', index=False)

#选择数据

# 选择列  
ages = df['Age']

# 选择多列  
info = df[['Name', 'Age']]

# 选择行  
first_row = df.iloc[0]  # 使用整数位置  
bob_row = df[df['Name'] == 'Bob']  # 使用条件  

# 选择特定行和列  
selected_data = df.loc[df['Age'] > 30, ['Name', 'City']]

#数据处理

# 对某列应用函数  
df['AgeSquared'] = df['Age'] ** 2

# 替换值  
df.replace({'City': {'New York': 'NYC'}}, inplace=True)

# 删除列  
df.drop('AgeSquared', axis=1, inplace=True)

# 删除行（基于条件）  
df = df[df['Age'] > 20]

# 数据排序  
df_sorted = df.sort_values(by='Age')

# 数据分组和聚合  
grouped = df.groupby('City')['Age'].mean()

#数据合并和连接

# 合并两个 DataFrame（基于索引）  
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'key': ['K0', 'K0', 'K1', 'K1'],
                    'C': ['C0', 'C1', 'C2', 'C3']})

df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'key': ['K0', 'K1', 'K0', 'K1'],
                    'D': ['D0', 'D1', 'D2', 'D3']})

merged = pd.merge(df1, df2, on='key')

# 连接两个 DataFrame（基于索引）  
concatenated = pd.concat([df1, df2], ignore_index=True)

#数据统计

# 描述性统计  
stats = df.describe()

# 唯一值计数  
unique_counts = df['City'].value_counts()

# 空值检查  
null_counts = df.isnull().sum()

#数据可视化
# 绘制直方图  
df['Age'].plot(kind='hist', bins=20)

# 使用 seaborn 进行更复杂的可视化  
import seaborn as sns

sns.barplot(x='City', y='Age', data=df)

matplotlib

import matplotlib.pyplot as plt

#绘制折线图

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.title('Line Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()

#绘制散点图

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

plt.scatter(x, y)
plt.title('Scatter Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()

#绘制柱状图

x = ['A', 'B', 'C', 'D', 'E']
y = [2, 4, 6, 8, 10]

plt.bar(x, y)
plt.title('Bar Plot')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

#绘制饼图

labels = ['A', 'B', 'C', 'D', 'E']
sizes = [15, 30, 45, 10, 5]

plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90)
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.show()

#绘制直方图

data = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5]

plt.hist(data, bins=5, edgecolor='black')
plt.title('Histogram')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

#绘制多个子图

plt.figure(figsize=(10, 6))

plt.subplot(2, 2, 1)  # 2 rows, 2 columns, first plot
plt.plot(x, y)
plt.title('First Plot')

plt.subplot(2, 2, 2)  # second plot
plt.scatter(x, y)
plt.title('Second Plot')

plt.subplot(2, 2, 3)  # third plot
plt.bar(x, y)
plt.title('Third Plot')

plt.tight_layout()  # Adjusts spacing between subplots
plt.show()

#添加图例

x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [3, 5, 7, 9, 11]

plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.title('Line Plot with Legend')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()

#自定义颜色、线型等

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, color='red', linestyle='--', marker='o')
plt.title('Customized Line Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()