Numpy、Pandas、Matplotlib 快速入门

焦糖酒

已于 2024-05-31 21:18:20 修改

阅读量42

点赞数 21

分类专栏：机器学习文章标签：数据分析数据挖掘

于 2024-05-02 23:00:22 首次发布

本文链接：https://blog.csdn.net/2301_80272161/article/details/138401964

版权

机器学习专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Numpy、Pandas、Matplotlib 快速入门

这篇简单介绍一下数据分析中常用到的几个库函数，强烈推荐使用JupyterNotebook，有机会写一篇教程速通一下，下面的图也都是直接从Jupyter中截取出来的。

基础库介绍

Numpy	Pandas	Matplotlib
主要用于处理多维数组和矩阵运算	用于数据处理和分析的库，提供了DataFrame数据结构和各种数据操作功能，如数据清洗、转换、筛选等	用于数据可视化的库，提供了各种绘图函数和工具，可以创建各种类型的图表，如折线图、柱状图、散点图等

Numpy

简介：NumPy 是 Python 中科学计算的基础包。它是一个 Python 库，提供多维数组对象、各种派生对象（例如掩码数组和矩阵）以及用于对数组进行快速操作的各种例程，包括数学、逻辑、形状操作、排序、选择、I/O、离散傅里叶变换、基本线性代数、基本统计运算、随机模拟等等。

数组array是 NumPy 库的中心数据结构。数组是值的网格，它包含有关原始数据、如何定位元素以及如何解释元素的信息。它有一个元素网格，可以以各种方式进行索引。这些元素都属于同一类型，称为数组 dtype 。

人话就是方便进行数组、矩阵运算

Scipy

简介：Scipy是一个基于NumPy的Python科学计算库，提供了更多高级的数学、科学和工程计算功能。

import numpy as np

#创建初始数组(矩阵)
print("1d array")
a = np.arange(6)                    # 1d array
print(a)

print("2d array")
b = np.arange(12).reshape(4, 3)     # 2d array
print(b)

print("3d array")
c = np.arange(30).reshape(2, 3, 5)  # 3d array
print(c)

arr = np.random.normal(size=1000)
print(arr)

在这里插入图片描述

#数组运算 maximum, minimum, sum, mean, product, standard deviation, and more
import matplotlib.pyplot as plt
plt.hist(arr)
print("max:",arr.max())
print("min:",arr.min())
print("mean:",arr.mean())
print("sum:",arr.sum())

# import matplotlib.pyplot as plt
# plt.hist(arr,bins=15)
print("std:",arr.std())

在这里插入图片描述

#矩阵运算
matrix=np.array([[1, 2], [5, 3], [4, 6]])
print(matrix)

在这里插入图片描述

#按行/列求最大值 列：axis=0，行：axis=1
m0=matrix.max(axis=0)
m1=matrix.max(axis=1)
print(m0)
print(m1)

在这里插入图片描述

#reshape重塑矩阵 arr.reshape()
print("reshape")
print(matrix)
rmatrix=matrix.reshape(2,3)
print(rmatrix)

#transpose 转置  arr.transpose() arr.T
print("transpose")
print(matrix.transpose())
print(matrix.T)

在这里插入图片描述

Matplotlib

人话就是画图的

import matplotlib.pyplot as plt
import numpy as np

x1=np.random.rand(10)
x2=np.random.rand(10)

fig = plt.figure()
ax = fig.add_subplot(221) 
ax.plot(x1)
ax = fig.add_subplot(222)
ax.plot(x2)

在这里插入图片描述

你可以一个一个设置基础属性

fig = plt.figure()

# Creating subplot/axes
ax = fig.add_subplot(111)

# Setting plot title
ax.set_title('My plot title')

# Setting X-axis and Y-axis limits
ax.set_xlim([0, 10])
ax.set_ylim([-5, 5])

# Setting X-axis and Y-axis labels
ax.set_ylabel('My y-axis label')
ax.set_xlabel('My x-axis label')

# Showing the plot
plt.show()

在这里插入图片描述

也可以一口气设置

fig = plt.figure()

# Creating subplot/axes
ax = fig.add_subplot(111)

# Setting title and axes properties
ax.set(title='An Axes Title', xlim=[0, 10], ylim=[-5, 5], ylabel='My y-axis label', xlabel='My x-axis label')

plt.show()

在这里插入图片描述

下面主要以plot为例，matplotlib的图像类型其实相当丰富

import numpy as np

x=np.random.rand(10)
print(x)

# Plot lists 'x' 
plt.plot(x)

# Plot axes labels and show the plot
plt.xlabel('X-axis Label')
plt.show()

在这里插入图片描述

# Create a figure with four subplots and shared axes
import matplotlib.pyplot as plt
import numpy as np
x=np.random.rand(10)

fig, axes = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)

axes[0, 0].set(title='Upper Left')
axes[0, 0].plot(x)

##设置颜色
axes[0, 1].set(title='Upper Right')
axes[0, 1].plot(x,'g')

##设置线条
axes[1, 0].set(title='Lower Left')
axes[1, 0].plot(x,'g*--')

##标记点和线条颜色分开
axes[1, 1].set(title='Lower Right')
axes[1, 1].plot(x,'g')
axes[1, 1].plot(x,'r*')

plt.show()

在这里插入图片描述

Pandas

Pandas 提供两种基本类型的数据结构:Series和Dataframe

Series是可以保存任何类型数据的一维数组

Dataframe 一种二维结构，用于将数据保存在包含行和列的表中

import pandas as pd

s1 = pd.Series([23,324,2,0,"ABC","DEF",-123])
print(s1)

在这里插入图片描述

s2 = pd.Series([23,324,2,0,"ABC","DEF",-123],index=["a","b","c","d","e","f","g"])

print(s2)
print("我们设置的index")
print(s2["b"])

在这里插入图片描述

import numpy as np

s3 = pd.Series(np.random.rand(100000))
print(s3)

##使用pandas自带的制图函数
ax = s3.plot.hist(bins=100)
ax.set_xlabel("Number")
ax.set_ylabel("Entries per bin")
ax.set_title("Uniform distribution")

在这里插入图片描述

import matplotlib.pyplot as plt

##也可以使用matplotlib的
plt.hist(s3,bins=100)
plt.title("Uniform distribution")

在这里插入图片描述

Dataframes

一些主要功能：
数据表示：以包含行和列的表格式存储数据。
异构数据类型：可以在不同的列（例如，整数、浮点数、字符串、布尔值）中保存不同的数据类型。
标签：每行和每列都有一个标签（索引和列名称）。
可变：允许数据操作和修改。
强大的操作：提供用于数据分析、操作和探索的各种功能和方法。
可扩展：可以通过库和用户定义的函数使用其他功能进行自定义和扩展。

df = pd.DataFrame(
{
    "Name": ["drunksweet", "jiaotangjiu","soubai","drunksweet", "jiaotangjiu","soubai",],
    "Age": [18, 19, 18, 18, 19, 18],
    "Sex": ["male", "male", "male","male", "male", "male"],
})
print(df)

df["Age"]