numpy学习笔记

最新推荐文章于 2024-07-29 22:04:02 发布

我就懂点皮毛

最新推荐文章于 2024-07-29 22:04:02 发布

阅读量497

点赞数

分类专栏： DataAnalysis 文章标签：数据分析 python numpy

本文链接：https://blog.csdn.net/weixin_45515116/article/details/108892332

版权

DataAnalysis 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一、list 和 ndarray 对比

list 分散存储，使用指针;
ndarray 存在连续均匀的内存中;
使用 y = x * 2，而是使用x *= 2 （避免隐式拷贝)

二、numpy 中两个重要对象

ndarray（N-dimensional array object），多维数组
ufunc（universal function object），对多维数组的操作

三、ndarray

ndarray 实际上是多维数组的含义。在 NumPy 数组中，维数称为秩（rank），一维数组的秩为 1，二维数组的秩为 2，以此类推。在 NumPy 中，每一个线性的数组称为一个轴（axis），其实秩就是描述轴的数量。

(一) 创建数组

import numpy as np

a = np.array([1, 2, 3])

array([1, 2, 3])

a.shape

(3,)

a.dtype

dtype('int32')

b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

b.shape

(3, 3)

b.dtype

dtype('int32')

(二) 结构数组

import numpy as np

person_type = np.dtype({    'names':['name', 'age', 'chinese', 'math', 'english'],    'formats':['S32','i', 'i', 'i', 'f']})

person_type

dtype([('name', 'S32'), ('age', '<i4'), ('chinese', '<i4'), ('math', '<i4'), ('english', '<f4')])

peoples = np.array([("ZhangFei",32,75,100, 90),
                    ("GuanYu",24,85,96,88.5),
                    ("ZhaoYun",28,85,92,96.5),
                    ("HuangZhong",29,65,85,100)],
                   dtype=person_type)

peoples

array([(b'ZhangFei', 32, 75, 100,  90. ), (b'GuanYu', 24, 85,  96,  88.5),
       (b'ZhaoYun', 28, 85,  92,  96.5),
       (b'HuangZhong', 29, 65,  85, 100. )],
      dtype=[('name', 'S32'), ('age', '<i4'), ('chinese', '<i4'), ('math', '<i4'), ('english', '<f4')])

peoples.shape

(4,)

peoples.dtype

dtype([('name', 'S32'), ('age', '<i4'), ('chinese', '<i4'), ('math', '<i4'), ('english', '<f4')])

names = peoples[:]['name']

names

array([b'ZhangFei', b'GuanYu', b'ZhaoYun', b'HuangZhong'], dtype='|S32')

names.dtype

dtype('S32')

peoples[0]['name']

b'ZhangFei'

ages = peoples[:]['age']

ages

array([32, 24, 28, 29], dtype=int32)

chineses = peoples[:]['chinese']

chineses

array([75, 85, 85, 65], dtype=int32)

maths = peoples[:]['math']

maths

array([100,  96,  92,  85], dtype=int32)

englishs = peoples[:]['english']

englishs

array([ 90. ,  88.5,  96.5, 100. ], dtype=float32)

np.mean(ages)

28.25

np.mean(chineses)

77.5

np.mean(maths)

93.25

np.mean(englishs)

93.75

四、ufunc 运算

(一) 连续数组的创建

x1 = np.arange(1,11,2)

x1

array([1, 3, 5, 7, 9])

arange() 类似内置函数 range()，通过指定初始值、终值、步长来创建等差数列的一维数组，默认是不包括终值的。

x2 = np.linspace(1,9,5)

x2

array([1., 3., 5., 7., 9.])

linspace 是 linear space 的缩写，代表线性等分向量的含义。linspace() 通过指定初始值、终值、元素个数来创建等差数列的一维数组，默认是包括终值的。

(二) 算数运算

np.add(x1, x2)

array([ 2.,  6., 10., 14., 18.])

np.subtract(x1, x2)

array([0., 0., 0., 0., 0.])

np.multiply(x1, x2)

array([ 1.,  9., 25., 49., 81.])

np.divide(x1, x2)

array([1., 1., 1., 1., 1.])

np.power(x1, x2)   # 求次方

array([1.00000000e+00, 2.70000000e+01, 3.12500000e+03, 8.23543000e+05,
       3.87420489e+08])

np.power(x1, 2)

array([ 1,  9, 25, 49, 81], dtype=int32)

np.remainder(x1, x2)   # 取余

array([0., 0., 0., 0., 0.])

np.mod(x1, x2)   # 取余

array([0., 0., 0., 0., 0.])

（三) 统计函数

1. 计数组 / 矩阵中的最大值函数 amax()，最小值函数 amin()

import numpy as np

a = np.array([[1,2,3], [4,5,6], [7,8,9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

np.amin(a)

np.min(a)

np.min(a, 0)

array([1, 2, 3])

np.min(a, 1)

array([1, 4, 7])

np.max(a)

np.max(a, 0)

array([7, 8, 9])

np.max(a, 1)

array([3, 6, 9])

axis=None: 统计范围为整个数组
axis=0: 统计范围为各列
axis=1: 统计范围为各行

2. 统计最大值与最小值之差 ptp()

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

np.ptp(a)

np.ptp(a, 0)

array([6, 6, 6])

np.ptp(a, 1)

array([2, 2, 2])

3. 统计数组的百分位数 percentile()

np.percentile(a, 50)

5.0

np.percentile(a, 50, axis=0)

array([4., 5., 6.])

np.percentile(a, 50, axis=1)

array([2., 5., 8.])

percentile() 代表着第 p 个百分位数，这里 p 的取值范围是 0-100，如果 p=0，那么就是求最小值，如果 p=50 就是求平均值，如果 p=100 就是求最大值。

3. 统计数组中的中位数 median()、平均数 mean()

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

np.median(a)

5.0

np.median(a, axis=0)

array([4., 5., 6.])

np.median(a, axis=1)

array([2., 5., 8.])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

np.mean(a)

5.0

np.mean(a, axis=0)

array([4., 5., 6.])

np.mean(a, axis=1)

array([2., 5., 8.])

4. 统计数组中的加权平均值 average()

a = np.array([1,2,3,4])

wts = np.array([1,2,3,4])

wts

array([1, 2, 3, 4])

np.average(a)

2.5

np.average(a,weights=wts)

3.0

np.average(a,weights=wts)=(1*1+2*2+3*3+4*4)/(1+2+3+4)=3.0

5. 统计数组中的标准差 std()、方差 var()

a = np.array([[1,2,3], [4,5,6], [7,8,9]])

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

np.std(a)

2.581988897471611

np.std(a, axis=0)

array([2.44948974, 2.44948974, 2.44948974])

np.std(a, axis=1)

array([0.81649658, 0.81649658, 0.81649658])

5. NumPy 排序

a = np.array([[4,3,2],[2,4,1]])

array([[4, 3, 2],
       [2, 4, 1]])

np.sort(a)

array([[2, 3, 4],
       [1, 2, 4]])

np.sort(a, axis=None)

array([1, 2, 2, 3, 4, 4])

np.sort(a, axis=0)

array([[2, 3, 1],
       [4, 4, 2]])

np.sort(a, axis=1)

array([[2, 3, 4],
       [1, 2, 4]])

使用 sort 函数，sort(a, axis=-1, kind=‘quicksort’, order=None)，默认情况下使用的是快速排序；在 kind 里，可以指定 quicksort、mergesort、heapsort 分别表示快速排序、合并排序、堆排序。同样 axis 默认是 -1，即沿着数组的最后一个轴进行排序，也可以取不同的 axis 轴，或者 axis=None 代表采用扁平化的方式作为一个向量进行排序。另外 order 字段，对于结构化的数组可以指定按照某个字段进行排序。

五、练习题：统计全班的成绩

person_type = np.dtype({
    'names':['name', 'chinese', 'english', 'math'],
    'formats':['U32','f', 'f', 'f']
})

person_type

dtype([('name', '<U32'), ('chinese', '<f4'), ('english', '<f4'), ('math', '<f4')])

people = np.array([
    ('张飞', 66, 65, 30),
    ('关羽', 95, 85, 98),
    ('赵云', 93, 92, 96),
    ('黄忠', 90, 88, 77),
    ('典韦', 80, 90, 90)
], dtype=person_type)

people

array([('张飞', 66., 65., 30.), ('关羽', 95., 85., 98.),
       ('赵云', 93., 92., 96.), ('黄忠', 90., 88., 77.),
       ('典韦', 80., 90., 90.)],
      dtype=[('name', '<U32'), ('chinese', '<f4'), ('english', '<f4'), ('math', '<f4')])

np.mean(people[:]['chinese'])   # 语文平均成绩

84.8

np.min(people[:]['chinese'])

66.0

np.max(people[:]['chinese'])

95.0

np.var(people[:]['chinese'])

114.96001

np.std(people[:]['chinese'])

10.72194

np.mean(people[:]['english'])   # 英语平均成绩

84.0

np.min(people[:]['english'])

65.0

np.max(people[:]['english'])

92.0

np.var(people[:]['english'])

95.6

np.std(people[:]['english'])

9.777525

np.mean(people[:]['math'])   # 数学平均成绩

78.2

np.min(people[:]['math'])

30.0

np.max(people[:]['math'])

98.0

np.var(people[:]['math'])

634.55994

np.std(people[:]['math'])

25.190474

totles = people[:]['chinese'] + people[:]['english'] + people[:]['math']

totles

array([161., 278., 281., 255., 260.], dtype=float32)

np.sort(totles)

array([161., 255., 260., 278., 281.], dtype=float32)

np.flip(np.sort(totles))

array([281., 278., 260., 255., 161.], dtype=float32)

order = np.argsort(totles)

order

array([0, 3, 4, 1, 2], dtype=int64)

order = np.flip(order)

order

array([2, 1, 4, 3, 0], dtype=int64)

people[:]['name']

array(['张飞', '关羽', '赵云', '黄忠', '典韦'], dtype='<U32')

people[:]['name'][order]

array(['赵云', '关羽', '典韦', '黄忠', '张飞'], dtype='<U32')

name_score = zip(people[:]['name'], totles)

for name, score in name_score:
    print(name, ' ', score)

张飞   161.0
关羽   278.0
赵云   281.0
黄忠   255.0
典韦   260.0

name_score = [item for item in name_score]

name_score

[('张飞', 161.0), ('关羽', 278.0), ('赵云', 281.0), ('黄忠', 255.0), ('典韦', 260.0)]

orders = [name_score[i] for i in order]

orders

[('赵云', 281.0), ('关羽', 278.0), ('典韦', 260.0), ('黄忠', 255.0), ('张飞', 161.0)]

这样指定类型没有使用pandas方便

六、收获

明确了numpy中的两个终于对象，ndarray和ufunc
明白了axis的含义，与数组的维度同步
知道了numpy还能定义结构，用名称取一列，虽然没有pandas好用

我就懂点皮毛

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录