第七十篇数据处理与分析 Numpy

最新推荐文章于 2020-11-24 11:42:29 发布

Laughing@me

最新推荐文章于 2020-11-24 11:42:29 发布

阅读量301

点赞数

分类专栏：数据分析文章标签： numpy 数据分析

本文链接：https://blog.csdn.net/qq_45503700/article/details/105545315

版权

数据分析专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Numpy官网API: https://www.numpy.org.cn/reference/arrays/ndarray.html

一、简介

NumPy 是一个 Python 的第三方库，代表 “Numeric Python”，主要用于数学/科学计算。它是一个由多维数组对象和用于处理数组的例程集合组成的库。

安装：

pip install numpy

引用：

不可以from numpy import * ,因为nump中有些min,max等方法和python内置方法冲突

import numpy as np

简单例子，创建二维数组

import numpy as np

lis = [1, 2, 3]
lis2 = [4, 5, 6]
a = np.array([lis, lis2])
print(a)

list直接转为array时，shape为(len,)，如果想要直接级联需要转化维度：
np.array(lis).reshape(-1,1) 等

>>> np.array([i for i in range(10)])
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.array([i for i in range(10)]).shape
(10,)

当然array转化为列表时的方法:

lis = array_obj.tolist()

二、用法

1 创建数组

在这里插入图片描述

>>> a=np.array([[1,2,3],[4,5,6]])
>>> a
array([[1, 2, 3],
       [4, 5, 6]])

>>> b=np.arange(1.1,3.3,0.5)   #支持小数，第三个参数为间隔，顾前不顾后
>>> b
array([1.1, 1.6, 2.1, 2.6, 3.1])

>>> c=np.linspace(1.1,3.3,10) #第三个参数为多少个数据,平均分配
>>> c
array([1.1       , 1.34444444, 1.58888889, 1.83333333, 2.07777778,
       2.32222222, 2.56666667, 2.81111111, 3.05555556, 3.3       ])

>>> d=np.zeros([3,4]) #三行四列
>>> d
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

>>> e=np.ones([3,4]) #全为1
>>> e
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

>>> f=np.empty([3,4])
>>> f
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

>>> g=np.eye(3)
>>> g
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

2 数组常用属性

在这里插入图片描述

>>> a
array([[1, 2, 3],
       [4, 5, 6]])
>>> a.T
array([[1, 4],
       [2, 5],
       [3, 6]])
>>> a.dtype
dtype('int32')
>>> a.size
6
>>> a.ndim
2
>>> a.shape
(2, 3)

3 数组数据类型

在这里插入图片描述

4 相同大小数组运算

>>> a
array([[1, 2, 3],
       [4, 5, 6]])
       
>>> a*2
array([[ 2,  4,  6],
       [ 8, 10, 12]])
>>> b=np.array([[1,1,1],[2,2,2]])

>>> b
array([[1, 1, 1],
       [2, 2, 2]])
>>> a+b
array([[2, 3, 4],
       [6, 7, 8]])

5 数组维度转换

reshape:

在reshape转化时候必须是转化的大小等于转化后的大小比如n.shape(2,6) 转化 n.reshape((3,4))

>>> b
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])  
>>> c=b.reshape(5,2) #变为5行2列
>>> c
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> d=c.reshape(10)  #变为1行
>>> d
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

6 数组索引

>>> a
array([[1, 2, 3],
       [4, 5, 6]])
>>>
>>> a[1][2] #列表中可以这样使用
6
>>> a[1,2]  #数组中使用
6

7 数组切片

>>> a
array([[1, 2, 3],
       [4, 5, 6]])
>>> a[0,1:2]  #[第几行，第几列]  左开右闭
array([2])
>>> a[0,1:]
array([2, 3])

数组的赋值不会复制，会改变原来的数据值，在大量的数据下如果复制的话会影响性能，可以用copy()来复制数据，保持原数据不被改变

>>> a
array([[1, 2, 3],
       [4, 5, 6]])
>>> b=a[0,1:]
>>> b
array([2, 3])    #改变切片后的内容 
>>> b[0]=4
>>> b
array([4, 3])
>>> a           #原数据发生改变，两者指向同一个地址
array([[1, 4, 3],
       [4, 5, 6]])

8 筛选数据

>>> li=[random.randint(1,10) for i in range(30)]
>>> li
[5, 10, 3, 4, 2, 10, 10, 2, 10, 10, 9, 10, 10, 8, 5, 1, 1, 8, 8, 8, 6, 3, 10, 2, 6, 1, 1, 4, 8, 1]
>>> a=np.array(li)
>>> a
array([ 5, 10,  3,  4,  2, 10, 10,  2, 10, 10,  9, 10, 10,  8,  5,  1,  1,
        8,  8,  8,  6,  3, 10,  2,  6,  1,  1,  4,  8,  1])
>>>
>>> a>5
array([False,  True, False, False, False,  True,  True, False,  True,
        True,  True,  True,  True,  True, False, False, False,  True,
        True,  True,  True, False,  True, False,  True, False, False,
       False,  True, False])
>>> a[a>5]
array([10, 10, 10, 10, 10,  9, 10, 10,  8,  8,  8,  8,  6, 10,  6,  8])
#对数组中每个数据比较返回大于5的数据

9 级联

np.concatenate()
axis=0:轴向 0表示的是竖直的轴向 1水平的轴向连接

arr=array([[ 83, 105,  62,  94],
       [110, 111,  71,  64],
       [108,  91,  65,  73],
       [107, 101, 109, 106],
       [ 67, 101,  80, 113],
       [119,  74,  89, 109]])
np.concatenate((arr,arr),axis=1)
#输出
array([[ 83, 105,  62,  94,  83, 105,  62,  94],
       [110, 111,  71,  64, 110, 111,  71,  64],
       [108,  91,  65,  73, 108,  91,  65,  73],
       [107, 101, 109, 106, 107, 101, 109, 106],
       [ 67, 101,  80, 113,  67, 101,  80, 113],
       [119,  74,  89, 109, 119,  74,  89, 109]])

合并九张照片

import numpy as np
import matplotlib.pyplot as plt
img_arr = plt.imread('cat.jpg')
img_data=np.concatenate((img_arr,img_arr,img_arr),axis=1)
img_all=np.concatenate((img_data,img_data,img_data),axis=0)
plt.imshow(img_all)

在这里插入图片描述

10 排序

快速排序
np.sort()与ndarray.sort()都可以，但有区别：

np.sort()不改变输入
ndarray.sort()本地处理，不占用空间，但改变输入

arr = np.array([3,8,5,7,6])
arr
array([3, 8, 5, 7, 6])
np.sort(arr)
array([3, 5, 6, 7, 8])
arr.sort()
arr
array([3, 5, 6, 7, 8])

部分排序
np.partition(a,k)

有的时候我们不是对全部数据感兴趣，我们可能只对最小或最大的一部分感兴趣。

当k为正时，我们想要得到最小的k个数
当k为负时，我们想要得到最大的k个数
```
np.partition(arr,kth=3)
array([3, 5, 6, 7, 8])
```

11 random生成数组

1 numpy.random.rand(n0,n1,…)
rand函数根据给定维度生成[0,1)之间的数据，包含0，不包含1
返回指定维度的数据

2 numpy.random.randn(n0,n1,…)
randn函数返回一个或一组样本，具有标准正态分布。
返回指定维度的数据

3 numpy.random.randint(low=1,hight=10,size(3,1))
返回范围内的值，包含low，不包含high

4 numpy.random.choice(b,size=(3,1))
b必须为一维数组或者整数，整数时随机生成的值为整数以内的值；当为数组时，生成的值为数组内的值

np.random.seed(1)
当指定seed值为1时，用seed(1)生成的值都是一样的，不会再改变

np.random.seed(1)
b=np.random.randn(4)
print(b)
a=np.random.choice(b,size=(2,10))
print(a)

输出

[ 1.62434536 -0.61175641 -0.52817175 -1.07296862]
[[-0.52817175 -0.61175641 -0.52817175  1.62434536 -0.52817175 -0.61175641
  -0.52817175  1.62434536 -1.07296862  1.62434536]
 [-0.52817175  1.62434536 -0.61175641 -0.52817175 -0.52817175  1.62434536
  -1.07296862 -1.07296862 -0.61175641 -0.61175641]]

三、数学常用函数

1 常用一元函数

在这里插入图片描述

2 二元函数

在这里插入图片描述
np.maximum(0,z)
对应位置逐位比较取最大值

np.maximum([1,2,3,4,5],[0,1,-1,2,7])
#输出
array([1, 2, 3, 4, 7])

3 数学统计

在这里插入图片描述
二维数组求和

>>> x = np.array([[1, 2, 3, 4], [2, 4, 6, 8]])
>>> np.sum(x)
30
>>> np.sum(x,axis=0) #按列求和
array([ 3,  6,  9, 12])
>>> np.sum(x,axis=1)  #按行求和
array([10, 20])

二维数组求平均值

import numpy as np

x = np.array([[1, 2, 3, 4], [2, 4, 6, 8]])

X = np.mean(x, axis=0)  # 按列求平均
Y = np.mean(x, axis=1)  # 按行求平均
Z = np.mean(x)  # 求全部平均
print(X)
print(Y)
print(Z)

结果

[1.5 3.  4.5 6. ]
[2.5 5. ]
3.75

四、常用函数

1. np.array和np.asarray都可将结构数据转换为ndarray类型

但是主要区别就是当数据源是ndarray时，
array仍会copy出一个副本，占用新的内存，但asarray不会。

2. np.round() 返回浮点数x的四舍五入值

np.round(num,n) 返回值该方法返回 x 的小数点四舍五入到n个数字
下面的例子显示了round()方法的使用:

print “round(80.23456, 2) : “, round(80.23456, 2)
print “round(100.000056, 3) : “, round(100.000056, 3)
print “round(-100.000056, 3) : “, round(-100.000056, 3)
当我们运行上面的程序，它会产生以下结果：

round(80.23456, 2) : 80.23
round(100.000056, 3) : 100.0
round(-100.000056, 3) : -100.0

3. np_obj.astype(‘int’) 转换numpy数组的数据类型

>>> a
array([[ 0.67996035, -1.76420626,  1.54116069,  0.33621728],
       [-1.29728681, -1.88959959, -1.12299914,  0.73344504]])
>>> a.dtype
dtype('float64')
>>> a.astype("int")
array([[ 0, -1,  1,  0],
       [-1, -1, -1,  0]])

4. np.percentile(list, q) 对列表求分位数q

比如[1,2,3]50%的分位数是2，1%的分位数是1.01，就是将最小和最大化成了100分，取对应分位的数

>>> (3-1)/100
0.02
>>> np.percentile([1,2,3],0)
1.0
>>> np.percentile([1,2,3],1)
1.02
>>>

5. np.ceil(ndarray) 计算大于等于该值的最小整数

翻译 n 天花板

>>> a = np.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0])
>>> np.ceil(a)
array([-1., -1., -0.,  1.,  2.,  2.,  2.])
>>> np.floor(a)
array([-2., -2., -1.,  0.,  1.,  1.,  2.])

#与之相对应为：
np.floor(a) 向下去整.  n 地板

6. np.vstack()和np.hstack()

np.vstack:按垂直方向（行顺序）堆叠数组构成一个新的数组

a = np.array([[1,2,3]])
a.shape
Out[4]:
(1, 3)
b = np.array([[4,5,6]])
b.shape
Out[5]:
(1, 3)
c = np.vstack((a,b)) # 将两个（1,3）形状的数组按垂直方向叠加
print(c)
c.shape # 输出形状为（2,3）
[[1 2 3]
 [4 5 6]]
Out[6]:
(2, 3)

np.hstack:按水平方向（列顺序）堆叠数组构成一个新的数组

a = np.array([[1,2,3]])
a.shape
Out[11]:
(1, 3)

In [12]:
b = np.array([[4,5,6]])
b.shape
Out[12]:
(1, 3)

In [16]:
c = np.hstack((a,b)) # 将两个（1,3）形状的数组按水平方向叠加
print(c)
c.shape  # 输出形状为（1,6）
[[1 2 3 4 5 6]]
Out[16]:
(1, 6)

7. np.clip()

clip这个函数将将数组中的元素限制在a_min, a_max之间，大于a_max的就使得它等于 a_max，小于a_min,的就使得它等于a_min。

import numpy as np
x=np.array([1,2,3,5,6,7,8,9])
np.clip(x,3,8)
Out[88]:
array([3, 3, 3, 5, 6, 7, 8, 8])

8. np.zeros_like()

这个函数的意思就是生成一个和你所给数组a相同shape的全0数组

>>> np.zeros_like(np.random.rand(3,4))
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

9. np.random.shuffle

np.random,shuffle作用就是重新排序返回一个随机序列作用类似洗牌
>>> a
[2, 4, 3, 1]
>>> np.random.shuffle(a)
>>> a
[3, 2, 1, 4]
>>> np.random.shuffle(a)
>>> a
[4, 3, 2, 1]

10. np.where(condition,x,y)

满足条件(condition)，输出x，不满足输出y。


>>> aa = np.arange(10)
>>> np.where(aa,1,-1)
array([-1,  1,  1,  1,  1,  1,  1,  1,  1,  1])  # 0为False，所以第一个输出-1
>>> np.where(aa > 5,1,-1)
array([-1, -1, -1, -1, -1, -1,  1,  1,  1,  1])

>>> np.where([[True,False], [True,True]],    # 官网上的例子
			 [[1,2], [3,4]],
             [[9,8], [7,6]])
array([[1, 8],
	   [3, 4]])

11. np.isnan()

是否为np.nan —>nan ，为空的话返回True,否则返回False

>>> np.isnan([1,2,np.nan,])
array([False, False,  True])

#将此向量的全部空值替换为0
only_nan_split_points=np.where(np.isnan(only_nan_split_points), only_nan_split_points, 0)

12 . np.argwhere()用法

返回符合条件的索引

>>> a = np.random.rand(3,4)
>>> np.argwhere(a>0.1)
#索引 代表[0,0]点
array([[0, 0],
       [0, 1],
       [0, 2],
       [1, 0],
       [1, 1],
       [1, 2],
       [1, 3],
       [2, 0],
       [2, 2]])
>>> a
array([[0.88142372, 0.15743024, 0.86292155, 0.08841115],
       [0.55044943, 0.56144471, 0.69757748, 0.23467154],
       [0.86813517, 0.02331109, 0.46798467, 0.06975323]])

13. numpy.std() 计算矩阵标准差

In [1]: import numpy as np

In [2]: a = np.array([[1, 2], [3, 4]])

In [3]: np.std(a) # 计算全局标准差
Out[3]: 1.118033988749895

In [4]: np.std(a,axis=0) # axis=0计算每一列的标准差  
Out[4]: array([1., 1.])

In [5]: np.std(a,axis=1) # 计算每一行的标准差
Out[5]: array([0.5, 0.5])

14. np.squeeze() 挤压，压榨

numpy数组去掉冗余的维度-----squeeze()函数

>>> np.array([[[1,2,3]]])
array([[[1, 2, 3]]])
>>> a = np.array([[[1,2,3]]])
>>> a
array([[[1, 2, 3]]])
>>> a.shape
(1, 1, 3)
>>> np.squeeze(a)
array([1, 2, 3])
>>> np.squeeze(a).shape
(3,)

15. np.unique()

去掉重复值

>>> names=np.array(['Bob','Joe','Will','Bob','Will','Joe','Joe'])
>>> np.unique(names)
array(['Bob', 'Joe', 'Will'], dtype='<U4')

五、jupyter安装和使用

Notebooks其实就像是你的python笔记本一样，不仅可以运行书写的python代码，同时还支持markdown格式的文本显示。
在Notebooks中不仅可以运行python，它还支持R、Julia 和 JavaScript等其他40余种语言。

安装和使用：

https://blog.csdn.net/qq_33619378/article/details/83037106?depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-1&utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromBaidu-1

错误:提示win32api模块找不到，没办法import
解决：

pip uninstall pywin32
pip install pywin32

jupyter 设置工作目录：

查看配置文件路径

jupyter notebook --generate-config

并修改#c.NotebookApp.notebook_dir = ''将其设置为你需要的目录，windows目录设置需要设置为双\\Desktop\\或者用r'\Desktop'来转义，启动后的工作目录就在设置目录下

jupyter设置代码提示

安装

pip install jupyter_contrib_nbextensions
jupyter contrib nbextension install --user

安装

pip install --user jupyter_nbextensions_configurator
jupyter nbextensions_configurator enable --user

重启后修改
在这里插入图片描述

Laughing@me

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
第七十篇数据处理与分析 Numpy

二维数组求平均值import numpy as npx = np.array([[1, 2, 3, 4], [2, 4, 6, 8]])X = np.mean(x, axis=0) # 按列求平均Y = np.mean(x, axis=1) # 按行求平均Z = np.mean(x) # 求全部平均print(X)print(Y)print(Z)结果[1.5 3. ...
复制链接

扫一扫