numpy

最新推荐文章于 2024-02-04 23:40:15 发布

533_

最新推荐文章于 2024-02-04 23:40:15 发布

阅读量246

点赞数 1

分类专栏： python数据分析

本文链接：https://blog.csdn.net/qq_14993591/article/details/84139568

版权

python数据分析专栏收录该内容

4 篇文章 0 订阅

订阅专栏

数据分析之numpy

数组的形状

In[01]：import numpy as np

In[02]: t1 = np.arange(12)

In[03]: t1
Out[04]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In[04]: t1.shape  # 查看数组的形状
Out[04]: (12,)

In[05]: t2 = np.array([[1,2,3],[4,5,6]])

In[06]:t2
Out[06]: array([[1, 2, 3],
       		   [4, 5, 6]])

In[07]: t2.shape
Out[07]: (2, 3)



In[08]: t3 = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])

In[09]: t3
Out[09]: array([[[ 1,  2,  3],
       	 		 [ 4,  5,  6]],

       			[[ 7,  8,  9],
        		[10, 11, 12]]])


In[10]: t3.shape
Out[10]: (2, 2, 3)

In[11]: t4 = np.arange(12)

In[12]: t4
Out[12]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In[13]: t4.reshape((3,4))
Out[13]: array([[ 0,  1,  2,  3],
       			[ 4,  5,  6,  7],
       			[ 8,  9, 10, 11]])
       			
       			
In[14]: t5 = np.arange(24).reshape((2,3,4))

In[15]: t5
Out[15]: array([[[ 0,  1,  2,  3],
        		 [ 4,  5,  6,  7],
        		 [ 8,  9, 10, 11]],

       			[[12, 13, 14, 15],
        		 [16, 17, 18, 19],
        		 [20, 21, 22, 23]]])
        		 
In[16]: t5.reshape((4,6))
Out[16]: array([[ 0,  1,  2,  3,  4,  5],
       			[ 6,  7,  8,  9, 10, 11],
       			[12, 13, 14, 15, 16, 17],
       			[18, 19, 20, 21, 22, 23]])

In[17]: t5
Out[17]: array([[[ 0,  1,  2,  3],
        		 [ 4,  5,  6,  7],
        		 [ 8,  9, 10, 11]],

       			[[12, 13, 14, 15],
        		 [16, 17, 18, 19],
        		 [20, 21, 22, 23]]])
        		 

In[18]: t5.reshape((24,))  
Out[18]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       			17, 18, 19, 20, 21, 22, 23]) # 注意这是一维的
       			
       			

In[19]: t5.reshape((24,1))
Out[19]: 
array([[ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10],
       [11],
       [12],
       [13],
       [14],
       [15],
       [16],
       [17],
       [18],
       [19],
       [20],
       [21],
       [22],
       [23]])

In[20]: t5.reshape((1,24))
Out[20]: array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23]])  # 注意这是二维的
        
In[21]: t5.flatten()  # 在不知道t5的形状时 想要将其转换成一维数据
Out[21]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In[22]: t5
Out[22]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

数组的计算


In[23]:t5
Out[23]: 
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In[24]:t5+2
Out[24]: 
array([[[ 2,  3,  4,  5],
        [ 6,  7,  8,  9],
        [10, 11, 12, 13]],

       [[14, 15, 16, 17],
        [18, 19, 20, 21],
        [22, 23, 24, 25]]])
        

In[25]:t5/2
Out[25]: 
array([[[ 0. ,  0.5,  1. ,  1.5],
        [ 2. ,  2.5,  3. ,  3.5],
        [ 4. ,  4.5,  5. ,  5.5]],

       [[ 6. ,  6.5,  7. ,  7.5],
        [ 8. ,  8.5,  9. ,  9.5],
        [10. , 10.5, 11. , 11.5]]])

In[26]: t5/0
__main__:1: RuntimeWarning: divide by zero encountered in true_divide
__main__:1: RuntimeWarning: invalid value encountered in true_divide
Out[26]: 
array([[[nan, inf, inf, inf],
        [inf, inf, inf, inf],
        [inf, inf, inf, inf]],

       [[inf, inf, inf, inf],
        [inf, inf, inf, inf],
        [inf, inf, inf, inf]]])
        
        
        
0/0 = nan 不是一个数字
1/0 = inf 无穷



In[27]: t5 = t5.reshape((4,6))

In[28]: t5
Out[28]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
       
In[29]: t6 = np.arange(100,124).reshape((4,6))
       
In[30]: t6
Out[30]: 
array([[100, 101, 102, 103, 104, 105],
       [106, 107, 108, 109, 110, 111],
       [112, 113, 114, 115, 116, 117],
       [118, 119, 120, 121, 122, 123]])

In[31]: t5+t6
Out[31]: 
array([[100, 102, 104, 106, 108, 110],
       [112, 114, 116, 118, 120, 122],
       [124, 126, 128, 130, 132, 134],
       [136, 138, 140, 142, 144, 146]])

In[32]: t5*t6
Out[32]: 
array([[   0,  101,  204,  309,  416,  525],
       [ 636,  749,  864,  981, 1100, 1221],
       [1344, 1469, 1596, 1725, 1856, 1989],
       [2124, 2261, 2400, 2541, 2684, 2829]])
       

In[33]: t7 = np.arange(0,6)

In[34]: t7
Out[34]: array([0, 1, 2, 3, 4, 5])

In[35]: t5
Out[35]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])


In[36]: t5-t7
Out[36]: 
array([[ 0,  0,  0,  0,  0,  0],
       [ 6,  6,  6,  6,  6,  6],
       [12, 12, 12, 12, 12, 12],
       [18, 18, 18, 18, 18, 18]])
       
       
       
In[37]: t8 = np.arange(4).reshape((4,1))

In[38]: t8
Out[38]: 
array([[0],
       [1],
       [2],
       [3]])

In[39]: t5
Out[39]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

In[40]: t5-t8
Out[40]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 5,  6,  7,  8,  9, 10],
       [10, 11, 12, 13, 14, 15],
       [15, 16, 17, 18, 19, 20]])

广播原则

如果两个数的后缘维度（从末尾开始算起的维度）的轴长度相符或其中一方长度为1，则认为他们是广播兼容的，广播会在缺失或长度为1的维度上进行。

维度=shape数字的个数

shape=（3,3,2）的数组可以和（3,2）的数组进行计算

shape=（3,3）的数组可以和（3,1）的数组进行计算

numpy数组的创建

函数	说明
array	将输入数据（列表、元祖、数组或其他序列类型）转换为ndarray，要么推断出dtype，要么显式指定dtype，默认直接复制输入数据
asarray	将输入转换为ndarray，如果输入本身就是一个ndarray就不进行复制
arange	类似于内置的range，但返回的是一个ndarray而不是列表
ones,ones_like	根据指定的形状和dtype创建一个全1数组，ones_like以了一个数组为参数，并根据其形状和dtype创建一个全1数组
zeros，zeros_like	根据指定的形状和dtype创建一个全0数组，ones_like以了一个数组为参数，并根据其形状和dtype创建一个全0数组
empty，empty_like	创建新数组，只分配内存空间但不填充任何值
eye，identity	创建一个正方的N * N单位矩阵（对角线为1，其余为0）

一维数组创建

import numpy as np

t1 = np.array([1,2,3,])
print(t1) # [1 2 3]
print(type(t1)) # <class 'numpy.ndarray'>

t2 = np.array(range(10))
print(t2) #[0 1 2 3 4 5 6 7 8 9]

t3 = np.arange(10)
print(t3) # [0 1 2 3 4 5 6 7 8 9]

创建多维数组

创建一个维度为2,2的数组

m = np.array([np.arange(2), np.arange(2)])

m
# array([[0, 1],
#       [0, 1]])

m.shape
# (2, 2)

m.dtype
# dtype('int32')

创建一个长度为10的全是0的一维数组

np.zeros(10)
# array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

创建一个维度为3,6的全是0的二位数组

np.zeros((3,6))

# array([[ 0.,  0.,  0.,  0.,  0.,  0.],
#        [ 0.,  0.,  0.,  0.,  0.,  0.],
#        [ 0.,  0.,  0.,  0.,  0.,  0.]])

创建一个没有内容的，维度是2,3,2的三维数组

np.empty((2,3,2))
# array([[[ 0.,  0.],
#         [ 0.,  0.],
#         [ 0.,  0.]],
# 
#        [[ 0.,  0.],
#         [ 0.,  0.],
#         [ 0.,  0.]]])

选取数组元素

a = np.array([[1,2], [3,4]])  # 创建一个二维数组，内容如括号内

a
# array([[1, 2],
#        [3, 4]])

a[0, 0]  # 选取第一维度第一个元素的第一个元素
# 1

a[0, 1]  # 选取第一维第一个元素第二个元素
# 2

a[1, 0]
# 3

a[1, 1]
# 4

numpy的数据类型

数据类型如表所示，如果要创建一个对应类型的数据，只需要np.数据类型(数值/数组)

è¿™é‡Œå†™å›¾ç‰‡æè¿°

np.float64(42)  # 将42转化为float64类型
# 42.0

np.int8(42.0)  # 转化为整数
# 42

np.bool(42)  # 转化为布尔值
# True

np.bool(0)  # 0的布尔值为False
# False

np.float(True)  # True转化为数值则是对应类型的1
# 1.0

np.float(False)  # False转化为数值则是对应类型的0
# 0.0

np.arange(7, dtype=np.uint16)  # 创建数组的时候可以用dtype参数指定数据类型
# array([0, 1, 2, 3, 4, 5, 6], dtype=uint16)

try:
    np.int(42.0 + 1.j)  # 错误类型的相加会报错
except TypeError:
    print("TypeError")
# TypeError

float(42.0 + 1.j)  # 不能把复数变为浮点数
# TypeError: can't convert complex to float


t4 = np.arange(4,10,2)
print(t4) # [4 6 8]
print(t4.dtype) # int32

t5 = np.array(range(1,4),dtype=float)
print(t5) # [1. 2. 3.]
print(t5.dtype) # float64

t6 = np.array([1,1,0,1,0,0],dtype=bool)
print(t6) # [ True  True False  True False False]
print(t6.dtype) # bool

# 调整数据类型
t7 = t6.astype("int8")
print(t7) # [1 1 0 1 0 0]
print(t7.dtype) # int8

# 随机数 0.021318307656799207
print(random.random())

#numpy中的小数
t8 = np.array([random.random() for i in range(10)])
print(t8) # [0.2260555  0.4224169  0.64453813 0.27608059 0.80809452 0.79012125 0.13323894 0.79472302 0.3888746  0.45760489]

t9 = np.round(t8,2) # 保留2位小数
print(t9)


--------------------------------------------------------------------------------


print([i for i in range(10)]) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print([i+1 for i in range(10)]) # [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(random.random()) # random() 方法返回随机生成的一个实数，它在[0,1)范围内。


l = []
for i in range(10):
    l.append(random.random())
print(l) # [0.2260555  0.4224169  0.64453813 0.27608059 0.80809452 0.79012125 0.13323894 0.79472302 0.3888746  0.45760489]

轴axis

对于一维数组只有一个0轴

对于二维数组（shape(2,2)），有0轴和1轴

axis=0表示作用于列

axis=1表示作用于行

对于三维数组（shape(2,2,2)），有0,1,2轴

计算一个二维数组的平均值，必须指定是哪个方向上面的数字的平均值

np.arange(10).reshape((2,5)) ,reshape中的2表示0轴的长度为2, 1轴的长度为5 , 2x5一共10个数据

二维数组的轴

三维数组的轴

numpy读取数据

CSV:Comma-Separated Value,逗号分隔值文件

显示：表格状态

源文件：换行和逗号分隔行列的格式化文本,每一行的数据表示一条记录

由于csv便于展示,读取和写入,所以很多地方也是用csv的格式存储和传输中小型的数据,为了方便教学,我们会经常操作csv格式的文件,但是操作数据库中的数据也是很容易的实现的

np.loadtxt(fname.dtype=np.float,delimiter=None,skiprows=0,usecols=None,unpack=False)

å‚æ•°å"ä¹‰

题目

对美国YouTube1000多视频的点击，喜欢，不喜欢，评论数量([“views”,“likes”,“dislikes”,“comment_total”])的csv进行操作

命名为US_video_data_numbers




### 代码

```python
import numpy as np

us_file_path = "./US_video_data_numbers.csv"

t1 = np.loadtxt(us_file_path,delimiter=",",dtype="int")
t2 = np.loadtxt(us_file_path,delimiter=",",dtype="int",unpack=True)

print(t1)
print("*"*100)
print(t2)

numpy中的转置

转置是一种变换对于numpy中的数组来说就是在对角线方向交换数据，目的为了方便数据处理

In [1]: import numpy as np

In [4]: t1=np.arange(24).reshape((4,6))

In [5]: t1
Out[5]:
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])


In [6]: t1.transpose()
Out[6]:
array([[ 0,  6, 12, 18],
       [ 1,  7, 13, 19],
       [ 2,  8, 14, 20],
       [ 3,  9, 15, 21],
       [ 4, 10, 16, 22],
       [ 5, 11, 17, 23]])

In [7]: t1.T
Out[7]:
array([[ 0,  6, 12, 18],
       [ 1,  7, 13, 19],
       [ 2,  8, 14, 20],
       [ 3,  9, 15, 21],
       [ 4, 10, 16, 22],
       [ 5, 11, 17, 23]])

In [8]: t1.swapaxes(1,0)
Out[8]:
array([[ 0,  6, 12, 18],
       [ 1,  7, 13, 19],
       [ 2,  8, 14, 20],
       [ 3,  9, 15, 21],
       [ 4, 10, 16, 22],
       [ 5, 11, 17, 23]])

numpt的切片与索引

一维数组索引和切片

一维数组索引和切片和python一样

a = np.arange(9)
print(a) #[0 1 2 3 4 5 6 7 8]
print(a[1:4]) # [1 2 3]
print(a[:7:2]) # [0 2 4 6]
print(a[::-1]) # [8 7 6 5 4 3 2 1 0]

多维数组的切片与索引


b = np.arange(12).reshape(3,4)

print(b.shape) # ( 3, 4)
print(b) 
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# 取行
print(b[2]) # [ 8  9 10 11]

#取连续的多行
print(b[1:])
#[[ 4  5  6  7]
# [ 8  9 10 11]]

#取不连续的多行
print(b[[0,2]])
#[[ 0  1  2  3]
# [ 8  9 10 11]]

# 取列  逗号分割 前面是行 后面是列 只写冒号 表示都要
print(b[:,1]) # 第二列 [1 5 9]

#取连续的多列
print(b[:,1:]) 
#[[ 1  2  3]
# [ 5  6  7]
# [ 9 10 11]]

#取不连续的多列
print(b[:,[0,2]])
#[[ 0  2]
# [ 4  6]
# [ 8 10]]

# 取第2行第3列的值
print(b[1][2]) #6
print(b[1,2]) #6
print(type(b[1,2])) #<class 'numpy.int32'>

# 取多行多列 取第1行到第2行 第2列到第4列的结果
print(b[0:2,1:4])
#[[1 2 3]
# [5 6 7]]

# 取多个不相邻的点  (0,1) (2,2) (2,3)
print(b[[0,2,2],[1,2,3]]) # [1 10 11]

numpy中的布尔索引


In[2]: t = np.arange(24).reshape(4,6)

In[3]: t
Out[3]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

In[4]: t<10
Out[4]: 
array([[ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True, False, False],
       [False, False, False, False, False, False],
       [False, False, False, False, False, False]])

In[5]: t[t<10] = 3

In[6]: t
Out[6]: 
array([[ 3,  3,  3,  3,  3,  3],
       [ 3,  3,  3,  3, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

In[7]: t[t>10]
Out[7]: array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])

numpy中的三元运算符


In[8]: t
Out[8]: 
array([[ 3,  3,  3,  3,  3,  3],
       [ 3,  3,  3,  3, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

In[9]: np.where(t<10,0,10)  # 小于10的赋为0 大于10的赋为10
Out[9]: 
array([[ 0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0, 10, 10],
       [10, 10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10, 10]])

numpy中的clip裁剪

In[2]: t
Out[2]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

In[3]: t = t.astype(float)

In[4]:t
Out[4]: 
array([[ 0.,  1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10., 11.],
       [12., 13., 14., 15., 16., 17.],
       [18., 19., 20., 21., 22., 23.]])

In[5]: t[3,3] = np.nan

In[6]: t
Out[6]: 
array([[ 0.,  1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10., 11.],
       [12., 13., 14., 15., 16., 17.],
       [18., 19., 20., nan, 22., 23.]])

In[7]: t.clip(10,18)  # 小于10的替换为10 大于18的替换为18
Out[7]: 
array([[10., 10., 10., 10., 10., 10.],
       [10., 10., 10., 10., 10., 11.],
       [12., 13., 14., 15., 16., 17.],
       [18., 18., 18., nan, 18., 18.]])

nump中的nan和inf

nan(NAN,Nan): not a number表示不是一个数字

什么时候numpy中会出现nan：

  当我们读取本地的文件为float的时候，如果有缺失，就会出现nan

  当做了一个不合适的计算的时候(比如无穷大(inf)减去无穷大)

inf(-inf,inf): infinity,inf表示正无穷，-inf表示负无穷

什么时候回出现inf包括（-inf，+inf）

  比如一个数字除以0，（python中直接会报错，numpy中是一个inf或者-inf）

a = np.inf

print(type(a)) #<class 'float'>

a = np.nan

print(type(a)) # <class 'float'>

In[2]: t
Out[2]: 
array([[ 0.,  1.,  2.,  3.,  4.,  5.],
       [ 6.,  7.,  8.,  9., 10., 11.],
       [12., 13., 14., 15., 16., 17.],
       [18., 19., 20., nan, 22., 23.]])

In[3]: t[:,0] = 0

In[4]: t
Out[4]: 
array([[ 0.,  1.,  2.,  3.,  4.,  5.],
       [ 0.,  7.,  8.,  9., 10., 11.],
       [ 0., 13., 14., 15., 16., 17.],
       [ 0., 19., 20., nan, 22., 23.]])

In[5]: np.count_nonzero(t) # 非0个数
Out[5]: 20

In[6]: np.count_nonzero(t!=t) # nan个数
Out[6]: 1

In[7]: t!=t
Out[7]: 
array([[False, False, False, False, False, False],
       [False, False, False, False, False, False],
       [False, False, False, False, False, False],
       [False, False, False,  True, False, False]])

In[8]: np.isnan(t)
Out[8]: 
array([[False, False, False, False, False, False],
       [False, False, False, False, False, False],
       [False, False, False, False, False, False],
       [False, False, False,  True, False, False]])

In[9]: np.count_nonzero(np.isnan(t))
Out[9]: 1


In[10]: t1 = np.arange(12).reshape(3,4)

In[11]: t1
Out[12]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In[13]: np.sum(t1)
Out[13]: 66

In[14]: t
Out[14]: 
array([[ 0.,  1.,  2.,  3.,  4.,  5.],
       [ 0.,  7.,  8.,  9., 10., 11.],
       [ 0., 13., 14., 15., 16., 17.],
       [ 0., 19., 20., nan, 22., 23.]])

In[15]: np.sum(t)
Out[15]: nan

In[16]: np.sum(t1,axis=0)
Out[16]: array([12, 15, 18, 21])

In[17]: np.sum(t1,axis=1)
Out[17]: array([ 6, 22, 38])

In[18]: np.sum(t,axis=0)
Out[18]: array([ 0., 40., 44., nan, 52., 56.])

In[19]: np.sum(t,axis=1)
Out[19]: array([15., 45., 75., nan])

1542340707904

numpy中常用统计函数

类型	函数
求和	t.sum(axis=None)
均值	t.mean(a,axis=None) 受离群点的影响较大
中值	np.median(t,axis=None)
最大值	t.max(axis=None)
最小值	t.min(axis=None)
极值	np.ptp(t,axis=None) 即最大值和最小值之差
标准差	t.std(axis=None)

默认返回多维数组的全部的统计结果,如果指定axis则返回一个当前轴上的结果

In[2]: t1
Out[2]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In[3]: t1.sum(axis=0)
Out[3]: array([12, 15, 18, 21])

In[4]: t1.mean(axis=0)
Out[4]: array([4., 5., 6., 7.])

In[5]: np.median(t1)
Out[5]: 5.5

In[6]: np.median(t1,axis=0)
Out[6]: array([4., 5., 6., 7.])

In[7]: t1.max()
Out[7]: 11

In[8]: t1.min(axis=0)
Out[8]: array([0, 1, 2, 3])

In[9]: np.ptp(t1)
Out[9]: 11

In[10]: np.ptp(t1,axis=1)
Out[10]: array([3, 3, 3])

标准差是一组数据平均值分散程度的一种度量。一个较大的标准差，代表大部分数值和其平均值之间差异较大；一个较小的标准差，代表这些数值较接近平均值

反映出数据的波动稳定情况，越大表示波动越大，越不稳定

In[11]:t1.std()
Out[11]: 3.452052529534663

In[12]:t1.std(axis=0)
Out[12]: array([3.26598632, 3.26598632, 3.26598632, 3.26598632])

ndarry缺失值填充均值

In[2]: t
Out[2]: 
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5., nan, nan],
       [ 8.,  9., 10., 11.]])

In[3]: t[t==t]
Out[3]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  8.,  9., 10., 11.])

In[4]: t[t!=t]
Out[4]: array([nan, nan])

In[5]: t[np.isnan(t)]
Out[5]: array([nan, nan])

题目

将数组中值为nan 的元素换为该列/行的均值

import numpy as np

def fill_ndarray(t):
    
    for i in range(t.shape[1]): # 遍历每一列 [0,1,2,3]
        temp_col = t[:,i] # 当前列 [0. 4. 8.] [1. 5. 9.] [ 2. nan 10.] [ 3. nan 11.]
        nan_number = np.count_nonzero(temp_col!=temp_col) # 计算当前列中nan的个数
    
        if nan_number!=0: # 不为0 说明当前这一列中有nan
            temp_non_nan_col = temp_col[temp_col==temp_col] # 当前一列不是nan的array [2,10] [3,11]
            temp_col_mean = temp_non_nan_col.mean() # 求均值 6 7
            # 替换当前列中nan为均值
            temp_col[temp_col!=temp_col] = temp_col_mean
             
    
    return t

if __name__ == '__main__':
    t = np.arange(12).reshape(3,4).astype("float")
    t[1,2:] = np.nan
    print(t)
    print('-'*30)
    
    t = fill_ndarray(t)
    print(t)

运行结果

[[ 0.  1.  2.  3.]
 [ 4.  5. nan nan]
 [ 8.  9. 10. 11.]]
------------------------------
[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]

数组的拼接

数组的行列交换

numpy更多好用的方法

1.获取最大值最小值的位置

np.argmax(t,axis=0)

np.argmin(t,axis=1)

In [158]: t
Out[158]: 
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.]])
       
In [159]: np.argmax(t)
Out[159]: 11

In [160]: np.argmax(t,axis=0)
Out[160]: array([2, 2, 2, 2], dtype=int64)

In [161]: np.argmax(t,axis=1)
Out[161]: array([3, 3, 3], dtype=int64)

2.创建一个全0的数组: np.zeros((3,4))


array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

3.创建一个全1的数组: np.ones((3,4))


array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

4.创建一个对角线为1的正方形数组(方阵)：np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

numpy生成随机数


In [162]: np.random.rand(2,3)
Out[162]: 
array([[0.77432552, 0.63033706, 0.09235116],
       [0.60838502, 0.90280121, 0.30387906]])
       
       
In [163]: np.random.randn(2,3)
Out[163]: 
array([[ 1.87953398,  0.01792738,  0.14404258],
       [-0.55321782, -1.31781243,  0.03661315]])

In [165]: np.random.randint(1,10,(2,3))
Out[165]: 
array([[4, 8, 3],
       [3, 2, 1]])
       

In [166]: np.random.uniform(1,10,(2,3))
Out[166]: 
array([[3.65153672, 1.79691893, 5.35748968],
       [8.44445194, 2.0842863 , 2.65346309]])


import numpy as np

np.random.seed(10)
t = np.random.randint(0,10,(3,4))
print(t)



[[9 4 0 1]
 [9 0 1 8]
 [9 0 8 6]]

无论运行多少次产生的随机数都是一样的

分布的补充

1、均匀分布

在相同的大小范围内的出现概率是等可能的

2、正态分布

呈钟型，两头低，中间高，左右对称

numpy的注意点copy和view

a=b 完全不复制，a和b相互影响
a = b[:], 视图的操作，一种切片，会创建新的对象a，但是a的数据完全由b保管，他们两个的数据变化是一致的，
a = b.copy(), 复制，a和b互不影响

533_

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录