Python数据分析--Numpy部分笔记

最新推荐文章于 2022-05-09 21:43:09 发布

Fade__d

最新推荐文章于 2022-05-09 21:43:09 发布

阅读量531

点赞数

分类专栏： python 文章标签： python 数据分析

本文链接：https://blog.csdn.net/Fade__d/article/details/77150892

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

基础内容

NumPy面向的是N维数组对象(ndarray)，但并不一定是方阵。其语法与标量一样。
ndarray内元素须是相同类型的，每个ndarray都有shape(表示各维度大小的元组)和dtype(说明数组数据类型的对象)，例如data是一个3*2矩阵，其内是float型数据：

In : data.shape
Out: (2,3)
In : data.dtype
Out: dtype('float64')

在使用pycharm进行创建ndarray时，出现错误，我将程序名称命名为numpy.py，之后在程序中import numpy as np，出现如下错误(图片为截图)：
这里写图片描述
查阅资料发现因为文件名为numpy.py，在import numpy时选择了numpy.py而非numpy模块，参考https://stackoverflow.com/questions/36530726/using-numpy-module-object-has-no-attribute-array

1、关于dtype转换数据类型出现的问题：

Input:      a1 = np.array([1,2,3], dtype = np.int32)
            print a1,  a1.shape          #其维数
Output:     [1 2 3] (3L,)

Input:      a1.dtype = np.int16          #强制转换了其类型
            print a1,  a1.shape
Output:     [1 0 2 0 3 0] (6L,)          #维数变了

Input:      a1.dtype = np.float32
            print a1,  a1.shape         #强制转换类型
Output:     [  1.40129846e-45   2.80259693e-45   4.20389539e-45] (3L,)          #数据出错

2、astype方法：转换数据类型，但在例子中我有一个疑问如下

Input:      numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
Output:     |S4

S作为字符串的缩写，我以为表示该数组为字符串，并且其长度为4，但通过len(numeric_strings)发现其长度为3，并且其内没有换行符。之后查看string_的相关定义发现S4代表了数组内元素位字符串，并且其长度为4，换个例子，如下程序：

Input:      numeric_strings = np.array(['1.25', '-9.6', '1.2345678'], dtype=np.string_)
            print numeric_strings.dtype
Output:     |S9

3、数组切片时，(3,)表示的是1行3列，1行是默认不写的，尤其是对于行向量和列向量而言，这个尤其重要，容易影响对于矩阵的判断。例如：

Input:      aa2 = np.array([1,2,3])
            print aa2.shape
Output:     (3L,)

4、布尔值可以用做数组的索引，但布尔值数组的长度必须和被索引的数组的轴长度(行数)一致，True时数组该行会输出，False时该行隐去。例如：输入names == 'Bob'时，输出为[True, False, False, True, False, False]，那么对于数组data而言，data[names == 'Bob']将只会输出data[0]和data[2]

5、transpose方法
对于一般的低维度的数组，直接使用数组的T属性即可实现数组的转置，例如：

Input:      arr = np.array([[1, 2, 3],[4, 5, 6]])
Input:      print arr.shape
Input:      print arr.T.shape
Output:     (2, 3)   (3, 2)

对于高维数组，书上这么说的‘’transpose需要得到一个由轴编号组成的元组才能对这些轴进行转置‘’，不太好理解，这里我们这样解释：对于一个高维数组X，首先看X的shape属性，返回一个元组(2, 2, 4)，这个元组的索引默认为(0, 1, 2)，即0对应2，1对应3, 2对应4，这个由X数组的shape属性的索引组成的元组才是transpose的真正意义。下面代码中transpose参数元组（1,0,2）可以理解为是索引组成的元组，1对应的还是3, 0对应的还是2, 2对应的还是4，通过索引的变化，数组X的shape属性为（3，2，4）。
没有进行transponse前，每个数组元素都有一个索引，例如13的索引为(1, 0, 1)，按上面变化后，其索引变为(0, 1, 1)。

结论：transponse()改变的是X数组内元素的索引。
数学描述:若X数组的shape属性为(a,b,c)，那么通过索引X[x, y, z]我们可以取到X内的任何元素，其中 $0<x<a-1, 0<y<b-1, 0<z<c-1$ ，transponse()将(x,y,z)改为(y,x,z)，其对应shape属性会改为(b, a, c)，那么对应的X内的元素的索引也发生变化：X[y, x, z]

Input:      X = np.arange(24).reshape((2, 3, 4))
Input:      print X
Output:     [[[ 0  1  2  3]，
              [ 4  5  6  7]，
              [ 8  9 10 11]]，
             [[12 13 14 15]，
              [16 17 18 19]，
              [20 21 22 23]]]
Input:      print X.shape
Output:     (2L, 3L, 4L)
Input:      print X.transpose((1, 0, 2))
Output:     [[[ 0  1  2  3]，
              [12 13 14 15]]，
             [[ 4  5  6  7]，
              [16 17 18 19]]，
             [[ 8  9 10 11]，
              [20 21 22 23]]]
Input:      print X.transpose((1, 0, 2)).shape
Output:     (3L, 2L, 4L)

另外，直接使用X.T与使用X.transponse((2,1,0))效果是相同的。

6、np.meshgrid函数
np.meshgrid函数的作用：接受两个一维数组（行向量/列向量），产生两个二维矩阵（对应于两个数组的所有的(x,y)对）
如下例子：

Input:      x = np.arange(-3,3)
            y = np.arange(-2,2)
            X, Y = np.meshgrid(x, y)
            print X, '\n', Y
Output:     [[-3 -2 -1  0  1  2]
             [-3 -2 -1  0  1  2]
             [-3 -2 -1  0  1  2]
             [-3 -2 -1  0  1  2]]         #矩阵X(4*6),以x为行向量，在此的基础上，扩展为4行
            [[-2 -2 -2 -2 -2 -2]
             [-1 -1 -1 -1 -1 -1]
             [ 0  0  0  0  0  0]
             [ 1  1  1  1  1  1]]         #矩阵Y(4*6)，以y为列向量，在此的基础上，扩展为6列

其中矩阵X，Y的大小为(shape(y),shape(x))
在做这个的时候，发现一个有趣的现象，transpose命令对于行向量是无效的，若要对行向量进行转置，需要改变其shape即可实现。如下程序：

Input:      x = np.arange(-3,3)
            y = np.arange(-2,2)
            x0 = np.transpose(x)
            x1 = x.reshape(6,1)
            x2 = np.transpose(x1)
            print x0, x1, x2
Output:     [-3 -2 -1  0  1  2]           #矩阵x,shape为(6,)
            [[-3]
             [-2]
             [-1]
             [ 0]
             [ 1]
             [ 2]]                       #矩阵x1,shape为(6,1)
            [[-3 -2 -1  0  1  2]]        #矩阵x2,shape为(1,6)

即transpose对于shape为(6,)的行向量是失效的，对于(1,6)是有效的，虽然(6,)和(1,6)在数学上式相等的。

再回到meshgrid，查阅资料时看到有大神用数学公式将其本质描述出来，对于x=-3:1:3和y=-2:1:2来说，xs, ys = meshgrid(x, y)，可表示为如下：
$xs=xones*x$ 其中 $xones$ 表示为一个适于 $x$ 的全是1的矩阵，在此中为 $[ones(size(y))]'$ ，其为4*1的矩阵，这样， $(4,1)*(1,6)=(4,6)$ 。
$ys=y'*yones$ 其中 $yones$ 表示为一个适于 $y$ 的全是1的矩阵，在此中为 $[ones(size(x))]$ ，其为1*6的矩阵，这样， $(4,1)*(1,6)=(4,6)$ 。
按照上述数学描述，其程序如下：

Input:      x = np.arange(-3,3)
            y = np.arange(-2,2)
            y1 = np.ones(np.shape(y))
            y1T = y1.reshape(4,1)
            xs = y1T*x
            x1 = np.ones(np.shape(x))
            yT = y.reshape(4,1)
            ys = yT*x1
            print xs,'\n', ys
Output:     [[-3. -2. -1.  0.  1.  2.]
             [-3. -2. -1.  0.  1.  2.]
             [-3. -2. -1.  0.  1.  2.]
             [-3. -2. -1.  0.  1.  2.]]           # xs=[ones(size(y))]'*x
            [[-2. -2. -2. -2. -2. -2.]
             [-1. -1. -1. -1. -1. -1.]
             [ 0.  0.  0.  0.  0.  0.]
             [ 1.  1.  1.  1.  1.  1.]]          # ys=y'*[ones(size(x))]