Numpy

最新推荐文章于 2024-08-23 17:06:02 发布

NightCharm

最新推荐文章于 2024-08-23 17:06:02 发布

阅读量5.3k

点赞数 4

分类专栏： Python 文章标签： numpy 数据分析

本文链接：https://blog.csdn.net/nightcharm/article/details/62041119

版权

Python 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

Numpy

是高性能科学和数据分析的基础包

其主要功能如下：

1、ndarray，一个具有矢量运算和复杂广播能力的快速且节省空间的多维数组

2、用于对数组数据进行快速运算的标准数学函数（无需编写循环）

3、线性代数、随机数生成以及傅里叶变换功能

NumPy数组属性

在详细介绍NumPy数组之前。先详细介绍下NumPy数组的基本属性。NumPy数组的维数称为秩（rank），一维数组的秩为1，二维数组的秩为2，以此类推。在NumPy中，每一个线性的数组称为是一个轴（axes），秩其实是描述轴的数量。比如说，二维数组相当于是两个一维数组，其中第一个一维数组中每个元素又是一个一维数组。所以一维数组就是NumPy中的轴（axes），第一个轴相当于是底层数组，第二个轴是底层数组里的数组。而轴的数量——秩，就是数组的维数。

NumPy的数组中比较重要ndarray对象属性有：

ndarray.ndim：数组的维数（即数组轴的个数），等于秩。最常见的为二维数组（矩阵）。
ndarray.shape：数组的维度。为一个表示数组在每个维度上大小的整数元组。例如二维数组中，表示数组的“行数”和“列数”。ndarray.shape返回一个元组，这个元组的长度就是维度的数目，即ndim属性。
ndarray.size：数组元素的总个数，等于shape属性中元组元素的乘积。
ndarray.dtype：表示数组中元素类型的对象，可使用标准的Python类型创建或指定dtype。另外也可使用前一篇文章中介绍的NumPy提供的数据类型。
ndarray.itemsize：数组中每个元素的字节大小。例如，一个元素类型为float64的数组itemsiz属性值为8(float64占用64个bits，每个字节长度为8，所以64/8，占用8个字节），又如，一个元素类型为complex32的数组item属性为4（32/8）。
ndarray.data：包含实际数组元素的缓冲区，由于一般通过数组的索引获取元素，所以通常不需要使用这个属性。

1、数据创建函数

函数	说明
array	将输入数据（列表、元组、数组或其他列序类型）转换成ndarray。约么推断出dtype，要么显式指定dtype。默认直接复制输入数据
asarray	将输入转换成ndarray，如果输入本身就是一个ndarray就不进行复制
arange	类似于内置的range，但返回的是一个ndarray而不是列表
ones、ones_like	根据指定的形状和dtype创建一个全1数组。one_like以另一个数组为参数，并根据其形状和dtype创建一个全1数组
zeros、zeros_like	类似于ones和ones_like，只不过产生的是全0数组
empty、empty_like	创建新数组，只分配内存空间但不填充任何值
eye、identity	创建一个正方的N*N单位的矩阵（对角线为1，其余为0）

1、array

先来介绍创建数组。创建数组的方法有很多。如可以使用array函数从常规的Python列表和元组创造数组。所创建的数组类型由原序列中的元素类型推导而来。

from numpy improt *
a = array ( [2, 3, 4] )
print(a)
# array( [2, 3, 4] )
a.dtype
# dtype(int32)
b = array ( [1.2, 3.5, 5.1] )
b.dtype
#dtype(float64)

可使用双重序列来表示二维的数组，三重序列表示三维数组，以此类推。

b = array ( [ (1.5, 2, 3), (4, 5, 6) ] )
print(b)
#[[ 1.5  2.   3. ]
  [ 4.   5.   6. ]]

可以在创建时显式指定数组中元素的类型

c = np.array([ [1,2],[3,4] ],dtype=complex)
print(c)
#[[ 1.+0.j  2.+0.j]
  [ 3.+0.j  4.+0.j]]

2、zeros ones

通常，刚开始时数组的元素未知，而数组的大小已知。因此，NumPy提供了一些使用占位符创建数组的函数。这些函数有助于满足除了数组扩展的需要，同时降低了高昂的运算开销。
用函数zeros可创建一个全是0的数组，用函数ones可创建一个全为1的数组，函数empty创建一个内容随机并且依赖与内存状态的数组。默认创建的数组类型(dtype)都是float64。

d = np.zeros((3,4))
print(d)
print(d.dtype)
print(d.dtype.itemsize)
#[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
#float64
#8

也可以自己制定数组中元素的类型

a = np.ones( (2,3,4),dtype = int)
print(a)
#[[[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]

 [[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]]
b = np.empty((2,3))
print(b)
#[[  2.67276450e+185   1.69506143e+190   1.75184137e+190]
 [  9.48819320e+077   1.63730399e-306   2.11392372e-307]]

3、arange

NumPy提供一个类似arange的函数返回一个数列形式的数组:(这是一个数组)

参数一：起始值，参数二：结束值，参数三：步长
c = np.arange(5,30,1)
print(c)
#[ 5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29]

就是 一个等差数列（可接收浮点数）
d = np.arange(0,2,0.5)
print(d)
#[ 0.   0.5  1.   1.5]
从0开始 差值为0.5的等差数列

注意：当arange使用浮点数参数时，由于浮点数精度有限，通常无法预测获得的元素个数。因此，最好使用函数linspace去接收我们想要的元素个数来代替用range来指定步长。

参数一：起始数值，参数二：结束数值，参数三：结果个数
a = numpy.linspace(0,10,5)
print(a)
#[  0.    2.5   5.    7.5  10. ]

2、NumPy中的基本数据类型

名称	描述
bool	用一个字节存储的布尔类型（Ture or False)
inti	由所在的平台决定其大小的整数（一般为int32 或 int64)
int8	一个字节大小，-128~127
int16	整数，-32768~32767
int32	整数，-231~232-1
int64	整数，-263~263-1
uint8	无符号整数，0~255
uint16	无符号整数，0~65535
uint32	无符号整数， 0~2**32-1
uint64	无符号整数，0~2**64-1
float16	半精度浮点数：16位，正负号1位，指数5位，精度10位
float32	单精度浮点数：32位，正负号1位，指数8位，精度23位
float64或float	双精度浮点数：正负号1位，指数11位，精度52位
complex64	复数，分别用两个32位浮点数标识实部和虚部
complex128或complex	复数，分别用两个64位浮点数标识实部和虚部

NumPy类型转换方式如下：

print(np.float64(46))
#46.0
print(np.int8(42.0))
#42
print(np.bool(46))
#True
print(np.bool(24.0))
#True
print(np.float(True))
#1.0

3、输出数组

当输出一个数组时，NumPy以特定的布局用类似嵌套列表的形式显示：

第一行从左到右输出
每行依次自上而下输出
每个切片通过一个空行与下一个隔开
一维数组被打印成行，二维数组成矩阵，三维数组成矩阵列表。

a = np.arange(6)
print(a)
#[0 1 2 3 4 5]

b = np.arange(12).reshape(4,3)
print(b)
#[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

c = np.arange(24).reshape(2,3,4)
print(c)
#[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

4、Ndarray对象的方法

ndarray.ptp(axis=None, out=None) : 返回数组的最大值—最小值或者某轴的最大值—最小值
ndarray.clip(a_min, a_max, out=None) : 小于最小值的元素赋值为最小值，大于最大值的元素变为最大值。
ndarray.all()：如果所有元素都为真，那么返回真；否则返回假
ndarray.any()：只要有一个元素为真则返回真
ndarray.swapaxes(axis1, axis2) : 交换两个轴的元素，如下
>>> z.swapaxes(0,1)
array([[2, 4, 6, 8],
[3, 5, 7, 9]])

改变数组维度和大小的方法

ndarray.reshape(shape[, order]) :返回重命名数组大小后的数组，不改变元素个数.

ndarray.resize(new_shape[, refcheck]) :改变数组的大小（可以改变数组中元素个数）.

ndarray.transpose(*axes) :返回矩阵的转置矩阵

ndarray.swapaxes(axis1, axis2) : 交换两个轴的元素后的矩阵.

ndarray.flatten([order]) : 复制一个一维的array出来.

ndarray.ravel([order]) :返回为展平后的一维数组.

ndarray.squeeze([axis]) :移除长度为1的轴。

ndarray.tolist():将数组转化为列表

ndarray.take(indices, axis=None, out=None, mode=’raise’):获得数组的指定索引的数据，如：

>>> a=np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
>>> a.take([1,3],axis=1) #提取1，3列的数据
array([[ 1,  3],
[ 5,  7],
[ 9, 11]])

numpy.put(a, ind, v, mode=’raise’)：用v的值替换数组a中的ind（索引）的值。Mode可以为raise/wrap/clip。Clip：如果给定的ind超过了数组的大小，那么替换最后一个元素。

numpy.repeat(a, repeats, axis=None)：重复数组的元素，如：

>>> x = np.array([[1,2],[3,4]])
>>> np.repeat(x, 2)
array([1, 1, 2, 2, 3, 3, 4, 4])
>>> np.repeat(x, 3, axis=1)
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4]])
>>> np.repeat(x, [1, 2], axis=0)
array([[1, 2],
[3, 4],
[3, 4]])

numpy.tile(A, reps)：根据给定的reps重复数组A，和repeat不同，repeat是重复元素，该方法是重复数组。

ndarray.var(axis=None, dtype=None, out=None, ddof=0)：返回数组的方差，沿指定的轴。

ndarray.std(axis=None, dtype=None, out=None, ddof=0)：沿给定的轴返回数则的标准差

ndarray.prod(axis=None, dtype=None, out=None)：返回指定轴的所有元素乘机

ndarray.cumprod(axis=None, dtype=None, out=None)：返回指定轴的累积，如下：

>>> a
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
>>> a.cumprod(axis=1)  #得到竖轴的累积
array([[   0,    0,    0,    0],
[   4,   20,  120,  840],
[   8,   72,  720, 7920]])

ndarray.mean(axis=None, dtype=None, out=None)：返回指定轴的数组元素均值

ndarray.cumsum(axis=None, dtype=None, out=None)：返回指定轴的元素累计和。如：

>>> a
array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11]])
>>> a.cumsum(axis=1)
array([[ 0,  1,  3,  6],
[ 4,  9, 15, 22],
[ 8, 17, 27, 38]])

ndarray.sum(axis=None, dtype=None, out=None)：返回指定轴所有元素的和

ndarray.trace(offset=0, axis1=0, axis2=1, dtype=None, out=None)：返回沿对角线的数组元素之和

ndarray.round(decimals=0, out=None)：将数组中的元素按指定的精度进行四舍五入，如下：

>>> np.around([0.37, 1.64])
array([ 0., 2.])
>>> np.around([0.37, 1.64], decimals=1)
array([ 0.4, 1.6])
>>> np.around([.5, 1.5, 2.5, 3.5, 4.5]) # rounds to nearest even value
array([ 0., 2., 2., 4., 4.])
>>> np.around([1,2,3,11], decimals=1) # ndarray of ints is returned
array([ 1, 2, 3, 11])
>>> np.around([1,2,3,11], decimals=-1)
array([ 0, 0, 0, 10])

ndarray.conj()：返回所有复数元素的共轭复数,如：

>>> b=np.array([[1+2j,3+0j],[3+4j,7+5j]])
>>> b
array([[ 1.+2.j,  3.+0.j],
[ 3.+4.j,  7.+5.j]])
>>> b.conj()
array([[ 1.-2.j,  3.-0.j],
[ 3.-4.j,  7.-5.j]])

ndarray.argmin(axis=None, out=None):返回指定轴最小元素的索引。

ndarray.min(axis=None, out=None)：返回指定轴的最小值

ndarray.argmax(axis=None, out=None)：返回指定轴的最大元素索引值

ndarray.diagonal(offset=0, axis1=0, axis2=1)：返回对角线的所有元素。

ndarray.compress(condition, axis=None, out=None)：返回指定轴上条件下的切片。

ndarray.nonzero()：返回非零元素的索引

#随机抽样 (numpy.random)

简单的随机数据

rand(d0, d1, …, dn)	随机值`>>> np.random.rand(3,2)array([[ 0.14022471, 0.96360618], #random [ 0.37601032, 0.25528411], #random [ 0.49313049, 0.94909878]]) #random`
randn(d0, d1, …, dn)	返回一个样本，具有标准正态分布。NotesFor random samples from $技术分享$ , use:`sigma * np.random.randn(...) + mu`Examples`>>> np.random.randn()2.1923875335537315 #random`Two-by-four array of samples from N(3, 6.25):`>>> 2.5 * np.random.randn(2, 4) + 3array([[-4.49401501, 4.00950034, -1.81814867, 7.29718677], #random [ 0.39924804, 4.68456316, 4.99394529, 4.84057254]]) #random`
randint(low[, high, size])	返回随机的整数，位于半开区间 [low, high)。`>>> np.random.randint(2, size=10)array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])>>> np.random.randint(1, size=10)array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])`Generate a 2 x 4 array of ints between 0 and 4, inclusive:`>>> np.random.randint(5, size=(2, 4))array([[4, 0, 2, 1], [3, 2, 2, 0]])`
random_integers(low[, high, size])	返回随机的整数，位于闭区间 [low, high]。NotesTo sample from N evenly spaced floating-point numbers between a and b, use:`a + (b - a) * (np.random.random_integers(N) - 1) / (N - 1.)`Examples`>>> np.random.random_integers(5)4>>> type(np.random.random_integers(5))>>> np.random.random_integers(5, size=(3.,2.))array([[5, 4], [3, 3], [4, 5]])`Choose five random numbers from the set of five evenly-spaced numbers between 0 and 2.5, inclusive (i.e., from the set $技术分享$ ):`>>> 2.5 * (np.random.random_integers(5, size=(5,)) - 1) / 4.array([ 0.625, 1.25 , 0.625, 0.625, 2.5 ])`Roll two six sided dice 1000 times and sum the results:`>>> d1 = np.random.random_integers(1, 6, 1000)>>> d2 = np.random.random_integers(1, 6, 1000)>>> dsums = d1 + d2`Display results as a histogram:`>>> import matplotlib.pyplot as plt>>> count, bins, ignored = plt.hist(dsums, 11, normed=True)>>> plt.show()`
random_sample([size])	返回随机的浮点数，在半开区间 [0.0, 1.0)。To sample $技术分享$ multiply the output ofrandom_sample by (b-a) and add a:`(b - a) * random_sample() + a`Examples`>>> np.random.random_sample()0.47108547995356098>>> type(np.random.random_sample())>>> np.random.random_sample((5,))array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428])`Three-by-two array of random numbers from [-5, 0):`>>> 5 * np.random.random_sample((3, 2)) - 5array([[-3.99149989, -0.52338984], [-2.99091858, -0.79479508], [-1.23204345, -1.75224494]])`
random([size])	返回随机的浮点数，在半开区间 [0.0, 1.0)。（官网例子与random_sample完全一样）
ranf([size])	返回随机的浮点数，在半开区间 [0.0, 1.0)。（官网例子与random_sample完全一样）
sample([size])	返回随机的浮点数，在半开区间 [0.0, 1.0)。（官网例子与random_sample完全一样）
choice(a[, size, replace, p])	生成一个随机样本，从一个给定的一维数组ExamplesGenerate a uniform random sample from np.arange(5) of size 3:`>>> np.random.choice(5, 3)array([0, 3, 4])>>> #This is equivalent to np.random.randint(0,5,3)`Generate a non-uniform random sample from np.arange(5) of size 3:`>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])array([3, 3, 0])`Generate a uniform random sample from np.arange(5) of size 3 without replacement:`>>> np.random.choice(5, 3, replace=False)array([3,1,0])>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]`Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:`>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])array([2, 3, 0])`Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:`>>> aa_milne_arr = [‘pooh‘, ‘rabbit‘, ‘piglet‘, ‘Christopher‘]>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])array([‘pooh‘, ‘pooh‘, ‘pooh‘, ‘Christopher‘, ‘piglet‘], dtype=‘\|S11‘)`
bytes(length)	返回随机字节。`>>> np.random.bytes(10)‘ eh\x85\x022SZ\xbf\xa4‘ #random`

排列

shuffle(x)	现场修改序列，改变自身内容。（类似洗牌，打乱顺序）`>>> arr = np.arange(10)>>> np.random.shuffle(arr)>>> arr[1 7 5 2 9 4 3 6 0 8]` This function only shuffles the array along the first index of a multi-dimensional array:`>>> arr = np.arange(9).reshape((3, 3))>>> np.random.shuffle(arr)>>> arrarray([[3, 4, 5], [6, 7, 8], [0, 1, 2]])`
permutation(x)	返回一个随机排列>>> np.random.permutation(10)array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6])``>>> np.random.permutation([1, 4, 9, 12, 15])array([15, 1, 9, 4, 12])``>>> arr = np.arange(9).reshape((3, 3))>>> np.random.permutation(arr)array([[6, 7, 8], [0, 1, 2], [3, 4, 5]])

分布

beta(a, b[, size])	贝塔分布样本，在 [0, 1]内。
binomial(n, p[, size])	二项分布的样本。
chisquare(df[, size])	卡方分布样本。
dirichlet(alpha[, size])	狄利克雷分布样本。
exponential([scale, size])	指数分布
f(dfnum, dfden[, size])	F分布样本。
gamma(shape[, scale, size])	伽马分布
geometric(p[, size])	几何分布
gumbel([loc, scale, size])	耿贝尔分布。
hypergeometric(ngood, nbad, nsample[, size])	超几何分布样本。
laplace([loc, scale, size])	拉普拉斯或双指数分布样本
logistic([loc, scale, size])	Logistic分布样本
lognormal([mean, sigma, size])	对数正态分布
logseries(p[, size])	对数级数分布。
multinomial(n, pvals[, size])	多项分布
multivariate_normal(mean, cov[, size])	多元正态分布。>>> mean = [0,0]>>> cov = [[1,0],[0,100]] # diagonal covariance, points lie on x or y-axis``>>> import matplotlib.pyplot as plt>>> x, y = np.random.multivariate_normal(mean, cov, 5000).T>>> plt.plot(x, y, ‘x‘); plt.axis(‘equal‘); plt.show()
negative_binomial(n, p[, size])	负二项分布
noncentral_chisquare(df, nonc[, size])	非中心卡方分布
noncentral_f(dfnum, dfden, nonc[, size])	非中心F分布
normal([loc, scale, size])	正态(高斯)分布NotesThe probability density for the Gaussian distribution is $技术分享$ where $技术分享$ is the mean and $技术分享$ the standard deviation. The square of the standard deviation, $技术分享$ , is called the variance.The function has its peak at the mean, and its “spread” increases with the standard deviation (the function reaches 0.607 times its maximum at $技术分享$ and $技术分享$ [R217]). ExamplesDraw samples from the distribution:`>>> mu, sigma = 0, 0.1 # mean and standard deviation>>> s = np.random.normal(mu, sigma, 1000)`Verify the mean and the variance:`>>> abs(mu - np.mean(s)) < 0.01True>>> abs(sigma - np.std(s, ddof=1)) < 0.01True`Display the histogram of the samples, along with the probability density function:`>>> import matplotlib.pyplot as plt>>> count, bins, ignored = plt.hist(s, 30, normed=True)>>> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) ... np.exp( - (bins - mu)2 / (2 sigma**2) ),... linewidth=2, color=‘r‘)>>> plt.show()`
pareto(a[, size])	帕累托（Lomax）分布
poisson([lam, size])	泊松分布
power(a[, size])	Draws samples in [0, 1] from a power distribution with positive exponent a - 1.
rayleigh([scale, size])	Rayleigh 分布
standard_cauchy([size])	标准柯西分布
standard_exponential([size])	标准的指数分布
standard_gamma(shape[, size])	标准伽马分布
standard_normal([size])	标准正态分布 (mean=0, stdev=1).
standard_t(df[, size])	Standard Student’s t distribution with df degrees of freedom.
triangular(left, mode, right[, size])	三角形分布
uniform([low, high, size])	均匀分布
vonmises(mu, kappa[, size])	von Mises分布
wald(mean, scale[, size])	瓦尔德（逆高斯）分布
weibull(a[, size])	Weibull 分布
zipf(a[, size])	齐普夫分布

随机数生成器

RandomState	Container for the Mersenne Twister pseudo-random number generator.
seed([seed])	Seed the generator.
get_state()	Return a tuple representing the internal state of the generator.
set_state(state)	Set the internal state of the generator from a tuple.