Chapter4-3 :利用数组进行数据处理

最新推荐文章于 2024-08-31 21:39:38 发布

Big__Boy

最新推荐文章于 2024-08-31 21:39:38 发布

阅读量554

点赞数

分类专栏：利用Python进行数据分析学习文章标签： python numpy 利用数组进行数据处理

本文链接：https://blog.csdn.net/u013162562/article/details/53711809

版权

利用Python进行数据分析学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

'''
Chapter4-3 :利用数组进行数据处理
索引：
1.meshgrid 函数#遇到了Undefined variable from import报错问题并解决
2.将条件逻辑表述为数组运算：（np.where 的作用）
3.数学和统计方法：主要函数列表
4.用于布尔型数组的方法
5.排序：就地排序arr.sort()与顶级方法排序np.sort(arr)注意区分
6.唯一化与集合运算
Created on 2016年12月17日
@author: Bigboy
'''

实践过程：

#-*- coding:utf-8 -*-
import numpy as np
#1-----------------------------------------------------------------------------

#为解释meshgrid去point：
point = np.arange(4)
#但书上是
#point = np.arange(-5,5,0.01)
xs,ys = np.meshgrid(point,point)#接收两个一维数组，产生两个二维数组
#实际上得到的xs，ys的任意组合正是point 中元素任意组合得到的点。
#从xs中任取一个数xs[i][j]，再在ys中任取一个数ys[k][z]，得到一个坐标(x,y)
#比如取x[1][1]和y[2][3]得到点(1,2)。
print xs
print ys

'''
out:
[[0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]
 [0 1 2 3]]

[[0 0 0 0]
 [1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]
'''

import matplotlib.pyplot as plt
z= np.sqrt(xs**2+ys**2)

plt.imshow(z,cmap=plt.cm.gray)
#引用plt.cm.gray时报错：Undefined variable from import matplotlib.
#解决：eclipse依次选择windows->preferences->pydev->interpreters->python interpreter->Forced Builtins,
#在其中new一个内建名称为“matplotlib”，点应用然后重启eclipse即可
#参考“无名”博客http://xdzw608.blog.51cto.com/4812210/1620403
plt.colorbar()
plt.title('picture of $\sqrt{x^2+y^2}$.')#注意\的作用把sqrt转成数学符号
#plt.show()#展示图片


#2-------------------------------------------------------------------------------
#条件逻辑
#可以用result = [(x if c else y) for x,y,c in zip(xarr,yarr,cond)]

xarr = np.array([1.1,1.2,1.3,1.4,1.5])
yarr = np.array([2.1,2.2,2.3,2.4,2.5]) 
cond = np.array([True,False,True,True,False])

result = [(x if c else y) for x,y,c in zip(xarr,yarr,cond)]
print result

#但是用where又快又简洁：
print np.where(cond,xarr,yarr)
'''
out:
[1.1000000000000001, 2.2000000000000002, 1.3, 1.3999999999999999, 2.5]
[ 1.1  2.2  1.3  1.4  2.5]
'''

#where 的参数可以不是数组
arr = np.random.randn(4,4)
print arr 
print np.where(arr>0,2,-2)
'''
out:
[[ 1.46828437  1.12557522 -0.62415282 -0.83070197]
 [-1.02569319  0.18433816  0.87384813 -0.01210176]
 [-1.15162067  0.141177    1.65110914  0.62002685]
 [ 0.65077927 -2.78828133  0.65450814  0.80931196]]

[[ 2  2 -2 -2]
 [-2  2  2 -2]
 [-2  2  2  2]
 [ 2 -2  2  2]]
'''

#3------------------------------------------------------------------------
#简单统计函数的实践
data = np.arange(16).reshape((4,4))
print data
print data.sum()#所有元素的和,返回一个数
print data.sum(0)#按第0个索引求和,即求每列的和
print data.cumsum()#注意不同
print data.cumsum(0)
print data.std(1)#每行的标准差
print data.std(0)#每列的标准差

'''
out:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]
 
120

[24 28 32 36]

[  0   1   3   6  10  15  21  28  36  45  55  66  78  91 105 120]

[[ 0  1  2  3]
 [ 4  6  8 10]
 [12 15 18 21]
 [24 28 32 36]]

[ 1.11803399  1.11803399  1.11803399  1.11803399]

[ 4.47213595  4.47213595  4.47213595  4.47213595]
'''
#4-------------------------------------------------------------------------
#在上表中的方法中，Boolean 型数据都被转为1和0.而且非0的元素都被视为1

arr= np.random.randn(10)
print arr
print arr>0#boolean数组
print (arr>0).sum()#运算时被当做是1和0，
print (arr>0).any()#存在True,
print (arr>0).all()#全是True
'''
out:
[ 1.53279947  0.40557692 -0.83135121  0.41576791  2.0295505   1.84298965
  0.2902597   0.82986302  1.09950845 -0.97778626]

[ True  True False  True  True  True  True  True  True False]

8

True

False
'''
#5------------------------------------------------------------------------------
arr = np.random.randn(5)
arr3 = np.random.randn(5) 
print arr
print arr.sort()#就地排序，没有返回值
print arr#本身改变
print arr3
print np.sort(arr3)#顶级方法排序，返回已排序的副本
print arr3#本身不变

print arr[::-1]#降序输出,建立新的数组
print arr#没有本身还是升序



'''
out:
[-1.01860266  0.1990631   0.19386477  0.72080222  2.39733426]

None

[-1.01860266  0.19386477  0.1990631   0.72080222  2.39733426]

[ 0.36441916 -0.59830668 -1.3190992   0.93019553  0.00647607]

[-1.3190992  -0.59830668  0.00647607  0.36441916  0.93019553]

[ 0.36441916 -0.59830668 -1.3190992   0.93019553  0.00647607]

[ 2.39733426  0.72080222  0.1990631   0.19386477 -1.01860266]

[-1.01860266  0.19386477  0.1990631   0.72080222  2.39733426]

'''

#6----------------------------------------------------------------------------
names = np.array(['Tom','Will','Joe','Tom','Eve','Joe'])
print np.unique(names)#去掉重复的，并返回排好序的结果

num1 = np.array([1,2,3,4])
num2 = np.array([1,3,4,5])
print np.intersect1d(num1,num2)#交集
print np.union1d(num1, num2)#并集
print np.in1d(num1,num2)#是否包含于
print np.setdiff1d(num1,num2)#差集
'''
out:
['Eve' 'Joe' 'Tom' 'Will']

[1 3 4]

[1 2 3 4 5]

[ True False  True  True]

[2]
'''

函数备忘：

Big__Boy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Chapter4-3 :利用数组进行数据处理

'''Chapter4-3 :利用数组进行数据处理索引：1.meshgrid 函数#遇到了Undefined variable from import报错问题并解决2.将条件逻辑表述为数组运算：（np.where 的作用）3.数学和统计方法：主要函数列表4.用于布尔型数组的方法5.排序：就地排序arr.sort()与顶级方法排序np.sort(arr)注意区分6.唯
复制链接

扫一扫

专栏目录