七月在线之python数据处理
python常用导入函数
from IPython.display import display
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
from PIL import Image
import matplotlib.pyplot as plt
%matplotlib inline
%config ZMQInteractiveShell.ast_node_interactivity='all' # nootbook使用
from scipy import interp # 线性插值
ndarray
np之常用函数创建
np.ones(shape, dtype=None, order=‘C’)
#ones–>创建指定长度或形状全部为1的数组
参数说明:
shape:维度
dtype:数据类型,默认是float
order: 可选规定返回数组元素在内存的存储顺序:看源码两个选项:{‘C’, ‘F’},
C(C语言)-rowmajor;F(Fortran《FormulaTranslation)的缩写,是一种编程语言》)column-major
ndarray之聚合操作
以三维数组求和为例:
%config ZMQInteractiveShell.ast_node_interactivity=‘all’
a = np.random.randint(0,10,size = [3,2,3])
a.shape
b = a.min(axis = 0) # 按三维,对应位置比较
c = a.min(axis = 1) # 按两维,行比较
d = a.min(axis = 2) # 按1维,列比较
e = a.min(axis = -1) # 按1维,列比较
display(b,c,d,e)
其他聚合操作:
Function Name NaN-safe Version Description
np.sum np.nansum Compute sum of elements
np.prod np.nanprod Compute product of elements
np.mean np.nanmean Compute mean of elements
np.std np.nanstd Compute standard deviation
np.var np.nanvar Compute variance
np.min np.nanmin Find minimum value
np.max np.nanmax Find maximum value
np.argmin np.nanargmin Find index of minimum value
np.argmax np.nanargmax Find index of maximum value
np.median np.nanmedian Compute median of elements
np.percentile np.nanpercentile Compute rank-based statistics of elements
np.any N/A Evaluate whether any elements are true
np.all N/A Evaluate whether all elements are true
np.power 幂运算
pandas
pandas之series
Series的创建
Series是一种类似与一维数组的对象,由下面两个部分组成:
values:一组数据(ndarray类型)
index:相关的数据索引标签
Series的索引和切片
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
import matplotlib.pyplot as plt
%matplotlib inline
# %matplotlib inline这一句是IPython的魔法函数,
# 可以在IPython编译器里直接使用,作用是内嵌画图,省略掉plt.show()这一步,直接显示图像
s = Series(nd,index = ['a','b','c','d','e'])
# 显示索引
s[['a','d']]
s.loc[['a','d']]
# 隐式索引
s[[0,3]]
s.iloc[[0,3]]
# 以隐式索引为例,取一段连续的
s.iloc[[0,1,2,3]]
# 索引如果要取一段连续的值,就要多个索引,
# 对索引稍加修改,去掉一个中括号,逗号改冒号,引入切片
# 切片
s['a':'d']#左闭右闭
s.loc['a':'d']
s[0:4]
s.iloc[0:4]#左闭右开
执行结果:
后四行代码,运用切片方法
可视化
十分钟掌握Seaborn,进阶Python数据可视化分析:https://zhuanlan.zhihu.com/p/49035741
matlplob官网:https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html#matplotlib.pyplot.plot
matlplob中文文档:https://www.matplotlib.org.cn/tutorials/introductory/usage.html
Matplotlib可视化最有价值的 50 个图表http://liyangbit.com/pythonvisualization/matplotlib-top-50-visualizations/