python常用函数积累（三）

最新推荐文章于 2024-04-01 13:14:19 发布

糖小豆子

最新推荐文章于 2024-04-01 13:14:19 发布

阅读量1.2k

点赞数 2

分类专栏： Python

本文链接：https://blog.csdn.net/Sugar_girl/article/details/79725705

版权

Python 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

1、strides
ndarray数据结构引用两个对象：数据存储区和 dtype对象存储区，具体包括dtype，dim count，dimensions，strides和data。
dim count指维度的数目；dimesion指各维度的数字；
strides指每个轴的下标增加1时数据存储区中的指针所增加的字节数，比如有个3*3的数组，元素类型是float32，那么每个数占4（32/8）字节。
在c语言格式中，第二个维度的数字比第一个变得快，所以第二个维度数字增加1指针增加4 字节，第一个的话就是12（4*3）个字节，所以strides（本身就有步伐”的意思）分别为12和4。不过这要求数据在内存中连续存储。
numpy.ndarray.strides
Tuple of bytes to step in each dimension when traversing an array.
The byte offset of element (i[0], i[1], …, i[n]) in an array a is:
offset = sum(np.array(i) * a.strides)

>>> y = np.reshape(np.arange(2*3*4), (2,3,4))
>>> y
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],
       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])
>>> y.strides
(48, 16, 4)
>>> y[1,1,1]
17
>>> offset=sum(y.strides * np.array((1,1,1)))
>>> offset/y.itemsize
17

>>> x = np.reshape(np.arange(5*6*7*8), (5,6,7,8)).transpose(2,3,1,0)
>>> x.strides
(32, 4, 224, 1344)
>>> i = np.array([3,5,2,2])
>>> offset = sum(i * x.strides)
>>> x[3,5,2,2]
813
>>> offset / x.itemsize
813

numpy.lib.stride_tricks.as_strided
numpy.lib.stride_tricks.as_strided(x, shape=None, strides=None, subok=False, writeable=True)[source]
Create a view into the array with the given shape and strides.
Parameters:
x : ndarray. Array to create a new.
shape : sequence of int, optional. The shape of the new array. Defaults to x.shape.
strides : sequence of int, optional. The strides of the new array. Defaults to x.strides.
subok : bool, optional.New in version 1.10.If True, subclasses are preserved.
writeable : bool, optional.New in version 1.12.
If set to False, the returned array will always be readonly. Otherwise it will be writable if the original array was. It is advisable to set this to False if possible (see Notes).
Returns: view : ndarray

import numpy as np

# 数独是个 9x9 的二维数组
# 包含 9 个 3x3 的九宫格
sudoku = np.array([   
    [2, 8, 7, 1, 6, 5, 9, 4, 3],
    [9, 5, 4, 7, 3, 2, 1, 6, 8],
    [6, 1, 3, 8, 4, 9, 7, 5, 2],
    [8, 7, 9, 6, 5, 1, 2, 3, 4],
    [4, 2, 1, 3, 9, 8, 6, 7, 5],
    [3, 6, 5, 4, 2, 7, 8, 9, 1],
    [1, 9, 8, 5, 7, 3, 4, 2, 6],
    [5, 4, 2, 9, 1, 6, 3, 8, 7],
    [7, 3, 6, 2, 8, 4, 5, 1, 9]
])

# 要将其变成 3x3x3x3 的四维数组
# 但不能直接 reshape，因为这样会把一行变成一个九宫格
shape = (3, 3, 3, 3)

# 大行之间隔 27 个元素，大列之间隔 3 个元素
# 小行之间隔 9 个元素，小列之间隔 1 个元素
strides = sudoku.itemsize * np.array([27, 3, 9, 1])

squares = np.lib.stride_tricks.as_strided(sudoku, shape=shape, strides=strides) 
print(squares)

'''
[[[[2 8 7]    [9 5 4]    [6 1 3]]
  [[1 6 5]    [7 3 2]    [8 4 9]]
  [[9 4 3]    [1 6 8]    [7 5 2]]]

 [[[8 7 9]    [4 2 1]    [3 6 5]]
  [[6 5 1]    [3 9 8]    [4 2 7]]
  [[2 3 4]    [6 7 5]    [8 9 1]]]

 [[[1 9 8]    [5 4 2]    [7 3 6]]
  [[5 7 3]    [9 1 6]    [2 8 4]]
  [[4 2 6]    [3 8 7]    [5 1 9]]]]
'''

就是按shape的形状对一个矩阵进行切块

2、ord()
ord() 函数是 chr() 函数（对于8位的ASCII字符串）或 unichr() 函数（对于Unicode对象）的配对函数，它以一个字符（长度为1的字符串）作为参数，返回对应的 ASCII 数值，或者Unicode 数值，如果所给的 Unicode 字符超出了你的 Python 定义范围，则会引发一个 TypeError 的异常。

>>>ord('a')
97
>>> ord('b')
98
>>> ord('c')
99

3、reduce()
reduce() 函数会对参数序列中元素进行累积。

函数将一个数据集合（链表，元组等）中的所有数据进行下列操作：用传给reduce中的函数 function（有两个参数）先对集合中的第 1、2 个元素进行操作，得到的结果再与第三个数据用 function 函数运算，最后得到一个结果。
reduce(function, iterable[, initializer])
function – 函数，有两个参数
iterable – 可迭代对象
initializer – 可选，初始参数

>>>def add(x, y) :            # 两数相加
...     return x + y
... 
>>> reduce(add, [1,2,3,4,5])   # 计算列表和：1+2+3+4+5
15
>>> reduce(lambda x, y: x+y, [1,2,3,4,5])  # 使用 lambda 匿名函数
15

4、join()
用于将序列中的元素以指定的字符连接生成一个新的字符串。
str.join(sequence)
sequence – 要连接的元素序列。
返回通过指定字符连接序列中元素后生成的新字符串。

str = "-";
seq = ("a", "b", "c"); # 字符串序列
print str.join( seq );
#a-b-c

5、’_main_’
通俗的理解name == ‘main‘：假如你叫小明.py，在朋友眼中，你是小明(name == ‘小明’)；在你自己眼中，你是你自己(name == ‘main‘)。

if name == ‘main‘的意思是：当.py文件被直接运行时，if name == ‘main‘之下的代码块将被运行；当.py文件以模块形式被导入时，if name == ‘main‘之下的代码块不被运行。
self可以理解为自己，类似于C++中的this指针，就是对象自身的意思，在用某个对象调用该方法时，就将该对象作为第一个参数传递给self

6、pandas display选项
import pandas as pd

pd.set_option(‘expand_frame_repr’, False)
True就是可以换行显示。设置成False的时候不允许换行

pd.set_option(‘display.max_rows’, 10)
pd.set_option(‘display.max_columns’, 10)
显示的最大行数和列数，如果超额就显示省略号，这个指的是多少个dataFrame的列。如果比较多又不允许换行，就会显得很乱。

pd.set_option(‘precision’, 5)
显示小数点后的位数

pd.set_option(‘large_repr’, A)
truncate表示截断，info表示查看信息，一般选truncate

pd.set_option(‘max_colwidth’, 5)
列长度

pd.set_option(‘chop_threshold’, 0.5)
绝对值小于0.5的显示0.0

pd.set_option(‘colheader_justify’, ‘left’)
显示居中还是左边，

pd.set_option(‘display.width’, 200)
横向最多显示多少个字符，一般80不适合横向的屏幕，平时多用200.

7、corr和cov
DataFrame的corr和cov方法将以DataFrame的形式返回完整的相关系数或者协方差矩阵。

协方差：两个变量在变化过程中是同方向变化还是反方向变化，同向或反向程度如何。
你变大，同时我也变大，说明两个变量是同向变化的，这时协方差就是正的；你变大，同时我变小，说明两个变量是反向变化的，这时协方差就是负的。从数值来看，协方差的数值越大，两个变量同向程序也就越大。反之亦然

C o v (X, Y) = E [(X - μ x) (Y - μ y)]

$Cov(X,Y)=E[(X-\mu _{x})(Y-\mu _{y})]$
如果有X,Y两个变量，每个时刻的“X值与其均值之差”乘以“Y值与其均值之差”得到一个乘积，再对这每时刻的乘积求和并求出均值（其实就是求期望）

相关系数：

ρ = C o v ( X , Y ) σ X σ Y

$\rho =\frac{Cov(X,Y)}{\sigma_{X}\sigma_{Y}}$
就是用X、Y的协方差除以X的标准差和Y的标准差。
相关系数也可以看成协方差：一种剔除了两个变量量纲影响、标准化后的特殊协方差。

8、Pandas.skew 求偏度、
偏度（skewness），是统计数据分布偏斜方向和程度的度量，是统计数据分布非对称程度的数字特征。偏度(Skewness)亦称偏态、偏态系数。
表征概率分布密度曲线相对于平均值不对称程度的特征数。直观看来就是密度函数曲线尾部的相对长度。

DataFrame.skew(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

参数：
axis : {index (0), columns (1)} 定义计算的轴
skipna : boolean, default True 计算时是否忽略空缺值，默认忽略
level : int or level name, default None（用的比较少）
numeric_only : boolean, default None（用的比较少）