使用外部库——Python 7/7
导入、运算符重载和进入外部库的世界冒险的生存技巧
本课你将学到Python中的导入方法,获取一些使用不熟悉的外部库的技巧,还有深入了解运算符重载。
导入外部库
- 目前,我们已经讨论过语言内置的类型和函数。
- 但是另一个Python很棒的功能就是有大量的、已经写好的、高质量的、自定义外部库。
- 有些库是“标准库”,意思是你在运行Python是都可以找得到他们。其他库可以很容易的加入进来,即使他们不常常和Python自动绑定。
- 总之,我们可以通过导入来访问这些代码。
- 我们先从导入
math
库开始:
import math
print("It's math! It has type {}".format(type(math)))
It's math! It has type <class 'module'>
math
是一个模块,模块只是一个其他人定义的变量的集合。我们可通过内置函数dir()
查看所有math
定义的名字。
print(dir(math))
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']
- 我们可以用点语法访问这些变量。有些只是普通的值,像
math.pi
print("pi to 4 significant digits = {:.4}".format(math.pi))
pi to 4 significant digits = 3.142
- 但大多数是指函数如
math.log()
math.log(32, 2)
5.0
- 当然,如果我们不知道
math.log
是干啥的,可以调用help()
help(math.log)
Help on built-in function log in module math:
log(...)
log(x, [base=math.e])
Return the logarithm of x to the given base.
If the base not specified, returns the natural logarithm (base e) of x.
- 我们可以对模块调用
help()
函数,这回返回一个组合的文档说明,里面包含了模块中所有的变量、函数(也包括一个更高级别的模块描述)。
help(math)
Help on module math:
NAME
math
MODULE REFERENCE
https://docs.python.org/3.7/library/math
The following documentation is automatically generated from the Python
source files. It may be incomplete, incorrect or include features that
are considered implementation detail and may vary between Python
implementations. When in doubt, consult the module reference at the
location listed above.
DESCRIPTION
This module provides access to the mathematical functions
defined by the C standard.
FUNCTIONS
acos(x, /)
Return the arc cosine (measured in radians) of x.
acosh(x, /)
Return the inverse hyperbolic cosine of x.
asin(x, /)
Return the arc sine (measured in radians) of x.
asinh(x, /)
Return the inverse hyperbolic sine of x.
atan(x, /)
Return the arc tangent (measured in radians) of x.
atan2(y, x, /)
Return the arc tangent (measured in radians) of y/x.
Unlike atan(y/x), the signs of both x and y are considered.
atanh(x, /)
Return the inverse hyperbolic tangent of x.
ceil(x, /)
Return the ceiling of x as an Integral.
This is the smallest integer >= x.
copysign(x, y, /)
Return a float with the magnitude (absolute value) of x but the sign of y.
On platforms that support signed zeros, copysign(1.0, -0.0)
returns -1.0.
cos(x, /)
Return the cosine of x (measured in radians).
cosh(x, /)
Return the hyperbolic cosine of x.
degrees(x, /)
Convert angle x from radians to degrees.
erf(x, /)
Error function at x.
erfc(x, /)
Complementary error function at x.
exp(x, /)
Return e raised to the power of x.
expm1(x, /)
Return exp(x)-1.
This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.
fabs(x, /)
Return the absolute value of the float x.
factorial(x, /)
Find x!.
Raise a ValueError if x is negative or non-integral.
floor(x, /)
Return the floor of x as an Integral.
This is the largest integer <= x.
fmod(x, y, /)
Return fmod(x, y), according to platform C.
x % y may differ.
frexp(x, /)
Return the mantissa and exponent of x, as pair (m, e).
m is a float and e is an int, such that x = m * 2.**e.
If x is 0, m and e are both 0. Else 0.5 <= abs(m) < 1.0.
fsum(seq, /)
Return an accurate floating point sum of values in the iterable seq.
Assumes IEEE-754 floating point arithmetic.
gamma(x, /)
Gamma function at x.
gcd(x, y, /)
greatest common divisor of x and y
hypot(x, y, /)
Return the Euclidean distance, sqrt(x*x + y*y).
isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0)
Determine whether two floating point numbers are close in value.
rel_tol
maximum difference for being considered "close", relative to the
magnitude of the input values
abs_tol
maximum difference for being considered "close", regardless of the
magnitude of the input values
Return True if a is close in value to b, and False otherwise.
For the values to be considered close, the difference between them
must be smaller than at least one of the tolerances.
-inf, inf and NaN behave similarly to the IEEE 754 Standard. That
is, NaN is not close to anything, even itself. inf and -inf are
only close to themselves.
isfinite(x, /)
Return True if x is neither an infinity nor a NaN, and False otherwise.
isinf(x, /)
Return True if x is a positive or negative infinity, and False otherwise.
isnan(x, /)
Return True if x is a NaN (not a number), and False otherwise.
ldexp(x, i, /)
Return x * (2**i).
This is essentially the inverse of frexp().
lgamma(x, /)
Natural logarithm of absolute value of Gamma function at x.
log(...)
log(x, [base=math.e])
Return the logarithm of x to the given base.
If the base not specified, returns the natural logarithm (base e) of x.
log10(x, /)
Return the base 10 logarithm of x.
log1p(x, /)
Return the natural logarithm of 1+x (base e).
The result is computed in a way which is accurate for x near zero.
log2(x, /)
Return the base 2 logarithm of x.
modf(x, /)
Return the fractional and integer parts of x.
Both results carry the sign of x and are floats.
pow(x, y, /)
Return x**y (x to the power of y).
radians(x, /)
Convert angle x from degrees to radians.
remainder(x, y, /)
Difference between x and the closest integer multiple of y.
Return x - n*y where n*y is the closest integer multiple of y.
In the case where x is exactly halfway between two multiples of
y, the nearest even value of n is used. The result is always exact.
sin(x, /)
Return the sine of x (measured in radians).
sinh(x, /)
Return the hyperbolic sine of x.
sqrt(x, /)
Return the square root of x.
tan(x, /)
Return the tangent of x (measured in radians).
tanh(x, /)
Return the hyperbolic tangent of x.
trunc(x, /)
Truncates the Real x to the nearest Integral toward 0.
Uses the __trunc__ magic method.
DATA
e = 2.718281828459045
inf = inf
nan = nan
pi = 3.141592653589793
tau = 6.283185307179586
FILE
/opt/conda/lib/python3.7/lib-dynload/math.cpython-37m-x86_64-linux-gnu.so
其他导入语法
- 如果我们知道自己将十分频繁的使用导入的模块,我们可以为模块起一个更短的别名。(即使math已经够短了)
import math as mt
mt.pi
3.141592653589793
你或许已经在使用一些流行库像 Pandas、Numpy、Tensorflow、Matplotlib时见过这样的用法,如
import numpy as np
import pandas as pd
as
很简单的就将模块重命名了,和下面的功能是一样的:
import math
mt = math
- 我们使用
math
中的所有变量都仅使用他们的变量名,比如用pi
而不是math.pi
或mt.pi
岂不美哉?好消息:你可以!
from math import *
print(pi, log(32, 2))
3.141592653589793 5.0
import *
可以让你直接访问模块中的所有变量,而不需要点语法。- 坏消息是:一些语言纯粹者会抱怨你这样的做法。
- 他们这样说也不无道理:
from math import *
from numpy import *
print(pi, log(32, 2))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_19/3018510453.py in <module>
1 from math import *
2 from numpy import *
----> 3 print(pi, log(32, 2))
TypeError: return arrays must be of ArrayType
- 怎么回事,之前是可以用的呀?
- 这种星型导入偶尔会产生奇怪的、难以调试的情况。
- 这里的问题出现在
math
和numpy
模块都有函数叫log
,但是他们有不同的语义,由于我们第二次导入了numpy
,他的log
函数重写了math
库导入的log
函数。 - 最理想的妥协方法就是只从各自模块中导入我们需要的东西:
from math import log, pi
from numpy import asarray
子模块
- 我们已经将模块包含特定的函数和变量。我们需要知道的是特们也可以包含其他模块
import numpy
print("numpy.random is a", type(numpy.random))
print("it contains names such as...",
dir(numpy.random)[-15:]
)
numpy.random is a <class 'module'>
it contains names such as... ['seed', 'set_state', 'shuffle', 'standard_cauchy', 'standard_exponential', 'standard_gamma', 'standard_normal', 'standard_t', 'test', 'triangular', 'uniform', 'vonmises', 'wald', 'weibull', 'zipf']
- 所以如果我们像上面导入了
numpy
那我们要调用random
子模块中的函数就要用两个点。
# Roll 10 dice
rolls = numpy.random.randint(low=1, high=6, size=10)
rolls
array([5, 5, 3, 4, 5, 1, 2, 2, 1, 1])
你走过的地方,你见过的对象
在第六课结束后,你应该已将是使用 整数、浮点数、布尔值、列表、字符串、和字典的高手了(吧?)
- 即使是这样,学习也从未止步。当你使用一些库来完成特定的任务时,你会发现他们定义了自己独特的对象需要你继续学习。例如,在图像库
matplotlib
中你会遇到他们定义的Subplots, Figures, TickMarks, and Annotations等对象。pandas
函数中会出现 DataFrames 和 Series. - 这部分,我想跟你分享下学习这些奇奇怪怪的类型的快速生存手册。
理解奇怪对象的三大法宝
- 上个代码块中我们看到
numpy
中的奇怪函数“array”。别担心,我们有三个熟悉的函数会帮助我们。
type()
(是什么?)
type(rolls)
numpy.ndarray
dir()
(怎么用?)
print(dir(rolls))
['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_function__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__init_subclass__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmatmul__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__xor__', 'all', 'any', 'argmax', 'argmin', 'argpartition', 'argsort', 'astype', 'base', 'byteswap', 'choose', 'clip', 'compress', 'conj', 'conjugate', 'copy', 'ctypes', 'cumprod', 'cumsum', 'data', 'diagonal', 'dot', 'dtype', 'dump', 'dumps', 'fill', 'flags', 'flat', 'flatten', 'getfield', 'imag', 'item', 'itemset', 'itemsize', 'max', 'mean', 'min', 'nbytes', 'ndim', 'newbyteorder', 'nonzero', 'partition', 'prod', 'ptp', 'put', 'ravel', 'real', 'repeat', 'reshape', 'resize', 'round', 'searchsorted', 'setfield', 'setflags', 'shape', 'size', 'sort', 'squeeze', 'std', 'strides', 'sum', 'swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']
# If I want the average roll, the "mean" method looks promising...
rolls.mean()
2.9
# Or maybe I just want to turn the array into a list, in which case I can use "tolist"
rolls.tolist()
[5, 5, 3, 4, 5, 1, 2, 2, 1, 1]
help()
(我想知道更多)
# That "ravel" attribute sounds interesting. I'm a big classical music fan.
help(rolls.ravel)
Help on built-in function ravel:
ravel(...) method of numpy.ndarray instance
a.ravel([order])
Return a flattened array.
Refer to `numpy.ravel` for full documentation.
See Also
--------
numpy.ravel : equivalent function
ndarray.flat : a flat iterator on the array.
# Okay, just tell me everything there is to know about numpy.ndarray
# (Click the "output" button to see the novel-length output)
help(rolls)
(输出太长了,描述太多,你可以查看在线文档)
运算符重载
[3, 4, 1, 2, 2, 1] + 10
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_19/2144087748.py in <module>
----> 1 [3, 4, 1, 2, 2, 1] + 10
TypeError: can only concatenate list (not "int") to list
- 多蠢的问题,这当然不对。
- 但是这样可以吗?
rolls + 10
array([15, 15, 13, 14, 15, 11, 12, 12, 11, 11])
- 我们可能会认为Python会严格监控其核心语法的运算符的行为方式,例如
+
、<
,in
,==
、或方括号索引和切片。但事实上,它采取一个不干涉的方法。当你定义一个新类型,你可以选择如何为它添加额外工作,或该类型的对象等于其他的类型。 - 这是一些
numpy
arrays和Python运算符的奇妙互动
# At which indices are the dice less than or equal to 3?
rolls <= 3
array([False, False, True, False, False, True, True, True, True,
True])
xlist = [[1,2,3],[2,4,6],]
# Create a 2-dimensional array
x = numpy.asarray(xlist)
print("xlist = {}\nx =\n{}".format(xlist, x))
xlist = [[1, 2, 3], [2, 4, 6]]
x =
[[1 2 3]
[2 4 6]]
# Get the last element of the second row of our numpy array
x[1,-1]
6
# Get the last element of the second sublist of our nested list?
xlist[1,-1]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_19/3020169379.py in <module>
1 # Get the last element of the second sublist of our nested list?
----> 2 xlist[1,-1]
TypeError: list indices must be integers or slices, not tuple
- numpy的
ndarray
是专门为多维数据而生的,所以它定义了自己的索引逻辑。让我们能够在各个维度中通过索引访问元组。
当 1 + 1 不再等于 2?
- 其实还可以更奇怪,你可能听过(甚至用过)一个流行的深度学习Python库:tensorflow。它将运算符重载做到了极致:
import tensorflow as tf
# Create two constants, each with value 1
a = tf.constant(1)
b = tf.constant(1)
# Add them together to get...
a + b
2021-09-13 19:57:05.691148: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib
2021-09-13 19:57:05.691269: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-09-13 19:57:10.718479: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-13 19:57:10.721541: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib
2021-09-13 19:57:10.721584: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-09-13 19:57:10.721611: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (afebc7c86ed6): /proc/driver/nvidia/version does not exist
2021-09-13 19:57:10.723864: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-13 19:57:10.725410: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
<tf.Tensor: shape=(), dtype=int32, numpy=2>
a + b
不等于2,等于(用tensorflow的文档的话说)
操作
的产出之一的一个象征性的句柄。它不保存该操作的输出的值,而是提供了计算这些值在TensorFlow tf的一种手段tf.Session
.。
- 要明白这一点很重要:库模块会经常采用不明显或很神奇的方式进行运算符重载。
- 能够明白Python对于整数、字符串和列表实行的运算符重载不代表你就可以弄明白对 tensorflow 的
Tensor
或 numpy 的ndarray
或 pandas 的DataFrame
实行的运算符重载。 - 下面的例子就看着很含糊:
# Get the rows with population over 1m in South America
df[(df['population'] > 10**6) & (df['continent'] == 'South America')]
- 但是为什么就能这么重载呢。上面的例子共展示了5个不同的重载后的运算符。每一个是怎么运行的?当出错时,明白这一点就会很有帮助。
好奇这是怎么实现的 - 你曾经调用
help()
查询一个对象的帮助文档时,你见过哪些有双横线的名字吗?
print(dir(list))
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
- 这实际上与运算符重载直接相关。
- 当Python程序员想定义运算符是怎么操作他们定义的对象时,他们通过实现这些有两个横线开头,两个横线结尾的特殊名字如
__lt__
,__setattr__
__contains__
。总的来说,这些函数对于Python有特殊意义。 - 比如,表达式
x in [1, 2, 3]
实际上和函数__contains__
有关。在幕后,他就等同于这个比较丑的形式:[1, 2, 3].__contains__(x)
。 - 如果你想学习更多,你可以查看Python官方手册,里面有很多很多这样的特殊函数
- 你在这节课不会去自己编写(要有时间就好了),但我希望你今后有机会定义属于自己的奇怪的对象和他们的方法。