Python3.x部分迁移指南

最新推荐文章于 2023-09-05 09:39:42 发布

詹欧骑士

最新推荐文章于 2023-09-05 09:39:42 发布

阅读量657

点赞数

分类专栏： Python 文章标签： python2 python3 机器学习

本文链接：https://blog.csdn.net/LUO_CAN/article/details/79220049

版权

Python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Python3.x部分迁移指南

本博客参考机器之心公众号文章

Python3x部分迁移指南

2019 年底，Numpy 等很多科学计算工具都将停止支持 Python 2，而 2018 年后 Numpy 的所有新功能版本将只支持 Python 3。为了使 Python 2 向 Python 3 的转换更加轻松，我收集了一些 Python 3 的功能，希望对大家有用。

1 使用pathlib更好地处理路径

pathlib 是 Python 3 的默认模块，帮助避免使用大量的 os.path.joins：

from pathlib import Path

dataset = 'wiki_images'
datasets_root = Path('/path/to/datasets/') 

train_path = datasets_root / dataset / 'train'
test_path = datasets_root / dataset / 'test'

for image_path in train_path.iterdir():
    with image_path.open() as f: # note, open is a method of Path object
        # do something with an image

Python 2 总是试图使用字符串级联（准确，但不好），现在有了 pathlib，代码安全、准确、可读性强。
此外，pathlib.Path 具备大量方法，这样 Python 新用户就不用每个方法都去搜索了：

p.exists()
p.is_dir()
p.parts()
p.with_name('sibling.png') # only change the name, but keep the folder
p.with_suffix('.jpg') # only change the extension, but keep the folder and the name
p.chmod(mode)
p.rmdir()

pathlib 会节约大量时间，详见：

文档：https://docs.python.org/3/library/pathlib.html
参考信息：https://pymotw.com/3/pathlib/

2 类型提示（Type hinting）成为语言的一部分

PyCharm 中的类型提示示例：

class Greeter:
    def __init__(self, fmt: str):
        if fmt:
            self.fmt = fmt
        else:
            self.fmt = 'Hi, there {}'
    def greet(self, name: str) -> str:
        return self.fmt.format(name)
greeting = Greeter(None)

Python 不只是适合脚本的语言，现在的数据流程还包括大量步骤，每一步都包括不同的框架（有时也包括不同的逻辑）。

类型提示被引入 Python，以帮助处理越来越复杂的项目，使机器可以更好地进行代码验证。而之前需要不同的模块使用自定义方式在文档字符串中指定类型（注意：PyCharm 可以将旧的文档字符串转换成新的类型提示）。

下列代码是一个简单示例，可以处理不同类型的数据（这就是我们喜欢 Python 数据栈之处）。

def repeat_each_entry(data):
    """ Each entry in the data is doubled 
    <blah blah nobody reads the documentation till the end>
    """
    index = numpy.repeat(numpy.arange(len(data)), 2)
    return data[index]

numpy.repeat(a, repeats, axis=None)

>>> np.repeat(3, 4)
array([3, 3, 3, 3])
>>> x = np.array([[1,2],[3,4]])
>>> np.repeat(x, 2)
array([1, 1, 2, 2, 3, 3, 4, 4])
>>> np.repeat(x, 3, axis=1)
array([[1, 1, 1, 2, 2, 2],
       [3, 3, 3, 4, 4, 4]])
>>> np.repeat(x, [1, 2], axis=0)
array([[1, 2],
       [3, 4],
       [3, 4]])

3 通过 @ 实现矩阵乘法

python中矩阵点乘用‘’，矩阵乘用ndarray.dot(ndarray)，python3中可用‘@’表示矩阵乘。（matlab中矩阵点乘用‘.’，矩阵乘用‘*’）
下面我们实现一个带L2正则化的线性回归模型：

    # l2-regularized linear regression: || AX - b ||^2 + alpha * ||x||^2 -> min
    # Python 2
    X = np.linalg.inv(np.dot(A.T, A) + alpha * np.eye(A.shape[1])).dot(A.T.dot(b))
    # Python 3
    X = np.linalg.inv(A.T @ A + alpha * np.eye(A.shape[1])) @ (A.T @ b)

4 使用 ** 作为通配符

递归文件夹的通配符在 Python2 中并不是很方便，因此才存在定制的 glob2 模块来克服这个问题。递归 flag 在 Python 3.6 中得到了支持。

    import glob
    # Python 2
    found_images = \
        glob.glob('/path/*.jpg') \
      + glob.glob('/path/*/*.jpg') \
      + glob.glob('/path/*/*/*.jpg') \
      + glob.glob('/path/*/*/*/*.jpg') \
      + glob.glob('/path/*/*/*/*/*.jpg') 
    # Python 3
    found_images = glob.glob('/path/**/*.jpg', recursive=True)

python3 中更好的选择是使用 pathlib：

    # Python 3
    found_images = pathlib.Path('/path/').glob('**/*.jpg')

5 Print 在 Python3 中是函数

这里讲一下如何重定义print函数，最简单的形式如下：

    # Python 3
    _print = print # store the original print function
    def print(*args, **kargs):
        pass  # do something useful, e.g. store output to some file

在 Jupyter 中，非常好的一点是记录每一个输出到独立的文档，并在出现错误的时候追踪出现问题的文档，所以我们现在可以重写 print 函数了。
在下面的代码中，我们可以使用上下文管理器暂时重写 print 函数的行为：

    @contextlib.contextmanager
    def replace_print():
        import builtins
        _print = print # saving old print function
        # or use some other function here
        builtins.print = lambda *args, **kwargs: _print('new printing', *args, **kwargs)
        yield
        builtins.print = _print
    with replace_print():
        <code here will invoke other print function>

上面并不是一个推荐的方法，因为它会引起系统的不稳定。（应该如何做？）

print 函数可以加入列表解析和其它语言构建结构：

    # Python 3
    result = process(x) if is_valid(x) else print('invalid item: ', x)

6 f-strings 可作为简单和可靠的格式化

默认的格式化系统提供了一些灵活性，且在数据实验中不是必须的。但这样的代码对于任何修改要么太冗长，要么就会变得很零碎。而代表性的数据科学需要以固定的格式迭代地输出一些日志信息，通常需要使用的代码如下：

    # Python 2
    print('{batch:3} {epoch:3} / {total_epochs:3}  accuracy: {acc_mean:0.4f}±{acc_std:0.4f} time: {avg_time:3.2f}'.format(
        batch=batch, epoch=epoch, total_epochs=total_epochs, 
        acc_mean=numpy.mean(accuracies), acc_std=numpy.std(accuracies),
        avg_time=time / len(data_batch)
    ))
    # Python 2 (too error-prone during fast modifications, please avoid):
    print('{:3} {:3} / {:3}  accuracy: {:0.4f}±{:0.4f} time: {:3.2f}'.format(
        batch, epoch, total_epochs, numpy.mean(accuracies), numpy.std(accuracies),
        time / len(data_batch)
    ))

输出如下：

    120  12 / 300  accuracy: 0.8180±0.4649 time: 56.60

f-strings 即格式化字符串在 Python 3.6 中被引入：

    # Python 3.6+
    print(f'{batch:3} {epoch:3} / {total_epochs:3}  accuracy: {numpy.mean(accuracies):0.4f}±{numpy.std(accuracies):0.4f} time: {time / len(data_batch):3.2f}')

另外，写查询语句非常方便：

    query = f"INSERT INTO STATION VALUES (13, '{city}', '{state}', {latitude}, {longitude})"

7 自然语言处理的 Unicode

    s = '您好'
    print(len(s))
    print(s[:2])
    #python2输出：6\n��
    #python3输出：2\n您好

python3对非英文的自然语言处理更加方便，下面再列举了二者关于NLP的一些区别：

#python2
>>>file=open('test.txt','r')
>>>file.read(1)
'1'
>>>file.read(1)
'\n'
>>>file.read(1)
'\xe4'
>>>file.read(1)
'\xbd'
>>>file.read(1)
'\xa0'

#python3
>>>file=open('test.txt','r')
>>>file.read(1)
'1'
>>>file.read(1)
'\n'
>>>file.read(1)
'你'

python2与python3的open函数默认编码方式都是UTF-8，该编码方式中汉字占3个字节。python2中read(1)是读取一个字节，而在python3中是读取一个字符。

python3中bytes与str类型的相互转换（python3支持Unicode编码格式）：

#将字符串转换成字节
a = "吴文"
b = bytes(a, encoding='utf-8')
print(b)
b1 = bytes(a, encoding='gbk')
print(b1)

#将字节转换成字符串
a1 = str(b, encoding='utf-8')
print(a1)
a2 = str(b1, encoding='gbk')
print(a2)

8 数据科学特有的迁移问题

map(), .keys(), .values(), .items(), 等等返回迭代器，而不是列表。迭代器的主要问题有：没有琐碎的分割和无法迭代两次。将结果转化为列表几乎可以解决所有问题。
我们应该花时间弄懂什么是迭代器，为什么它不能像字符串那样被分片/级联/相乘/迭代两次（以及如何处理它）。

詹欧骑士

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python3.x部分迁移指南

Python3.x部分迁移指南本博客参考机器之心公众号文章Python3x部分迁移指南使用pathlib更好地处理路径类型提示Type hinting成为语言的一部分通过实现矩阵乘法使用作为通配符Print 在 Python3 中是函数f-strings 可作为简单和可靠的格式化自然语言处理的 Unicode数据科学特有的迁移问题2019 年底，N
复制链接

扫一扫

专栏目录