Best questions about python at stackoverflow in 2016

最新推荐文章于 2024-09-05 20:15:59 发布

clay小哈_

最新推荐文章于 2024-09-05 20:15:59 发布

阅读量684

点赞数

分类专栏： Programming language Machine Learning、Cloud DataAnalysis、Processing DataStruct、Algorithmic 文章标签： python stack overflow datastruct methods

Programming language 同时被 3 个专栏收录

30 篇文章 0 订阅

订阅专栏

DataStruct、Algorithmic

15 篇文章 0 订阅

订阅专栏

Machine Learning、Cloud

12 篇文章 0 订阅

订阅专栏

stackoverflow

1. 使用pythonic的方式避免“if x : return x”的语句。

Question:
I have a method that calls 4 other methods in sequence to check for specific conditions, and returns immediately (not checking the following ones) whenever one returns something Truthy.

def check_all_conditions():
    x = check_size()
    if x:
        return x
    x = check_color()
    if x:
        return x
    x = check_tone()
    if x:
        return x
    x = check_flavor()
    if x:
        return x
    return None

This seems like a lot of baggage code. Instead of each 2-line if statement, I’d rather do something like:

x and return x

But that is invalid Python. Am I missing a simple, elegant solution here? Incidentally, in this situation, those four check methods may be expensive, so I do not want to call them multiple times.
Answer:

使用循环

conditions = (check_size, check_color, check_tone, check_flavor)
for condition in conditions:
    result = condition()
    if result:
        return result

或者，使用生成器表达式：

conditions = (check_size, check_color, check_tone, check_flavor)
checks = （condition() for condition in conditions）
return next((check for check in checks if check),None)

使用or连接，返回第一个真值或者None：

def check_all_conditions():
    return check_size() or check_color() or check_tone() or check_flavor() or None

Demo:
这里写图片描述

**2. 如何理解python中的“else”子句？**

Question：
Many Python programmers are probably unaware that the syntax of while loops and for loops includes an optional else: clause:

for val in iterable:
    do_something(val)
else:
    clean_up()

The body of the else clause is a good place for certain kinds of clean-up actions, and is executed on normal termination of the loop: I.e., exiting the loop with return or break skips the else clause; exiting after a continue executes it. I know this only because I just looked it up (yet again), because I can never remember when the else clause is executed.

Always? On “failure” of the loop, as the name suggests? On regular termination? Even if the loop is exited with return? I can never be entirely sure without looking it up.

I blame my persisting uncertainty on the choice of keyword: I find else incredibly unmnemonic for this semantics. My question is not “why is this keyword used for this purpose” (which I would probably vote to close, though only after reading the answers and comments), but how can I think about the else keyword so that its semantics make sense, and I can therefore remember it?

I’m sure there was a fair amount of discussion about this, and I can imagine that the choice was made for consistency with the try statement’s else: clause (which I also have to look up), and with the goal of not adding to the list of Python’s reserved words. Perhaps the reasons for choosing else will clarify its function and make it more memorable, but I’m after connecting name to function, not after historical explanation per se.

The answers to this question, which my question was briefly closed as a duplicate of, contain a lot of interesting back story. My question has a different focus (how to connect the specific semantics of else with the keyword choice), but I feel there should be a link to this question somewhere.
Answer:
一个 if 语句在条件为假时候运行 else 子句，同样地，一个while语句在其条件为flase时运行其else子句。
This rule matches the behavior you described:

在正常执行中， while 循环重复运行直至条件为假，因此很自然的退出循环并进入 else 子句；
当执行 break 语句时，会不经条件判断直接退出循环，所以条件就不能为假，也就永远不会执行 else 子句；
当执行 continue 语句时，会再次进行条件判断，然后在循环迭代的开始处正常执行。所以如果条件为真，就接着循环，如果条件为假就运行 else 子句；
Other methods of exiting the loop, such as return, do not evaluate the condition and therefore do not run the else clause.（其他退出loop的方法，例如return语句，不会经过条件判断，所以就不会经过else 子句）。

for loops behave the same way. Just consider the condition as true if the iterator has more elements, or false otherwise.

**3. 如何避免 _init_中 “self.x = x; self.y = y; self.z = z” 这样的模式?**

def __init__(self, x, y, z):
    self.x = x
    self.y = y
    self.z = z

是否存在一种比较好的方法，避免这种参数初始化模式，应该继承nemedtuple吗？
Answer：
whatever，this is right！
以下是一些如何去避免这种情况的解决方案：

针对只有关键字参数的情况，可简单使用settat

class A:
    def __init__(self, **kwargs):
        for key in kwargs:
            setattr(self, key, kwargs[key])

a = A(l=1, n=2, m=0)
a.l # return 1
a.n # return 2
a.m # return o

针对同时有位置参数和关键字参数，可采用装饰器

import decorator
import inspect
import sys


@decorator.decorator
def simple_init(func, self, *args, **kws):
    """
    @simple_init
    def __init__(self,a,b,...,z)
        dosomething()

    behaves like

    def __init__(self,a,b,...,z)
        self.a = a
        self.b = b
        ...
        self.z = z
        dosomething()
    """

    #init_argumentnames_without_self = ['a','b',...,'z']
    if sys.version_info.major == 2:
        init_argumentnames_without_self = inspect.getargspec(func).args[1:]
    else:
        init_argumentnames_without_self = tuple(inspect.signature(func).parameters.keys())[1:]

    positional_values = args
    keyword_values_in_correct_order = tuple(kws[key] for key in init_argumentnames_without_self if key in kws)
    attribute_values = positional_values + keyword_values_in_correct_order

    for attribute_name,attribute_value in zip(init_argumentnames_without_self,attribute_values):
        setattr(self,attribute_name,attribute_value)

    # call the original __init__
    func(self, *args, **kws)


class Test():
    @simple_init
    def __init__(self,a,b,c,d=4):
        print(self.a,self.b,self.c,self.d)

#prints 1 3 2 4
t = Test(1,c=2,b=3)
#keeps signature
#prints ['self', 'a', 'b', 'c', 'd']
if sys.version_info.major == 2:
    print(inspect.getargspec(Test.__init__).args)
else:
    print(inspect.signature(Test.__init__))

4 . 为什么Python3中浮点值40.1看起来是对的，但是30.1则不然？

Answer：
The simple answer is because 3*0.1 != 0.3 due to quantization (roundoff) error (whereas 4*0.1 == 0.4 because multiplying by a power of two is usually an “exact” operation).
简单地说，因为由于量化（舍入）误差的存在，3*0.1 != 0.3（而4*0.1 == 0.4是因为2的幂的乘法通常是一个“精确的”操作）。

You can use the .hex method in Python to view the internal representation of a number (basically, the exact binary floating point value, rather than the base-10 approximation). This can help to explain what’s going on under the hood.
你可以在Python中使用 .hex 方法来查看数字的内部表示（基本上，是确切的二进制浮点值，而不是十进制的近似值）。这可以帮助解释下面发生了什么。
这里写图片描述
但是，当乘以3时，0x0.99和0x0.a0（0x0.07）之间的微小差异放大为0x0.15的错误，在最后一个位置显示为一位错误。这使得0.1*3大于整值0.3。
Python 3中浮点数的repr设计为可以往返的，也就是说，显示的值应该可以精确地转换为原始值。因此，它不能以完全相同的方式显示0.3和0.1 * 3，或者两个不同的数字在往返之后是相同的。所以，Python 3的repr引擎选择显示有轻微的有明显错误的结果。

5 . 当前如下的python代码是否知道它的缩进嵌套级别？

print（get_indentation_level()）
    print(get_indentation_level())
        print(get_indentation_level())

我想得到这样的结果：

1
2
3

这段代码是否能够识别这种格式？
All I want is the output from the more nested parts of the code to be more nested. In the same way that this makes code easier to read, it would make the output easier to read.
我想要的是更多的嵌套部分的代码的输出更多的嵌套。同样地，使得代码更容易阅读，也使输出更容易阅读。

Of course I could implement this manually, using e.g. .format(), but what I had in mind was a custom print function which would print(i*’ ’ + string) where i is the indentation level. This would be a quick way to make readable output on my terminal.
当然，我可以用.format()手动实现，但我想到的是一个自定义 print 函数，它将print(i*’ ’ + string），其中i是缩进级别。这会是一个在终端中产生可读输出的快速方法。

Is there a better way to do this which avoids painstaking manual formatting?
有没有更好的格式化方式？
Answer:
如果你想要嵌套级别的缩进，而不是空格和制表符，事情变得棘手。例如，在下面的代码中：

if True:
    print(
        get_nesting_level())

对get_nesting_level的调用实际上是嵌套1级，尽管事实上在get_nesting_level的调用行前没有空格。同时，在下面的代码中：

print(1,
      2,
      get_nesting_level())

6 为什么python的Array很慢？

我以为 array.array 比 list 要快，因为array看起来是未装箱的(unboxed)。

然后，我得到了下面的结果：

In [1]: import array

In [2]: L = list(range(100000000))

In [3]: A = array.array('l', range(100000000))

In [4]: %timeit sum(L)
1 loop, best of 3: 667 ms per loop

In [5]: %timeit sum(A)
1 loop, best of 3: 1.41 s per loop

In [6]: %timeit sum(L)
1 loop, best of 3: 627 ms per loop

In [7]: %timeit sum(A)
1 loop, best of 3: 1.39 s per loop

造成这种问题的区别是什么？
Answer：
The storage is “unboxed”, but every time you access an element Python has to “box” it (embed it in a regular Python object) in order to do anything with it. For example, your sum(A) iterates over the array, and boxes each integer, one at a time, in a regular Python int object. That costs time. In your sum(L), all the boxing was done at the time the list was created.
存储是未装箱的，但当每次访问一个元素，Python都必须将其“装箱”，一遍做其他操作，例如，sum(A) 遍历了array，并且每次一个装箱到Python的int对象中，这需要一定的时间，在sum(L)中，所有的“装箱”操作都已在创建list的时候完成了。

So, in the end, an array is generally slower, but requires substantially less memory.
所以，数组普遍慢，但是实际上，占用更少的内存。

Here’s the relevant code from a recent version of Python 3, but the same basic ideas apply to all CPython implementations since Python was first released.
Python3 最新版本的代码如下，也可以用Cpython实现。
Here’s the code to access a list item:

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    /* error checking omitted */
    return ((PyListObject *)op) -> ob_item[i];
}

here’s very little to it: somelist[i] just returns the i’th object in the list (and all Python objects in CPython are pointers to a struct whose initial segment conforms to the layout of a struct PyObject).

And here’s the getitem implementation for an array with type code l:

static PyObject *
l_getitem(arrayobject *ap, Py_ssize_t i)
{
    return PyLong_FromLong(((long *)ap->ob_item)[i]);
}

原始内存被视为本地平台的元素为C long（长整型）的向量；第 i 个C long 被读出；然后调用PyLong_FromLong() 将本地的C long 包装（“装箱”）成Python long 对象（在Python 3中，它消除了Python 2中 int 和 long 之间的区别，实际上显示为int）。

这个装箱必须为Python int对象分配新的内存，并将本地的C long的位写入其中。在原例的上下文中，这个对象的生命周期非常短暂（只是足够让sum()将内容添加到总数中），然后需要更多的时间来释放新的int对象。

这就是速度差异的来源，来自于CPython的实现。

7 . 乘以2比移位快吗？

was looking at the source of sorted_containers and was surprised to see this line:

self._load, self._twice, self._half = load, load * 2, load >> 1

这里的 load 是一个整数。为什么在一个位置使用移位，在另一个位乘？合理的解释似乎是，比特移位可能比整数除以2快，但是为什么不用移位替换乘法呢？我对以下情况做了基准测试：

（乘法，除法）
（移位，移位）
（乘法，移位）
（移位，除法）
并发现3 始终比其他方式更快：

import pandas as pd
import timeit
import random

x = random.randint(10**3, 10**6)

def test_naive():
    a, b, c = x, 2 * x, x // 2

def test_shift():
    a, b, c = x, x << 1, x >> 1

def test_mixed():
    a, b, c = x, x * 2, x >> 1

def test_mixed_swaped():
    a, b, c = x, x << 1, x // 2

def observe(k):
    print(k)
    return {
        'naive': timeit.timeit(test_naive),
        'shift': timeit.timeit(test_shift),
        'mixed': timeit.timeit(test_mixed),
        'mixed_swapped': timeit.timeit(test_mixed_swaped),
    }

def get_observation():
    return pd.DataFrame([observe(k) for k in range(100)])

if __name__ == '__main__':
    get_observation()

Question：
我的测试有效吗？如果是，为什么（乘法，移位）比（移位，移位）快？我是在Ubuntu 14.04上运行Python 3.5。
以上是问题的原始声明。 Dan Getz在他的回答中提供了一个很好的解释。
Answer：
为了完整性，以下是不应用乘法优化时，用更大x的示例说明。
这似乎是因为小数字的乘法在CPython 3.5中得到优化，而小数字的左移则没有。正左移总是创建一个更大的整数对象来存储结果，作为计算的一部分，而对于测试中使用的排序的乘法，特殊的优化避免了这一点，并创建了正确大小的整数对象。这可以在 Python的整数实现的源代码中看到。

因为Python中的整数是任意精度的，所以它们被存储为整数“数字(digits)”的数组，每个整数数字的位数有限制。所以在一般情况下，涉及整数的操作不是单个操作，而是需要处理多个“数字”。在 pyport.h 中，该位限制在64位平台上定义为 30位，其他的为15位。（这里我将使用30，以使解释简单。但是请注意，如果你使用的Python编译为32位，你的基准的结果将取决于如果 x 是否小于32,768。

static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
    PyLongObject *z;

    CHECK_BINOP(a, b);

     / *单位乘法的快速路径* /
    if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
        stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
        return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
        / *如果没有long long，我们几乎肯定
           使用15位数字，所以 v 将适合 long。在
           不太可能发生的情况中，没有long long，
           我们在平台上使用30位数字，一个大 v 
           会导致我们使用下面的一般乘法代码。 * /
        if (v >= LONG_MIN && v <= LONG_MAX)
            return PyLong_FromLong((long)v);
#endif
    }

因此，当乘以两个整数（每个整数适用于30位数字）时，这会由CPython解释器进行的直接乘法，而不是将整数作为数组。（对一个正整数对象调用的 MEDIUM_VALUE() 会得到其前30位数字。）如果结果符合一个30位数字， PyLong_FromLongLong() 将在相对较少的操作中注意到这一点，并创建一个单数字整数对象来存储它。

相反，左移位不是这样优化的，每次左移位会把整数当做一个数组来处理。特别地，如果你阅读 long_lshift() 的源码，在一个小且正的左移位的情况下，如果只需把它的长度截断成1，总会创建一个2位数的整数对象：

static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
    /*** ... ***/

    wordshift = shiftby / PyLong_SHIFT;   /*** 对于小w，是0 ***/
    remshift  = shiftby - wordshift * PyLong_SHIFT;   /*** 对于小w，是w ***/

    oldsize = Py_ABS(Py_SIZE(a));   /*** 对于小v > 0，是1 ***/
    newsize = oldsize + wordshift;
    if (remshift)
        ++newsize;   /*** 对于 w > 0, v > 0，newsize至少会变成2 ***/
    z = _PyLong_New(newsize);

    /*** ... ***/
}

整数除法
你没有问整数整除相比于右位移哪种性能更差，因为这符合你（和我）的期望。但是将小的正数除以另一个小的正数并不像小乘法那样优化。每个 // 使用函数 long_divrem() 计算商和余数。这个余数是通过小除数的乘法得到的，并存储在新分配的整数对象中。在这种情况下，它会立即被丢弃。

clay小哈_

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Best questions about python at stackoverflow in 2016

1. 使用pythonic的方式避免“if x : return x”的语句。Question: I have a method that calls 4 other methods in sequence to check for specific conditions, and returns immediately (not checking the following ones)
复制链接

扫一扫