Python Performance 2 of n - Python Performance Tips, Part 1

Python Performance Tips, Part 1


To read the  Zen of Python , type  import this  in your Python interpreter. A sharp reader new to Python will notice the word “interpreter”, and realize that Python is another scripting language. “It must be slow!”

No question about it: Python program does not run as fast or efficiently as compiled languages. Even Python advocates will tell you performance is the area that Python is not good for. However, YouTube has proven Python is capable of serving 40 million videos per hour. All you have to do is writing efficient code and seek external (C/C++) implementation for speed if needed. Here are the tips to help you become a better Python developer:

  1. Go for built-in functions:
    You can write efficient code in Python, but it’s very hard to beat built-in functions (written in C). Check them out here. They are very fast.
  2. Use join() to glue a large number of strings:
    You can use “+” to combine several strings. Since string is immutable in Python, every “+” operation involves creating a new string and copying the old content.
    A frequent idiom is to use Python’s array module to modify individual characters; when you are done, use the join() function to re-create your final string.
    >>> #This is good to glue a large number of strings
    >>> for chunk in input():
    >>>    my_string.join(chunk)
  3. Use Python multiple assignment to swap variables:
    This is elegant and faster in Python:
    >>> x, y = y, x
    This is slower:
    >>> temp = x
    >>> x = y
    >>> y = temp
  4. Use local variable if possible:
    Python is faster retrieving a local variable than retrieving a global variable. That is, avoid the “global” keyword.
  5. Use “in” if possible:
    To check membership in general, use the “in” keyword. It is clean and fast.
    >>> for key in sequence:
    >>>     print “found”
  6. Speed up by lazy importing:
    Move the “import” statement into function so that you only use import when necessary. In other words, if some modules are not needed right away, import them later. For example, you can speed up your program by not importing a long list of modules at startup. This technique does not enhance the overall performance. It helps you distribute the loading time for modules more evenly.
  7. Use “while 1″ for the infinite loop: (Edit, this may not be true in py3.0, I see they generate the same byte code)
    Sometimes you want an infinite loop in your program. (for instance, a listening socket) Even though “while True” accomplishes the same thing, “while 1″ is a single jump operation. Apply this trick to your high-performance Python code.
    >>> while 1:
    >>>    #do stuff, faster with while 1
    >>> while True:
    >>>    # do stuff, slower with wile True
  8. Use list comprehension:
    Since Python 2.0, you can use list comprehension to replace many “for” and “while” blocks. List comprehension is faster because it is optimized for Python interpreter to spot a predictable pattern during looping. As a bonus, list comprehension can be more readable (functional programming), and in most cases it saves you one extra variable for counting. For example, let’s get the even numbers between 1 to 10 with one line:
    >>> # the good way to iterate a range
    >>> evens = [ i for i in range(10) if i%2 == 0]
    >>> [0, 2, 4, 6, 8]
    >>> # the following is not so Pythonic
    >>> i = 0
    >>> evens = []
    >>> while i < 10:
    >>>    if i %2 == 0: evens.append(i)
    >>>    i += 1
    >>> [0, 2, 4, 6, 8]
  9. Use xrange() for a very long sequence:
    This could save you tons of system memory because xrange() will only yield one integer element in a sequence at a time. As opposed to range(), it gives you an entire list, which is unnecessary overhead for looping.
  10. Use Python generator to get value on demand:
    This could also save memory and improve performance. If you are streaming video, you can send a chunk of bytes but not the entire stream. For example,
    >>> chunk = ( 1000 * i for i in xrange(1000))
    >>> chunk
    <generator object <genexpr> at 0x7f65d90dcaa0>
    >>> chunk.next()
    0
    >>> chunk.next()
    1000
    >>> chunk.next()
    2000
  11. Learn itertools module:
    The module is very efficient for iteration and combination. Let’s generate all permutation for a list [1, 2, 3] in three lines of Python code:
    >>> import itertools
    >>> iter = itertools.permutations([1,2,3])
    >>> list(iter)
    [(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]
  12. Learn bisect module for keeping a list in sorted order:
    It is a free binary search implementation and a fast insertion tool for a sorted sequence. That is, you can use:
    >>> import bisect
    >>> bisect.insort(list, element)
    You’ve inserted an element to your list, and you don’t have to call sort() again to keep the container sorted, which can be very expensive on a long sequence.
  13. Understand that a Python list, is actually an array:
    List in Python is not implemented as the usual single-linked list that people talk about in Computer Science. List in Python is, an array. That is, you can retrieve an element in a list using index with constant time O(1), without searching from the beginning of the list. What’s the implication of this? A Python developer should think for a moment when using insert() on a list object. For example:>>> list.insert(0, element)
    That is not efficient when inserting an element at the front, because all the subsequent index in the list will have to be changed. You can, however, append an element to the end of the list efficiently using list.append(). Pick deque, however, if you want fast insertion or removal at both ends. It is fast because deque in Python is implemented as double-linked list. Say no more. :)
  14. Use dict and set to test membership:
    Python is very fast at checking if an element exists in a dictionary or in a set. It is because dict and set are implemented using hash table. The lookup can be as fast as O(1). Therefore, if you need to check membership very often, use dict or set as your container..
    >>> mylist = ['a', 'b', 'c'] #Slower, check membership with list:
    >>> ‘c’ in mylist
    >>> True
    >>> myset = set(['a', 'b', 'c']) # Faster, check membership with set:
    >>> ‘c’ in myset:
    >>> True 
  15. Use sort() with Schwartzian Transform:
    The native list.sort() function is extraordinarily fast. Python will sort the list in a natural order for you. Sometimes you need to sort things unnaturally. For example, you want to sort IP addresses based on your server location. Python supports custom comparison so you can do list.sort(cmp()), which is much slower than list.sort() because you introduce the function call overhead. If speed is a concern, you can apply the Guttman-Rosler Transform, which is based on the Schwartzian Transform. While it’s interesting to read the actual algorithm, the quick summary of how it works is that you can transform the list, and call Python’s built-in list.sort() -> which is faster, without using list.sort(cmp()) -> which is slower.
  16. Cache results with Python decorator:
    The symbol “@” is Python decorator syntax. Use it not only for tracing, locking or logging. You can decorate a Python function so that it remembers the results needed later. This technique is called memoization. Here is an example:
    >>> from functools import wraps
    >>> def memo(f):
    >>>    cache = { }
    >>>    @wraps(f)
    >>>    def  wrap(*arg):
    >>>        if arg not in cache: cache['arg'] = f(*arg)
    >>>        return cache['arg']
    >>>    return wrap
    And we can use this decorator on a Fibonacci function:
    >>> @memo
    >>> def fib(i):
    >>>    if i < 2: return 1
    >>>    return fib(i-1) + fib(i-2)
    The key idea here is simple: enhance (decorate) your function to remember each Fibonacci term you’ve calculated; if they are in the cache, no need to calculate it again.
  17. Understand Python GIL(global interpreter lock):
    GIL is necessary because CPython’s memory management is not thread-safe. You can’t simply create multiple threads and hope Python will run faster on a multi-core machine. It is because GIL will prevents multiple native threads from executing Python bytecodes at once. In other words, GIL will serialize all your threads. You can, however, speed up your program by using threads to manage several forked processes, which are running independently outside your Python code.
  18. Treat Python source code as your documentation:
    Python has modules implemented in C for speed. When performance is critical and the official documentation is not enough, feel free to explore the source code yourself. You can find out the underlying data structure and algorithm. The Python repository is a wonderful place to stick around:http://svn.python.org/view/python/trunk/Modules

Conclusion:

There is no substitute for brains. It is developers’ responsibility to peek under the hood so they do not quickly throw together a bad design. The Python tips in this article can help you gain good performance. If speed is still not good enough, Python will need extra help: profiling and running external code. We will cover them both in the part 2 of this article.

To be continued…


From
http://blog.monitis.com/index.php/2012/02/13/python-performance-tips-part-1/
http://blog.monitis.com/index.php/2012/03/21/python-performance-tips-part-2/

Chinese version
http://www.oschina.net/question/1579_45822


深度学习是机器学习的一个子领域,它基于人工神经网络的研究,特别是利用多层次的神经网络来进行学习和模式识别。深度学习模型能够学习数据的高层次特征,这些特征对于图像和语音识别、自然语言处理、医学图像分析等应用至关重要。以下是深度学习的一些关键概念和组成部分: 1. **神经网络(Neural Networks)**:深度学习的基础是人工神经网络,它是由多个层组成的网络结构,包括输入层、隐藏层和输出层。每个层由多个神经元组成,神经元之间通过权重连接。 2. **前馈神经网络(Feedforward Neural Networks)**:这是最常见的神经网络类型,信息从输入层流向隐藏层,最终到达输出层。 3. **卷积神经网络(Convolutional Neural Networks, CNNs)**:这种网络特别适合处理具有网格结构的数据,如图像。它们使用卷积层来提取图像的特征。 4. **循环神经网络(Recurrent Neural Networks, RNNs)**:这种网络能够处理序列数据,如时间序列或自然语言,因为它们具有记忆功能,能够捕捉数据中的时间依赖性。 5. **长短期记忆网络(Long Short-Term Memory, LSTM)**:LSTM 是一种特殊的 RNN,它能够学习长期依赖关系,非常适合复杂的序列预测任务。 6. **生成对抗网络(Generative Adversarial Networks, GANs)**:由两个网络组成,一个生成器和一个判别器,它们相互竞争,生成器生成数据,判别器评估数据的真实性。 7. **深度学习框架**:如 TensorFlow、Keras、PyTorch 等,这些框架提供了构建、训练和部署深度学习模型的工具和库。 8. **激活函数(Activation Functions)**:如 ReLU、Sigmoid、Tanh 等,它们在神经网络中用于添加非线性,使得网络能够学习复杂的函数。 9. **损失函数(Loss Functions)**:用于评估模型的预测与真实值之间的差异,常见的损失函数包括均方误差(MSE)、交叉熵(Cross-Entropy)等。 10. **优化算法(Optimization Algorithms)**:如梯度下降(Gradient Descent)、随机梯度下降(SGD)、Adam 等,用于更新网络权重,以最小化损失函数。 11. **正则化(Regularization)**:技术如 Dropout、L1/L2 正则化等,用于防止模型过拟合。 12. **迁移学习(Transfer Learning)**:利用在一个任务上训练好的模型来提高另一个相关任务的性能。 深度学习在许多领域都取得了显著的成就,但它也面临着一些挑战,如对大量数据的依赖、模型的解释性差、计算资源消耗大等。研究人员正在不断探索新的方法来解决这些问题。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值