Python performance optimization

最新推荐文章于 2022-05-16 10:28:12 发布

fish_cool

最新推荐文章于 2022-05-16 10:28:12 发布

阅读量1.1k

点赞数

分类专栏： python 文章标签： performance python optimization distance dictionary list

python 专栏收录该内容

14 篇文章 1 订阅

订阅专栏

Python performance optimization

Performance optimization – making a program run faster – is closely related to refactoring. Refactoring makes existing source code more readable and less complex on the inside without changing its behavior on the outside. Refactoring is a costly operation. Typically, an 80/20 rule applies. A 20% effort is needed to implement the bulk of the program. Then another 80% effort (and budget) is needed to optimize, refactor, extend and/or document the program.

Do not worry about performance during development. You first need the program to produce correct results (with correct input) before you know what is making it slow.

Not all parts of the source code will be optimizable. In Python, many operations such as dictionary lookups and regular expressions are as efficient as they can be. Some operations in your code simply need to happen to make the program work. In this case, micro-optimizations (e.g., making a function 1% faster) are a waste of time.

Profiling

Typically, the top 20% slow parts of the program may benefit from optimization. To find the slow parts you need to profile the source code. The profile() function below uses Python's cProfile module to return a string of performance statistics. To use it you can simply copy it int0 a test script:

 
            def  
            profile 
            (function, *args, **kwargs): 
           
            """ Returns performance statistics (as a string) for the given function. 
           
            """ 
           
            def  
            _run 
            (): 
           
            function(*args, **kwargs) 
           
            import 
             cProfile as profile 
           
            import 
             pstats 
           
            import 
             os 
           
            import 
             sys; sys.modules[ 
            '__main__' 
            ].__profile_run__ = _run 
           
            id 
             = function.__name__ +  
            '()' 
           
            profile.run( 
            '__profile_run__()' 
            ,  
            id 
            ) 
           
            p = pstats.Stats( 
            id 
            ) 
           
            p.stream =  
            open 
            ( 
            id 
            ,  
            'w' 
            ) 
           
            p.sort_stats( 
            'time' 
            ).print_stats( 
            20 
            ) 
           
            p.stream.close() 
           
            s =  
            open 
            ( 
            id 
            ).read() 
           
            os.remove( 
            id 
            ) 
           
            return 
             s

The following example profiles Pattern's parse() command. Parameters of parse() are instead passed toprofile(). Make sure that the program is busy for a few seconds to get a reliable profile. You could execute it several times in a loop or pass lots of data to it, for example.

 
      
 
            >>>  
            from 
             pattern.en  
            import 
             parse 
           
 
            >>> txt =  
            open 
            ( 
            'paper.txt' 
            ).read() 
           
 
            >>>  
            print 
             profile(parse, txt, lemmata= 
            True 
            ) 
           

               
           
 
            369318 
             function calls ( 
            369315 
             primitive calls)  
            in 
             0.685 
             CPU seconds 
           
 
                
           
 
            ncalls  tottime  percall  cumtime  percall filename:lineno(function) 
           
 
                 
            67    
             0.164    
             0.002    
             0.170    
             0.003 
             /pattern/en/brill.py: 
            104 
            ( 
            apply 
            ) 
           
 
                  
            1    
             0.098    
             0.098    
             0.265    
             0.265 
             /pattern/en/brill.py: 
            171 
            (load) 
           
 
              
            93694    
             0.088    
             0.000    
             0.149    
             0.000 
             /pattern/en/brill.py: 
            175 
            (<genexpr>) 
           
 
            102762    
             0.081    
             0.000    
             0.081    
             0.000 
             {method  
            'split' 
             of  
            'str' 
             objects} 
           
 
            ... 
           

Assessment of the output:

brill.py has a heavy apply() that needs closer inspection.
brill.py has a heavy load(), it probably loads a lexicon file. Not much we can do about that.
There appears to be a lambda function (<genexpr>) in brill.py that is called x 100,000.
The split() method of Python's str object is called often, but we probably can't optimize that.

Once we have isolated the slow parts, we can try to make them faster. To do this, we time them one by one. Below is an example setup that executes the apply() in brill.py multiple times and outputs the elapsed time. We can tinker with the source code and verify if the elapsed time increases or decreases.

 
      
 
            >>>  
            from 
             time  
            import 
             time 
           
 
            >>>  
            from 
             pattern.en.parser.brill  
            import 
             Lexicon 
           
 
            >>> lexicon = Lexicon() 
           
 
            >>> t = time() 
           
 
            >>>  
            for 
             i  
            in 
             range 
            ( 
            1000 
            ): 
           
 
            >>>     lexicon. 
            apply 
            (...) 
           
 
            >>>  
            print 
             time() - t 
           

Dictionaries

Membership testing is faster in dict than in list. Python dictionaries use hash tables, this means that a lookup operation (e.g., if x in y) is O(1). A lookup operation in a list means that the entire list needs to be iterated, resulting in O(n) for a list of length n.

Assume we have a list of stop words we want to filter from a given text:

 
       
            >>> stopwords = [ 
           
 
            >>>      
            'a' 
            ,  
            'an' 
            ,  
            'and' 
            ,  
            'are' 
            ,  
            'as' 
            ,  
            'at' 
            ,  
            'be' 
            ,  
            'by' 
            ,  
           
 
            >>>      
            'for' 
            ,  
            'from' 
            ,  
            'has' 
            ,  
            'he' 
            ,  
            'in' 
            ,  
            'is' 
            ,  
            'it' 
            ,  
            'its' 
            , 
           
 
            >>>      
            'of' 
            ,  
            'on' 
            ,  
            'that' 
            ,  
            'the' 
            ,  
            'to' 
            ,  
            'was' 
            ,  
            'were' 
            ,  
            'will' 
            ,  
            'with' 
           
 
            >>> ] 
           
 
            >>>  
           
 
            >>>  
            for 
             i  
            in 
             range 
            ( 
            1000000 
            ):  
            # Do it many times to test performance. 
           
 
            >>>     filtered = [] 
           
 
            >>>      
            for 
             w  
            in 
             [ 
            'Mr.' 
            ,  
            'Hat' 
            ,  
            'is' 
            ,  
            'feeding' 
            ,  
            'the' 
            ,  
            'black' 
            ,  
            'cat' 
            ,  
            '.' 
            ]: 
           
 
            >>>          
            if 
             w  
            not 
             in 
             stopwords: 
           
 
            >>>             filtered.append(w) 
           
 
     

The algorithm takes 11.6 seconds to run. Adding stop words makes it even slower. However, the list is easily converted to a dictionary. With dict performance is constant, regardless of dictionary size:

 
            >>> stopwords =  
            dict 
            .fromkeys(stopwords,  
            True 
            )

The dict.fromkeys() method takes a list of keys + a default value for all keys, and returns a new dictionary. Using this approach the algorithm takes 4.7 seconds to run, a x2.5 speedup.

Dictionaries + caching

Here is a comparison of four implementations of the cosine distance algorithm, which measures similarity between vectors of features (words) with feature weights (word relevance). The first implementation represents vectors as ordered, equal-length lists of feature weights. The second represents vectors as sparse dictionaries of feature → weight items, where features with weight = 0 are omitted. The third subclasses a dictionary and uses caching for l2(). The fourth adds a distance cache.

The first implementation takes 1.5 seconds to run (and more for longer vectors):

 
       
            def  
            distance 
            (v1, v2): 
           
 
                 
            return 
             sum 
            (w1 * w2  
            for 
             w1, w2  
            in 
             zip 
            (v1, v2)) / (l2(v1) * l2(v2)  
            or 
             1 
            ) 
           
 
            def  
            l2 
            (v): 
           
 
                 
            return 
             sum 
            (w** 
            2 
             for 
             w  
            in 
             v) **  
            0.5 
           
 
     

 
       
            >>>  
            for 
             i  
            in 
             range 
            ( 
            100000 
            ): 
           
 
            >>>     d = distance([ 
            2 
            , 
            1 
            , 
            1 
            , 
            0 
            , 
            0 
            , 
            0 
            , 
            0 
            , 
            0 
            , 
            0 
            , 
            0 
            ], [ 
            1 
            , 
            0 
            , 
            0 
            , 
            0 
            , 
            0 
            , 
            0 
            , 
            0 
            , 
            0 
            , 
            3 
            , 
            1 
            ]) 
           
 
     

To make it faster we can get leave out the zeros (which means less iterations), using dict.get() with a default value for missing features. The second implementation then takes 1.1 seconds, a x1.4 speedup:

 
       
            def  
            distance 
            (v1, v2):  
           
 
                 
            return 
             sum 
            (v1[f] * v2.get(f, 
            0 
            )  
            for 
             f  
            in 
             v1) / (l2(v1) * l2(v2)  
            or 
             1 
            ) 
           
 
            def  
            l2 
            (v): 
           
 
                 
            return 
             sum 
            (w** 
            2 
             for 
             w  
            in 
             v.values()) **  
            0.5 
           
 
     

 
       
            >>>  
            for 
             i  
            in 
             range 
            ( 
            100000 
            ): 
           
 
            >>>     d = distance({ 
            0 
            : 
            2 
            ,  
            1 
            : 
            1 
            ,  
            2 
            : 
            1 
            }, { 
            0 
            : 
            1 
            ,  
            8 
            : 
            3 
            ,  
            9 
            : 
            1 
            }) 
           
 
     

To make it faster we can cache the costly L2-norm math, since it always yields the same result for a given vector. The third implementation then takes 0.6 seconds, a x2.5 speedup:

 
       
            class  
            Vector 
            ( 
            dict 
            ): 
           
 
                 
            def  
            __init__ 
            ( 
            self 
            , *args, **kwargs): 
           
 
                     
            dict 
            .__init__( 
            self 
            , *args, **kwargs) 
           
 
                     
            self 
            .l2 =  
            sum 
            (w** 
            2 
             for 
             w  
            in 
             self 
            .values()) **  
            0.5 
           
 
     

 
       
            def  
            distance 
            (v1, v2): 
           
 
                 
            return 
             sum 
            (v1[f] * v2.get(f, 
            0 
            )  
            for 
             f  
            in 
             v1) / (v1.l2 * v2.l2  
            or 
             1 
            ) 
           
 
     

 
       
            >>>  
            for 
             i  
            in 
             range 
            ( 
            100000 
            ): 
           
 
            >>>     d = distance(Vector({ 
            0 
            : 
            2 
            ,  
            1 
            : 
            1 
            ,  
            2 
            : 
            1 
            }), Vector({ 
            0 
            : 
            1 
            ,  
            8 
            : 
            3 
            ,  
            9 
            : 
            1 
            })) 
           
 
     

Finally, we cache the distances. Python's id() function returns a unique id for each object. When we calculate the distance between v1 and v2, we can store the result in a global CACHE dictionary, underCACHE[(id(v1),id(v2))] – since keys must be hashable we can't use (v1,v2) directly.

Next time, we can check if the result is already in the cache – a dictionary lookup is faster than the math.

 
            CACHE = {} 
           
            def  
            distance 
            (v1, v2): 
           
            k =  
            id 
            (v1),  
            id 
            (v2) 
           
            if 
             k  
            not 
             in 
             CACHE: 
           
            d =  
            sum 
            (v1[f] * v2.get(f, 
            0 
            )  
            for 
             f  
            in 
             v1) / (v1.l2 * v2.l2  
            or 
             1 
            ) 
           
            CACHE[k] = d 
           
            CACHE[ 
            id 
            (v2),  
            id 
            (v1)] = d  
            # symmetric 
           
            return 
             CACHE[k]

This ensures that calculating distance is never more than O(n*n) for n vectors.

Sets

Set operations (union, intersection, difference) are faster than iterating over lists:

Syntax	Operation	Description
set(list1) \| set(list2)	union	New set with values from both list1 and list2.
set(list1) & set(list2)	intersection	New set with values common to list1 and list2.
set(list1) - set(list2)	difference	New set with values in list1 but not in list2.

You can use them to merge two lists, or to make a list of unique values for example.

Inner for-loops

If your code has nested for-loops, all optimizations inside the inner loop count. Consider the following:

 
      
 
            >>> v1 = [ 
            0.0 
            ,  
            0.1 
            ,  
            0.2 
            ,  
            0.3 
            ,  
            0.4 
            ,  
            0.5 
            ,  
            0.6 
            ,  
            0.7 
            ,  
            0.8 
            ,  
            0.9 
            ,  
            1.0 
            ] *  
            10 
           
 
            >>> v2 = [ 
            0.0 
            ,  
            0.1 
            ,  
            0.2 
            ,  
            0.3 
            ,  
            0.4 
            ,  
            0.5 
            ,  
            0.6 
            ,  
            0.7 
            ,  
            0.8 
            ,  
            0.9 
            ,  
            1.0 
            ] *  
            10 
           
 
            >>>  
           
 
            >>>  
            for 
             n  
            in 
             range 
            ( 
            1000 
            ): 
           
 
            >>>      
            for 
             i  
            in 
             range 
            ( 
            len 
            (v1)): 
           
 
            >>>          
            for 
             j  
            in 
             range 
            ( 
            len 
            (v2)): 
           
 
            >>>             x = v1[i] + v2[j] 
           

The algorithm takes 4.0 seconds to run. It is a hypothetical example, but the point is this: we can make it faster by moving v1[i] outside of the inner loop:

 
      
 
            >>>  
            for 
             n  
            in 
             range 
            ( 
            1000 
            ): 
           
 
            >>>      
            for 
             i  
            in 
             range 
            ( 
            len 
            (v1)): 
           
 
            >>>         v1i = v1[i] 
           
 
            >>>          
            for 
             j  
            in 
             range 
            ( 
            len 
            (v2)): 
           
 
            >>>             x = v1i + v2[j] 
           

Now it takes 3.2 seconds to run. In the first example, v1[i] is called 100 x 100 x 100 = 1,000,000 times. In this example, we look up number i in v1 once before iterating over v2, so v1[i] is called only 100 x 100 times, making the algorithm x1.3 faster. Move everything you can outside of the inner loop.

Lazy if-evaluation

As in most programming languages, Python's if is lazily evaluated. This means that in: if x and y, condition y will not be tested if x is already False. We can exploit this by checking a fast condition first before checking a slow condition.

For example, say we are looking for abbreviations in text:

 
       
            >>> abbreviations = [ 
            'cf.' 
            ,  
            'e.g.' 
            ,  
            'ex.' 
            ,  
            'etc.' 
            ,  
            'fig.' 
            ,  
            'i.e.' 
            ,  
            'Mr.' 
            ,  
            'vs.' 
            ] 
           
 
            >>> 
           
 
            >>>  
            for 
             n  
            in 
             range 
            ( 
            1000000 
            ): 
           
 
            >>>      
            for 
             w  
            in 
             ( 
            'Mr.' 
            ,  
            'Hat' 
            ,  
            'is' 
            ,  
            'chasing' 
            ,  
            'the' 
            ,  
            'black' 
            ,  
            'cat' 
            ,  
            '.' 
            ): 
           
 
            >>>          
            if 
             w  
            in 
             abbreviations: 
           
 
            >>>              
            pass 
             # Process abbreviation here. 
           
 
     

The algorithm takes 4.3 seconds to run. However, since most of the words are not abbreviations we can optimize it by first checking if a word ends with a period, which is faster than iterating the list of known abbreviations:

 
       
            >>>  
            for 
             n  
            in 
             range 
            ( 
            1000000 
            ): 
           
 
            >>>      
            for 
             w  
            in 
             ( 
            'Mr.' 
            ,  
            'Hat' 
            ,  
            'is' 
            ,  
            'chasing' 
            ,  
            'the' 
            ,  
            'black' 
            ,  
            'cat' 
            ,  
            '.' 
            ): 
           
 
            >>>          
            if 
             w[- 
            1 
            ] ==  
            '.' 
             and 
             w  
            in 
             abbreviations: 
           
 
            >>>              
            pass 
           
 
     

Now it takes 3.1 seconds to run, a x1.4 speedup.

String methods & regular expressions

Regular expressions in Python are fast because they are pushed back to C code. However, in many situations simple string methods are even faster. Below is a list of interesting string methods. If you do use regular expressions, remember to add source code comments what they do.

Method	Description
str[-1] == 'x'	True if the last character is "x" (but Exception if len(str) == 0).
str.isalpha()	True if the string only contains a-z \| A-Z characters.
str.isdigit()	True if the string only contains 0-9 characters.
str.startswith(('x', 'yz'))	True if the first characters in the string are "x" or "yz".
str.endswith(('x', 'yz'))	True if the last characters in the string are "x" or "yz".

String concatenation

Format strings are often faster than concatenating values to strings:

 
       
            >>> s =  
            '<feature term="' 
             +  
            'cat' 
             +  
            '" weight="' 
             +  
            str 
            ( 
            1.0 
            ) +  
            '" />' 
           
 
     

 
       
            >>> s =  
            '<feature term="%s" weight="%s" />' 
             % ( 
            'cat' 
            ,  
            1.0 
            )  
            # Faster format string. 
           
 
     

If you are constructing a large string (for example, XML output), it is faster to append the different parts to a list and collapse it at the end:

 
       
            >>> features = [( 
            'cat' 
            ,  
            1.0 
            ), ( 
            'dog' 
            ,  
            0.0 
            )] 
           
 
            >>> 
           
 
            >>> xml = [] 
           
 
            >>>  
            for 
             term, weight  
            in 
             features: 
           
 
            >>>     xml.append( 
            '<feature term="%s" weight="%s" />' 
             % (term, weight)) 
           
 
            >>> xml =  
            '\n' 
            .join(xml) 
           
 
     

List comprehension

List comprehensions are faster than building a new list in a for-loop. The first example below uses a loop and takes 6.6 seconds. The second example uses list comprehension and takes 5.3 seconds, a x1.2 speedup.

 
       
            >>> words = [ 
            'Mr.' 
            ,  
            'Hat' 
            ,  
            'is' 
            ,  
            'feeding' 
            ,  
            'the' 
            ,  
            'black' 
            ,  
            'cat' 
            ,  
            '.' 
            ] 
           
 
            >>> 
           
 
            >>>  
            for 
             n  
            in 
             range 
            ( 
            1000000 
            ): 
           
 
            >>>     a = [] 
           
 
            >>>      
            for 
             w  
            in 
             words: 
           
 
            >>>          
            if 
             w !=  
            '.' 
            :  
           
 
            >>>             a.append(w.lower()) 
           
 
     

 
       
            >>>  
            for 
             n  
            in 
             range 
            ( 
            1000000 
            ): 
           
 
            >>>     a = [w.lower()  
            for 
             w  
            in 
             words  
            if 
             w !=  
            '.' 
            ] 
           
 
     

If + None

if done is not None is faster than if done != None, which in turn is faster than if not done.
It's nitpicking but it matters inside inner loops.

fish_cool

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python performance optimization

Python performance optimizationPerformance optimization – making a program run faster – is closely related to refactoring. Refactoring makes existing source code more readable and less com
复制链接

扫一扫