Effective Python学习笔记 - ch02 Using Comprehensions and Generators

8. Use list comprehensions instead of MAP and FILTER

It's clearer to use list comprehensions to understand than using MAP and FILTER.

For example:

a = [1,2,3,4,5,6,7,8,9,10]

#using list comprehensions -- easy to understand
squares = [x**2 for x in a]
print(squares)
#Output: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

#using map function -- not easy to understand
squares = map(lambda x:x**2,a)
print(list(squares))
#Output: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]


#using list comprehensions -- easy to understand
squares = [x**2 for x in a if x%2 == 0]
print(squares)
#Output: [4, 16, 36, 64, 100]

#using list comprehensions -- not easy to understand
squares = map(lambda x:x**2, filter(lambda x:x%2==0,a))
print(list(squares))
#Output: [4, 16, 36, 64, 100]


#swich the key/value for a given dict
name_dict = {"Tom":1,"Peter":2,"George":3}
rank_dict = {rank : name for name,rank in name_dict.items()}
print(rank_dict)
#Output: {1: 'Tom', 2: 'Peter', 3: 'George'}

Additions: 

map(function,iterable): It applies a function to all the items in an input_list

filter(function,iterable): It creates a list of elements for which a function returns true.

reduce(function,iterable): It applies a rolling computation to sequential pairs of values in a list.

More detailes can be found in http://book.pythontips.com/en/latest/map_filter.html

9. Avoid more than two expressions in list comprehensions

Firstly, let's give some reasonable examples of not too complicated list comprehensions. If the list comprehensions is more complicated than examples, just seperate it.

b = [x for x in a if x%2==0 if x%3 == 0]
b = [x**2 for x in a if x%2==0]
b = [x**2 for row in matrix for x in row]

Then, let's see the principle of list comprehensions.

The nested expressions in list comprehensions would nest in the same way for loops and if statements nest now. 

Let's see some examples. 

  • Two basic examples:
#transform a matrix to a list
matrix = [[1,2,3],[4,5,6],[7,8,9]]
flat = [x for row in matrix for x in row]
print(flat)
#Output: [1, 2, 3, 4, 5, 6, 7, 8, 9]

matrix = [[1,2,3],[4,5,6],[7,8,9]]
squared = [[x**2 for x in row] for row in matrix]
print(squared)
#Output: [[1, 4, 9], [16, 25, 36], [49, 64, 81]]
  • A More complicated example: 
matrix = [
    [[1,2,3],[4,5,6]],
    [[7,8,9],[10,11,12]],
]
flat = [x for sublist1 in matrix
            for sublist2 in sublist1
                for x in sublist2]
print(flat)
#output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

flat = []
for sublist1 in matrix:
    for sublist2 in sublist1:
        flat.extend(sublist2)
print(flat)
#output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

More Details about list comprehensions could be found in  https://www.tutorialspoint.com/python/list_extend.html

10. Consider generator expressions for large comprehensions

10.1 Intro Generator in Python 

(ALL Content in Section 10.1 is referenced from: https://www.programiz.com/python-programming/generator)

  • What's a generator?

A generator is a function that returns an object (iterator) which we can iterate over (one value at a time).  

  • Create a generator in python

Generators are functions that use yield expressions and when they're called, generator functions don't actually run, but they immediately return an iterator. For example: 

# A simple generator function
def my_gen():
    n = 1
    print('This is printed first')
    # Generator function contains yield statements
    yield n

    n += 1
    print('This is printed second')
    yield n

    n += 1
    print('This is printed at last')
    yield n


>>> # It returns an object but does not start execution immediately.
>>> a = my_gen()

>>> # We can iterate through the items using next().
>>> next(a)
This is printed first
1
>>> # Once the function yields, the function is paused and the control is transferred to the caller.

>>> # Local variables and theirs states are remembered between successive calls.
>>> next(a)
This is printed second
2

>>> next(a)
This is printed at last
3

>>> # Finally, when the function terminates, StopIteration is raised automatically on further calls.
>>> next(a)
Traceback (most recent call last):
...
StopIteration
>>> next(a)
Traceback (most recent call last):
...
StopIteration
On
  • Differences between generator function and a normal function

-  Generator function contains one or more yield statement

-  When called, it returns an object (iterator) but does not start execution immediately.

- Methods like __iter__() and __next__() are implemented automatically. So we can iterate through the items using next().

- Once the function yields, the function is paused and the control is transferred to the caller.

- Local variables and their states are remembered between successive calls.

-  Finally, when the function terminates, StopIteration is raised automatically on further calls.

  • Python generator expression
# Initialize the list
my_list = [1, 3, 6, 10]

# square each term using list comprehension
# Output: [1, 9, 36, 100]
[x**2 for x in my_list]

# same thing can be done using generator expression
# Output: <generator object <genexpr> at 0x0000000002EBDAF8>
(x**2 for x in my_list)

The major difference between a list comprehension and a generator expression is that while list comprehension produces the entire list, generator expression produces one item at a time.

The generator expression is kind of lazy, producing items only when aksed for,so that it can save much more memory than an equivalent list comprehension

  • Advantages of generators in Python

- Easy to implement :

 Generators can be implemented in a clear and concise way as compared to their iterator class counterpart. 

- Memory efficient:

A normal function to return a sequence will create the entire sequence in memory before returning the result. This is an overkill if the number of items in the sequence is very large.

Generator implementation of such sequence is memory friendly and is preferred since it only produces one item at a time.

- Represent Infinite Stream

Generators can be used to pipeline a series of operations. This is best illustrated using an example.

10.2 Examples given in the book

Firstly, let's see an example:

####10
import random
with open("/tmp/my_file.txt",'w') as f:
    for _ in range(10):
        f.write("a" * random.randint(0,100)) #Noticed: str*number ==> 10 times of current str, e.g "a"*3 ==> "aaa"
        f.write('\n')

value = [len(x) for x in open("/tmp/my_file.txt")]
print(value)
#output: [93, 46, 33, 17, 21, 42, 34, 95, 51, 98]

In this example, there are two problems:

1. The value is dense and noisy, so every time I call it, I will find a result.

2. If the file contains too many lines, too many items will be saved in value that may cause your program to run out of memory and crash.

Then, To solve these problems, we use a generator instead of list comprehensions. The definition of Generator is similar with list comprehensions, just replace the [] to (). For example: value = (len(x) for x in open("/tmp/my_file.txt")). 

Let's see the enhanced version.

value = (len(x) for x in open("/tmp/my_file.txt"))  #return a generator instead a fixed list
print(value)
#output: <generator object <genexpr> at 0x10dbe84c0>
print(next(value))
#output: 93
print(next(value))
#output: 46

value = (len(x) for x in open("/tmp/my_file.txt"))  
#return a generator instead a fixed list
roots = ((x,x**0.5) for x in value)
print(roots)
#<generator object <genexpr> at 0x10dd3faf0>
print(next(roots))
#(93, 9.643650760992955)
print(next(roots))
#(46, 6.782329983125268)

The generator used in this example can solve these two problems.

The generator is a lazy function, when we call next(roots), it will go into the roots = ((x,x**0.5) for x in value)  ==> then run value = (len(x) for x in open("/tmp/my_file.txt")) to get the value ===> then back to  roots = ((x,x**0.5) for x in value)  to get the roots. Thus, the roots could always be correct without no noise. So that, the first problem could be solved.

On the other hand,  generators can save memory by only loading the current position object into the memory, so that the second problem could be solved.

11. Consider generators instead of returning lists

In this section, the author suggests us to return generators instead of returning lists becasue of the two problems given in Section 10.2. 

Example1: returning list 

def index_words(text):
    result=[]
    if text:
        result.append(0)
    for index,letter in enumerate(text):
        if letter == ' ':
            result.append(index+1)
    return result

address = "Four score and seven years ago our fathers brought forth on this continent a new nation," \
          " conceived in liberty, and dedicated to the proposition that all men are created equal."

result = index_words(address)
print(result)
#Output:
#[0,5,11, 15, 21, 27, 31, 32, 36, 44, 52, 58, 61, 66, 67, 77, 79, 83, 91, 101, 104, 113, 114, 118, 128, 131, 135, 147, 152, 156, 160, 161, 165, 173]

In this function, there are two similar problems as given in 10.2. To solve them, we can use generators instead of returning lists, to save memory and reduce noise.

Example 2: using generator instead of returning list

def index_words_by_generator(text):
    if text:
        yield 0
    for index,letter in enumerate(text):
        if letter == ' ':
           yield index + 1

address = "Four score and seven years ago our fathers brought forth on this continent a new nation," \
          " conceived in liberty, and dedicated to the proposition that all men are created equal."

it = index_words_by_generator(address) #return a generator
print(it)
#Output: <generator object index_words_by_generator at 0x10dbf6938>
print(next(it))
#Output: 0
print(next(it))
#Output: 5

Example 3: A similar function for file handle

def index_words_by_generator_for_file(handle):
    offset = 0
    for line in handle:
        if line:
            yield offset
        for letter in line:
            offset += 1
            if letter == ' ':
                yield offset

address_lines = """Four score and seven years ago 
our fathers brought forth on this 
continent a new nation, conceived in liberty, 
and dedicated to the proposition that all men 
are created equal."""

with open("/tmp/address.txt",'w') as f:
    f.write(address_lines)

with open("/tmp/address.txt") as f:
    it = index_words_by_generator_for_file(f)
    print(it)
    print(next(it))
    print(next(it))
    print(list(it))

#output
#<generator object index_words_by_generator_for_file at 0x10dbf6888>
#0
#5
#[11, 15, 21, 27, 31, 32, 36, 44, 52, 58, 61, 66, 67, 77, 79, 83, 91, 101, 104, 113, 114, 118, 128, 131, 135, 147, 152, 156, 160, 161, 165, 173]

Noticed, in the last print statement "print(list(it))", "it" is an iterator, only use memory in one line to get list, instead of load the whole source file into the memory.

12. Be defensive when iterating over arguments

12.1 Iterables, iterator, generator

Reference: https://www.datacamp.com/community/tutorials/python-iterator-tutorial

  • The relations between Generator, Iterables, and Iterator: 

    • Generators always are Iterators

    • An Iterator is always an Iterable

    • Iter(An Iteratble) will return an iterator 

  •   Iterable

An Iterabel is any object, that can return an iterator. In Python, An Iterable class must implement __iter__(). This method returns a new iterator. When running iter(IterableA), the __iter__() will be called to return an iterator.

  • Iterator

An Iterator is also an object, that must  implement __iter__() and __next__(). Generally, for iterators, __iter__() will return itself. 

  • Generator

A generator is a function that returns an object (iterator) which we can iterate over (one value at a time).  More details can be found in Section 10.

12.2 Examples given in the book

Example 1: Using list as the returned value of the function. 

data = [15,80,35]

def normalize(numbers):
    total = sum(numbers)    #iterate the whole list -  the first time
    result = []
    for value in numbers:    #iterate the whole list - the second time
        percent = 100 * value/total
        result.append(percent)
    return result

output = normalize(data)
print(output)
#Output: [11.538461538461538, 61.53846153846154, 26.923076923076923]
print(sum(output))
#Output: 100.0

There are two potential problems about this function:

1. code is a bit dense and nosiy so every time I find a different result

2. it requires all results to be stored in the list before being returned, which may causing run out of memory.

To solve these problems, using Generator. 

Example 2: Using Generator

path = '/tmp/my_numbers.txt'
with open(path,'w') as f:
    for i in [15,80,35]:
        f.write("%d\n"%i)

def read_visits(data_paths):
    with open(data_paths) as f:
        for line in f:
            yield int(line)

it = read_visits(path)
print(it)
print(list(it))
#Output:
#<generator object read_visits at 0x10dbf6570>
#[15, 80, 35]

Example 3: Ananyzing example1 and example2 together

When we run the following v1 and v2 code block, we want them to get the same correct results. But in fact, only v1 can get the correct output, the output of v2 is an empty list. Why this happens?

#v1
data = [15,80,35]
print(normalize(data))
#Output: [11.538461538461538, 61.53846153846154, 26.923076923076923]

#v2
it = read_visits(path)
print(normalize(it))
#Output: []

The reason is that in the normalize() given in example 1, we iterate the input parameter twice which are sum() and for..loop. That's OK when input parameter is a list, becasue we will create a new iterator in each iterated process.

However, when the input is a generator, things will be different. After finishing the first iterated process ---  sum()  ---, the input generator has already met and raised a stop iteration exception. So, in the second iterated process -- for..loop--, it can't be moved to next anymore, and thus nothing will be append to the result list.

Let's validate it in example 4.

Example 4: valid a generator iterator can only run a single time.

#One way to demonstrate this problem
it = read_visits(path)
print(list(it))
print(list(it))
print(list(it))
print(list(it))
print(list(it))
#Output:
#[15, 80, 35]
#[]
#[]
#[]
#[]

Example 5: Explicitly exhaust an input iterator To solve the problem of generator only iterating a single time 

You can explicitly exhaust an input iterator and keep a copy of its entire contents in a list. And then you can iterate over the copy you have as many times as you need # to without having to worry about these iteration problems.

def normalize_enhanced_1(numbers):
    #copying the iterators, can also cause running out of memory
    numbers = list(numbers)  
    total = sum(numbers)    #iterate the whole list -  the first time
    result = []
    for value in numbers:    #iterate the whole list - the second time
        percent = 100 * value/total
        result.append(percent)
    return result

def read_visits(data_paths):
    with open(data_paths) as f:
        for line in f:
            yield int(line)


it = read_visits(path)
print(normalize_enhanced_1(it))
#Output: [11.538461538461538, 61.53846153846154, 26.923076923076923]

Although this solution can solve one problem of generator, but it defeats our purpose of saving memory. So, to also solve it, we can send a function getattr as the input parameter of normalize(). everytime the input function runs, it will return a new genrator. Let's see the example6

Example 6: Using a function as the input parameter of normalize

def normalize_enhanced_2(get_iter):
    total = sum(get_iter())    #New iterator
    result = []
    for value in get_iter():    #New iterator
        percent = 100 * value/total
        result.append(percent)
    return result

get_iter = lambda : read_visits(path)
print(normalize_enhanced_2(get_iter))
#Output: [11.538461538461538, 61.53846153846154, 26.923076923076923]

In this example, both of the two problems are solved. However, its problem is that the code is ugly to read, e.g: you need to match get_iter with a function, and try to understand get_iter() means run the function of read_visits.

Is there any better way to solve these problems? Yes, that is the iterable & iterator. Let's see the example 6.

Example 7: Defining an Iterable Class to solve the problems.

class ReadVisits:
    def __init__(self,data_path):
        self.data_path = data_path

    def __iter__(self):
        with open(self.data_path) as f:
            for line in f:
                yield int(line)

def normalize(numbers):
    total = sum(numbers)    #call ReadVisits.__iter__ first time
    result = []
    for value in numbers:    #call ReadVisits.__iter__ firsttime
        percent = 100 * value/total
        result.append(percent)
    return result

visits = ReadVisits(path)
print(normalize(visits))
#[11.538461538461538, 61.53846153846154, 26.923076923076923]

In example 7, ReadVisits is an Iterable class, when we call iter(ReadVisits(path)), and it will return an iterator.  ReadVisits doesn't have __next__, so it doesn't an iterator. In normalize function, when calling sum() & for, they will call the ReadVisits.__iter__ potentially to generate a new iterator.

It seems like that all the problems could be solved. But, there is still one problem. That is "An Iterator‘s iterator is itself." When we passing an iterator object ot normalize() function, the same iterators will be used in both sum() & for loops. That will cause the same problems as the first example. So, we need to enhance the normalize() to handle this situation. Let's see example 8.

Example 8: Enhance Normalize() to handle the iterator input.

#enhance normalize
def normalize_enhanced_3(numbers):
    if iter(numbers) is iter(numbers):
        raise  TypeError("Must supply a container")
    total = sum(numbers)    #call ReadVisits.__iter__ first time
    result = []
    for value in numbers:    #call ReadVisits.__iter__ firsttime
        percent = 100 * value/total
        result.append(percent)
    return result

visits = ReadVisits(path)
it = iter(visits)
print(normalize_enhanced_3(it))
#Traceback (most recent call last):

Now, the normalize_enhanced_3() can handle the iterator input. Everything is perfect until now. 

Finally, I want to give some examples to demonstrate the "An Iterator‘s iterator is itself."

it = iter(visits)
it2 = iter(it)
print(it,it2)
#<generator object ReadVisits.__iter__ at 0x10dbf66d0> <generator object ReadVisits.__iter__ at 0x10dbf66d0>
#it and it2 are the same object

it = iter(visits)
it2 = iter(visits)
print(it,it2)
#<generator object ReadVisits.__iter__ at 0x10dbf6990> <generator object ReadVisits.__iter__ at 0x10dbf6a98>
#it and it2 are different object

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值