利用python进行数据分析（3st）第三章笔记_[x.rstrip() for x in open(path, encoding="utf-8")]-CSDN博客

本文链接：https://blog.csdn.net/Candylx/article/details/132174078

第三章笔记拾遗

元组的创建可以加括号，也可以不加括号，或使用tuple, 可以使用加号合并，但无法修改内部元素，若内部元素为列表（可变的对象），则可以修改。

In [6]: tuple([4, 0, 2])
Out[6]: (4, 0, 2)

In [7]: tup = tuple('string')

In [8]: tup
Out[8]: ('s', 't', 'r', 'i', 'n', 'g')

In [10]: nested_tup = (4, 5, 6), (7, 8)

In [11]: nested_tup
Out[11]: ((4, 5, 6), (7, 8))

In [12]: nested_tup[0]
Out[12]: (4, 5, 6)

In [13]: nested_tup[1]
Out[13]: (7, 8)

In [14]: tup = tuple(['foo', [1, 2], True])

In [15]: tup[2] = False
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-b89d0c4ae599> in <module>
----> 1 tup[2] = False
TypeError: 'tuple' object does not support item assignment

In [16]: tup[1].append(3)

In [17]: tup
Out[17]: ('foo', [1, 2, 3], True)

In [18]: (4, None, 'foo') + (6, 0) + ('bar',)
Out[18]: (4, None, 'foo', 6, 0, 'bar')

In [19]: ('foo', 'bar') * 4
Out[19]: ('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

元组拆包

In [23]: tup = 4, 5, (6, 7)

In [24]: a, b, (c, d) = tup

In [25]: d
Out[25]: 7

交换
In [26]: a, b = 1, 2

In [27]: a
Out[27]: 1

In [28]: b
Out[28]: 2

In [29]: b, a = a, b

In [30]: a
Out[30]: 2

In [31]: b
Out[31]: 1

遍历

In [32]: seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [33]: for a, b, c in seq:
   ....:     print(f'a={a}, b={b}, c={c}')
a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9

在某些情况下，可能希望从元组的开头“提取”一些元素。有一种特殊的语法可以做到这一点*rest，它也用在函数签名中来捕获任意长的位置参数列表：
```
In [34]: values = 1, 2, 3, 4, 5In [35]: a, b, *rest = values
In [36]: a
Out[36]: 1In [37]: b
Out[37]: 2In [38]: rest
Out[38]: [3, 4, 5]
```
有时您想丢弃后面的值；按照惯例，许多 Python 程序员会使用下划线 ( _) 来表示不需要的变量：
```
In [39]: a, b, *_ = values
```
元组方法

由于元组的大小和内容无法修改，因此实例方法非常简单。一个特别有用的（也可在列表中使用）是count，它计算某个值出现的次数：
```
In [40]: a = (1, 2, 2, 2, 3, 4, 2)
In [41]: a.count(2)
Out[41]: 4
```

列表方法，append，insert ， pop，remove ， in， not in，extend（多个元素）等。注意append比insert成本低

In [51]: b_list.append("dwarf")

In [52]: b_list
Out[52]: ['foo', 'peekaboo', 'baz', 'dwarf']

In [53]: b_list.insert(1, "red")

In [54]: b_list
Out[54]: ['foo', 'red', 'peekaboo', 'baz', 'dwarf']

In [57]: b_list.append("foo")

In [58]: b_list
Out[58]: ['foo', 'red', 'baz', 'dwarf', 'foo']

In [59]: b_list.remove("foo")

In [60]: b_list
Out[60]: ['red', 'baz', 'dwarf', 'foo']

列表可以用+号连接，也可以排序，即sort，也可进行切片

In [67]: a = [7, 2, 5, 1, 3]

In [68]: a.sort()

In [69]: a
Out[69]: [1, 2, 3, 5, 7]

In [70]: b = ["saw", "small", "He", "foxes", "six"]

In [71]: b.sort(key=len)

In [72]: b
Out[72]: ['He', 'saw', 'six', 'small', 'foxes']

字典存储键值对的集合，其中键和值是Python对象。每个键都与一个值相关联，以便在给定特定键的情况下可以方便地检索、插入、修改或删除值。创建字典的一种方法是使用花括号{}和冒号来分隔键和值：

In [83]: empty_dict = {}

In [84]: d1 = {"a": "some value", "b": [1, 2, 3, 4]}

In [85]: d1
Out[85]: {'a': 'some value', 'b': [1, 2, 3, 4]}

可以使用与访问列表或元组元素相同的语法来访问、插入或设置元素：
In [86]: d1[7] = "an integer"

In [87]: d1
Out[87]: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [88]: d1["b"]
Out[88]: [1, 2, 3, 4]

可以使用与检查列表或元组是否包含值相同的语法来检查字典是否包含键：

In [89]: "b" in d1
Out[89]: True

可以使用del关键字或pop方法（同时返回值并删除键）来删除值：

In [90]: d1[5] = "some value"In [91]: d1
Out[91]:
{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value'}
In [92]: d1["dummy"] = "another value"In [93]: d1
Out[93]:
{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}
In [94]: del d1[5]
In [95]: d1
Out[95]:
{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}
In [96]: ret = d1.pop("dummy")
In [97]: ret
Out[97]: 'another value'In [98]: d1
Out[98]: {'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

keys方法values分别提供字典的键和值的迭代器。键的顺序取决于它们的插入顺序，这些函数以相同的顺序输出键和值：

In [99]: list(d1.keys())
Out[99]: ['a', 'b', 7]
In [100]: list(d1.values())
Out[100]: ['some value', [1, 2, 3, 4], 'an integer']

如果需要迭代键和值，可以使用该items方法将键和值作为 2 元组进行迭代：

In [101]: list(d1.items())
Out[101]: [('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')]

可以使用以下方法将一个字典合并到另一个字典中update：

In [102]: d1.update({"b": "foo", "c": 12})
In [103]: d1
Out[103]: {'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

该update方法会就地更改字典，因此传递到的数据中的任何现有键都update将丢弃其旧值

偶尔会出现想要在字典中按元素配对的两个序列，这是很常见的。作为第一步，编写如下代码：

mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value

由于字典本质上是 2 元组的集合，因此该dict函数接受 2 元组列表：

In [104]: tuples = zip(range(5), reversed(range(5)))
In [105]: tuples
Out[105]: <zip at 0x17d604d00>In [106]: mapping = dict(tuples)
In [107]: mapping
Out[107]: {0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

字典默认值

if key in some_dict:
    value = some_dict[key]
else:
    value = default_value

因此，字典方法get和pop可以返回默认值，因此上面的if-else块可以简单地写为：

value = some_dict.get(key, default_value)

get None默认情况下，如果密钥不存在，则会返回，而pop会引发异常。通过设置值，字典中的值可能是另一种集合，例如列表。例如，可以想象将单词列表按其首字母分类为列表字典：

In [108]: words = ["apple", "bat", "bar", "atom", "book"]
In [109]: by_letter = {}
In [110]: for word in words:
   .....:     letter = word[0]
   .....:     if letter not in by_letter:
   .....:         by_letter[letter] = [word]
   .....:     else:
   .....:         by_letter[letter].append(word)
   .....:
In [111]: by_letter
Out[111]: {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

字典setdefault方法可以用来简化这个工作流程。前面的for循环可以重写为：

In [112]: by_letter = {}
In [113]: for word in words:
   .....:     letter = word[0]
   .....:     by_letter.setdefault(letter, []).append(word)
   .....:
In [114]: by_letter
Out[114]: {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

判断字典有效值。虽然字典的值可以是任何 Python 对象，但键通常必须是不可变对象，例如标量类型（int、float、string）或元组（元组中的所有对象也必须是不可变的）。这里的技术术语是散列性。您可以使用以下函数检查对象是否可散列（可以用作字典中的键）hash：

In [118]: hash("string")
Out[118]: 4022908869268713487In [119]: hash((1, 2, (2, 3)))
Out[119]: -9209053662355515447In [120]: hash((1, 2, [2, 3])) # fails because lists are mutable---------------------------------------------------------------------------TypeError                                 Traceback (most recent call last)
<ipython-input-120-473c35a62c0b> in <module>----> 1 hash((1, 2, [2, 3])) # fails because lists are mutableTypeError: unhashable type: 'list'

列表作为键
In [121]: d = {}

In [122]: d[tuple([1, 2, 3])] = 5

In [123]: d
Out[123]: {(1, 2, 3): 5}

set，{}, 元素不重复，集合支持数学*集合运算，*例如并集、交集、差值和对称差值。有add、clear、remove、pop、union、update、intersection、intersection_update、difference等方法。考虑这两个示例集：

In [124]: set([2, 2, 2, 1, 3, 3])
Out[124]: {1, 2, 3}

In [125]: {2, 2, 2, 1, 3, 3}
Out[125]: {1, 2, 3}

In [126]: a = {1, 2, 3, 4, 5}

In [127]: b = {3, 4, 5, 6, 7, 8}

In [128]: a.union(b)
Out[128]: {1, 2, 3, 4, 5, 6, 7, 8}

In [129]: a | b
Out[129]: {1, 2, 3, 4, 5, 6, 7, 8}

In [130]: a.intersection(b)
Out[130]: {3, 4, 5}

In [131]: a & b
Out[131]: {3, 4, 5}

枚举，

index = 0for value in collection:
   # do something with value   index += 1

由于这种情况很常见，Python 有一个内置函数，enumerate它返回一个(i, value)元组序列：

for index, value in enumerate(collection):
   # do something with value

sorted函数从任何序列的元素中返回一个新的排序列表：该sorted函数接受与sort列表上的方法相同的参数。

In [145]: sorted([7, 1, 2, 6, 0, 3, 2])
Out[145]: [0, 1, 2, 2, 3, 6, 7]
In [146]: sorted("horse race")
Out[146]: [' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

zip将多个列表、元组或其他序列的元素“配对”以创建元组列表：

In [147]: seq1 = ["foo", "bar", "baz"]
In [148]: seq2 = ["one", "two", "three"]
In [149]: zipped = zip(seq1, seq2)
In [150]: list(zipped)
Out[150]: [('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

zip可以取任意数量的序列，它产生的元素数量由最短序列决定：

In [151]: seq3 = [False, True]
In [152]: list(zip(seq1, seq2, seq3))
Out[152]: [('foo', 'one', False), ('bar', 'two', True)]

常见用途是同时迭代多个序列，也可能与结合使用enumerate：

In [153]: for index, (a, b) in enumerate(zip(seq1, seq2)):
   .....:     print(f"{index}: {a}, {b}")
   .....:
0: foo, one
1: bar, two
2: baz, three

reversed以相反的顺序迭代序列的元素：
```
In [154]: list(reversed(range(10)))
Out[154]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
```
请记住，这reversed是一个生成器，因此它不会创建相反的序列，直到具体化（例如，使用list或for循环）。

推导式

列表
[expr for value in collection if condition]
In [155]: strings = ["a", "as", "bat", "car", "dove", "python"]

In [156]: [x.upper() for x in strings if len(x) > 2]
Out[156]: ['BAT', 'CAR', 'DOVE', 'PYTHON']

字典
dict_comp = {key-expr: value-expr for value in collection
             if condition}
In [160]: loc_mapping = {value: index for index, value in enumerate(strings)}

In [161]: loc_mapping
Out[161]: {'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

集合
set_comp = {expr for value in collection if condition}

In [157]: unique_lengths = {len(x) for x in strings}

In [158]: unique_lengths
Out[158]: {1, 2, 3, 4, 6}
In [159]: set(map(len, strings))
Out[159]: {1, 2, 3, 4, 6}

嵌套

In [162]: all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
   .....:             ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

In [163]: names_of_interest = []

In [164]: for names in all_data:
   .....:     enough_as = [name for name in names if name.count("a") >= 2]
   .....:     names_of_interest.extend(enough_as)
   .....:

In [165]: names_of_interest
Out[165]: ['Maria', 'Natalia']

等价于下面
In [166]: result = [name for names in all_data for name in names
   .....:           if name.count("a") >= 2]

In [167]: result
Out[167]: ['Maria', 'Natalia']

In [168]: some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [169]: flattened = [x for tup in some_tuples for x in tup]

In [170]: flattened
Out[170]: [1, 2, 3, 4, 5, 6, 7, 8, 9]

函数有关键字参数和位置参数，虽然关键字参数是可选的，但在调用函数时必须指定所有位置参数，鼓励使用关键字

def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

返回多个值

def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()
return_value = f()

def f():
    a = 5
    b = 6
    c = 7
    return {"a" : a, "b" : b, "c" : c}

Lambda

In [201]: def apply_to_list(some_list, f):
   .....:     return [f(x) for x in some_list]

In [202]: ints = [4, 0, 1, 5, 6]

In [203]: apply_to_list(ints, lambda x: x * 2)
Out[203]: [8, 0, 2, 10, 12]

In [205]: strings.sort(key=lambda x: len(set(x)))

In [206]: strings
Out[206]: ['aaaa', 'foo', 'abab', 'bar', 'card']

生成器是一种构造新的可迭代对象的便捷方法，类似于编写普通函数。普通函数一次执行并返回一个结果，而生成器可以通过每次使用生成器时暂停和恢复执行来返回多个值的序列。要创建生成器，请使用yield关键字而不是return在函数中：

def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

In [213]: gen = squares()

In [214]: gen
Out[214]: <generator object squares at 0x17d5fea40>

In [215]: for x in gen:
   .....:     print(x, end=" ")
Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100

表达式，类似于列表推导式，[] —> ()

In [216]: gen = (x ** 2 for x in range(100))

In [217]: gen
Out[217]: <generator object <genexpr> at 0x17d5feff0>

def _make_gen():
    for x in range(100):
        yield x ** 2
gen = _make_gen()

在某些情况下，可以使用生成器表达式代替列表推导式作为函数参数：

In [218]: sum(x ** 2 for x in range(100))
Out[218]: 328350

In [219]: dict((i, i ** 2) for i in range(5))
Out[219]: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

异常

def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

在某些情况下，可能不想抑制异常，但希望无论try中的代码是否成功都执行某些代码。为此，请使用finally：

f = open(path, mode="w")

try:
    write_to_file(f)
finally: 必须执行
    f.close()

IPython 中的异常

%run如果在执行脚本或执行任何语句时引发异常，IPython 默认情况下将打印完整的调用堆栈跟踪（回溯），其中包含堆栈中每个点位置周围的几行上下文：

打开文件

In [233]: path = "examples/segismundo.txt"

In [234]: f = open(path, encoding="utf-8")

for line in f:
    print(line)

In [235]: lines = [x.rstrip() for x in open(path, encoding="utf-8")]

In [236]: lines
Out[236]: 
['Sueña el rico en su riqueza,',
 'que más cuidados le ofrece;',
 '',
 'sueña el pobre que padece',
 'su miseria y su pobreza;',
 '',
 'sueña el que a medrar empieza,',
 'sueña el que afana y pretende,',
 'sueña el que agravia y ofende,',
 '',
 'y en el mundo, en conclusión,',
 'todos sueñan lo que son,',
 'aunque ninguno lo entiende.',
 '']
当您使用open创建文件对象时，建议在完成后关闭文件。关闭文件会将其资源释放回操作系统：
In [237]: f.close()

更轻松地清理打开的文件的方法之一是使用以下with语句：

In [238]: with open(path, encoding="utf-8") as f:
   .....:     lines = [x.rstrip() for x in f]