working with data
python 的核心数据结构:tuples,lists,sets,dictionaries
2.1 数据类型与数据结构
tuple unpack 拆包可迭代数据
拆包是指将一个结构中的数据拆分到多个单独变量中
- 数量需要对应
- 可使用
*
以及*_
省略一项或多项
a = ('windows', 10, 25.1, (2017, 12, 29)) # pack
os_type, number, price, dat = a # unpack
# 忽略前两项,只要后面的变量
*_, price, dat = a
# 更细致只要月份日期
*_, price, (_, m, d) = a
元组与列表
:
- 元组被看作不变列表,但通常装着不同类型的数据,
- 列表通常包含相同类型的数据
字典
字典是键值对映射,也叫哈希表,关联数组
- d.keys()
- d.values()
- d.items()
- d = dict(d.items())
>>> d {'name': 'AA', 'shares': 100, 'price': 32.2 } ``` 获得所有键: ```python >>> list(d) ['name', 'shares', 'price', 'date', 'account'] >>> keys = d.keys() >>> keys dict_keys(['name', 'shares', 'price', 'date', 'account']) ```获得键值对 ```python items = d.items() >>> items dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))]) ``` ```python >>> for k in d: print(k, '=', d[k]) name = AA shares = 75 price = 32.2 date = (6, 11, 2007) account = 12345 >>> >>> for k, v in d.items(): print(k, '=', v) name = AA shares = 75 price = 32.2 date = (6, 11, 2007) ```
2.2 容器
- 列表:有序数据
- 字典:无序数据
- 集合:无序、唯一数据
列表
records = []
# Use .append() to add more items
records.append(('GOOG', 100, 490.10))
读取表格,将每行信息以元组存入列表
records = [] # Initial empty list
with open('Data/portfolio.csv', 'rt') as f:
next(f) # Skip header
for line in f:
row = line.split(',')
records.append((row[0], int(row[1]), float(row[2])))
字典
① 字典存入为键值对格式:dict[’ key ']=values
prices = {} # Initial empty dict
with open('Data/prices.csv', 'rt') as f:
for line in f:
row = line.split(',')
prices[row[0]] = float(row[1])
② 查找字典里是否有一个键的值:
有返回key的values,没有返回default位置
name = d.get(key, default)
③ 键可为数,字符串,元组
holidays = {
(1, 1) : 'New Years',
(3, 14) : 'Pi day',
(9, 13) : "Programmer's day",
}
>>> holidays[3, 14]
'Pi day'
>>>
集合
① 两种表示方式:
tech_stocks = { 'IBM','AAPL','MSFT' }
# Alternative syntax
tech_stocks = set(['IBM', 'AAPL', 'MSFT'])
② 列表去重方面:
names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO']
unique = set(names)
③ 集合增删元素:
names.add('CAT') # Add an item
names.remove('YHOO') # Remove an item
s1 | s2 # Set union
s1 & s2 # Set intersection
s1 - s2 # Set difference
2.3 Formatting
① f-strings:
优点:可以左右对齐
>>> name = 'IBM'
>>> shares = 100
>>> price = 91.1
>>> f'{name:>10s} {shares:>10d} {price:>10.2f}'
' IBM 100 91.10'
'%10s %10d %10.2f' % (IBM,100,91.10)
:>10d 右对齐
:<10d 左对齐
:^10d 居中
:0.2f Float with 2 digit precision
② 字典的格式化输出
>>> s = {
'name': 'IBM',
'shares': 100,
'price': 91.1
}
>>> '{name:>10s} {shares:10d} {price:10.2f}'.format_map(s)
' IBM 100 91.10'
>>>
③ %形式
'%10s %10d %10.2f %10.2f' % r
2.4 Sequences 序列
- string ’ ’
- list [ ]
- tuple ( )
① 有序,索引切片
② min(s), sum(s)
③ for循环迭代,break(打破循环),continue(跳过本次循环)
④enumerate()
- 用在同时需要序号和数据的时候,
with open(filename) as f: for lineno, line in enumerate(f, start=1):
- 涉及到序号时别用len(),会很慢
for n in range(len(data)): print(data[n])
⑤ zip()
将多重序列组合在一起
-
组合
columns = ['name', 'shares', 'price'] values = ['GOOG', 100, 490.1 ] pairs = zip(columns, values) # ('name','GOOG'), ('shares',100), ('price',490.1)
-
拿出来使用的时候:
for column, value in pairs: ...
-
利用zip组合字典
d = dict(zip(columns, values))
2.5 collections module
计数器Counter,一个键对应多个值
from collections import Counter
counter = Counter()
获取前几个最大值
>>> # Get three most held stocks
>>> counter.most_common(3)
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
>>> portfolio2 = read_portfolio('Data/portfolio2.csv')
>>> holdings2 = Counter()
>>> for s in portfolio2:
holdings2[s['name']] += s['shares']
>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>>
counter可以相加
combined = counter1 + counter2
2.6 List comprehensions 我的天,列表突然变简单
[<expression> for x in <sequence> if<condition>]
相当于:
result = []
for variable_name in sequence:
if condition:
result.append(expression)
集合也可以:
>>> names = { s['name'] for s in portfolio }
字典
dictionary comprehension.
holdings = { name: 0 for name in names }
>>> holdings
{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0}
2.7 Objects
types = [str, int, float]
import csv
f = open('Data/portfolio.csv')
rows = csv.reader(f)
headers = next(rows)
for row in rows:
converted = [func(val) for func, val in zip(types, row)]
# print(converted[1] * converted[2])
print(converted)
['AA', 100, 32.2]
['IBM', 50, 91.1]
['CAT', 150, 83.44]
['MSFT', 200, 51.23]
['GE', 95, 40.37]
['MSFT', 50, 65.1]
['IBM', 100, 70.44]
dict_row = { name: func(val) for name, func, val in zip(headers, types, row) }
print(dict_row)
{'name': 'AA', 'shares': 100, 'price': 32.2}
列的方向组合
f = open('Data/dowstocks.csv')
rows = csv.reader(f)
headers = next(rows)
row = next(rows)
types = [str, float, str, str, float, float, float, float, int]
# 内容进行类型转换
converted = [func(val) for func, val in zip(types, row)]
# 头:内容
record = dict(zip(headers, converted))