pratical python programming (二)working with data

working with data


python 的核心数据结构:tuples,lists,sets,dictionaries

2.1 数据类型与数据结构

tuple unpack 拆包可迭代数据

拆包是指将一个结构中的数据拆分到多个单独变量中

  • 数量需要对应
  • 可使用*以及*_省略一项或多项
a = ('windows', 10, 25.1, (2017, 12, 29)) # pack
os_type, number, price, dat = a # unpack
# 忽略前两项,只要后面的变量
*_, price, dat = a
# 更细致只要月份日期
*_, price, (_, m, d) = a

元组与列表

  • 元组被看作不变列表,但通常装着不同类型的数据,
  • 列表通常包含相同类型的数据

字典

字典是键值对映射,也叫哈希表,关联数组

  • d.keys()
  • d.values()
  • d.items()
  • d = dict(d.items())
>>> d {'name': 'AA', 'shares': 100, 'price': 32.2 } ```

获得所有键:

```python
>>> list(d) ['name', 'shares', 'price', 'date', 'account']

>>> keys = d.keys()
>>> keys dict_keys(['name', 'shares', 'price', 'date', 'account']) ```获得键值对

```python items = d.items()
>>> items dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))]) ```

```python
>>> for k in d:
        print(k, '=', d[k])

name = AA shares = 75 price = 32.2 date = (6, 11, 2007) account =
12345
>>>

>>> for k, v in d.items():
        print(k, '=', v)

name = AA shares = 75 price = 32.2 date = (6, 11, 2007) ```

2.2 容器

  • 列表:有序数据
  • 字典:无序数据
  • 集合:无序、唯一数据

列表

records = []  
# Use .append() to add more items
records.append(('GOOG', 100, 490.10))

读取表格,将每行信息以元组存入列表

records = []  # Initial empty list
with open('Data/portfolio.csv', 'rt') as f:
    next(f) # Skip header
    for line in f:
        row = line.split(',')
        records.append((row[0], int(row[1]), float(row[2])))

字典

① 字典存入为键值对格式:dict[’ key ']=values

prices = {} # Initial empty dict

with open('Data/prices.csv', 'rt') as f:
    for line in f:
        row = line.split(',')
        prices[row[0]] = float(row[1])

② 查找字典里是否有一个键的值:
有返回key的values,没有返回default位置

name = d.get(key, default)

③ 键可为数,字符串,元组

holidays = {
  (1, 1) : 'New Years',
  (3, 14) : 'Pi day',
  (9, 13) : "Programmer's day",
}

>>> holidays[3, 14]
'Pi day'
>>>

集合

① 两种表示方式:

tech_stocks = { 'IBM','AAPL','MSFT' }
# Alternative syntax
tech_stocks = set(['IBM', 'AAPL', 'MSFT'])

② 列表去重方面:

names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO']
unique = set(names)

③ 集合增删元素:

names.add('CAT')        # Add an item
names.remove('YHOO')    # Remove an item

s1 | s2                 # Set union
s1 & s2                 # Set intersection
s1 - s2                 # Set difference

2.3 Formatting

① f-strings:
优点:可以左右对齐

>>> name = 'IBM'
>>> shares = 100
>>> price = 91.1
>>> f'{name:>10s} {shares:>10d} {price:>10.2f}'
'       IBM        100      91.10'
'%10s %10d %10.2f' % (IBM,100,91.10)

:>10d 右对齐
:<10d 左对齐
:^10d 居中
:0.2f Float with 2 digit precision

② 字典的格式化输出

>>> s = {
    'name': 'IBM',
    'shares': 100,
    'price': 91.1
}
>>> '{name:>10s} {shares:10d} {price:10.2f}'.format_map(s)
'       IBM        100      91.10'
>>>

③ %形式

'%10s %10d %10.2f %10.2f' % r

2.4 Sequences 序列

  • string ’ ’
  • list [ ]
  • tuple ( )

① 有序,索引切片
② min(s), sum(s)
③ for循环迭代,break(打破循环),continue(跳过本次循环)

④enumerate()

  1. 用在同时需要序号和数据的时候,
    with open(filename) as f:
        for lineno, line in enumerate(f, start=1):
    
  2. 涉及到序号时别用len(),会很慢
    for n in range(len(data)):
            print(data[n])
    

⑤ zip()
将多重序列组合在一起

  1. 组合

    columns = ['name', 'shares', 'price']
    values = ['GOOG', 100, 490.1 ]
    pairs = zip(columns, values)
    # ('name','GOOG'), ('shares',100), ('price',490.1)
    
  2. 拿出来使用的时候:

    for column, value in pairs:
        ...
    
  3. 利用zip组合字典

    d = dict(zip(columns, values))
    

2.5 collections module

计数器Counter,一个键对应多个值

from collections import Counter
counter = Counter()

获取前几个最大值

>>> # Get three most held stocks
>>> counter.most_common(3)
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]
>>> portfolio2 = read_portfolio('Data/portfolio2.csv')
>>> holdings2 = Counter()
>>> for s in portfolio2:
          holdings2[s['name']] += s['shares']

>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>>

counter可以相加

combined = counter1 + counter2

2.6 List comprehensions 我的天,列表突然变简单

[<expression> for x in <sequence> if<condition>]

相当于:

result = []
for variable_name in sequence:
    if condition:
        result.append(expression)

集合也可以:

>>> names = { s['name'] for s in portfolio }

字典
dictionary comprehension.

holdings = { name: 0 for name in names }
>>> holdings 
{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0}

2.7 Objects

types = [str, int, float]
import csv
f = open('Data/portfolio.csv')
rows = csv.reader(f)
headers = next(rows)

for row in rows:
	converted = [func(val) for func, val in zip(types, row)]
	# print(converted[1] * converted[2])
	print(converted)

['AA', 100, 32.2]
['IBM', 50, 91.1]
['CAT', 150, 83.44]
['MSFT', 200, 51.23]
['GE', 95, 40.37]
['MSFT', 50, 65.1]
['IBM', 100, 70.44]

dict_row = { name: func(val) for name, func, val in zip(headers, types, row) }
	print(dict_row)
{'name': 'AA', 'shares': 100, 'price': 32.2}

列的方向组合

f = open('Data/dowstocks.csv')
rows = csv.reader(f)
headers = next(rows)
row = next(rows)

types = [str, float, str, str, float, float, float, float, int]
# 内容进行类型转换
converted = [func(val) for func, val in zip(types, row)]
# 头:内容
record = dict(zip(headers, converted))
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值