pratical python programming （二）working with data

最新推荐文章于 2024-07-10 09:15:55 发布

林钟即望

最新推荐文章于 2024-07-10 09:15:55 发布

阅读量155

点赞数

分类专栏： pratical python programming 文章标签： python

本文链接：https://blog.csdn.net/weixin_45177485/article/details/106781267

版权

pratical python programming 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

working with data

python 的核心数据结构：tuples，lists，sets，dictionaries

2.1 数据类型与数据结构

tuple unpack 拆包可迭代数据

拆包是指将一个结构中的数据拆分到多个单独变量中

数量需要对应
可使用*以及*_省略一项或多项

a = ('windows', 10, 25.1, (2017, 12, 29)) # pack
os_type, number, price, dat = a # unpack
# 忽略前两项，只要后面的变量
*_, price, dat = a
# 更细致只要月份日期
*_, price, (_, m, d) = a

元组与列表：

元组被看作不变列表，但通常装着不同类型的数据，
列表通常包含相同类型的数据

字典

字典是键值对映射，也叫哈希表，关联数组

d.keys()
d.values()
d.items()
d = dict(d.items())

>>> d {'name': 'AA', 'shares': 100, 'price': 32.2 } ```

获得所有键：

```python
>>> list(d) ['name', 'shares', 'price', 'date', 'account']

>>> keys = d.keys()
>>> keys dict_keys(['name', 'shares', 'price', 'date', 'account']) ```获得键值对

```python items = d.items()
>>> items dict_items([('name', 'AA'), ('shares', 75), ('price', 32.2), ('date', (6, 11, 2007))]) ```

```python
>>> for k in d:
        print(k, '=', d[k])

name = AA shares = 75 price = 32.2 date = (6, 11, 2007) account =
12345
>>>

>>> for k, v in d.items():
        print(k, '=', v)

name = AA shares = 75 price = 32.2 date = (6, 11, 2007) ```

2.2 容器

列表：有序数据
字典：无序数据
集合：无序、唯一数据

列表

records = []  
# Use .append() to add more items
records.append(('GOOG', 100, 490.10))

读取表格，将每行信息以元组存入列表

records = []  # Initial empty list
with open('Data/portfolio.csv', 'rt') as f:
    next(f) # Skip header
    for line in f:
        row = line.split(',')
        records.append((row[0], int(row[1]), float(row[2])))

字典

① 字典存入为键值对格式：dict[’ key ']=values

prices = {} # Initial empty dict

with open('Data/prices.csv', 'rt') as f:
    for line in f:
        row = line.split(',')
        prices[row[0]] = float(row[1])

② 查找字典里是否有一个键的值：
有返回key的values，没有返回default位置

name = d.get(key, default)

③ 键可为数，字符串，元组

holidays = {
  (1, 1) : 'New Years',
  (3, 14) : 'Pi day',
  (9, 13) : "Programmer's day",
}

>>> holidays[3, 14]
'Pi day'
>>>

集合

① 两种表示方式：

tech_stocks = { 'IBM','AAPL','MSFT' }
# Alternative syntax
tech_stocks = set(['IBM', 'AAPL', 'MSFT'])

② 列表去重方面：

names = ['IBM', 'AAPL', 'GOOG', 'IBM', 'GOOG', 'YHOO']
unique = set(names)

③ 集合增删元素：

names.add('CAT')        # Add an item
names.remove('YHOO')    # Remove an item

s1 | s2                 # Set union
s1 & s2                 # Set intersection
s1 - s2                 # Set difference

2.3 Formatting

① f-strings：
优点：可以左右对齐

>>> name = 'IBM'
>>> shares = 100
>>> price = 91.1
>>> f'{name:>10s} {shares:>10d} {price:>10.2f}'
'       IBM        100      91.10'
'%10s %10d %10.2f' % (IBM,100,91.10)

:>10d 右对齐
:<10d 左对齐
:^10d 居中
:0.2f Float with 2 digit precision

② 字典的格式化输出

>>> s = {
    'name': 'IBM',
    'shares': 100,
    'price': 91.1
}
>>> '{name:>10s} {shares:10d} {price:10.2f}'.format_map(s)
'       IBM        100      91.10'
>>>

③ %形式

'%10s %10d %10.2f %10.2f' % r

2.4 Sequences 序列

string ’ ’
list [ ]
tuple ( )

① 有序，索引切片
② min(s), sum(s)
③ for循环迭代，break（打破循环），continue（跳过本次循环）

④enumerate（）

用在同时需要序号和数据的时候，

with open(filename) as f:
    for lineno, line in enumerate(f, start=1):

涉及到序号时别用len（），会很慢

for n in range(len(data)):
        print(data[n])

⑤ zip（）
将多重序列组合在一起

组合

columns = ['name', 'shares', 'price']
values = ['GOOG', 100, 490.1 ]
pairs = zip(columns, values)
# ('name','GOOG'), ('shares',100), ('price',490.1)

拿出来使用的时候：
```
for column, value in pairs:
    ...
```
利用zip组合字典
```
d = dict(zip(columns, values))
```

2.5 collections module

计数器Counter，一个键对应多个值

from collections import Counter
counter = Counter()

获取前几个最大值

>>> # Get three most held stocks
>>> counter.most_common(3)
[('MSFT', 250), ('IBM', 150), ('CAT', 150)]

>>> portfolio2 = read_portfolio('Data/portfolio2.csv')
>>> holdings2 = Counter()
>>> for s in portfolio2:
          holdings2[s['name']] += s['shares']

>>> holdings2
Counter({'HPQ': 250, 'GE': 125, 'AA': 50, 'MSFT': 25})
>>>

counter可以相加

combined = counter1 + counter2

2.6 List comprehensions 我的天，列表突然变简单

[<expression> for x in <sequence> if<condition>]

相当于：

result = []
for variable_name in sequence:
    if condition:
        result.append(expression)

集合也可以：

>>> names = { s['name'] for s in portfolio }

字典
dictionary comprehension.

holdings = { name: 0 for name in names }
>>> holdings 
{'AA': 0, 'GE': 0, 'IBM': 0, 'MSFT': 0, 'CAT': 0}

2.7 Objects

types = [str, int, float]
import csv
f = open('Data/portfolio.csv')
rows = csv.reader(f)
headers = next(rows)

for row in rows:
	converted = [func(val) for func, val in zip(types, row)]
	# print(converted[1] * converted[2])
	print(converted)

['AA', 100, 32.2]
['IBM', 50, 91.1]
['CAT', 150, 83.44]
['MSFT', 200, 51.23]
['GE', 95, 40.37]
['MSFT', 50, 65.1]
['IBM', 100, 70.44]

dict_row = { name: func(val) for name, func, val in zip(headers, types, row) }
	print(dict_row)
{'name': 'AA', 'shares': 100, 'price': 32.2}

列的方向组合

f = open('Data/dowstocks.csv')
rows = csv.reader(f)
headers = next(rows)
row = next(rows)

types = [str, float, str, str, float, float, float, float, int]
# 内容进行类型转换
converted = [func(val) for func, val in zip(types, row)]
# 头：内容
record = dict(zip(headers, converted))

林钟即望

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
pratical python programming （二）working with data

working with datapython 的核心数据结构：tuples，lists，sets，dictionaries2.1 数据类型与数据结构tuple unpack 拆包可迭代数据拆包是指将一个结构中的数据拆分到多个单独变量中数量需要对应可使用*以及*_省略一项或多项a = ('windows', 10, 25.1, (2017, 12, 29)) # packos_type, number, price, dat = a # unpack# 忽略前两项，只要后面的变量*
复制链接

扫一扫