Python 基础系列 18 - 字典 dict

引言

今天来梳理一下Python dict 类型的知识点,更多 Python 基础系列,请参考 Python 基础系列大纲

内容提要:

  1. dict 特征
  2. dict 创建
  3. dict 操作
  4. dict 遍历
  5. dict 其它方法
  6. dict 性能
  7. dict 应用
    单词计数
    Google 搜索给页面打分

dict 特征

dict 是键值对 item 的集合
在这里插入图片描述

dic 是可变的

dic 的 item 可以更新

Key 键是不可变类型而且是唯一的

Key 是一个可变类型
在这里插入图片描述
key 不可能是 list, set,因为他们是可变 mutable 类型,不可哈希 hash 的

d1 = {['script']:'java'}             # error unhashable type: 'list'
d2 = {{'A', 'B', 'C'}:'all grades'}  # error unhashable type: 'list'

key 可以为 bool, str, tuple,int, float

d_bool_key = {True : 5}
d_str_key = {'name': 'kelly'}
d_int_float_key = {9 : 'score', 3.7 : 'gpa'}
d_tuple_key = {('first', 'last') : ('kelly', 'xia')}

Keys 键值是唯一的 UNIQUE
键值相同的情况,最后一个值会覆盖前面的值。
在这里插入图片描述

d1 = {'name':'kelly','name':'peter','name':'bob' }
d2 = {'first':'Smith', 'last':'Smith'}
print(d1)
print(d2)

# output:
{'name': 'bob'}
{'first': 'Smith', 'last': 'Smith'}

键值可以是异类 Heterogeneous 类型**在这里插入图片描述

dict 对象 d 中的 key 有 str 和 tuple

d = {'ssn: 555-55-5555':['jane', 'doe'], (16,17,18):'age group'}
print(d)

# output:
# {'ssn: 555-55-5555': ['jane', 'doe'], (16, 17, 18): 'age group'}

Key 键是按插入顺序排的,并不是按字母数字排序

所以不能按 index 索引访问 dict item

dict 创建

在这里插入图片描述

  • empty dict
    d = dict()
    d = {}

  • d = {key : value}

d1 = {'key1':'value1', 'key2':'value2'}
d1

# output:
# {'key1': 'value1', 'key2': 'value2'}

不支持: dict(‘key’:‘value’)

d = dict('key':'value')
d

# output:
# File "<ipython-input-640-efaed8bcc2a7>", line 1
#    d = dict('key':'value')
#                  ^
# SyntaxError: invalid syntax
  • d=dict(key_name=value_object)
d2 = dict(key1='value1', key2='value2')
d2
# output:
{'key1': 'value1', 'key2': 'value2'}
  • d = dict([tuple, tuple])------via tuple list
d_2_tuple_list = dict([('script', 'python'), ('version', 3.8)])
d_1_tuple_list = dict([('script', 'python')])

print('d_2_tuple_list: {}'.format(d_2_tuple_list))
print('d_1_tuple_list: {}'.format(d_1_tuple_list))

# output:
# d_2_tuple_list: {'script': 'python', 'version': 3.8}
# d_1_tuple_list: {'script': 'python'}
  • 不支持: d = dict(tuple)
d = dict(('a', 'b'))
d
# output:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-633-423a7c460802> in <module>
----> 1 d = dict(('a', 'b'))
      2 d

ValueError: dictionary update sequence element #0 has length 1; 2 is required
  • {tuple}----set
d_2_tuple = {('script', 'python'), ('version', 3.8)}
d_1_tuple = {('script', 'python')}

print('d_2_tuple: {}'.format(d_2_tuple))
print('d_1_tuple: {}'.format(d_1_tuple))

# output:
d_2_tuple: {('script', 'python'), ('version', 3.8)}
d_1_tuple: {('script', 'python')}
  • 不支持: d = {[tuple, tuple]}
d = {[('script', 'python'), ('version', 3.8)]}
d

# output:
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-612-db86a05588ef> in <module>
----> 1 d = {[('script', 'python'), ('version', 3.8)]}
      2 d

TypeError: unhashable type: 'list'
  • d=dict.fromkeys(key sequence, value)
keys = {'peter', 'bob', 'bruce'}
d1 = dict.fromkeys(keys, ['male', 'smoking', 30])
d2 = dict.fromkeys(keys, 30)
print('d1: {}'.format(d1))
print('d2: {}'.format(d2))

# output:
d1: {'bob': ['male', 'smoking', 30], 'peter': ['male', 'smoking', 30], 'bruce': ['male', 'smoking', 30]}
d2: {'bob': 30, 'peter': 30, 'bruce': 30}
  • d = dict(zip(collection1, collection2)
zip_3_3 = zip(['one', 'two', 'three'], [1,2,3])
zip_4_3 = zip(['one', 'two', 'three', 'four'], [1,2,3])
zip_3_4 = zip(['one', 'two', 'three'], [1,2,3,4])
d33 = dict(zip_3_3)
d43 = dict(zip_4_3)
d34 = dict(zip_3_4)
print('d33: {}'.format(d33))
print('d43: {}'.format(d43))
print('d34: {}'.format(d34))

只输出两个 collection 匹配成功的对数

# output:
d33: {'one': 1, 'two': 2, 'three': 3}
d43: {'one': 1, 'two': 2, 'three': 3}
d34: {'one': 1, 'two': 2, 'three': 3}
  • 字典的推导式 创建
dic = {str(i) : i*2  for i in range(4)}
print(dic)

# output:
{'0': 0, '1': 2, '2': 4, '3': 6}

作业: 哈哈,巩固成果的时候到了。
在这里插入图片描述
在这里插入图片描述

dict 操作

访问 Access

在这里插入图片描述
图例:
在这里插入图片描述

key 不存在会 unknown key error

info_dict = {}  # do not miss this statement, or name 'info_dict' is not defined
key = 'name' 
info_dict [key] = 'kelly' 
print(info_dict [key])
print(info_dict ['unkonwnKey'])

# output:
kelly

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-644-de1581161396> in <module>
      3 info_dict [key] = 'kelly'
      4 print(info_dict [key])
----> 5 print(info_dict ['unkonwnKey'])

KeyError: 'unkonwnKey'

get(key, [default]) key 找不到对应的 value 会用默认的 value,但是这个默认的 value 是不会加到原来的 dict 里

d = {'first': 'kelly', 'last':'Xu'}
print('d.keys:\n{} '.format(d.keys()))
print('d.values:\n{} '.format(d.values()))
print('d.items:\n{} '.format(d.items()))
print('d.get("first"):\n{} '.format(d.get('first')))
print('d.get("phone"):\n{} '.format(d.get('phone')))
print('d.get("phone", "111-222-333-444"):\n{} '.format(d.get('phone', '111-222-333-444')))
print('d content:\n{}'.format(d))
# output:
d.keys:
dict_keys(['first', 'last']) 
d.values:
dict_values(['kelly', 'xu']) 
d.items:
dict_items([('first', 'kelly'), ('last', 'xu')]) 
d.get("first"):
kelly 
d.get("phone"):
None 
d.get("phone", "111-222-333-444"):
111-222-333-444 
d content:
{'first': 'kelly', 'last': 'xia'} 

setdefault(key, default=None) 如果 key 存在则返回对应的 value,如果 key 不存在,则设置默认None 或 value, 会自动加入到 dict 里

d = {'first': 'kelly', 'last':'xu'}

print('d.setdefault("first", "lydia"):\n{}'.format(d.setdefault('first', 'lydia')))
print('d.setdefault("age", 18):\n{}'.format(d.setdefault('age', 18)))
print('d.setdefault("cellphone"):\n{}'.format(d.setdefault('cellphone')))
print('afer setting:\n{}'.format(d))
d.setdefault("first", "lydia"):
kelly
d.setdefault("age", 18):
18
d.setdefault("cellphone"):
None
afer setting:
{'first': 'kelly', 'last': 'xu', 'age': 18, 'cellphone': None}

作业:
在这里插入图片描述

查询

in 操作符:快速查询一个 Key 是否存在于一个 dict 里

d = {'script':'python'}
print('Is the key "script" in d:\n{}'.format('script' in d))
print('Is the key "version" in d:\n{}'.format('version' in d))
print('Is the value "python" in d:\n{}'.format('python' in d))

只应用到 key上,对 value 无效

# output:
Is the key "script" in d:
True
Is the key "version" in d:
False
Is the value "python" in d:
False

dict 遍历

for key, item in d.items()

d = {'first':'kelly', 'last':'xu'}
for key, item in d.items():
    print('{}:{}'.format(key, item))
    
# output:
first:kelly
last:xu

for key in d.keys()

d = {'first':'kelly', 'last':'xu'}
for key in d.keys():
    print('{}:{}'.format(key, d[key]))

# output:
first:kelly
last:xu

for key in d

d = {'first':'kelly', 'last':'xia'}
for key in d:
    print('{}:{}'.format(key, d[key]))
    
# output:
first:kelly
last:xia

for value in d.values(), 但是不能访问key

d = {'first':'kelly', 'last':'xia'}
for value in d.values():
    print('{}'.format(value))

# output:
kelly
xia

dict 其它方法

d = dict (name='John', age=20) 
MethodDescriptionResult
d.copy ()Return copy, without modifying the original dictionaryd.copy (name=‘John’, age=20)
d.clear ()Remove all items{}
del d[key]Remove an items with a given key deld[name]
d.pop (key, [default])Remove and return an element from a dictionaryd.pop (‘age’)
d.update ()Update the dictionary with elements from dictionary objectd.update (d1)
d.fromkeys (sequence, [value])Create a dictionary from sequence as keys w/ value provided keys = {‘1’, ‘2’, ‘3’}d.fromkeys (keys, 0)

dict.copy(d)
生成一个新的dict

d = dict (name='John', age=20) 
print('original dict:\n id:{}\t content:{}'.format(id(d), d))
d_copy = dict.copy(d)
print('copy dict:\n id:{}\t content:{}'.format(id(d_copy), d_copy))
# output:
original dict:
 id:2239127189312	 content:{'name': 'John', 'age': 20}
copy dict:
 id:2239126915072	 content:{'name': 'John', 'age': 20}

del d[key]—没有返回值

d = {'first':'kelly', 'last':'xu'}
del d['first']
d

# output:
{'last': 'xu'}

pop(key) - 移除和返回 value

d = {'first':'kelly', 'last':'xu'}
print('return d.pop("first"):{}'.format(d.pop('first')))
print('d content afer pop:{}'.format(d))

# output:
return d.pop("first"):kelly
d content afer pop:{'last': 'xu'}

d1.update(d2): key 相同时会将 d2 的 value替换 d1的 value,key 不同时,会将 d2的key value加入 d1 中

d1 = {'first':'kelly', 'last':'xu'}
d2 = {'first':'kelly', 'last':'wang', 'age':'18'}
d1.update(d2)
d1

# output:
{'first': 'kelly', 'last': 'wang', 'age': '18'}

dic 性能

在这里插入图片描述

作业:
在这里插入图片描述

dic 应用

计算单词个数:给定一个字符串,计算单词个数

解决方法:

  • 应用 .split() 方法将字符串按空格分隔成数组,应用 .replace() 对单词进行处理(可选)
  • 两种方式统计单词个数
    用 collections.Counter() 方法
    用 .setdefault() 方法
  • 按单词字符逆序,打印单词及对应的个数
  • 打印前 3 使用最频繁的单词

代码:

from collections import Counter as count_words

# import this
paragraph_str = """
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
"""

word_list = paragraph_str.split()
print ('Number of words: ' + str(len(word_list)))
# apply different pre-processing 
word_list = [x.strip().
             replace('.','').
             replace(',','').
             replace('!','').
             replace('?','').
             replace('*','').
             replace('--','').
             replace(':','').
             lower() for x in word_list]
print ('Number of words after pre-processing: ' + str(len(word_list)))
    
# Case 1: use collections.Counter to count identical words
word_counts_dict = count_words (word_list) 
print ('Number of unique words: ' + str(len(word_counts_dict)))
print ('\nWord \t Count')
for key in sorted (word_counts_dict.keys(), reverse=True):
    print (key + '\t : ' + str(word_counts_dict[key]))
   
# Case 2: use .setdefaults() method to count identical words
word_dict = {}
for word in word_list:
    word_dict.setdefault(word, 0)
    word_dict[word] += 1
    
print("word_dict:{}".format(str(len(word_dict))))   

print ('\nTop-3 Most Frequent Words in the Zen of Python:')  
print ('is: \t{} {}'.format(word_counts_dict['is'], word_dict['is']))    
print ('better: {} {}'.format(word_counts_dict['better'], word_dict['better']))
print ('than: \t{} {}'.format(word_counts_dict['than'], word_dict['than']))

输出:

Number of words: 137
Number of words after pre-processing: 137
Number of unique words: 81

Word 	 Count
you're	 : 1
way	 : 2
unless	 : 2
ugly	 : 1
to	 : 5
those	 : 1
there	 : 1
the	 : 5
that	 : 1
than	 : 8
temptation	 : 1
special	 : 2
sparse	 : 1
simple	 : 1
silently	 : 1
silenced	 : 1
should	 : 2
rules	 : 1
right	 : 1
refuse	 : 1
readability	 : 1
purity	 : 1
preferably	 : 1
practicality	 : 1
pass	 : 1
only	 : 1
one	 : 3
often	 : 1
of	 : 2
obvious	 : 2
now	 : 2
not	 : 1
never	 : 3
nested	 : 1
namespaces	 : 1
more	 : 1
may	 : 2
let's	 : 1
it's	 : 1
it	 : 2
is	 : 10
in	 : 1
implicit	 : 1
implementation	 : 2
if	 : 2
idea	 : 3
honking	 : 1
hard	 : 1
guess	 : 1
great	 : 1
good	 : 1
flat	 : 1
first	 : 1
face	 : 1
explicitly	 : 1
explicit	 : 1
explain	 : 2
errors	 : 1
enough	 : 1
easy	 : 1
dutch	 : 1
do	 : 2
dense	 : 1
counts	 : 1
complicated	 : 1
complex	 : 2
cases	 : 1
break	 : 1
better	 : 8
beautiful	 : 1
beats	 : 1
be	 : 3
bad	 : 1
at	 : 1
aren't	 : 1
are	 : 1
and	 : 1
ambiguity	 : 1
although	 : 3
a	 : 2
	 : 1
word_dict:81

Top-3 Most Frequent Words in the Zen of Python:
is: 	10 10
better: 8 8
than: 	8 8

Google 搜索给页面打分

Google 搜索引擎基于输入的 links 用网页排名 PageRank 算法给每个页面打分。
输入:页面 list, link 到其它页面,links = [(‘a’, ‘b’), (‘a’, ‘c’), (‘b’, ‘c’)]
输出:每个页面的重要性分数

工作原理:
每个页面初始分数设置为 1
多次迭代算法,一般为 10 次,每次迭代:
每页面计算分数:
page_score[to_page] += 0.85*page_score[from_page]/float(outdegree[from_page])
循环累积加分

代码:

links = [('a','b'),('a','c'),('b','c')]
pages = set()
outdegree = {}
for from_page, to_page in links: # list unpacking
    pages.add(from_page)
    pages.add(to_page)
    if from_page in outdegree:
        outdegree[from_page] += 1
    else:
        outdegree[from_page] = 1
print('\n\tPages: ' + str(pages))
print('\tOutdegree: ' + str(outdegree))

page_score = {}
for page in pages:
    page_score[page] = 1
    
for i in range(10):
    for from_page,to_page in links:
        page_score[to_page] += 0.85*page_score[from_page]/float(outdegree[from_page])

print('\n\tPage Scores: ' + str(page_score))

输出:

	Pages: {'a', 'b', 'c'}
	Outdegree: {'a': 2, 'b': 1}

	Page Scores: {'a': 1, 'b': 5.249999999999999, 'c': 33.61875}
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值