#python学习笔记（十）#字典：对文件中的单词进行计数

最新推荐文章于 2022-11-01 15:43:40 发布

易烊万玺622

最新推荐文章于 2022-11-01 15:43:40 发布

阅读量414

点赞数

分类专栏： python

本文链接：https://blog.csdn.net/weixin_38980061/article/details/120985616

版权

Python 字典计数文件处理字符串处理

关键词由CSDN通过智能技术生成

python 专栏收录该内容

23 篇文章 4 订阅

订阅专栏

1 Dictionary

2 Dictionary as a set of counters

3 Looping and dictionary

1 Dictionary

A dictionary is like a list, but more general. In a list, the index positions have to be integers; in a dictionary, the indices can be (almost) any type

字典和列表类似，列表的中的index只能是整数，而字典的index可以是任何类型

▲dict函数创建字典，{}表示一个空字典

>>> eng2sp = dict()
>>> print(eng2sp)
{}

▲字典包含keys和values，形成多个key-value对，key和value之间用：colon分隔，不同key-value对之间，comma分隔

>>> eng2sp['one'] = 'uno'  ###字典添加元素：一对key-value
>>> print(eng2sp)
{'one': 'uno'}



>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'} ###定义字典内容
>>> print(eng2sp)
{'one': 'uno', 'three': 'tres', 'two': 'dos'}    

###由于字典的index不像list是有序的整数，因此输出时不同key-value对顺序可能发生变化

▲ 根据keys取出字典中的值

>>> print(eng2sp['two'])
'dos'

>>> print(eng2sp['four'])  ###key不存在时报错
KeyError: 'four'

▲len函数返回的是key-value对的数目

>>> len(eng2sp)
3

▲in函数依据key判断

>>> 'one' in eng2sp
True
>>> 'uno' in eng2sp
False

▲values和keys方法可以分别取出字典的values和keys，可以用list()转化成列表

>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
>>> eng2sp.values()
dict_values(['uno', 'dos', 'tres'])
>>> eng2sp.keys()
dict_keys(['one', 'two', 'three'])
>>> list(eng2sp.values())
['uno', 'dos', 'tres']
>>> list(eng2sp.keys())
['one', 'two', 'three']


>>> vals = list(eng2sp.values())
>>> 'uno' in vals
True

▲字典方法

方法	描述
clear()	删除字典中的所有元素
copy()	返回字典的副本
fromkeys()	返回拥有指定键和值的字典
get()	返回指定键的值
items()	返回包含每个键值对的元组的列表
keys()	返回包含字典键的列表
pop()	删除拥有指定键的元素
popitem()	删除最后插入的键值对
setdefault()	返回指定键的值。如果该键不存在，则插入具有指定值的键。
update()	使用指定的键值对字典进行更新
values()	返回字典中所有值的列表

2 Dictionary as a set of counters

▲用字典实现计数（字符串）：

word = 'brontosaurus'
d = dict()   ###创建空字典

for c in word:
    if c not in d:  ###字典的key中没有c字符，则在字典中加入c作为新的key，它的值为1
        d[c] = 1
    else:
        d[c] = d[c] + 1   ###字典的key中有c字符，则数目加1

print(d)

▲用get方法简化代码，get两个参数，第一个为要在字典中查找的key，找到时返回key对应的value；第二个参数为没找到时返回的内容，如：

>>> counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
>>> print(counts.get('jan', 0))  ###counts中有jan，返回jan的value 100
100
>>> print(counts.get('tim', 0))  ###counts中没有tim，返回第二个参数0
0

简化后的代码如下

word = 'brontosaurus'
d = dict()

for c in word:
    d[c] = d.get(c,0) + 1 ###若c在d中，d[c]在原值基础上加1，若不在，d[c]=0+1=1

print(d)

▲用字典实现计数（文件）：

fname = input('Enter the file name: ')   ###输入文件名
try:
    fhand = open(fname)
except:
    print('File cannot be opened:', fname)
    exit()                               ###try-except

counts = dict()                          ###创建空字典
for line in fhand:                       ###遍历文件
    words = line.split()                 ###每段拆分成单词   
    for word in words:
        if word not in counts:
            counts[word] = 1             ###单词不在key中，加入并赋值1
        else:
            counts[word] += 1            ###单词在key中，value+1

print(counts)

▲不区分大小写和标点符号的计数

用到字符串的方法translate删除标点符号

line.translate(str.maketrans(fromstr, tostr, deletestr))
Replace the characters in fromstr with the character in the same position in tostr and delete all characters that are in deletestr. The fromstr and tostr can be empty strings and the deletestr parameter can be omitted.

feomstr换成tostr，删除deletestr

string模块中的punctuation返回常用的标点符号

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

用到字符串的方法lower将所有字符变成小写

import string ###加载string module
 
fname = input('Enter the file name: ') ###输入文件名
try:
    fhand = open(fname)
except:
    print('File cannot be opened:', fname)
    exit()  ###try-except

counts = dict()  ###创建空字典

for line in fhand:  ###遍历文件
    line = line.rstrip() ###删除newline
    line = line.translate(line.maketrans('', '', string.punctuation)) ###删除标点符号
    line = line.lower()  ###变成小写
    
    words = line.split() ###拆分成单词
    for word in words:
        if word not in counts:
            counts[word] = 1
        else:
            counts[word] += 1  ###字典计数

print(counts)

3 Looping and dictionary

▲根据key遍历字典

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    print(key, counts[key])


counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
for key in counts:
    if counts[key] > 10 :
        print(key, counts[key])

▲ 根据key排序输出

counts = { 'chuck' : 1 , 'annie' : 42, 'jan': 100}
lst = list(counts.keys()) ###取出keys生成列表
print(lst)
lst.sort()           ###列表排序
for key in lst:      ###依据列表顺序输出
    print(key, counts[key])

易烊万玺622

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
#python学习笔记（十）#字典：对文件中的单词进行计数

目录1 Dictionary2Dictionary as a set of counters3 Looping and dictionary1 DictionaryA dictionary is like a list, but more general. In a list, the index positions have to be integers; in a dictionary, the indices can be (almost) any type字典和列表类似，.
复制链接

扫一扫

专栏目录