p05

最新推荐文章于 2022-10-21 17:14:03 发布

kmp_whkl

最新推荐文章于 2022-10-21 17:14:03 发布

阅读量346

点赞数

分类专栏： p

本文链接：https://blog.csdn.net/u013679551/article/details/40077633

版权

p 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

1. 元组：不可变列表；使用，（可以加（））创建元组

eg：tuple = 1, 'a', 3.2, True 或 tuple = (1, 'a', 3.2, True)

2. 如果元组中只有一个元素正确错误方法：

正确：a = 1, 或者 a = (1,)

错误：a = (1)

3. 为什么需要元组：保证列表内容不被修改

4. 元组赋值：

交换两个值：a, b = b, a

5. 切分一个邮件地址：

name, domain = 'pp@qq.com'.split('@')

6. 函数和元组，同时返回列表中的最大值和最小值：

def max_min(lst):
    for i in lst:
        if i > max:
            max = i
        if i < min:
            min = i
    return max, min

7. Decorate, Sort and Undecorate(DSU)模式，装饰、排序和反装饰：

def sort_by_length(words):
    # decorate
    t = []
    for word in words:
        t.append((len(word),word))

    # sort
    t.sort(reverse = True)

    # undecorate
    res = []
    for lenth, word in t:
        res.append(word)

    return res

words = ['a', 'abde', 'acfbgi', 'ee']

print sort_by_length(words)
print words
words.sort(key = lambda x: len(x), reverse = True)
print words

8. 字典，类似map，创建字典：

使用{}创建字典

使用：指明键：值对： dict = {'anny':88661, 'bob':86541, 'mike':11256}

键必须是不可变的且不重复，值可以使任意类型

9. 访问字典，添加元素：

使用[]运算符，键作为索引

>>> print dict['anny']

>>>88661

访问不存在的键报错

添加一个新对：dict['Tom'] = 56231

10. 字典运算符和方法：

len(dict): 字典中的键值对数量

key in dict：快速判断key是否为字典中的键：O(1)，等价于dict.has_key(key)

for key in dict：枚举字典中的键，注：键是没有顺序的

dict.items(): 全部键值对

dict.keys()：全部的键

dict.values()：全部的值

dict.clear()：清空字典

>>> dict = {'anny':88661, 'bob':86541, 'mike':11256}
>>> dict
{'bob': 86541, 'mike': 11256, 'anny': 88661}
>>> dict.items()
[('bob', 86541), ('mike', 11256), ('anny', 88661)]
>>> dict.keys()
['bob', 'mike', 'anny']
>>> dict.values()
[86541, 11256, 88661]
>>> key in dict

Traceback (most recent call last):
  File "<pyshell#165>", line 1, in <module>
    key in dict
NameError: name 'key' is not defined
>>> 'bob' in dict
True

11. 统计给定字符串中出现字母的次数：

count = {}
for i in 'asdfjklasdjklsd':
    if i in count:
        count[i] += 1
    else:
        count[i] = 1
print count

12. 读取文件，打印出现频率最高的10个词：

count = {}
f = open('emma.txt')

for line in f:
    line = line.strip()
    words = line.split()
    for word in words:
        if word in count:
            count[word] += 1
        else:
            count[word] = 1

word_f = []
for word, freq in count.items():
    word_f.append((freq, word))

word_f.sort(reverse = True)

for freq, word in word_f[:10]:
    print word, freq

f.close()

13. 字典翻转：

def reverse_dict(d):
    re = {}
    for k, v in d.items():
        if v in re:
            re[v].append(k)
        else:
            re[v] = [k]
    return re

d = {'A':28, 'B':30, 'C':28}
print reverse_dict(d)

14. 集合（Set）：

创建：x = set()

添加和删除：x.add('body')；x.remove('body')

15. set运算符：-，差集；&，交集；|，并集；!=；==；in；for key in set；

16. 正向最大匹配：

def load_dic(filename):
    f = open(filename)
    word_dic = set()
    max_length = 1
    for line in f:
        word = unicode(line.strip(), 'utf-8')
        word_dic.add(word)
        if len(word) > max_length:
            max_length = len(word)
    f.close()
    return max_length, word_dic
def fmm_word_seg(sentence, word_dic, max_length):
    begin = 0
    words = []
    sentence = unicode(sentence, 'utf-8')

    while begin < len(sentence):
        for end in range(min(begin + max_len, len(sentence)), begin, -1):
            word = sentence[begin:end]
            if word in word_dic or end == begin + 1:
                words.append(word)
                break
        begin = end
    return words

max_len, word_dic = load_dic('lexicon.dic')
words = fmm_word_seg(raw_input(), word_dic, max_len)
for word in words:
    print word,

17. 数据结构对比：

	string	list	tuple	set	dict
Mutable	N	Y	N	Y	Y
Sequential	Y	Y	Y	N	N
Sortable	Y	Y	Y	N	N
Slicable	Y	Y	Y	N	N
Index/key type	int	int	int	不可变	不可变
Item/value type	char	any	any	no	any
Search	Y	Y	Y	Y	Y
complexity	O(n)	O(n)	O(n)	O(1)	O(1)