【pythonchallenge】【问题2】

最新推荐文章于 2024-08-26 18:56:03 发布

jiuyueguang

最新推荐文章于 2024-08-26 18:56:03 发布

阅读量734

点赞数 1

分类专栏： python 文章标签： python pythonchallenge

本文链接：https://blog.csdn.net/jiuyueguang/article/details/43348539

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

网页标题：ocr
问题网址：http://www.pythonchallenge.com/pc/def/ocr.html
问题描述：问题给了个图片，无意义
问题提示：recognize the characters. maybe they are in the book,
but MAYBE they are in the page source.
问题分析：查看网页源代码，发现“find rare characters in the mess below:”应该是找出下面乱码中最少的字母来，把乱码保存成文件，思路过程是：
1. 统计词频
2. 利用python中的set统计出有哪些字母，要保证顺序
3. 然后计算词频即可

代码实现：

def level_2():
    # http://www.pythonchallenge.com/pc/def/ocr.html
    with open('level_2.txt', 'r')as f:
        l2_content = f.read()
    origin_list=[x for x in l2_content]
    ch_list = list(set(l2_content))
    ch_list.sort(key=origin_list.index)#按照原来的顺序排序
    print ch_list
    print [l2_content.count(x) for x in ch_list] #分析词频最少的即可

其他代码：

def a_level_2():
    with open('level_2.txt', 'r') as f:
        l2_content = f.read()
    order_dict = collections.OrderedDict()  # 保证出现的顺序
    for ch in l2_content:
        order_dict[ch] = order_dict.get(ch, 0) + 1  # get函数犀利
    print order_dict
    pass

输出结果：

['%', '$', '@', '_', '^', '#', ')', '&', '!', '+', ']', '*', '}', '[', '(', '{', '\n', 'e', 'q', 'u', 'a', 'l', 'i', 't', 'y']
[6104, 6046, 6157, 6112, 6030, 6115, 6186, 6043, 6079, 6066, 6152, 6034, 6105, 6108, 6154, 6046, 1219, 1, 1, 1, 1, 1, 1, 1, 1]

对应结果是'e', 'q', 'u', 'a', 'l', 'i', 't', 'y'，然后替换url即可