- 网页标题:ocr
- 问题网址:http://www.pythonchallenge.com/pc/def/ocr.html
- 问题描述:问题给了个图片,无意义
- 问题提示:recognize the characters. maybe they are in the book,
but MAYBE they are in the page source. - 问题分析:查看网页源代码,发现“find rare characters in the mess below:”应该是找出下面乱码中最少的字母来,把乱码保存成文件,思路过程是:
- 统计词频
- 利用python中的set统计出有哪些字母,要保证顺序
- 然后计算词频即可
- 代码实现:
其他代码:def level_2(): # http://www.pythonchallenge.com/pc/def/ocr.html with open('level_2.txt', 'r')as f: l2_content = f.read() origin_list=[x for x in l2_content] ch_list = list(set(l2_content)) ch_list.sort(key=origin_list.index)#按照原来的顺序排序 print ch_list print [l2_content.count(x) for x in ch_list] #分析词频最少的即可
def a_level_2(): with open('level_2.txt', 'r') as f: l2_content = f.read() order_dict = collections.OrderedDict() # 保证出现的顺序 for ch in l2_content: order_dict[ch] = order_dict.get(ch, 0) + 1 # get函数犀利 print order_dict pass
- 输出结果:
对应结果是'e', 'q', 'u', 'a', 'l', 'i', 't', 'y',然后替换url即可['%', '$', '@', '_', '^', '#', ')', '&', '!', '+', ']', '*', '}', '[', '(', '{', '\n', 'e', 'q', 'u', 'a', 'l', 'i', 't', 'y'] [6104, 6046, 6157, 6112, 6030, 6115, 6186, 6043, 6079, 6066, 6152, 6034, 6105, 6108, 6154, 6046, 1219, 1, 1, 1, 1, 1, 1, 1, 1]
- 替换网址:http://www.pythonchallenge.com/pc/def/equality.html
【pythonchallenge】【问题2】
最新推荐文章于 2024-08-26 18:56:03 发布