python 上机题大全_Python测试--传说中腾讯的一道上机题

最新推荐文章于 2023-09-12 14:01:15 发布

维维手作

最新推荐文章于 2023-09-12 14:01:15 发布

阅读量416

点赞数

文章标签： python 上机题大全

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_35029527/article/details/112991165

版权

传说中腾讯的面试题 http://www.xhttp.cn/2010/05/2，拿了最后一道上机题来用Python做了一下，作为Python的菜鸟，而且还很久不写Python了，我表示压力很大。

我的想法是这样的：

1、题目要求输入使用“php example.php [单词]”，那我总不可能没运行一次命令都扫描一次文本吧，4MB的文本，30000行，每次运行都遍历一下谁都吃不消，所以我觉得还是弄个脚本给每个单词生成一个文本缓存吧……

2、事先用脚本把所有单词的缓存都生成一遍，用“php example.php [单词]”这样的输入就只是做做样子，直接读对应的单词缓存文件即可。

3、那生成缓存的工作就交给Python来做吧！

大致思路是这样的：

1、按行读入文件，将每行内的单词分离出来；单词的定义——由英文字母(大小写)，数字(0-9)组成的串。

2、记录下改行内所有单词出现的位置，用一个哈希表储存起来，哈希表的键是单词的小写形式，值是一个序列，记录了该单词在行内的位置。

3、生成一个以“单词名字.txt”的文本文件，里面记录当前行内该单词出现的位置；在后续处理中持续向该文件内写入单词的位置。

4、直接上代码！

import os

#bible_lib.py

def is_letter(char):

return str.isalnum(char)

def write_index_file(filename, linenumber, val):

filepath = os.path.dirname(filename)

# create file path if it does not exist

if not os.path.exists(filepath):

os.makedirs(filepath)

# linebumber == 1, create a new file, or append new content to the file

if linenumber == 1:

f = file(filename, 'wb')

else:

f = file(filename, 'ab')

content = "Line %d: %s %s" % (linenumber, ', '.join(val), os.linesep)

f.write(content)

f.close()

def process_line(linebumber, line):

print "processing line %d" % linebumber

words_hash = split_word(line)

for (key, val) in words_hash.items():

filename = os.path.join(os.getcwd(), 'index', key+'.txt')

write_index_file(filename, linebumber, val)

def split_word(line):

str_len = len(line)

words_hash = {}

word_start, word_start_pos, word_end_pos = False, 0, 0

for i in range(0, str_len):

# for a word's beginning

if is_letter(line[i]) and (not word_start):

word_start, word_start_pos = True, i

# for a word's ending

if (not is_letter(line[i])) and word_start:

word_start, word_end_pos = False, i

word = line[word_start_pos:word_end_pos]

words_hash.setdefault(word.lower(), []).append(str(word_start_pos))

return words_hash

#!/usr/lib/python

#bible.py

from bible_lib import process_line

import os, time

if __name__ == '__main__':

cur_dir = os.getcwd()

handle = file(os.path.join(cur_dir, 'bbe.txt'), 'r')

start = time.time()

try:

l = 1

for line in handle:

process_line(l, line)

l += 1

finally:

handle.close()

end = time.time()

print end - start

最后生成缓存大概用时是1057秒……，这个时间相当惨不忍睹……

应该还有更快的解决办法吧，稍后再用Ruby写一个试试看。

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。