今天二营长有个需求,需要对百万级别的关键词进行违禁词过滤,每次都找研发兄弟跑,人家估计不耐烦了,啪...把程序扔给我了,让我自己跑,看到脚本的当时我是崩溃的,这TM的是啥?pyahocorasick库见都没见过,来感受下:import ahocorasick
import time
def main():
t1 = time.time()
A = ahocorasick.Automaton()
with open("D:\\seo-dev\\blackword\\blackword.properties", 'r') as fp:
for line in fp:
tok = line.strip("\n").split("\t")
if len(tok) < 1:
print line.decode('utf-8')
else:
A.add_word(tok[0], (1, tok[0]))
A.make_automaton()
f1 = open("D:\\seo-dev\\blackword\\back_dangdang.csv", 'w')
g1 = open("D:\\seo-dev\\blackword\\result_dangdang.csv", 'w')
cnt = 0
for line in open("D:\\seo-dev\\blackword\\dangdang.csv", 'r'):
cnt +=