Python中最快的搜索引擎之一：ThreadSearch（自己开发）（abccdee1）

最新推荐文章于 2024-03-16 14:24:02 发布

何荣基

最新推荐文章于 2024-03-16 14:24:02 发布

阅读量691

点赞数 1

文章标签： python 算法

本文链接：https://blog.csdn.net/walchina2017/article/details/125689587

版权

ThreadSearch 用的是一个巧妙的方法来执行快速搜索的。它是用一个叫Thread 模块里的concurrent.futures 的东西（这两个东西是Python自带的，不用下载）它能让多个项目同时运行。这是我开发的搜索模块程序代码：

ThreadSearch.py

import concurrent.futures
def preload(list, div_num):
  divided = []
  start_index = 0
  end_index = int(len(list)/div_num)
  for x in list:
    if list[start_index: end_index]:
      divided.append(list[start_index: end_index])
    start_index += int(len(list)/div_num)
    end_index += int(len(list)/div_num)
  return divided
out = []
def search(pres, obj):
  def find(list_index):
    global out
    list = pres[list_index]
    finded = [x for x in list if obj in x]
    out.append(finded)
  with concurrent.futures.ThreadPoolExecutor(max_workers=len(pres)) as executor:
    executor.map(find, range(len(pres)))
  return out

这是一段测试代码：

Test.py

import random
import time
from ThreadSearch import *
def load_examples(num):
  alphabet = list('abcdefghijklmnopqrstuvwxyz')
  randomone = []
  nn = ''
  for x in range(num):
    for x in range(26):
      nn += alphabet[random.randint(0, 25)]
    randomone.append(nn)
    nn = ''
  return randomone
div_num = 10
obj = 'a'
print('loading examples')
a = load_examples(10000)
print('loading examples finished')
pres = preload(a, div_num)
x = search(pres, obj)
print(t2-t1)

preload的参数: 1. list：获取搜索的列表

2. div_num：把list分成多少段

search的参数: 1. preloaded ：获取preload处理完的列表

2. obj ：获取要搜索的字符/字符串

3. exactly：获取是否要字符/字符串要和preloaded里的某项一模一样，原始值为False

load_examples: 是我用来随机加载一个列表搞得，列表长度和num有关系.

P.S 你也可以把x给打印出来。但是效果。。。

接下来，我们来测试一下它搜索的速度：

import random
import time
from ThreadSearch import *
def load_examples(num):
  alphabet = list('abcdefghijklmnopqrstuvwxyz')
  randomone = []
  nn = ''
  for x in range(num):
    for x in range(26):
      nn += alphabet[random.randint(0, 25)]
    randomone.append(nn)
    nn = ''
  return randomone
div_num = 10
obj = 'a'
print('loading examples')
a = load_examples(10000)
print('loading examples finished')
preloaded = preload(a, div_num)
t1 = time.time()
x = search(preloaded, obj)
t2 = time.time()
print(t2-t1)

我们是用运行search函数之前和之后的时间差来判定速度的。

运行一下：