既然你还没有得到任何答案,我想我至少会提出一些想法.我使用了一个python k-d树模块来快速搜索最近的邻居点:
http://code.google.com/p/python-kdtree/downloads/detail?name=kdtree.py
只要它们的大小相同,它就需要任意点长度.
我不确定你将如何应用“重要性”的权重,但这里只是一个关于如何使用kdtree模块至少让最近的“人”到达给定人集的每个点的头脑风暴:
import numpy
from kdtree import KDTree
from itertools import chain
class PersonPoint(object):
def __init__(self,person,point,factor):
self.person = person
self.point = point
self.factor = factor
def __repr__(self):
return '' % (self.person,['%0.2f' % p for p in self.point],self.factor)
def __iter__(self):
return self.point
def __len__(self):
return len(self.point)
def __getitem__(self,i):
return self.point[i]
people = {}
for name in ('bill','john','mary','jenny','phil','george'):
factors = numpy.random.rand(6)
points = numpy.random.rand(6,3).tolist()
people[name] = [PersonPoint(name,p,f) for p,f in zip(points,factors)]
bill_points = people['bill']
others = list(chain(*[people[name] for name in people if name != 'bill']))
tree = KDTree.construct_from_data(others)
for point in bill_points:
# t=1 means only return the 1 closest.
# You could set it higher to return more.
print point,"=>",tree.query(point,t=1)[0]
结果:
=>
=>
=>
=>
=>
=>
我想结果,你可以看看最常见的匹配“人”或者然后考虑权重.或者也许你可以总结结果中的重要因素,然后取最高评分.那样的话,如果玛丽只匹配一次,但有10个因子,而菲尔有3个匹配,但只总计5,玛丽可能更相关?
我知道你有一个更强大的功能来创建一个索引,但它需要遍历你的集合中的每个点.