java mvp_用Java实现MVPtree——MVPtree核心算法代码的搭建

项目需要,需要把MVPtree这种冷门的数据结构写入Java,然网上没有成形的Java实现,虽说C++看惯了不过对C++实现复杂结构也是看得蒙蔽,幸好客户给了个github上job什么的人用Java写的VPtree,大体结构可以嵌入MVPtree。

对于MVPtree的其他信息请左转百度= =本文只讲述算法实现。

点查找树结构主要需解决的问题有2个:如何减少非必要点的搜索,以及如何减少距离计算次数。前者的解决方法比较容易想到,把点集分割为左右对称的两半长方形,或者脑洞大点的,通过距离切分(效率很高,因为所有查询都是基于点距离的)成为圆和圆环。后者适用面不是很广,优化思路通常是预先计算与基准点的距离,查询点时筛点。

VPtree就是使用距离划分点集的例子。每个结点一个点集,随意定个点作为基准点,然后把点集根据与基准点距离分成数量相等的2个子集,这2个子集再分别进入此结点的子结点,用点查找出点集的过程如出一辙,但是没有对第2点进行优化,这个结构适合于距离函数是曼哈顿距离或者欧几里得距离的情况。

MVPtree继承了VPtree用距离划分的特点,只不过一个结点会划分4个点集,同时通过path数组限制距离函数运行次数。划分为4个点集而不是2个点集,可以分割得细一些,减少无效点;使用一定数量的基准点限制,可以在查询频繁的情况下减少距离计算次数,并且这些基准点通常被切分得很散,大片大片的无效区域被排除了,效果拔群。这个结构适合于距离函数是计算次数过高的切比雪夫函数之流。

接下来就是代码的实现了。

MVPtree与VPtree的点有个不同之处,就是MVPtree的点还附上了与基准点的距离数组,这里就需要使用特别的点数据结构:MVPtree用点

核心代码如下:

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

public class MVPTreePoint

{private ArrayListpath;privateP point;private final intmaxLevel;public MVPTreePoint(final P point, final intmaxLevel) {this.point =point;this.maxLevel =maxLevel;this.path = new ArrayList<>();

}public void addDistanceToSelf(final MVPTreePoint

vantagePointElement, final DistanceFunction

distanceFunction) {if(this.path.size() < this.maxLevel)this.path.add(distanceFunction.getDistance(this.point, vantagePointElement.point));

}public void addDistanceToSelf(final P vantagePoint, final DistanceFunction

distanceFunction) {if(this.path.size() < this.maxLevel)this.path.add(distanceFunction.getDistance(this.point, vantagePoint));

}public void addDistanceToSelf(final doubledistance) {if(this.path.size() < this.maxLevel) {this.path.add(distance);

}

}public void removeDistanceToSelf(final intposition) {if(position < this.path.size()) {this.path.remove(position);

}

}public double getDistanceToSelf(inti) {return this.path.get(i);

}public intsize() {return this.path.size();

}public voidclearPath() {this.path.clear();

}publicP getPoint() {return this.point;

}

@SuppressWarnings("unchecked")public booleanequals(Object o){

MVPTreePoint

t = (MVPTreePoint

) o;return this.point.equals(t.point);

}

}

MVPTreePoint

把距离数组写到点类上而不是集成到树结点类上,结构会清晰一些,并且从点里取出距离也方便。

MVPtree与VPtree有好多不同的地方,但是好多都只是改一下类名,把P,E改成MVPTreePoint

,MVPTreePoint,这里主讲核心算法——初始化树和点查询。

初始化MVPtree不仅要多选出一个基准点,多切分2次数组,还要把基准点到每个点的距离都分别储存起来。

capacity就是叶子结点的容量,要设中间一些,根据数据规模定吧。

原论文把基准点从点集取出来放到单独的位置上,但是实际编写程序时,把基准点仅仅当作一个基准点,基准点还是作为点集的一部分初始化。这样,数据结构仅仅是多出quantityOfPoint/capacity个点,但是程序编写方便了很多。

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

publicMVPTreeNode(final Collection>pointNodes,final DistanceFunction

distanceFunction,final MVPThresholdSelectionStrategy

thresholdSelectionStrategy,final int capacity, final intmaxLevel) {if (capacity < 1) {throw new IllegalArgumentException("Capacity must be positive.");

}if(pointNodes.isEmpty()) {throw newIllegalArgumentException("Cannot create a MVPTreeNode with an empty list of points.");

}this.capacity =capacity;this.maxLevel =maxLevel;this.distanceFunction =distanceFunction;this.thresholdSelectionStrategy =thresholdSelectionStrategy;this.pointNodes = new ArrayList<>(pointNodes);this.children = new MVPTreeNode[2][2];this.vantagePoint = (E[]) new Object[2];this.secondThreshold = new double[2];this.anneal();

}protected voidanneal() {if (this.pointNodes == null) {int childrenSize[][] = new int[2][2];for (int i = 0; i < 2; i++) {for (int j = 0; j < 2; j++) {

childrenSize[i][j]= this.children[i][j].size();

}

}if (childrenSize[0][0] == 0 || childrenSize[0][1] == 0

|| childrenSize[1][0] == 0 || childrenSize[1][1] == 0) {//One of the child nodes has become empty, and needs to be//pruned.

this.pointNodes = new ArrayList<>(childrenSize[0][0]+ childrenSize[0][1] + childrenSize[1][0]+ childrenSize[1][1]);this.addAllPointsToCollection(this.pointNodes);for (MVPTreePoint pointNode : this.pointNodes) {

pointNode.clearPath();

}for (int i = 0; i < 2; i++) {for (int j = 0; j < 2; j++) {this.children[i][j] = null;

}

}this.anneal();

}else{for (int i = 0; i < 2; i++) {for (int j = 0; j < 2; j++) {this.children[i][j].anneal();

}

}

}

}else{int firstVantagePointIndex = new Random().nextInt(this.pointNodes

.size());this.vantagePoint[0] = this.pointNodes.get(firstVantagePointIndex)

.getPoint();this.firstThreshold = this.thresholdSelectionStrategy

.selectThreshold(this.pointNodes, this.vantagePoint[0],this.distanceFunction);intfirstIndexPastThreshold;try{

firstIndexPastThreshold=MVPTreeNode.partitionPoints(this.pointNodes, this.vantagePoint[0],this.firstThreshold, this.distanceFunction);

}catch (finalPartitionException e) {this.storeInOneNode();return;

}if (this.pointNodes.size() > this.capacity) {

List> subTreeList[] = new List[2];

subTreeList[0] = this.pointNodes.subList(0,

firstIndexPastThreshold);

subTreeList[1] = this.pointNodes.subList(

firstIndexPastThreshold,this.pointNodes.size());//if points can be divided into 2 parts, find second vantage//point and try to split point array

int secondVantagePointIndex = newRandom()

.nextInt(subTreeList[1].size());this.vantagePoint[1] = subTreeList[1].get(

secondVantagePointIndex).getPoint();int splitPosition[] = new int[2];for (int i = 0; i < 2; i++) {this.secondThreshold[i] = this.thresholdSelectionStrategy

.selectThreshold(subTreeList[i],this.vantagePoint[1], this.distanceFunction);try{

splitPosition[i]=MVPTreeNode.partitionPoints(

subTreeList[i],this.vantagePoint[1],this.secondThreshold[i], this.distanceFunction);

}catch (finalPartitionException e) {this.storeInOneNode();return;

}

}for (MVPTreePoint pointNode : this.pointNodes) {

pointNode.addDistanceToSelf(this.distanceFunction

.getDistance(pointNode.getPoint(),this.vantagePoint[0]));

pointNode.addDistanceToSelf(this.distanceFunction

.getDistance(pointNode.getPoint(),this.vantagePoint[1]));

}for (int i = 0; i < 2; i++) {this.children[i][0] = new MVPTreeNode<>(

subTreeList[i].subList(0, splitPosition[i]),this.distanceFunction,this.thresholdSelectionStrategy, this.capacity,this.maxLevel);this.children[i][1] = new MVPTreeNode<>(

subTreeList[i].subList(splitPosition[i],

subTreeList[i].size()),this.distanceFunction,this.thresholdSelectionStrategy, this.capacity,this.maxLevel);

}this.pointNodes = null;

}else{this.storeInOneNode();

}

}

}private voidstoreInOneNode() {int maxIndex = 0;double maxDistance = this.distanceFunction.getDistance(this.pointNodes

.get(0).getPoint(), this.vantagePoint[0]);for (int i = 1; i < this.pointNodes.size(); i++) {double curDistance = this.distanceFunction.getDistance(this.pointNodes.get(i).getPoint(), this.vantagePoint[0]);if (maxDistance

maxDistance=curDistance;

maxIndex=i;

}

}this.vantagePoint[1] = this.pointNodes.get(maxIndex).getPoint();for (int i = 0; i < 2; i++) {for (int j = 0; j < 2; j++) {this.children[i][j] = null;

}

}

}

init MVPtree

原作者给出了2种查询方式:找离查询点前k近点和找离查询点不远于u点。

找离查询点前k点的算法可以沿用查询VPtree时的做法,先查找查询点所在的子结点,再查找其他子结点,注意要先判定收集者是否装满(没装满的话,不管是啥点都直接塞),再判定收集者与查询点的最远距离(对第二种查找方式来说是固定距离)是否小于点/点集与查询点的最近距离(在树结点和叶子结点都有用处)。

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

public voidcollectNearestNeighbors(final NearestNeighborCollector

collector, intdepth) {if (this.pointNodes == null) {//O1-Q

final double distanceFromFirstVantagePointToQueryPoint = this.distanceFunction

.getDistance(this.vantagePoint[0],

collector.getQueryPoint().getPoint());//O2-Q

final double distanceFromSecondVantagePointToQueryPoint = this.distanceFunction

.getDistance(this.vantagePoint[1],

collector.getQueryPoint().getPoint());

collector.getQueryPoint().addDistanceToSelf(

distanceFromFirstVantagePointToQueryPoint);

collector.getQueryPoint().addDistanceToSelf(

distanceFromSecondVantagePointToQueryPoint);final MVPTreeNode

index = this.getChildNodeForPoint(collector.getQueryPoint().getPoint());

index.collectNearestNeighbors(collector, depth+ 1);//O1-Q - O1-S1

double basicDistance =distanceFromFirstVantagePointToQueryPoint- this.firstThreshold;for(int i = 0;i < 2;i ++){if (!collector.isFull() || basicDistance <=collector.getRadius()) {//O2-Q - O2-S2

double touchDistance =distanceFromSecondVantagePointToQueryPoint- this.secondThreshold[i];for(int j = 0;j < 2;j ++){if (index != this.children[i][j]&& (!collector.isFull() || touchDistance <=collector.getRadius())) {this.children[i][j].collectNearestNeighbors(collector, depth + 1);

}

touchDistance*= -1;

}

}

basicDistance*= -1;

}

collector.getQueryPoint().removeDistanceToSelf(depth+ depth + 1);

collector.getQueryPoint().removeDistanceToSelf(depth+depth);

}else{for (final MVPTreePoint pointNode : this.pointNodes) {if(!collector.isFull() || this.isAbleToInsert(collector.getRadius(),

collector.getQueryPoint(), pointNode)) {

collector.offerPoint(pointNode.getPoint());

}

}

}

}

collectNearestNeighbors

找离查询点不远于u点算法就是论文里讲述的算法,执行步骤与收集第k近有相同之处,不同在于限定距离是固定值,且任何时候都必须判定,点集没有数量限制。

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

public void collectAllWithinDistance(final MVPTreePoint

queryPoint,final double maxDistance, final Collection collection, intdepth) {if (this.pointNodes == null) {final double distanceFromFirstVantagePointToQueryPoint = this.distanceFunction

.getDistance(this.vantagePoint[0], queryPoint.getPoint());final double distanceFromSecondVantagePointToQueryPoint = this.distanceFunction

.getDistance(this.vantagePoint[1], queryPoint.getPoint());

queryPoint

.addDistanceToSelf(distanceFromFirstVantagePointToQueryPoint);

queryPoint

.addDistanceToSelf(distanceFromSecondVantagePointToQueryPoint);//We want to search any of this node's children that intersect with//the query region

if (distanceFromFirstVantagePointToQueryPoint <= this.firstThreshold+maxDistance) {if (distanceFromSecondVantagePointToQueryPoint <= this.secondThreshold[0]+maxDistance) {this.children[0][0].collectAllWithinDistance(queryPoint,

maxDistance, collection, depth+ 1);

}if (distanceFromSecondVantagePointToQueryPoint + maxDistance >= this.secondThreshold[0]) {this.children[0][1].collectAllWithinDistance(queryPoint,

maxDistance, collection, depth+ 1);

}

}if (distanceFromFirstVantagePointToQueryPoint + maxDistance >= this.firstThreshold) {if (distanceFromSecondVantagePointToQueryPoint <= this.secondThreshold[1]+maxDistance) {this.children[1][0].collectAllWithinDistance(queryPoint,

maxDistance, collection, depth+ 1);

}if (distanceFromSecondVantagePointToQueryPoint + maxDistance >= this.secondThreshold[1]) {this.children[1][1].collectAllWithinDistance(queryPoint,

maxDistance, collection, depth+ 1);

}

}

queryPoint.removeDistanceToSelf(depth+ depth + 1);

queryPoint.removeDistanceToSelf(depth+depth);

}else{for (MVPTreePointpointNode : pointNodes) {if (this.isAbleToInsert(maxDistance, queryPoint, pointNode))

collection.add(pointNode.getPoint());

}

}

}

collectAllWithinDistance

这两种查询方式都需要比较预先计算的距离,把这种计算合为一个函数:

8f900a89c6347c561fdf2122f13be562.png

961ddebeb323a10fe0623af514929fc1.png

public boolean isAbleToInsert(doublelimitDistance,

MVPTreePoint

queryPoint, MVPTreePointpointNode) {for (int i = 0; i < queryPoint.size(); i++) {double disOffset =queryPoint.getDistanceToSelf(i)-pointNode.getDistanceToSelf(i);if (Math.abs(disOffset) >limitDistance) {return false;

}

}return this.distanceFunction.getDistance(pointNode.getPoint(),

queryPoint.getPoint())<=limitDistance;

}

isAbleToInsert

其他函数也需要修改,但是没有像这3个函数一样大幅度的修改结构。

-------------------------------我是分割线------------------------------------

代码地址:https://coding.net/u/funcfans/p/MVPtree-for-Java/git

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值