M-Tree for Similarity Search

My current area of research is similarity search. Just like the normal search process, we need several data structures to make the similarity search effectively and efficiently, which should support the range query and KNN at least. In this essay, I would like to sum up my recent research in M-Tree, which is a kind of metric tree (only considering relative distances between objects).

Firstly, let us see the example, which is copied from the book 《Similarity Search-The Metric Space Approach》, written by Pavel Zezula, Giuseppe Amato, Vlastislav Dohnal and Michal Batko.


Obviously, it is a two dimensional space and we can abstract these objects, ranging from O1 to O11 in the vector space. Additionally, to compute the relative distances, euclidean distance is often used. In this data structure, internal nodes including root node, in which each entry consists of radius representing its area and the distance between itself and its parent object (0 for root), while as for the leaf nodes, radii are always 0 instead.  

The features of M-Tree can be concluded:

1. Balanced.

2. All of its objects are listed in the leaf nodes.

3. Dynamic, meaning insertion is possible without reorganization the whole tree.

4. Most importantly, it bases on the secondary memory, able to process large data.

However, to further improve the performance of M-Tree, triangle inequality is also applied to diminish the computing as distance computing in high dimensional space is rather time-consuming. Fully employing the distances stored in the entries can contribute it totally.

Note that euclidean distance is not the only way to measure the distance. Only if the distance meets the requirement of non-negativity, symmetry as well as triangle inequality can we employ it as the distance in M-Tree.

Several useful materials are listed below:

1. http://www-db.deis.unibo.it/Mtree/ (below are most relative ones)

2. P. Ciaccia, M. Patella, F. Rabitti, and P. Zezula. Indexing metric spaces with M-tree. In Atti del Quinto Convegno Nazionale SEBD, Verona, Italy, June 1997.

3. P. Ciaccia, and M. Patella. Bulk loading the M-tree. In Proceedings of th 9th Australasian Database Conference (ADC'98), Perth, Australia, February 1998.

4. M. Patella. Similarity Search in Multimedia Databases. PhD thesis, Dipartimento di Elettronica Informatica e Sistemistica, Università degli Studi di Bologna, Bologna, Italy, February 1999.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值