Vantage-point tree

A vantage-point tree, or VP tree is a BSP tree which segregates data in a metric space by choosing a position in the space (the "vantage point") and dividing the data points into two partitions: those that are nearer to the vantage point than a threshold, and those that are not. By repeatedly applying this procedure to partition the data into smaller and smaller sets, a tree data structure is created where neighbors in the tree are likely to be neighbors in the space.[1]

One of its declination is called the multi-vantage point tree, or MVP tree: an abstract data structure for indexing objects from large metric spaces for similarity search queries. It uses more than one point to partition each level.[2]

History

Peter Yianilos claimed that the VP-tree was discovered independently by him (Peter Yianilos) and by Jeffrey Uhlmann.[1] Yet, Uhlmann published this method before Yianilos in 1991.[3] Uhlmann called the data structure a metric tree, the name VP-tree was proposed by Yianilos. Vantage point trees have been generalized to non-metric spaces using Bregman divergences by Nielsen et al.[4]


This iterative partitioning process is similar to that of a k-d tree, but uses circular (or spherical, hyperspherical, etc.) rather than rectilinear partitions. In 2D Euclidean space, this can be visualized as a series of circles segregating the data.

The VP tree is particularly useful in dividing data in a non-standard metric space into a BSP tree.

Understanding a VP tree

The way a VP tree stores data can be represented by a circle.[5] First, understand that each node of this tree contains an input point and a radius. All the left children of a given node are the points inside the circle and all the right children of a given node are outside of the circle. The tree itself does not need to know any other information about what is being stored. All it needs is the distance function that satisfies the properties of the metric space.[5] Just imagine a circle with a radius. The left children are all located inside the circle and the right children are located outside the circle.

Searching through a VP tree

Suppose there is a need to find the two nearest targets from a given point (The point will be placed relatively close to distance). Since there are no points yet, it is assumed that the middle point (center) is the closest target. Now a variable is needed to keep track of the distance X (This will change if another distance is greater). To determine whether we go to left or right child will depend on the given point.[5] If the point is closer to the radius than the outer shell, search the left child. Otherwise, search the right child. Once the point (the neighbor) is found, the variable will be updated because the distance has increased.

From here, all the points within the radius have been considered. To complete the search, we will now find the closest point outside the radius (right child) and determine the second neighbor. The search method will be the same, but it will be for the right child.[5]

Advantages of a VP tree

  1. Instead of inferring multidimensional points for domain before the index being built, we build the index directly based on the distance.[5] Doing this, avoids pre-processing steps.
  2. Updating a VP tree is relatively easy compared to the fast-map approach. For fast maps, after inserting or deleting data, there will come a time when fast-map will have to rescan itself. That takes up too much time and it is unclear to know when the rescanning will start.
  3. Distance based methods are flexible. It is “able to index objects that are represented as feature vectors of a fixed number of dimensions."[5]

Implementation examples

  1. In Python
  2. In C
  3. In Java
  4. In Java (alternative implementation)
  5. In JavaScript
  6. In Ruby

References

  1. 1 2 Yianilos (1993). "Data structures and algorithms for nearest neighbor search in general metric spaces". Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 311–321. pny93. Retrieved 2008-08-22.
  2. CiteSeerX — Indexing Large Metric Spaces For Similarity Search Queries
  3. Uhlmann, Jeffrey (1991). "Satisfying General Proximity/Similarity Queries with Metric Trees". Information Processing Letters 40 (4). doi:10.1016/0020-0190(91)90074-r.
  4. Nielsen, Frank (2009). "Bregman vantage point trees for efficient nearest Neighbor Queries". Proceedings of Multimedia and Exp (ICME). IEEE. pp. 878–881.
  5. 1 2 3 4 5 6 Fu, Ada Wai-chee; Polly Mei-shuen Chan; Yin-Ling Cheung; Yiu Sang Moon (2000). "Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances". The VLDB Journal — The International Journal on Very Large Data Bases. Springer-Verlag New York, Inc. Secaucus, NJ, USA. pp. 154–173. vp. Retrieved 2012-10-02.

External links

This article is issued from Wikipedia - version of the Wednesday, March 30, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.