Graph edit distance
In mathematics and computer science, graph edit distance (GED) is a measure of similarity (or dissimilarity) between two graphs. The concept of graph edit distance was first formalized mathematically by Alberto Sanfliu and King-Sun Fu in 1983.[1] A major application of graph edit distance is in inexact graph matching, such as error-tolerant pattern recognition in machine learning.[2]
The graph edit distance between two graphs is related to the string edit distance between strings. With the interpretation of strings as connected Directed acyclic graphs of maximum degree one, classical definitions of edit distance such as Levenshtein distance, [3] [4] Hamming distance[5] and Jaro–Winkler distance may be interepeted as graph edit distances between suitably constrained graphs. Likewise, graph edit distance is also a generalization of tree edit distance between rooted trees.[6][7][8][9]
Formal definitions and properties
The mathematical definition of graph edit distance is dependent upon the definitions of the graphs over which it is defined, i.e. whether and how the vertices and edges of the graph are labeled and whether the edges are directed. Generally, given a set of graph edit operations (also known as elementary graph operations), the graph edit distance between two graphs and , written as can defined as
where denotes the set of edit paths transforming into (a graph isomorphic to) and is the cost of each graph edit operation .
The set of elementary graph edit operators typically includes:
- vertex insertion to introduce a single new labeled vertex to a graph.
- vertex deletion to remove a single (often disconnected) vertex from a graph.
- vertex substitution to change the label (or color) of a given vertex.
- edge insertion to introduce a new colored edge between a pair of vertices.
- edge deletion to remove a single edge between a pair of vertices.
- edge substitution to change the label (or color) of a given edge.
Additional, but less common operators, include operations such as edge splitting that introduces a new vertex into an edge (also creating a new edge), and edge contraction that eliminates vertices of degree two between edges (of the same color). Although such complex edit operators can be defined in terms of more elementary transformations, their use allows finer parameterization of the cost function when the operator is cheaper than the sum of its constituents.
Applications
Graph edit distance finds applications in handwriting recognition,[10] fingerprint recognition[11] and cheminformatics.[12]
Algorithms
Exact algorithms for computing the graph edit distance between a pair of graphs typically transform the problem into one of finding the minimum cost edit path between the two graphs. The computation of the optimal edit path is cast as a pathfinding search or shortest path problem, often implemented as an A* search algorithm.
In addition to exact algorithms, a number of efficient approximation algorithms are also known.[13][14]
References
- ↑ Sanfeliu, Alberto; Fu, King-Sun (1983). "A distance measure between attributed relational graphs for pattern recognition". IEEE Transactions on Systems, Man and Cybernetics 13 (3): 353–363. doi:10.1109/TSMC.1983.6313167.
- ↑ Gao, Xinbo; Xiao, Bing; Tao, Dacheng; Li, Xuelong (2010). "A survey of graph edit distance". Pattern Analysis and Applications 13: 113–129. doi:10.1007/s10044-008-0141-y.
- ↑ Влади́мир И. Левенштейн (1965). Двоичные коды с исправлением выпадений, вставок и замещений символов [Binary codes capable of correcting deletions, insertions, and reversals]. Доклады Академий Наук СCCP (in Russian) 163 (4): 845–848.
- ↑ Levenshtein, Vladimir I. (February 1966). "Binary codes capable of correcting deletions, insertions, and reversals". Soviet Physics Doklady 10 (8): 707–710.
- ↑ Hamming, Richard W. (1950). "Error detecting and error correcting codes" (PDF). Bell System Technical Journal 29 (2): 147–160. doi:10.1002/j.1538-7305.1950.tb00463.x. MR 0035935.
- ↑ Shasha, D; Zhang, K (1989). "Simple fast algorithms for the editing distance between trees and related problems". SIAM J. Comput. 18 (6): 1245–1262. doi:10.1137/0218082.
- ↑ Zhang, K (1996). "A constrained edit distance between unordered labeled trees". Algorithmica 15 (3): 205–222. doi:10.1007/BF01975866.
- ↑ Bille, P (2005). "A survey on tree edit distance and related problems". Theor. Comput. Sci. 337 (1-3): 22–34. doi:10.1016/j.tcs.2004.12.030.
- ↑ Demaine, Erik D.; Mozes, Shay; Rossman, Benjamin; Weimann, Oren (2010). "An optimal decomposition algorithm for tree edit distance". ACM Transactions on Algorithms 6 (1): A2. doi:10.1145/1644015.1644017. MR 2654906.
- ↑ Fischer, Andreas; Suen, Ching Y.; Frinken, Volkmar; Riesen, Kaspar; Bunke, Horst (2013), "A Fast Matching Algorithm for Graph-Based Handwriting Recognition", Graph-Based Representations in Pattern Recognition, Lecture Notes in Computer Science 7877, pp. 194–203, doi:10.1007/978-3-642-38221-5_21, ISBN 978-3-642-38220-8
- ↑ Neuhaus, Michel; Bunke, Horst (2005), "A Graph Matching Based Approach to Fingerprint Classification using Directional Variance", Audio- and Video-Based Biometric Person Authentication, Lecture Notes in Computer Science 3546, pp. 191–200, doi:10.1007/11527923_20, ISBN 978-3-540-27887-0
- ↑ Birchall, Kristian; Gillet, Valerie J.; Harper, Gavin; Pickett, Stephen D. (Jan 2006). "Training Similarity Measures for Specific Activities: Application to Reduced Graphs". Journal of Chemical Information and Modeling 46 (2): 557–586. doi:10.1021/ci050465e.
- ↑ Neuhaus, Michel; Bunke, Horst (Nov 2007). Bridging the Gap between Graph Edit Distance and Kernel Machines. Machine Perception and Artificial Intelligence 68. World Scientific. ISBN 978-9812708175.
- ↑ Riesen, Kaspar (Feb 2016). Structural Pattern Recognition with Graph Edit Distance: Approximation Algorithms and Applications. Advances in Computer Vision and Pattern Recognition. Springer. ISBN 978-3319272511.