Reducing the Overestimation of Graph Edit Distance Approximation

Time 2014
FundingSwiss National Science Foundation
ResearchersKaspar Riesen

Abstract: Graph based representation has found widespread applications in science and engineering during the last decades. One of the most flexible methods for graph dissimilarity computation is the edit distance of graphs (GED). The applicant of the present proposal recently developed a two step procedure for fast approximations of GED . First, an optimal node mapping m is computed by means of a fast bipartite optimization procedure. Second, the GED corresponding to m is derived on the graphs. It has been empirically verified that this framework approximates small distances with a high precision while larger distances are often overestimated. This is due to the first step, which considers the nodes and their local structure only. Therefore, when the actual edit distance is derived from mapping m (2nd step), often additional edge edit operations have to be performed (which would not be necessary in an optimal graph matching). The major objective of the present project is to omit these additional edge operations and thus substantially reduce the overestimation in our framework.
In fact, almost all of the node mappings in m correspond to node mappings that are also carried out by exact GED computation algorithms. Conversely, only very few node mappings of the first step are incorrect and cause additional edge operations on the graphs in the second step. This observation motivates the definition of t variations of the initial mapping m from which t different approximations of the GED can be derived. Given that the t variations of m are small but consider crucial node mappings, it is likely to further improve the approximation quality. One of the most important open research questions in this project is how to define the constructive variations of the initial mapping m. Among other strategies we will adopt sophisticated node query strategies recently developed for this crucial task.
The overall question of the present project is whether this novel approximation scheme is beneficial in terms of accuracy and how the planned extension affects the time complexity of the complete framework. In order to answer these questions, exhaustive experimental evaluations on several graph data sets have to be carried out.