# Current projects

**Novel State-of-the-Art Graph Matching Algorithms (2020 – 2024)**

Swiss National Science Foundation*Abstract:* *Due to fast developments in both storage media and data acquisition, we observe rapidly increasing amounts of data available in diverse areas in both science and industry. Simultaneously, we observe that in many applications the underlying data is inherently complex, making graphs the most useful and adequate data structure available to date. These two developments evoke the need for ongoing research of robust and efficient methods that assist humans in understanding and handling their pools of big sets of graph data.A large amount of graph based methods for pattern recognition and related fields have been proposed. One of these methods is graph edit distance – a powerful and flexible graph dissimilarity measure and actually one of the main subjects of this proposal. Regarding graph edit distance (or more generally graph matching) we observe two substantial gaps in research that we aim to research and bridge. Formally, within the present project we will research (a) encodings of matching information in a novel data structure to formalize the stable cores of specific classes by means of graphs. The rationale of this matching-graph representation is that it can be beneficial to focus on stable/important parts of graphs during algorithmic comparisons (rather than on complete graphs). (b) hierarchical graph representations in conjunction with linear time graph embedding. This procedure is motivated by the fact that hierarchical representations (including fast and expressive graph embeddings) can be exploited in filter-and-verify strategies in order to substantially speed up and improve the matching processes.By verifying both hypotheses we plan to make significant advances in the field of structural pattern recognition and establishing novel paradigms that go beyond the current understanding. In particular, the overall objective is the development and research of novel, robust graph edit distance methods that outperform the current state-of-the-art in graph matching on existing and novel data sets stemming from different real world scenarios. Hence, the proposed project involves both research on fundamental algorithms and solving concrete problems in applications. Last but not least, we aim at thorough comparisons of structural methods with recent deep learning methods in order to gain insights on where graph based methods actually rank to date.*

# Past Projects

**Knowledge via Graph Reasoning (2018 – 2020)**

Commission for Technology and Innovation*Abstract: The idea of the present project is to research novel algorithms for interacting with graph databases. In particular, we aim at implementing a next generation search and recommendation engine on top of a graph representation. For decades, researchers and companies have tried to accommodate connected, semi structured data into relational databases. But whereas relational databases are perfectly suited to model rigid and tabular structures, they fail when attempting to model the ad hoc, exceptional relationships that dynamically occur in the real world. Actually, graph databases can overcome this and other major limitations. The first focus is the development of algorithmic procedures that build an interface from a graph database to diverse data repositories covering disjoint and overlapping domains (big data with respect to variety/volume). This interface will automatically carry out both extractions of information and subsequent translations into nodes and edges of the graph. The second building block is the development of a novel paradigm for searching and browsing knowledge. As our data representation is based on a native graph, we are able to implement several innovative features. First, by means of explicit/implicit relationships in the graph model, we can answer non-trivial queries and find relevant connections in the first place. Second, graphs offer a natural way of representing rich content, which in turn increases the possibility for exploratory interactions. Third, topological descriptors allow dynamic rankings of nodes and empower us to find relevant data with high precision. Finally, the schema free nature of graph databases allows for dynamic evolvements of the data basis.*

**14 Weeks of Java – A Novel Approach for Teaching Java (2018)**Hasler Foundation

*Abstract: Understanding the basic concepts of programming is hardly possible without actually programming. Likewise, producing source code is difficult or actually impossible without a fundamental understanding of programming. This vicious circle makes learning to program to a complex and often frustrating task. The aim of the present project is to develop and implement a novel approach for learning an object oriented programming language, viz. Java, during a one semester course with a special focus for non-technical oriented students. The major deliverable of the present project is a new textbook ready to be submitted to an international publisher including widespread accompanying material (in particular, video tutorials to enable students to imitate software development tasks and potentially help to break the vicious circle defined above).*

**Graph Based Signature Verification (2016 – 2019)**Swiss National Science Foundation

*Abstract: Handwritten signatures are widely used and well-accepted biometrics for personal authentication. Based on only a few genuine specimens, signature verification is a challenging task even for humans. Automatic verification systems are still far from human performance but their accuracy has improved significantly in the past decade, making it possible to rely on machines in some restricted cases or to support human experts.*

Most of the current methods for automatic signature verification are based on feature vector representation and statistical classification, either taking into account local or global features. Known limitations of this approach include the inability to capture the global structure of the signatures and the relations between their subparts in a standardized way. Graph based representation and structural matching can overcome this limitation by providing a more intuitive, powerful, and flexible mechanism for formally representing and comparing handwritten signatures.

However, one of the main challenges in graph based pattern recognition is the high computational complexity involved in structural matching. Recently, we have introduced a promising approximation framework for graph edit distance based on substructure assignment. This framework is able to compare arbitrary graphs efficiently with cubic or, more recently, even quadratic time complexity, which makes graph matching applicable to a wide range of real-world pattern recognition problems.

Based on this efficient graph matching framework, we propose in this project to explore the use of graphs and the potential benefits of structural pattern recognition methods for signature verification. First, we will derive graph models for signatures and adapt the matching framework to signature verification. In order to further improve the verification accuracy, we will also attempt to embed signature graphs in vector spaces, which allows statistical classification, and the combination of multiple classifiers profiting from the complementary properties of the statistical and structural approach.

A central aspect of the envisaged structural verification system will be to include a signature stability model that distinguishes stable from variable subparts based on the analysis of genuine reference samples. Besides potential improvements in the verification accuracy, such a stability model provides a novel graph based interpretation of signatures, which can be valuable for supporting human experts.

Most of the current methods for automatic signature verification are based on feature vector representation and statistical classification, either taking into account local or global features. Known limitations of this approach include the inability to capture the global structure of the signatures and the relations between their subparts in a standardized way. Graph based representation and structural matching can overcome this limitation by providing a more intuitive, powerful, and flexible mechanism for formally representing and comparing handwritten signatures.

However, one of the main challenges in graph based pattern recognition is the high computational complexity involved in structural matching. Recently, we have introduced a promising approximation framework for graph edit distance based on substructure assignment. This framework is able to compare arbitrary graphs efficiently with cubic or, more recently, even quadratic time complexity, which makes graph matching applicable to a wide range of real-world pattern recognition problems.

Based on this efficient graph matching framework, we propose in this project to explore the use of graphs and the potential benefits of structural pattern recognition methods for signature verification. First, we will derive graph models for signatures and adapt the matching framework to signature verification. In order to further improve the verification accuracy, we will also attempt to embed signature graphs in vector spaces, which allows statistical classification, and the combination of multiple classifiers profiting from the complementary properties of the statistical and structural approach.

A central aspect of the envisaged structural verification system will be to include a signature stability model that distinguishes stable from variable subparts based on the analysis of genuine reference samples. Besides potential improvements in the verification accuracy, such a stability model provides a novel graph based interpretation of signatures, which can be valuable for supporting human experts.

**User Authentication based on Freehand Sketches (2015 – 2017)**Commission for Technology and Innovation

*Abstract: The vast majority of user authentication in digital applications is based on alphanumeric passwords. Yet, due to severe problems that might arise with this approach, various efforts have been made in the last decade to replace this authentication paradigm. One candidate for the prospective paradigm shift might be found in the field of graphical passwords. The present project introduces and researches a novel framework for user authentication based on freehand sketches. The basic idea is that during the registration phase a user draws an arbitrary sketch in a specific drawing canvas (rather than typing a password). Registered users can then be authenticated whenever they are able to reproduce their personal sketch with sufficient precision. The major challenge of such a system is twofold. First, it has to provide a certain degree of error-tolerance such that the authentication of genuine users can be smoothly accomplished. Secondly, the system should detect even subtle forgeries and reject possible intruders. The main contributions of the present projects are as follows. First, we formally represent the underlying sketches by means of strings and present a general authentication algorithm that is based on structural pattern recognition. Second, we present a novel cost model that is particularly useful in conjunction with string matching. Third, by means of an exhaustive empirical investigation using both random and skilled forgeries (stemming from several hundreds of users) we empirically confirm the feasibility of this particular authentication framework in a real world scenario.*

**Graph Based Keyword Spotting in Handwritten Historical Documents (2015 – 2018)**Hasler Foundation

*Abstract: Many libraries all around the world have started digitizing their most valuable old handwritings in order to preserve world’s cultural heritage. The large number of available handwritten document images evoked the need to make them amenable to searching and browsing. Yet, the automatic transcription of text images is still a widely unsolved problem (especially for degraded historical manuscripts). Therefore, keyword spotting (KWS) has been proposed instead of a complete transcriptions. KWS refers to the process of retrieving all instances of a given keyword or a key phrase from a document. A large variety of algorithms have been developed for KWS during the last two decades. Yet, graph based representations and graph matching have been very rarely used for this specific task. Most probably this is due to well known problems arising with graphs in the field of unconstrained handwriting recognition. Yet, KWS is not necessarily based on handwriting recognition. In fact, it turns out that the paradigm of graph matching is able to meet the requirements of KWS. That is, through the representation of isolated words with graphs, the search and retrieval in documents can be regarded as matching of an input graph (keyword) with a large set of various graphs or with one large graph (document). The major objective of the present project is to develop specific graph representations, graph matching technologies as well as novel graph embedding and kernel techniques within the field of graph based KWS. The main focus of the project is to get a deep understanding of the advantages and limitations of graph based methods in the field of KWS. For testing our novel algorithms, two historical documents will be used, viz. the George Washington and the Parcival Manuscript. These historical documents are well known in the community of document analysis, and moreover, benchmark tests for KWS are available on them.*

**Reducing the Overestimation of Graph Edit Distance Approximation (2014)**Swiss National Science Foundation (SNSF)

*Abstract: Graph based representation has found widespread applications in science and engineering during the last decades. One of the most flexible methods for graph dissimilarity computation is the edit distance of graphs (GED). The applicant of the present proposal recently developed a two step procedure for fast approximations of GED . First, an optimal node mapping m is computed by means of a fast bipartite optimization procedure. Second, the GED corresponding to m is derived on the graphs. It has been empirically verified that this framework approximates small distances with a high precision while larger distances are often overestimated. This is due to the first step, which considers the nodes and their local structure only. Therefore, when the actual edit distance is derived from mapping m (2nd step), often additional edge edit operations have to be performed (which would not be necessary in an optimal graph matching). The major objective of the present project is to omit these additional edge operations and thus substantially reduce the overestimation in our framework.*

In fact, almost all of the node mappings in m correspond to node mappings that are also carried out by exact GED computation algorithms. Conversely, only very few node mappings of the first step are incorrect and cause additional edge operations on the graphs in the second step. This observation motivates the definition of t variations of the initial mapping m from which t different approximations of the GED can be derived. Given that the t variations of m are small but consider crucial node mappings, it is likely to further improve the approximation quality. One of the most important open research questions in this project is how to define the constructive variations of the initial mapping m. Among other strategies we will adopt sophisticated node query strategies recently developed for this crucial task.

The overall question of the present project is whether this novel approximation scheme is beneficial in terms of accuracy and how the planned extension affects the time complexity of the complete framework. In order to answer these questions, exhaustive experimental evaluations on several graph data sets have to be carried out.

In fact, almost all of the node mappings in m correspond to node mappings that are also carried out by exact GED computation algorithms. Conversely, only very few node mappings of the first step are incorrect and cause additional edge operations on the graphs in the second step. This observation motivates the definition of t variations of the initial mapping m from which t different approximations of the GED can be derived. Given that the t variations of m are small but consider crucial node mappings, it is likely to further improve the approximation quality. One of the most important open research questions in this project is how to define the constructive variations of the initial mapping m. Among other strategies we will adopt sophisticated node query strategies recently developed for this crucial task.

The overall question of the present project is whether this novel approximation scheme is beneficial in terms of accuracy and how the planned extension affects the time complexity of the complete framework. In order to answer these questions, exhaustive experimental evaluations on several graph data sets have to be carried out.

**Improving Graph Edit Distance (2013 – 2016)**Hasler Foundation

*Abstract: One of the most flexible methods for graph dissimilarity computation is based on the edit distance of graphs. Yet, the computation of graph edit distance is known to be exponential in the number of nodes. The applicant of the present project developed a novel approximate graph edit distance framework recently. The basic idea is that on optimal assignments of the nodes of both graphs, the edit distance is eventually derived. It has been empirically verified that our framework approximates small distances with a high precision while relatively large distances are often overestimated. The reason for this overestimation is that our procedure considers the nodes and their local structure only in the initial node assignment. The major objective of the present project is to substantially reduce this overestimation. Three lines of research are planned to meet this goal The first line of research considers the integration of topological descriptors in the node labels in order to improve the initial matching process. There is eligible confidence that if this information is considered in the initial node matching a reduction of overestimations can be expected. The second line of research pursues the idea of matching subgraphs rather than single nodes in order to prevent unnecessary edge operations in the derived edit distance computation. To this end, the graphs under consideration are partitioned with standard graph clustering algorithms. Given that the subgraphs are densely structured by means of edges, this procedure substantially increases the influence of edge structure in the initial node mapping. Thus, with this extension it is likely to improve our matching precision and reduce the overestimation. The third line of research consists of answering several theoretical questions about our framework which are still open. Answers to these question potentially lead to heuristics for further reductions of the overestimations.*

**Development of a Graph Matching Toolkit (2012)**Hasler Foundation

*Abstract: Graph based data structures have found widespread applications in science and engineering during the past years. The applicant of the present proposal developed a novel approximate graph distance framework during his PhD, allowing the computation of graph distances on large graphs in a substantial faster way than traditional methods. Though the algorithms for graph distance computation are available in principle, they are not ready to use for other researchers. All adoptations of the algorithm to a given problem domain as well as the specification of the available options have to be made directly in the source code. This is cumbersome and a severe restriction of potential users. The main objective of this project is to build a software tool based on the established algorithms. This requires an exhaustive refactoring and documentation of the existing*