Supervised Learning of Configurations of an N-way Model Matching Algorithm

Co-Supervised by: Prof. Kehrer and PD. Kaspar Riesen

Software models such as UML-like diagrams are frequently used to describe a program’s structure and behavior in graph-like structure. Identifying the common and specific parts of a set of such model graphs is an essential prerequisite for many development tasks, notably for merging several variants of a model graph into a unified one. To that end, Schultheiss et al. proposed a generic algorithm for n-way model matching which scales for large model graphs. This is achieved by indexing the graph nodes in a multi-dimensional search tree which allows for efficient range queries to find the most suitable matching candidates.

A major drawback of the approach of Schultheiss et al. is that it relies on the manual definition of the vectorization of model graphs in order to build the multi-dimensional search trees, which has to be handcrafted for every type of model and requires sophisticated domain knowledge. The goal of this thesis is to learn such vectorizations automatically.

In recent years, graph neural networks (GNNs) emerged as a versatile method for node and or graph embeddings. In various practical applications, GNNs have shown superior performance when compared to other graph-based pattern recognition technologies. The very basic idea of GNNs is to train an ANN on the feature vectors stored as labels in individual nodes in the graph. Thereby, the aim is to learn a holistic vector representation of the complete graph. In the present project, we aim at training standard GNNs with different architectures in order to verify whether or not the manually defined vectorization can be automatically learned by means of neural networks operating on graphs.

Good programming skills, particularly working with different APIs
Basic statistics and a bit of graph theory and algorithms

If you are interested in this topic or have further questions, do not hesitate to contact kaspar.riesen@unibe.ch or timo.kehrer@unibe.ch.

Further Reading

Schultheiß, A., Bittner, P. M., Boll, A., Grunske, L., Thüm, T., & Kehrer, T. (2022). RaQuN: a generic and scalable n-way model matching algorithm. Software and Systems Modeling, 1-23.

A Gentle Introduction to Graph Neural Networks: https://distill.pub/2021/gnn-intro/

Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi: Learning to Represent Programs with Graphs. ICLR 2018

Kechi Zhang, Wenhan Wang, Huangzhao Zhang, Ge Li, Zhi Jin: Learning to represent programs with heterogeneous graphs. ICPC 2022: 378-389