Pattern-based Characterization of Evolving Variable Software

Co-Supervised by: Prof. Kehrer and PD. Kaspar Riesen

Feature annotations are key to represent variability in annotative software product lines. This variability information, like the source code itself, is subject to continuous evolution. To date, there is a substantial lack of studies that characterize and help us to better understand this kind of evolution. Bittner et al. proposed so-called variation diffs, a graph structure that represents the difference between two versions of a code base based on the nesting hierarchy of source code blocks and their surrounding feature annotations as well as their logical interrelations.

While variation diffs are well-suited to describe the difference between two versions, the goal of this thesis is to detect frequently occurring patterns (i.e., variation diff sub-graphs) within a huge set of variation diffs, which has been already extracted from various open-source projects hosted on GitHub. Such patterns represent typical editing operations on variable software such as annotative software product lines. The patterns found shall be compiled into a catalog of editing operations which contributes to the general body of knowledge in the field and can be exploited in other use cases such as product-line analysis and testing in the future.

The idea of mining frequent variation diffs (i.e., patterns) is in this particular application an instance of a subgraph mining problem. Finding frequent subgraphs in large sets of graphs is a common and well-studied problem in the field of structural pattern recognition. Recently, Fuchs and Riesen proposed a novel and quite efficient algorithm for extracting stable and frequent subgraphs out of sets of graphs. In the present project, the developed framework of Fuchs and Riesen needs to be adapted to the problem of mining variation graph structures and evaluated on a real-world set of variation diffs.

Good programming skills, particularly working with different APIs
Basic statistics and a bit of graph theory and algorithms

If you are interested in this topic or have further questions, do not hesitate to contact or

Further Reading

Bittner, P. M., Tinnes, C., Schultheiß, A., Viegener, S., Kehrer, T., & Thüm, T.: Classifying edits to variability in source code. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 196-208).

Mathias Fuchs, Kaspar Riesen: A novel way to formalize stable graph cores by using matching-graphs. Pattern Recognit. 131: 108846 (2022)