Benchmarking and Analyzing Graph Machine Learning

Supervised by: Linlin Jia

If you are interested in this topic or have further questions, do not hesitate to contact me.

Context/Background/Current State

Graph structures have emerged as a potent tool in the realm of machine learning, capable of representing the intricacies of real-world data and complex relational dynamics. Combining graph structures with machine learning algorithms opens a new dimension of possibilities. Keys among graph machine learning methodologies include Graph Neural Networks (GNNs), graph kernels, and graph edit distances. GNNs, as deep learning models, excel in handling non-Euclidean data, while graph kernels and graph edit distances provide robust measures of (dis-)similarity between graphs.

However, despite these advancements, there are significant challenges to overcome. The current landscape lacks a comprehensive comparison and critical analysis of these methodologies, leaving a gap in understanding their relative strengths and weaknesses. Furthermore, although libraries such as graphkit-learn offer a repository of methods, they fall short of fully representing the breadth and diversity of graph machine learning techniques. There’s also a scarcity of small graph datasets, especially in regression problems, which play a crucial role in validating and comparing these techniques in real-world scenarios.

To address these issues, our aim is to implement state-of-the-art machine learning methods on graph structures, including but not limited to graph kernels, graph edit distances, and GNNs. In conjunction with this, we plan to collect and establish benchmark small graph datasets to stimulate further research in the field. We intend to provide a comprehensive comparison and thorough analyses of these methods to fill the existing knowledge gap. Through these concerted efforts, we aspire to advance the field of machine learning on graph structures, offering more structured, inclusive, and effective solutions.

Goal(s)

  • Implement SOTA machine learning methods on graphs, e.g., graph kernels, graph edit distances, and GNN.
  • Collect benchmark graph datasets to support various analytical tasks.
  • Design and develop data loaders and transformers for efficient data processing.
  • Conduct comprehensive comparisons and analyses of these methods.
  • (optional) Optimize graph methods using C++ and Cython to enhance processing speed.
  • (optional) Draft and submit a conference paper detailing the findings of this research.

Approach

This project is primarily focused on practical applications and implementations. We already implemented a range of graph machine learning methods and assembled several benchmark databases in the graphkit-learn library [1]. Building upon this foundation, we will sequentially implement the planned methods, perform experimental evaluations, and conduct detailed analyses of the results.

Required Skills

  • Good programming skills, with a preference for Python and potentially C++.
  • Basic statistics and a bit of graph algorithms.
  • Communication in English (or Chinese 😉).

    Remarks

    Further Reading

    1. Jia, L., Gaüzère, B., & Honeine, P. (2021). graphkit-learn: A Python library for graph kernels based on linear patterns. Pattern Recognition Letters, 143, 113-121.
    2. Jia, L. (2021). Bridging graph and kernel spaces: a pre-image perspective (Doctoral dissertation, Normandie).