GAUGE: Graph Attention with Universal and Global Extensions

Co-Supervised by: Merlin Streilein

If you are interested in this topic or have further questions, do not hesitate to contact merlin.streilein@unibe.ch.

Background / Context

Graph Attention Networks (GATs) have become a central architecture for learning on graph-structured data by enabling nodes to attend to their neighbors with learnable attention weights. However, standard GATs restrict attention to direct neighbors, which can limit the ability to capture long-range dependencies or incorporate global context. Moreover, attention weights are often uniform in their capacity to amplify or suppress neighbors, lacking a mechanism to “dilute” influence based on distance, edge type, or external criteria. Extending GATs with diluted attention and global tokens could improve representational power, particularly in applications where global structure or controlled attenuation of neighbors is crucial.

Research Question(s) / Goals

How can diluted attention be incorporated into GATs to regulate the influence of neighbors (e.g., distance-based decay or edge-type weighting)?
How can global attention mechanisms (e.g., global context tokens) be integrated into GATs to provide nodes with access to non-local information?
What are the effects of these extensions on model performance, stability, and interpretability across benchmark datasets?

Approach / Methods

Implement baseline GAT layers.
Design and implement diluted attention modules (e.g., distance decay, edge-conditioned weights, or learnable dilution factors).
Integrate a global attention mechanism (e.g., global token, pooled representation, or global-to-local gating).
Evaluate performance on benchmark datasets.
Analyze performance gains, computational costs, and qualitative behavior.

Expected Contributions / Outcomes

A prototype implementation of diluted and global attention extensions to GATs.
Experimental evaluation on multiple graph datasets with quantitative performance comparisons.
Insights into when and why diluted and global attention improve GNN performance.
A comparative study highlighting strengths and trade-offs relative to baseline GAT.

Required Skills / Prerequisites

Programming experience in Python.
Basic knowledge of experimental design and performance evaluation.
Familiarity with deep learning frameworks (PyTorch, TensorFlow, or ideally PyTorch Geometric) is advantageous but not necessary.
Understanding of graph neural networks, especially GATs is highly advantageous but not necessary.

Possible Extensions

Understanding of graph neural networks, especially GATs is highly advantageous but not necessary.
Explore hierarchical global attention (multiple global tokens for different substructures).
Apply the methods to domain-specific problems (e.g., drug discovery, recommendation systems, social network analysis).
Analyze interpretability: can diluted/global attention make GNN predictions more transparent?

Further Reading / Starting Literature

Veličković, P. et al. (2018). Graph Attention Networks. ICLR. https://arxiv.org/abs/1710.10903.
Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The Long-Document Transformer. https://arxiv.org/abs/2004.05150.
Puny, O., Ben-Hamu, H., & Lipman, Y. (2020). Global Attention Improves Graph Networks Generalization. https://arxiv.org/abs/2006.07846.