DeepSchemata: Towards Unified Graph Extraction for Technical Drawings

Co-Supervised by: Dr. Linlin Jia

If you are interested in this topic or have further questions, do not hesitate to contact linlin.jia@unibe.ch.

Background / Context

Extracting structural graph representations from technical drawings is a challenging yet essential problem for visual understanding, pattern analysis, and downstream reasoning tasks. Recent progress in deep learning introduces new possibilities by bridging visual grounding with structured reasoning, enabling the generation of explicit node–edge representations and interpretable graph descriptions. However, most state-of-the-art methods still adopt a modular, step-by-step pipeline for the extraction process, which often accumulates errors and lacks sub-task alignment.

To address these limitations, this project aims to ameliorate the graph extraction by building a unified framework that integrates symbol detection, line detection, and symbol-level graph extraction modules. In particular, we propose to apply a topology-aware network to mitigate potential edge prediction errors, especially gaps between predicted edges and endpoint symbols, which may otherwise lead to cascading failures in the logical structure of the extracted graphs. In addition, the project will investigate multi-aspect evaluation metrics and interpretability analysis for the extraction procedure, laying a foundation for robust and explainable visual-structural understanding.

Research Question(s) / Goals

In this project, we aim to answer the following research questions:

Q1: Can the topology network improve edge connectivity accuracy for technical drawing graph extraction?
Q2: Can integrated end-to-end learning framework ameliorate the graph extraction procedure for technical drawings
Q3: How to properly evaluate the extraction procedure?

To answer these questions, we define the following goals:

G1: Construct or refine datasets of technical drawings with required annotations.
G2: Design multi-level and multi-aspect metrics to evaluate the quality of the extracted graphs.
G3: Design and train integrated framework to extract graphs along with multi-task losses.
G4: Evaluate and compare the performance of the proposed methods against the baselines.

Approach / Methods

Prepare and preprocess technical drawings datasets, extracting annotations such as edge masks, keypoint masks, symbol types and bounding boxes, and graph ground truths.
Survey the state-of-the-art research on deep learning methods for technical drawing graph extraction.
Design and implement evaluation metrics from the perspective of symbol detection, edge detection, graph construction, and downstream tasks.
Develop novel end-to-end graph extraction approaches by integrating representations constructed from symbol detection, edge detection, and topology network modules.
Benchmark the designed approach against baselines.

Expected Contributions / Outcomes

A novel integrated end-to-end framework for extracting graph-structured representations of technical drawings
A quality evaluation metric matrix.
A comprehensive benchmark of the proposed methods against state-of-the-art approaches.
Reproducible code, model checkpoints, experimental settings, and results.
A thesis and potentially a research paper submission.

Required Skills / Prerequisites

Solid knowledge of statistics and machine learning, with an understanding of deep learning, learning on graphs, and computer vision being a plus
Strong programming skills, preferably in Python and PyTorch.
Proficiency in English for effective communication and presentations at the research level.

Possible Extensions

Refine symbol detection and edge detection modules, potentially training a single unified backbone with a specific head for each sub-module
Generalize the framework to more challenging cases, such as super-thin lines, dot lines, etc.

Further Reading / Starting Literature

Jamieson, L., Francisco Moreno-García, C., & Elyan, E. (2024). A review of deep learning methods for digitisation of complex documents and engineering diagrams. Artificial Intelligence Review, 57(6), 136
Thoma, F., Bayer, J., Li, Y., & Dengel, A. (2021, September). A public ground-truth dataset for handwritten circuit diagram images. In International Conference on Document Analysis and Recognition (pp. 20-27). Cham: Springer International Publishing.
Hetang, C., Xue, H., Le, C., Yue, T., Wang, W., & He, Y. (2024). Segment anything model for road network graph extraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2556-2566).