Exploring Research Opportunities for Automatic Text Document Summarization

Time	2023 — 2024
Funding	Hasler Foundation
Researchers	Mathias Fuchs, Kaspar Riesen

Abstract: The proposed project serves as a preparatory project to lay the groundwork for a large-scale research project possibly funded by the Swiss National Science Foundation (SNSF). We use the following terminology in the present document: The terms ”present application” and ”preparatory project” refer to the requested ”Hasler Foundation Grant”, while ”principal application” and ”principal project” refer to the actual application and project that will potentially be financed via SNSF.
Roughly speaking, the principal project is focused on substantially advancing and thoroughly researching novel paradigms and algorithms for automatic document analysis based on machine learning and natural language processing. The motivation and rationale for this project stems from the study of historical textual documents (e.g., Federal Council Protocols or similar), which often involves time-consuming activities of human experts. The principal project aims at establishing an in-depth and lasting collaboration between the Pattern Recognition Group of the Institute of Computer Science at the University of Bern and the research center Diplomatic Documents of Switzerland (Dodis). Using large scale document sets provided by Dodis, we aim at thoroughly researching possible benefits and limitations of state of the art methods as well as novel methods to (partially) automate and/or support the study, understanding, selection, and editing of documents at Dodis.
The goals of the present preparatory project are threefold.

We aim at implementing a structural pilot with well-defined pipelines and procedures to access the large scale corpus of documents at Dodis by means of the specialized infrastructure of the University of Bern (in particular, the high performance computing cluster for scientific computing).
We aim at an initial exploration of existing machine learning and natural language methods in con- junction with Dodis’ documents. That is, we aim at building a general framework of tools such that they become directly applicable to the preliminary data set from Dodis (defined in Goal 1) and conduct preliminary empirical evaluations. In the preparatory project, we plan to focus on one specific task only, viz. text document summarization.
Last but not least, we strive for a principal application ready to be submitted to SNSF. To this end, we aim at condensing interesting, feasible and – above all – unresolved research questions. The ultimate question to be answered is, whether or not research of novel paradigms and algorithms for the task of text document summarization in the context of Dodis’ documents is actually necessary. Moreover, together with the researchers of Dodis, we aim at inferring other research questions that can be addressed with machine learning and natural language processing algorithms.

The relevance of both preparatory and principal project is high as it aims at researching state of the-art and also novel methods on a unique corpus of historical textual documents. Moreover, the leverage of the accessibility to large amounts of complex data is a key issue in the 21st century. The planned lines of research can be interpreted as first – yet significant – step towards a more natural human computer interaction in searching, browsing, interacting, and interpreting documents in large scale repositories.
In order to achieve the three goals of the preparatory project, a 50% position for an academic with doctoral degree will be created during one year (postdoctoral researcher). The postdoctoral researcher will address Goal 1 and Goal 2 of the present project. The applicant of the present application (who is not funded through the Hasler grant) will address Goal 3 as well as the supervision and mentoring of the postdoctoral researcher.