Automated Metadata Extraction and Reconstruction of Case Progression in Swiss Court Decisions

If you are interested in this topic or have further questions, do not hesitate to contact kaspar.riesen@unibe.ch.

Background / Context

Many Swiss court decisions are publicly available through various legal information platforms. However, these documents are typically published as unstructured text, and the relevant legal metadata must be identified manually. This makes systematic analysis, large-scale search, and downstream applications such as legal analytics or case comparison difficult.

Important metadata such as the legal domain, factual background, dispute object or charges, cited laws, and the outcome of the decision are often embedded in complex legal language and heterogeneous document structures. Extracting these elements automatically would significantly improve the usability and analytical value of the available datasets.

A second major limitation of existing public datasets is that related decisions from different judicial instances are not linked. In Switzerland, legal cases frequently proceed through multiple levels of jurisdiction (e.g., cantonal court -> appellate court -> Federal Supreme Court). The decisions belonging to the same case therefore form a procedural chain across several instances. Currently, these decisions are typically published independently, making it difficult to reconstruct the procedural history of a case or analyse how judgments evolve across instances.

Combining automated metadata extraction with methods for identifying and linking related decisions would enable the reconstruction of the full procedural trajectory of legal cases and support new forms of legal analysis.

Research Question(s) / Goals

The thesis investigates how natural language processing (NLP) methods can be used to structure and connect Swiss court decisions.

Possible research questions include:

Metadata Extraction: How accurately can NLP methods (e.g., Named Entity Recognition or LLM-based structured extraction) extract key legal metadata from heterogeneous Swiss court decisions?
How reliably can elements such as the factual background, dispute object or charges, and the decision outcome be identified and classified?
Which linguistic patterns or structural cues in the documents are most useful for identifying these elements?
How does extraction performance vary across different court types, judicial instances, or language regions?
Reconstruction of Case Progression: How can decisions that belong to the same case but originate from different judicial instances be automatically identified and linked?
How can the hierarchy of judicial instances be modelled in order to reconstruct the procedural path of a case?
How can it be determined automatically whether a higher court confirms, modifies, or overturns the decision of a lower court?
Optional Extension: Can the dispute object or charges be extracted for each decision and compared across instances to analyse how the focus of the case evolves during the appeals process?

Approach / Methods

The work will involve the design and evaluation of an automated pipeline for analysing court decisions.

Possible components include:

Data collection and preprocessing of Swiss court decisions from publicly available sources.
Metadata extraction using NLP techniques, such as:
- Named Entity Recognition (NER)
- rule-based pattern matching
- large language model (LLM)–based structured extraction
Document similarity and linkage methods to identify decisions belonging to the same case, for example using:
- semantic embeddings
- textual similarity measures
- metadata-based heuristics
Graph or hierarchical modelling of case relationships to represent the procedural chain across instances.

Expected Contributions / Outcomes

The thesis is expected to produce:

A method for extracting structured legal metadata from Swiss court decisions.
A prototype pipeline for linking related decisions and reconstructing the progression of legal cases across judicial instances.
An evaluation of the performance of different NLP approaches for legal document analysis.
Insights into the linguistic and structural characteristics of Swiss court decisions that enable automated analysis.
The resulting system could serve as a foundation for improved legal search systems, legal analytics tools, or research on judicial decision-making.

Required Skills / Prerequisites

Recommended background includes:

Programming experience in Python
Basic knowledge of Natural Language Processing (NLP) or machine learning
Interest in legal texts or legal informatics
Familiarity with libraries such as spaCy, Hugging Face, or similar NLP frameworks is beneficial but not required.
Experience with large language models, embeddings, or text similarity methods is a plus but can also be acquired during the thesis.