A PROSE – Advanced PROofreading SErvices

Time 2022 — 2026
FundingInnosuisse
ResearchersCorina Masanti, Kaspar Riesen

Abstract: Many proof-reading services are currently focused on orthography, grammar, punctuation, and typography. In these areas, unresolved challenges for automated proofreading exist, comprising detection of so-called real-word errors, compliance with specific typesetting rules and company-specific spelling guidelines, and ensuring the use of gender-neutral language. Further tasks that currently require better solutions are the automatic detection of subjective language and incorrect information (e.g. dates, addresses) in customer documents. The major challenge results from linguistic problems (e.g. ambiguities) combined with a specific lack of training data, which have hindered scientific progress and maturity of software solutions.
Therefore, the aim of this project is to research and solve the identified scientific obstacles and build a reliable and advanced (semi-)automated proofreading and editing solution, which will be integrated into the proofreading process at Rotstift so that human effort is substantially reduced and Rotstift can scale up its services to a still growing demand, offering its services at a cheaper price to new customer segments.
Since 2019 Rotstift has been collecting manually corrected documents digitally. Hence, there is now a large and unique collection of around 80’000 documents available that can be used to train and evaluate machine learning algorithms. This will be accompanied by an active learning approach where human proofreaders are involved proactively when the solution lacks confidence regarding its error detection.