Detection of False Information in Texts

Supervised by: Corina Masanti

If you are interested in this topic or have further questions, do not hesitate to contact corina.masanti@unibe.ch.

Context

With the advent of digitalization and the widespread use of social media, it has become easy to share information rapidly across the globe. While this enables knowledge and information to reach a wide audience instantly, it also means that false or misleading information can spread just as quickly, leading to confusion and distrust. False information can range from simple inaccuracies, such as incorrect dates, addresses, or names to more complex scenarios involving false statements, misleading information, or misquotations.

Goal(s)

This project aims to tackle the detection of incorrect information in text. The first goal is to identify inaccuracies in dates, addresses, and names of companies, products, and people. For this, a dataset needs to be created that contains these errors. This can be achieved by selecting a text corpus (e.g., Wikipedia) and injecting errors. Finally, the detection system can be expanded to identify more complex misinformation. For this purpose, publicly available datasets and language models can be used.

Approach

  • Prepare a dataset containing false information in dates, addresses, and names of companies, products, and people. This list of inaccuracies can be expanded as needed.
  • Develop a system to detect incorrect information in texts and test it on the prepared dataset.
  • Extend the system to identify more complex misinformation in texts, such as false statements, misleading information, or misquotations.

Required Skills

  • Good programming skills
  • Basic understanding of machine learning concepts or interest to learn them in the process

Further Reading(s)

Guo, Zhijiang, Michael Schlichtkrull, and Andreas Vlachos. “A survey on automated fact-checking.” Transactions of the Association for Computational Linguistics 10 (2022): 178-206. List of existing datasets for fact-checking: https://github.com/Cartus/Automated-Fact-Checking-Resources?tab=readme-ov-file#datasets