Robopsy @DIGRA

We are delighted that we are able to talk about our research project "Robopsy" and paper at this year's DIGRA conference in Ireland.

Monday, 15.06.26, 10:00

Collective Memory Lapse: LLMs Omissions in Roleplay. (Margarete Jahrmann, Thomas Brandstetter and Stefan Glasauer)

PRESENTER: Margarete Jahrmann

ABSTRACT. In this study we present a personal case for criticizing Large Language Models (LLMs) in a game setting and point to its systemic omissions and distortions of history. The larger context id an ongoing artistic/ Digital Humanities research project about potential distortions of collective memory through role-playing with Large Language Models (LLMs). We designed experimental games to collect data and investigate how LLMs represent historical events. As the first case study we created and evaluated a roleplaying game about the murder of the founder of the Vienna circle Moritz Schlick in 1936 on the stairs of the University of Vienna. In the game, players could switch between five different LLMs and then, with the LLM being the game master, try to investigate the circumstances of the murder in a time travel scenario. After public play sessions, which included very distorted versions of the historical facts, we made qualitative interviews and debriefings with the players, which led us to the understanding that more dramatic scenarios would need even more guided contextualization to avoid traumatic after role play bleeding. METHOD Our method was to exhibit a self-designed art game for a text-based historical role play scenario with a choice of different LLMs as game master in 2025 for three months as an interactive installation in a major public exhibition in Vienna. The collection of data and its qualitative analysis is continued in 2026. In a quantitative analysis 115 texts for role-playing generated by the LLMs were examined by different methods of natural language processing, including semantic similarity and sentiment analysis. While the qualitative player feedback allowed to distinguish three distinct types of users, the quantitative text analysis showed significant differences between how the different LLMs presented the historical content. Part of this analysis has been published as a preprint (Jahrmann et al. 2025).