by CRC 1646 project C02 Bielefeld: 1-2 Dec 2025

The workshop is organized by CRC 1646 Project C02. Its broad topic is how humans employ speech and gesture creatively in communicative challenging situations, e.g., when conventionalized or commonly used communicative resources are insufficient. The workshop will address empirical, conceptual as well as computational modelling challenges surrounding this question.
| Time | Speaker | Title |
| 09:00-09:30 | Registration | |
| 09:30-09:45 | Opening remarks | |
| 09:45-10:30 | Lisa Gottschalk, Alon Fishman, Joana Cholin & Stefan Kopp | Creativity in speech and gesture and its perception in humans and AI |
| 10:30-10:45 | Break | |
| 10:45-11:30 | Angela Grimminger | Multimodal (feedback) behavior and understanding in explanatory interactions |
| 11:30-11:45 | Break | |
| 11:45-12:30 | Luyao Wang | Multimodal strategies for resolving lexical challenges: The role of creative gestures in four- and five-year-old children |
| 12:30-13:30 | Lunch | |
| 13:30-14:15 | James Trujillo | Adaptation of multimodal communication strategies to noise and failure |
| 14:15-15:00 | Anna Palmann | Multimodal communication in free and task-based dialogue between hard-of-hearing and hearing dyads |
| 15:00-15:30 | Break | |
| 15:30-16:15 | Lotta Heidemann | Coding creative gesture use by people with neurogenic language and communication disorders |
| 16:15-16:30 | Break | |
| 16:30-17:30 | All | Discussion: Operationalizing and measuring multimodal creativity |
| 17:30-17:45 | Wrap Up | |
| 19:00 | Workshop Dinner at La Cucina (Rathausstraße 1, 33602 Bielefeld) |
| Time | Speaker | Title |
| 09:30-10:15 |
Andy Lücking | Formal and Computational Semantics of Iconic Gestures |
| 10:15-10:30 | Break | |
| 10:30-11:15 | Jiahao Yang | Individuals Rapidly Create Communicatively Efficient Gestural Symbols |
| 11:15-11:30 | Break | |
| 11:30-12:30 | All | Discussion: Cognitive and computational modeling of multimodal creativity. |
| 12:30-13:30 | Lunch | |
| 13:30-14:00 | Wrap Up |
Multimodal creativity is extremely under-researched. Few studies address this topic at all, even fewer engage with the “standard definition” of creativity based on originality and effectiveness, and fewer still offer an operationalization or model of the creativity of multimodal data. In this study we make steps to address these gaps. We first discuss challenges, and potential solutions, for operationalizing multimodal creativity in experimental data elicited in a multimodal reference game. We then turn to modeling multimodal creativity, focusing on intelligent virtual agents (IVAs). We raise the question whether patterns of multimodal creativity observed in humans can be perceived in a similar way when expressed by IVAs. We report on an ongoing study investigating this question and provide a first look at results. In addition, we explore the interpretive capabilities of vision models and how they can contribute to multimodal agent design. Together, these insights highlight both the opportunities and challenges in modeling human-like creative behaviors across modalities.
In this talk, I focus on multimodal communicative means that interlocutors use in interactions oriented toward understanding, namely, explanatory interactions. Such interactions can be characterized as being, at the beginning, asymmetric with respect to knowledge about or understanding of a certain concept or entity, and as having the goal of increasing understanding on the side of the explainee (i.e., the person addressed by the explanation). In dyadic explanatory interactions, both interlocutors — the explainer and the explainee — contribute to this goal (Rohlfing et al., 2021): the explainer by explaining but also by monitoring the explainee (Clark & Krych, 2004), and the explainee by providing feedback and contributing actively to the explanation. All these processes do not only involve spoken language but also nonverbal and multimodal forms of communicative behavior, such as co-speech gestures, head gestures, gaze behavior, and vocal backchannels (e.g., Bavelas et al., 2000; Clark & Krych, 2004; Malisz et al., 2016). Such multimodal forms serve as communicative resources that convey meaning beyond propositional information.
To investigate how multimodal forms of communicative behavior are related to different levels of explainees’ understanding, we built a rich multimodal corpus, MUNDEX (MUltimodal UNderstanding of Explanations, Türk et al., 2023) that comprises a total of 87 dyadic board game explanations between German native speakers in which 32 explainers (who were familiar with the game) interact with 2–3 explainees each (who were not familiar with the game), enabling the study of adaptations to individual explainees. Each dialog consisted of three phases – game absent, game present, and game play – that differ with respect to the shared referential space between the interlocutors. Explainees’ understanding was assessed via a retrospective video-recall task after the interaction session that both interaction partners carried out separately. In this task, the explainees comment on their level of understanding, and the explainers on their belief on the explainees’ level of understanding.
I will present some of our research findings that shed light on the connection between multimodal forms of communicative behavior and understanding. It addresses the question what communicative resources in addition to spoken means communication partners use to create a shared basis for understanding and jointly come to a certain goal, which for explanations is understanding.
This presentation introduces a framework for identifying creative gestures in early childhood and demonstrates how multimodal methods reveal children’s inventive strategies for constructing meaning when confronted with lexical gaps. The first part summarizes key findings from the Ecogest study, which informed a set of categories that define what constitutes a creative gesture. The framework is then applied to new data from the C04 project, and an adapted coding scheme is presented along with examples of creative gestures observed in this context.
When communicating with others in a challenging environment, such as in the presence of disruptive background noise, we tend to speak louder, which helps the listener to pick up what is said amidst all the noise. Yet face to face communication involves not only spoken language, but visual information as well. Our lip movements provide phonological cues to the speech, and hand gestures provide both emphatic and timing cues as well as complementary semantic and pragmatic meaning. How do speakers adapt this whole multimodal system of signals to the presence of noise? And what do they do when their addressee fails to pick up what was said? I will discuss a study involving a dyadic communication game where participants (n=48) were tasked with communicating single Dutch action verbs to one another in the presence of varying levels of multi-speaker babble. During this relatively unconstrained task, we used audio, video, and motion tracking recordings to capture lip kinematics, gesture kinematics, speech acoustics, and gesture representation strategy. Several patterns of behaviors emerged, including general preferences for modality of communication. However, the level of background noise and an initial failure to communicate further shaped both the acoustic and kinematic features, as well as the gesture strategy
Face-to-face communication is inherently multimodal, with interlocutors combining signals from the auditory and the visual domain to ensure mutual understanding, joint meaning making, and ultimately communicative success. Hard-of-hearing individuals who face difficulties in the auditory domain might therefore rely more on visual communicative signals, such as gestures and facial expressions – especially in situations with background noise. To date, it has not been empirically investigated whether the multimodal communicative behavior (i.e. the kinematic properties of gestures and the acoustic properties of speech) differ between hard-of-hearing and hearing individuals. We investigate this by analyzing audio- and video-recordings of dyads engaging in free and task-based dialogue with changing types of background noise (no noise, social noise, non-social noise). Besides differences in multimodal communicative behavior across hearing status and background noise type, we also investigate whether there are types of behavior that are associated with communicative success, which we operationalize as a combination of self-report measures (questionnaires) and task measures (accuracy and reaction time). With this research, we aim to develop informed suggestions on how to create a more inclusive and pleasant communicative experience for hard-of-hearing people.
Research has indicated that individuals diagnosed with aphasia, subsequent to a left hemisphere lesion, have the capacity to utilise gestures in a creative manner, thereby achieving their communicative intentions despite their verbal limitations (Magno Cadlofnetto & Poggi, 1995). The extent to which individuals with cognitive and/or communicative impairments following a right hemisphere lesion are also capable of creative gesture use as a communicative strategy remains unclear. As of now the characteristics of creative gesture use in individuals with brain lesions have not been studied in depth and the coding and description of creatively used gestures consequently lack consensus. This talk will present a new coding scheme for the creative gesture use of people with neurogenic speech and communication disorders.
Visual communication means such as manual gestures interact with speech meaning. At the same time, it is well-known that gestures are sublinguistic in the sense that their linguistic (possibly contrary to their visual contribution) are neither at the at-issue nor at the non-at-issue level. The talk introduces spatial gesture semantics which aims to unravel this puzzle. First, the sublinguistic status of gestures is captured by spatially enriched semantic models. Second, it will be explain how it is possible to talk about gestures, for instance, as part of clarification interaction. To this end, computational semantic models based on perceptual classification (with some affinity to the philosophy of Nelson Goodman) will be presented.
Humans can create a new communication system within a remarkably short timescale, and an individual’s ability to generate new symbols is a crucial foundation for this process. In this talk, I present experimental work examining how hearing adults create silent gestures for concepts under a strict four-second time constraint. We showed that people tended to produce communicatively efficient gestures. Namely, the gesture produced by the majority of participants was the one effective for comprehension. Furthermore, more empathetic participants estimated gestures’ communicative efficacy more accurately and produced gestures with higher communicative efficacy, indicating that they imagined how the recipient may interpret the gestures when deciding what gesture to produce. Participants also tended to produce gestures with lower production costs. Together, these findings indicate that humans can create a communicatively efficient communicative symbol for a concept within a few seconds, which may partly explain why a new language can emerge quickly in a community.

The workshop will be held at the Center for Interdisciplinary Research (short: ZiF = Zentrum für interdisziplinäre Forschung), close to Bielefeld University’s main campus (campus map) and right next to the Teutoburg Forest. You can find detailed travel information for the ZiF below.
Bielefeld Hbf, then take tram line subway/tram line 4 (destination Universität or Lohmannshof, approx. 7 minutes). From the tram stop Universität or Bültmannshof you can reach ZiF by walking up the hill behind the main building of the university.
During the day, a bus goes from Bielefeld main station to ZiF (lines 61 to Werther/Halle or 62 to Borgholzhausen); the exit stop is Universität/Studentenwohnheim.
Timetable information for the Bielefeld public transport lines
Network map of the Bielefeld public transport lines
Taxis are always available directly in front of the main station (it takes approx. 10 minutes from the main station to ZiF). The fare to the university is currently around 16 euros.
From the north:
Motorway A2: Exit Bi-Ost, Detmolder Str. direction Zentrum (6 km, approx. 10 min). Route via Kreuzstr., Oberntorwall, Stapenhorststr., Wertherstr. until ZiF is signposted.
From the south:
Motorway A2: At the Bielefeld junction, take the A33 towards Bi-Zentrum, exit at Bi-Zentrum, follow the signs to the city centre on Ostwestfalendamm (B61), exit at Universität, follow Stapenhorststr., Wertherstr. until ZiF is signposted.