(2021-2024, DFG, Petra Wagner and Reinhold Häb-Umbach (Paderborn University))
Together with Reinhold Häb-Umbach (Paderborn University), we are investigating how recent progress in speech technology can be leveraged to be used as tools in basic phonetics research. Speech technological methods, in particular speech synthesis and voice conversion tools, have a long tradition in phonetics research, but always had the problem of audible artifacts which severely affect the validity of findings. With the improvements brought about by deep learning, however, some speech synthesis methods achieve a naturalness which is nearly indistinguishable from human speech. What is still needed to make them a useful tool for phonetics research is a mechanism which gives the phonetician control over the synthetic speech stimuli, both at low level (manipulation of formants, pitch etc.) and at more abstract levels (e.g., speaker identity, variety, emotional state etc.). In our project, we are investigating whether deep representation learning as an emerging technique can be used to reveal compact, disentangled and human interpretable latent factors in speech, and has the potential to enable the dedicated manipuation of speech cues at different levels of abstraction. This needs to be researched in an interdisciplinary fashion, whereby phonetics experts are in the design loop to determine the relevance of categories to be learnt, and to evaluate the systems in a continuous fashion.
The project investigates laughter in spontaneous conversations, from the prism of its form, its function, as well as its multi-modal realization. It aims to offer a more connected view of the use of laughter in conversation, one which takes into account these three aspects, together with their interplay, in an interactive setting. First, we will investigate the laughter dimensions in which interlocutors become more similar during their interaction, whether this process varies with conversational context, and whether it has a functional role in conversation. Next, we will complement existing knowledge on multimodal laughter production with an analysis of the co-occurring gestures and their relation to the type of laughter. Then, we will study acoustic-prosodic cues able to discriminate laughter from speech, which are robust to variability sources present in laughter. Finally, we will perform an exploratory investigation to evaluate the findings of the project, by means of interaction experiments with a dialogue system implementing laughter.
More information about the findings of the project can be found here.
(2018-ongoing, Bielefeld University, Simon Betz)
The long-term goal of this project is to analyze differences in online processing of two types of hesitations: lengthening and fillers. The method of analysis is a variant of mouse tracking embedded in a drag&drop game environment, in which users have to move a space ship around following audio instructions (hence the working title of this project). Using gamification and mouse tracking demands some previous basic research. At Interspeech 2019 in Graz, we presented a first study of this project showing that placement of lengthening within a word influences uncertainty perception, which is to be heeded for creating the stimuli for the full experiment. At ESSV 2020 in Magdeburg, we presented a pilot study of the first functional version of the GUI and the data it provides. The full experiment was devised to take place in Q1 and Q2 2020, but has been postponed due to the pandemic. Currently (2021) we conduct spinoff experiments which are online-compatible, investigating potential mappings between hesitations and unclear color terms. This project is led by Simon Betz, in cooperation with Petra Wagner, Marin Schröer and Leonie Schade from this workgroup, and with Éva Székely (KTH Stockholm) and Sina Zarrieß (Computational Linguistics, Bielefeld University).
(2009-ongoing, Bielefeld University, Petra Wagner)
In this project, we investigate the multimodal production and perception of prosodic prominence in speech. We specifically focus on the functions of prosody in conversational interactions, and play special attention to the (temporal) cross-modal co-ordinations as well as speech production-perception coupling. Among other things, our previous research in this area investigated the multimodal display of conversational feedback and its integration into a virtual agent capable of showing an active listening behaviour, showed listeners’ ability to encode their perceived patterns of prosodic impressions in simple manual movements such as drumming or tapping, and analyzed the function-specific multimodal expression of prosodic accents in simple task-oriented spontaneous conversations.
(2020-2022, Swedish Research Council, Petra Wagner)
Together with colleagues from Stockholm University (Dr. Marcin Wlodarczak and Prof. Dr. Mattias Heldner) and KTH Stockholm (Prof. Dr. Johan Sundberg), we are addressing voice quality related aspects affecting conversational interaction. Voice quality (VQ) is a colouring of the voice determined largely by the mode of vocal fold vibration. It is a feature that is continuously changing as we speak and varies from modal to, for instance, breathy, pressed or creaky voice. In this project, we aim to understand the role of VQ by exploring two prosodic functions of VQ dynamics in spontaneous conversation: management of speaking turns and marking of prosodic prominence. This will be achieved by adopting automatic methods for the identification of turn-taking events and prosodic prominence expression in order to process large, conversational datasets. The outcomes of the project will create potential for existing speech and interaction technology solutions. Hence, the project provides another contribution to the long-term goal "to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is human-like" formulated in previous projects run by members of the research group.
(2019-2022, Bielefeld University, Jana Voße)
In my PhD project I aim at developing a speech synthesis that is able to motivate humans to perform healthier routines in their everyday life. This includes for example choosing a healthier menu for lunch, taking the stairs instead of the elevator or going to the gym instead of relaxing on the sofa. The project focusses on the acoustic-phonetic expression of motivation, which will subsequently be implemented in a speech synthesis system. In a first step, the acoustic-phonetic parameters will be analyzed on holistic and local levels and contrasted with respect to variables like gender and culture. After validating the results with perception experiments, the motivating parameter constellation will be imple mented in a speech synthesis system to test whether the parameters that have a motivating effect in human-human interaction convey motivation also in human-machine interaction.
The project is supervised by Petra Wagner (Bielefeld University) and Oliver Niebuhr (University of Southern Denmark).
(Horizon 2020, 2018-2020, Bogdan Ludusan)
This project investigates conversational laughter, both from a fundamental research perspective as well as from an application viewpoint. The addressed research question concerns the context in which laughter occurs and, in particular, the use of acoustic-prosodic cues in marking it. The findings of this investigation will directly feed into a spoken dialogue system, with the goal of increasing its perceived naturalness. In a first step, analyses are performed at different linguistic levels to identify the range of cues employed by speakers to mark laughter itself, as well as its introduction in conversation. This will feed into the second part of the project, that aims to combine state-of-the-art signal processing methods and prosodic information in order to automatically detect and segment laughter. Finally, a laughter-enhanced dialogue system will be implemented and its perceived naturalness and friendliness will be tested through interactions between the system and human participants.
Simon Betz (2019). In this PhD project, it was investigated whether hesitations are suitable additions for conversational spoken dialogue systems. In human communication, hesitations like fillers (uhm…), silences or lengthened words provide information for the listener and buy dialogue time for the speaker. En route to implementing a model for hesitation insertion into a dialogue system situated in a smart-home environment, this thesis provided basic research on hesitations in human and machine communication. It became apparent that hesitation lengthening is an under-researched phenomenon which is very versatile in conversational dialogue systems, as it is capable of buying time without the user noticing it. Several new research questions regarding lengthening opened up during this project, which will be addressed in future projects in this workgroup.
After a nomination by KTH Stockholm, Petra Wagner has been awarded the 27th Swedish Research Award to outstanding German scholars by the Swedish Riksbanken Jubileumsfond for 2018. The award is based on a collaboration between Riksbanken Jubileumsfond and Alexander von Humboldt-Foundation, and is dedicated to perform scientific work in Sweden. Each year, it is typically awarded to 1 or 2 outstanding German scholars from the humanities or social sciences. The prize sum is ca. 60.000 EUR.