Publications

Saund, C. (2022). Modelling the relationship between gesture motion and meaning (Doctoral dissertation, University of Glasgow).

Abstract: There are many ways to say “Hello,” be it a wave, a nod, or a bow. We greet others not only with words, but also with our bodies. Embodied communication permeates our interactions. A fist bump, thumbs-up, or pat on the back can be even more meaningful than hearing “good job!” A friend crossing their arms with a scowl, turning away from you, or stiffening up can feel like a harsh rejection. Social communication is not exclusively linguistic, but is a multi-sensory affair. It’s not that communication without these bodily cues is impossible, but it is impoverished. Embodiment is a fundamental human experience.


Expressing ourselves through our bodies provides a powerful channel through which we express a plethora of meta-social information. And integral to communication, expression, and social engagement is our utilization of conversational gesture. We use gestures to express extra-linguistic information, to emphasize our point, and to embody mental and linguistic metaphors that add depth and color to social interaction.

The gesture behaviour of virtual humans when compared to human-human conversation is limited, depending on the approach taken to automate performances of these characters. The generation of nonverbal behaviour for virtual humans can be approximately classified as either: 1) data-driven approaches that learn a mapping from aspects of the verbal channel, such as prosody, to gestures; or 2) rule bases approaches that are often tailored by designers for specific applications.

This thesis is an interdisciplinary exploration that bridges these two approaches, and brings data-driven analyses to observational gesture research. By marrying a rich history of gesture research in behavioral psychology with data-driven techniques, this body of work brings rigorous computational methods to gesture classification, analysis, and generation. It addresses how researchers can exploit computational methods to make virtual humans gesture with the same richness, complexity, and apparent effortlessness as you and I. Throughout this work the central focus is on metaphoric gestures. These gestures are capable of conveying rich, nuanced, multi-dimensional meaning, and raise several challenges in their generation, including establishing and interpreting a gesture’s communicative meaning, and selecting a performance to convey it. As such, effectively utilizing these gestures remains an open challenge in virtual agent research. This thesis explores how metaphoric gestures are interpreted by an observer, how one can generate such rich gestures using a mapping between utterance meaning and gesture, as well as how one can use data driven techniques to explore the mapping between utterance and metaphoric gestures.

The thesis begins in Chapter 1 by outlining the interdisciplinary space of gesture research in psychology and generation in virtual agents. It then presents several studies that address presupposed assumptions raised about the need for rich, metaphoric gestures and the risk of false implicature when gestural meaning is ignored in gesture generation. In Chapter 2, two studies on metaphoric gestures that embody multiple metaphors argue three critical points that inform the rest of the thesis: that people form rich inferences from metaphoric gestures, these inferences are informed by cultural context and, more importantly, that any approach to analyzing the relation between utterance and metaphoric gesture needs to take into account that multiple metaphors may be conveyed by a single gesture. A third study presented in Chapter 3 highlights the risk of false implicature and discusses this in the context of current subjective evaluations of the qualitative influence of gesture on viewers.

Chapters 4 and 5 then present a data-driven analysis approach to recovering an interpretable explicit mapping from utterance to metaphor. The approach described in detail in Chapter 4 clusters gestural motion and relates those clusters to the semantic analysis of associated utterance. Then, Chapter 5 demonstrates how this approach can be used both as a framework for data-driven techniques in the study of gesture as well as form the basis of a gesture generation approach for virtual humans.

The framework used in the last two chapters ties together the main themes of this thesis: how we can use observational behavioral gesture research to inform data-driven analysis methods, how embodied metaphor relates to fine-grained gestural motion, and how to exploit this relationship to generate rich, communicatively nuanced gestures on virtual agents. While gestures show huge variation, the goal of this thesis is to start to characterize and codify that variation using modern data-driven techniques.

The final chapter of this thesis reflects on the many challenges and obstacles the field of gesture generation continues to face. The potential for applications of Virtual Agents to have broad impacts on our daily lives increases with the growing pervasiveness of digital interfaces, technical breakthroughs, and collaborative interdisciplinary research efforts. It concludes with an optimistic vision of applications for virtual agents with deep models of non-verbal social behaviour and their potential to encourage multi-disciplinary collaboration.


@phdthesis{saund2022modelling,

  title={Modelling the relationship between gesture motion and meaning},

  author={Saund, Carolyn},

  year={2022},

  school={University of Glasgow}

}

2022 10th International Conference on Human-Agent Interaction (HAI)

Saund, C., Matuszak, H., Weinstein, A., & Marsella, S. (2022, December). Motion and Meaning: Data-Driven Analyses of The Relationship Between Gesture and Communicative Semantics. In Proceedings of the 10th International Conference on Human-Agent Interaction (pp. 227-235).

Abstract: Gestures convey critical information within social interactions. As such, the success of virtual agents (VA) in both building social relationships and achieving their goals is heavily dependent on the information conveyed within their gestures. Because of the precision required for effective gesture behavior, it is prudent to retain some designer control over these conversational gestures. However, in order to exercise that control practically we must first understand how gestural motion conveys meaning. One consideration in this relationship between motion and meaning is the notion of Ideational Units, meaning that only parts of a gesture’s motion at a point in time may convey meaning, while other parts may be held from the previous gesture. In this paper, we develop, demonstrate, and release a set of tools that help quantify the relationship between the semantics conveyed in a gesture’s co-speech utterance and the fine-grained motion of that gesture. This allows us to explore insights into the complex relationship between motion and meaning. In particular, we use spectral motion clustering to discern patterns of motion that tend to be associated with semantic concepts, on both an aggregate and individual-speaker level. We then discuss the potential for these tools to serve as a framework for both automated gesture generation and interpretation in virtual agents. These tools can ideally be used within approaches to automating VA gesture performances as well as serve as an analysis framework for fundamental gesture research.


@inproceedings{saund2022motion,

  title={Motion and Meaning: Data-Driven Analyses of The Relationship Between Gesture and Communicative Semantics},

  author={Saund, Carolyn and Matuszak, Haley and Weinstein, Anna and Marsella, Stacy},

  booktitle={Proceedings of the 10th International Conference on Human-Agent Interaction},

  pages={227--235},

  year={2022}

}

The Handbook on Socially Interactive Agents: 20 years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 1: Methods, Behavior, Cognition (2021).

Saund, C., & Marsella, S. (2021). Gesture generation. In The Handbook on Socially Interactive Agents: 20 years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 1: Methods, Behavior, Cognition (pp. 213-258).

Abstract: Gestures accompany our speech in ways that punctuate, augment, substitute for, and even contradict verbal information. Such co-speech gestures draw listeners’ attention to specific phrases, indicate the speaker’s feelings toward a subject, or even convey “off-the-record” information that is excluded from our spoken words. The study of co-speech gesture stretches at least as far back as the work of Quin tilian in 50 AD, and draws from the disciplines of cognitive science, performance...


@incollection{saund2021gesture,

  title={Gesture generation},

  author={Saund, Carolyn and Marsella, Stacy},

  booktitle={The Handbook on Socially Interactive Agents: 20 years of Research on Embodied Conversational Agents, Intelligent Virtual Agents, and Social Robotics Volume 1: Methods, Behavior, Cognition},

  pages={213--258},

  year={2021}

}

2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)

C. Saund and S. Marsella, "The Importance of Qualitative Elements in Subjective Evaluation of Semantic Gestures," 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 2021, pp. 1-8, doi: 10.1109/FG52635.2021.9667023.

Abstract: Gestures play a vital role in face-to-face interactions, from conveying speaker attitudes to relaying information not present in speech. Gestures have been widely shown to be linked to the meaning, form and timing of co-speech context of their production. Producing convincing, relevant, and informative semantic gestures is an ongoing challenge in the field of gesture generation for embodied conversational agents. In this paper, we put forward a novel technique to select semantically-related gestures from a gesture database, and present two experiments which highlight the importance of measuring the qualitative impact of semantically-related gestures on the viewer. In the first experiment, we demonstrate a strong correlation between subjective perception of energy level of the speaker and perception of the semantic relatedness of the co-speech transcript. In the second experiment, we attempt to measure semantic information conveyed in gesture on specific qualitative dimensions. We then discuss the implications and impacts of these findings, including the limitations of the strength of claims we can make when using the original gesture that accompanies an utterance as an evaluative baseline.


@INPROCEEDINGS{9667023,

  author={Saund, Carolyn and Marsella, Stacy},

  booktitle={2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021)}, 

  title={The Importance of Qualitative Elements in Subjective Evaluation of Semantic Gestures}, 

  year={2021},

  volume={},

  number={},

  pages={1-8},

  doi={10.1109/FG52635.2021.9667023}}


2021 Autonomous Agents and Multiagent Systems (AAMAS)

C. Saund, A. Bîrlădeanu and S. Marsella, 2021. "CMCF: An Architecture for Realtime Gesture Generation by Clustering Gestures by Motion and Communicative Function," 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2021. Online, May 3-7, 2021. 

Abstract: Gestures augment speech by performing a variety of communicative functions in humans and virtual agents, and are often related to speech by complex semantic, rhetorical, prosodic, and affective elements. In this paper we briefly present an architecture for human-like gesturing in virtual agents that is designed to realize complex speech-to-gesture mappings by exploiting existing machine-learning based parsing tools and techniques to extract these functional elements from speech. We then deeply explore the rhetorical branch of this architecture, objectively assessing specifically whether existing rhetorical parsing techniques can classify gestures into classes with distinct movement properties. To do this, we take a corpus of spontaneously generated gestures and correlate their movement to co-speech utterances. We cluster gestures based on their rhetorical properties, and then by their movement. Our objective analysis suggests that some rhetorical structures are identifiable by our movement features while others require further exploration. We explore possibilities behind these findings and propose future experiments that may further reveal nuances of the richness of the mapping between speech and motion. This work builds towards a real-time gesture generator which performs gestures that effectively convey rich communicative functions.


@INPROCEEDINGS{8925435,

  author={Saund, Carolyn and Bîrlădeanu, Andre and Marsella, Stacy},

  booktitle={Autonomous Agents and Multiagent Systems 2021 (AAMAS)}, 

  title={CMCF: An Architecture for Realtime Gesture Generation by Clustering Gestures by Motion and Communicative Function}, 

  year={2021},

  volume={},

  number={},

}

2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)

C. Saund, M. Roth, M. Chollet and S. Marsella, "Multiple metaphors in metaphoric gesturing," 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), Cambridge, UK, 2019, pp. 524-530, doi: 10.1109/ACII.2019.8925435.

Abstract: The use of metaphoric gestures by speakers has long been known to influence thought in the viewer. What is less clear is the extent to which the expression of multiple metaphors in a single gesture reliably affect viewer interpretation. Additionally, gestures which express only one metaphor are not sufficient to explain the broad array of metaphoric gestures and metaphoric scenes that human speakers naturally produce. In this paper we address three issues related to the implementation of metaphoric gestures in virtual humans. First, we break down naturally occurring examples of multiple-metaphor gestures, as well as metaphoric scenes created by gesture sequences. Then, we show the importance of capturing multiple metaphoric aspects of gesture with a behavioral experiment using crowdsourced judgements of videos of alterations of the naturally occurring gestures. Finally, we discuss the challenges for computationally modeling metaphoric gestures that are raised by our findings.


@INPROCEEDINGS{8925435,

  author={Saund, Carolyn and Roth, Marion and Chollet, Mathieu and Marsella, Stacy},

  booktitle={2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII)}, 

  title={Multiple metaphors in metaphoric gesturing}, 

  year={2019},

  volume={},

  number={},

  pages={524-530},

  doi={10.1109/ACII.2019.8925435}}