Sign Language Detection for Microsoft Teams

Cyrine Chaabani

User: Guest

Document type:: Masterarbeit
Author(s):: Cyrine Chaabani
Title:: Sign Language Detection for Microsoft Teams
Subtitle:: Exploring AI-based approaches and meaningful data representations
Abstract:: Sign language plays a crucial role in facilitating effective communication for individuals with hearing im- pairments. As technology becomes increasingly integrated into our lives, it becomes imperative to create inclusive platforms that cater to the needs of sign language users, particularly in remote communication and collaboration settings. This thesis focuses on addressing the specific challenge of sign language de- tection within the context of Microsoft Teams, a widely utilized communication and collaboration tool. By tackling this challenge, we aim to enhance the accessibility and inclusivity of Microsoft Teams for individu- als who rely on sign language as their primary mode of communication. We begin our work by establishing our evaluation metrics: we use unweighted average recall instead of accuracy which better captures the performance in unbalanced datasets and we resort to qualitative evaluation of our best performing model by visualising the classification output and analysing the attention activation weights, similarily to the ap- proach used in the paper introducing InfoGCN . We also define the datasets that we use throughout our work namely: signing in the wild, the DGS-Corpus and the Teams dataset. Our experimentation begins by setting a VGG16+RNN approach as the baseline model for sign language detection, this has been defined and explored in Borg et al. The baseline model combines the VGG16 convolutional neural network for feature extraction and a recurrent neural network for leveraging temporal information in video segments. The VGG16+RNN baseline is trained and evaluated to establish a performance benchmark for the task. To explore the potential of human skeleton-based approaches, we introduce the Hierarchical Co-occurrence Network (HCN) architecture as a baseline for skeleton-based sign language detection. The HCN model leverages the hierarchical composition of co-occurrence features extracted from human skeletons. The HCN baseline is trained and evaluated to assess its effectiveness in capturing sign language. Further- more, we propose a revisited version of InfoGCN architecture tailored to the specificities of sign languagr, as an advanced model for sign language detection. The InfoGCN model combines attention-based graph convolutions with an information bottleneck framework to achieve its state-of-the-art performance on action recognition benchmarks. We optimize the performance of the InfoGCN model through various approaches: augmenting the human skeleton graph with additional landmarks, incorporating direct cross-modal connec- tions (e.g., hands, face contours, eyebrows, and mouth), and integrating a graph convolution step into the encoding block of the InfoGCN architecture. We report the UAR on each of these experiments. Our final model achieves a detection UAR of 0.920 on the test split of signing in the wild and 0.825 on the test split of the DGS Corpus, improving the baseline by 70% and 57% in terms of relative error reduction for the test split UAR on both datasets respectively. «
Sign language plays a crucial role in facilitating effective communication for individuals with hearing im- pairments. As technology becomes increasingly integrated into our lives, it becomes imperative to create inclusive platforms that cater to the needs of sign language users, particularly in remote communication and collaboration settings. This thesis focuses on addressing the specific challenge of sign language de- tection within the context of Microsoft Teams, a widely utilized communi... »
Supervisor:: Felix Dietrich
Advisor:: Oscar Koller
Year:: 2023
Quarter:: 2. Quartal
Year / month:: 2023-06
Month:: Jun
Language:: en
University:: Technical University of Munich
Faculty:: TUM School of Computation, Information and Technology
BibTeX

Occurrences:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)2023