Sign language plays a crucial role in facilitating effective communication for individuals with hearing im-
pairments. As technology becomes increasingly integrated into our lives, it becomes imperative to create
inclusive platforms that cater to the needs of sign language users, particularly in remote communication
and collaboration settings. This thesis focuses on addressing the specific challenge of sign language de-
tection within the context of Microsoft Teams, a widely utilized communication and collaboration tool. By
tackling this challenge, we aim to enhance the accessibility and inclusivity of Microsoft Teams for individu-
als who rely on sign language as their primary mode of communication. We begin our work by establishing
our evaluation metrics: we use unweighted average recall instead of accuracy which better captures the
performance in unbalanced datasets and we resort to qualitative evaluation of our best performing model
by visualising the classification output and analysing the attention activation weights, similarily to the ap-
proach used in the paper introducing InfoGCN . We also define the datasets that we use throughout our
work namely: signing in the wild, the DGS-Corpus and the Teams dataset. Our experimentation begins by
setting a VGG16+RNN approach as the baseline model for sign language detection, this has been defined
and explored in Borg et al. The baseline model combines the VGG16 convolutional neural network for
feature extraction and a recurrent neural network for leveraging temporal information in video segments.
The VGG16+RNN baseline is trained and evaluated to establish a performance benchmark for the task. To
explore the potential of human skeleton-based approaches, we introduce the Hierarchical Co-occurrence
Network (HCN) architecture as a baseline for skeleton-based sign language detection. The HCN model
leverages the hierarchical composition of co-occurrence features extracted from human skeletons. The
HCN baseline is trained and evaluated to assess its effectiveness in capturing sign language. Further-
more, we propose a revisited version of InfoGCN architecture tailored to the specificities of sign languagr,
as an advanced model for sign language detection. The InfoGCN model combines attention-based graph
convolutions with an information bottleneck framework to achieve its state-of-the-art performance on action
recognition benchmarks. We optimize the performance of the InfoGCN model through various approaches:
augmenting the human skeleton graph with additional landmarks, incorporating direct cross-modal connec-
tions (e.g., hands, face contours, eyebrows, and mouth), and integrating a graph convolution step into the
encoding block of the InfoGCN architecture. We report the UAR on each of these experiments. Our final
model achieves a detection UAR of 0.920 on the test split of signing in the wild and 0.825 on the test split
of the DGS Corpus, improving the baseline by 70% and 57% in terms of relative error reduction for the test
split UAR on both datasets respectively.
«
Sign language plays a crucial role in facilitating effective communication for individuals with hearing im-
pairments. As technology becomes increasingly integrated into our lives, it becomes imperative to create
inclusive platforms that cater to the needs of sign language users, particularly in remote communication
and collaboration settings. This thesis focuses on addressing the specific challenge of sign language de-
tection within the context of Microsoft Teams, a widely utilized communi...
»