With the general progress in artificial neural networks, sign language detection from live video feeds has become a popular research field in recent years. Right now, sign language detection is where spoken language detection from live audio feeds was 10 years ago, being restricted in context, vocabulary, grammatical diversity and user-friendliness. But with the rise of efficient and light-weight applications for real-time human pose estimation on mobile devices, a major improvement in quality and usability of sign language detection is possible in the next years. In this thesis I talk about why preprocessing is necessary for sign language detection, and how the structure of sign language itself sets special requirements for the preprocessing of sign language detection. I talk about general computer vision algorithms used for preprocessing, and give a quick overview over the history of sign language detection and its development. For the main part of my thesis, I examine MediaPipe Holistic, a real-time human pose estimation application using machine learning that reaches unprecedented accuracy on mobile devices. I conclude that it has the possibility to make sign language detection with machine learning viable for real-time applications. By analysis of its code and experiments I conclude that it reaches its accuracy and low latency by intelligently re-using computation results and not queuing up frames which can not be analyzed in real-time anymore. Additionally I conclude that preprocessing based on human skin colour is very helpful for sign language detection, but MediaPipe Holistic’s own human skin colour based preprocessing actually degrades its accuracy when handling sign language. In summary, MediaPipe Holistic shows the benefits human pose estimation can offer to make sign language detection real-time viable even on low-end hardware by only analyzing frames if the model is not busy computing prior frames. But it cannot be deployed as-is as preprocessing, because it fails to accurately track typical poses used in sign language.
«
With the general progress in artificial neural networks, sign language detection from live video feeds has become a popular research field in recent years. Right now, sign language detection is where spoken language detection from live audio feeds was 10 years ago, being restricted in context, vocabulary, grammatical diversity and user-friendliness. But with the rise of efficient and light-weight applications for real-time human pose estimation on mobile devices, a major improvement in quality a...
»