Physiological studies have identified that facial dynamics can be considered as biomarkers to analyze depression severity. This paper accordingly develops a Dual Attention and Element Recalibration (DAER) network to extract facial changes to predict the depression level. In this model, we propose two blocks: a Dual Attention (DA) block and Element Recalibration (ER) block. The DA block uses the self-attention to investigate the dynamic changes in the representation sequence of a facial video segment. It further examines the influence of feature components of the representation sequence on depression level prediction through bilinear-attention. Moreover, to improve the representation ability of network, the ER block is used to obtain the global information to recalibrate each element of the tensor. Adopting this approach, for the depression level prediction task, we first divide the long-term video into fixed-length segments and use the trained ResNet50 to encode each frame to generate the representation sequences of video segments. Second, the representation sequences are input into DAER network to obtain the depression level scores. Finally, the average of these scores yields the prediction result corresponding to the long-term video. Experiments on publicly available AVEC 2013 and AVEC 2014 depression databases illustrate the effectiveness of our method.
«
Physiological studies have identified that facial dynamics can be considered as biomarkers to analyze depression severity. This paper accordingly develops a Dual Attention and Element Recalibration (DAER) network to extract facial changes to predict the depression level. In this model, we propose two blocks: a Dual Attention (DA) block and Element Recalibration (ER) block. The DA block uses the self-attention to investigate the dynamic changes in the representation sequence of a facial video seg...
»