Quantitative Comparisons of Real-time Emotion Recognitions

This thesis primarily focuses on quantifying and analysing the impact of image degradations on the predictions of Automatic Affect Estimators.

Completed Bachelor Thesis

With the recent growth of the digitalization, there is an increase need for better human computer interactions. One such attempt is by equiping the computer with ability to recognise human affect states, which is a part affective computing field [4,5]. This has been recently realized by the development of computer based emotion recognition (ER), particularly using facial modality as input data due to their intuitive nature [1].

Even though the current facial based emotion recongnition approaches have been shown to work well in their respective experiment settings, their robustness outside of their standard protocols have not yet been intensively analysed. This can cause a problem, especially if the realibility of these models are expected during the deployment. That is, where potential risks (from system attacker, for instance) of artifical image alteration exist to a degree. The impact of such alterations indeed have been shown in other related facial analysis applications [2][3].

The first aim of this thesis is to quantitatively evaluate the robustness of several facial-based emotion recognition against different image conditions (such as against synthetic noises) [2]; second to benchmark the efficiency of evaluated models during run time; third is to utilise the finding of models behaviours during the evaluation to reduce their current limitations, if any [3].

We will primarily target the valence/arousal based pretrained deep learning based models, such as [10] and [11] given their state of the art accuracy. We also will consider to perform comparison using classical machine learning approach such as LBP/SIFT in combinations with SVM/KNN classifiers. At least one large dataset will be used for intensive evaluation (for instances: Affecnet [4,7] or AFF in the wild [6]) with a smaller dataset [8,9] could be used for faster evaluations.



Following backgrounds are necessary to complete the thesis:

1. Knowledge in Machine Learning.

2. Knowledge in Image Processing.

3. Introduction to Deep Learning.


[1] Aspandi, Decky, et al. "An Enhanced Adversarial Network with Combined Latent Features for Spatio-temporal Facial Affect Estimation in the Wild." (2021).

[2] Zhou, Yuqian, Ding Liu, and Thomas Huang. "Survey of face detection on low-quality images." 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018). IEEE, 2018.

[3] Aspandi, Decky, et al. "Robust facial alignment with internal denoising auto-encoder." 2019 16th Conference on Computer and Robot Vision (CRV). IEEE, 2019.

[4]. Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor, “AffectNet: A New Database for Facial Expression, Valence, and Arousal Computation in the Wild”, IEEE Transactions on Affective Computing, 2017.

[5]. Kossaifi, Jean, et al. "Sewa db: A rich database for audio-visual emotion and sentiment research in the wild." IEEE transactions on pattern analysis and machine intelligence (2019).

[6]. Kossaifi, Jean, et al. "AFEW-VA database for valence and arousal estimation in-the-wild." Image and Vision Computing 65 (2017): 23-36. [7] AffectNet: http://mohammadmahoor.com/affectnet/ [8] AFEW-VA: https://ibug.doc.ic.ac.uk/resources/afew-va-database/ [9] SEWA: https://db.sewaproject.eu/ [10] https://github.com/dkollias/Aff-Wild-models [11] https://github.com/serengil/deepface


To the top of the page