::Home|Research Profile|Affective Computing

 

Facial Expression Recognition
Emotional Clustering
Speech Emotion Recognition

Facial Expression Recognition

Computers in these days try to interpret certain human characteristics so as to react better. These characteristics include facial expressions, eyes gaze, body gait, speech etc. . Many applications such as virtual reality, videoconferencing, user profiling, customer satisfaction studies for broadcast and web services and interfaces for people with special needs require efficient facial expression recognition in order to achieve the desired results.

The basic facial expressions are defined as six anger, disgust, fear, happiness, sadness and surprise. A set of muscle movements (Facial Action Units-FAUs) was created to produce those facial expressions, forming the Facial Action Coding System ( FACS ).


An example of each facial expression for a poser from the Cohn-Kanade database

Facial expressions are generally hard to recognize as:

  • Every person expresses in a different way, no international patterns are available.
  • The conditions must be ideal, meaning that a full frontal pose of the poser has to be available.
  • The neutral state has to be found in videos in order to be able to define the fully expressive video frame and thus perform facial expression recognition.
  • No proper databases available and difficult to create a new one, as supervision from psychologists is required.

 

Our Method

A novel method that performs facial expression recognition has been developed:

  • Introduces a new class of Support Vector Machines (SVMs) .
  • Introduces a subset of Candide grid used for facial expression recognition.
  • Introduces a new set of simplified rules for facial expression synthesis.
  • Takes into account the geometrical information of the Candide grid nodes between the first and the last video frame representing the neutral state and the expressive facial expression, respectively.
  • Uses SVMs as a classifier for the geometrical information extracted.
  • Classifies deformed grids into facial expressions and detects the FAUs that are activated in the grid under examination. It also performs later facial expression recognition using the detected FAUs.


Examples of grids depicting the 6 basic facial expressions


System architecture for facial expression recognition in facial videos

The accuracy achieved is equal to 99.7% when using multi-class SVMs for facial expression recognition and to 95.1% when using two-class SVMs for FAUs detection and afterwards facial expression recognition.


Downloads

Example of grid tracking for the 6 basic facial expressions


Relevant Publications

I. Kotsia and I. Pitas, "Real time facial expression recognition from video sequences using Support Vector Machines", in Proc. of Visual Communications and Image Processing (VCIP 2005), Beijing, China, 12-15 July, 2005.

I. Kotsia and I. Pitas, "Real time facial expression recognition from image sequences using Support Vector Machines", in Proc. of IEEE Int. Conf. on Image Processing (ICIP 2005), Genova, Italy, 11-14 September, 2005.

I. Kotsia and I. Pitas, "Facial Expression Recognition in Image Sequences using Geometric Deformation Features and Support Vector Machines", IEEE Transactions on Image Processing, January, 2007.


Research Projects

SIMILAR - The European research taskforce creating human-machine interfaces SIMILAR to human-human communication, IST, FP6

PENED 01 - Virtual Reality tools for education on natural disasters

top

© 2006

Emotional Clustering

Speech is different between persons and it also depends on the emotional state of the speaker. The primitive emotional states are anger, happiness, neutral, sadness, and surprise. The task of emotional clustering refers to uniquely assigning emotional feature vectors into the emotional states.


Our Methods

A) Two well-known variants of the self-organizing map (SOM) that are based on order statistics are the marginal median SOM (MMSOM) and the vector median SOM (VMSOM). We employ the MMSOM and the VMSOM to re-distribute emotional speech patterns from the Danish Emotional Speech database that were originally classified as being neutral to four emotional states such as hot anger, happiness, sadness, and surprise. The latter experiment is motivated by the following facts:

  • There are emotional facial expression databases such as the Action-Unit coded Cohn-Kanade database, where the neutral emotional class is not represented adequately. Accordingly, facial expression feature vectors are not clustered to the neutral emotional class.
  • For the emotional speech databases, the utterances are regularly classified as neutral. Accordingly, when the neutral class is not represented in one modality it is difficult to develop multimodal emotion recognition algorithms.
  • Frequently, the ground truth information related to emotions that is provided by the human evaluators is biased towards the neutral class.

It was proven that the marginal median SOM and the vector median SOM perform better than the standard SOM.

B) Another important issue arises when the emotional feature vectors are represented as points on the (N - 1)-dimensional simplex and the elements of these patterns are the posterior class probabilities for N classes. Such patterns form N clusters on the (N - 1)-dimensional simplex. The challenge is to reduce the number of clusters to N - 1, in order to redistribute the features classified into a particular class in the N - 1 simplex, according to the maximum a posteriori probability principle in an optimal manner using a SOM. We have accomplished a mathematical derivation of the training algorithm for a SOM that reduces the number of clusters by one on a simplex subspace.


Downloads

-

Relevant Publications

V. Moschou, D. Ververidis, and C. Kotropoulos, "On the Variants of the Self-Organizing Map That Are Based on Order Statistics ", in Proc. 2006 Int. Conf. Artificial Neural Networks, Athens, Sep. 2006.

C. Kotropoulos and V. Moschou, Self Organizing Maps for Reducing the Number of Clusters by One on Simplex Subspaces, in Proc. 2006 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 5, pp. 725-728, May 2006.


Research Projects

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

Pythagoras II - Funded by the Hellenic Ministry of Education in the framework of the program

top

© 2006

Speech Emotion Recognition

Affect recognition aims at automatically identifying the emotional or physical state of a human being from his or her face and voice. The emotional and physical states of a speaker are known as emotional aspects of speech and are included in the so-called paralinguistic aspects. Although the emotional state does not alter the linguistic content, it is an important factor in human communication, because it provides feedback information in many applications.

Affect Recognition is related to the following tasks:

  • Data collection procedures, the kind of speech (natural, simulated, or elicited), the content, and other physiological signals that may accompany the emotional speech.
  • Short-term features (i.e. features that are extracted on speech frame basis) that are related to the emotional content of speech. The emotions affect the contour characteristics, such as statistics and trends.
  • Emotion classification techniques that exploit timing information and other techniques that ignore time context.


Our Method

  • A data collection is under construction. Subjects are a) kids while trying to imitate an actor, b) kids imerged in a VR enviroment. Two cameras, a professional condense microphone, a sweat sensor, and a heart beat sensor were used.
  • Two databases are obtained, namely: a) Danish Emotional Speech (DES), and b) Speech under simulated and Actual Stress (SUSAS).
  • Feature extraction algorithms were developed: a) fundamental frequency (pitch), b) formants from reflection coefficients of the linear prediction model, and c) cepstral coefficients.
  • Feature selection algorithms are improved, namely the Sequential Floating Forward Selection algorithm is improved by statistical comparisons between feature sets using confidence intervals of the prediction error achieved by a feature set.
  • Several classifiers are developed a) Bayes classifier using mixtures of Gaussian densities (GMM), b) Support vector machines, c) Bayes classifier by using Parzen windows, d) neural networks (self organizing maps), e) the Brunswik model for emotion perception is under development.

Downloads

-

Relevant Publications

D. Ververidis and C. Kotropoulos "Fast Sequential Floating Forward Selection applied to emotional speech features estimated on DES and SUSAS data collections", in Proc. of European Signal Processing Conf. (EUSIPCO 2006), Florence, Italy, 4-8 September, 2006.

M. Haindl, P. Somol, D. Ververidis and C. Kotropoulos, "Feature Selection Based on Mutual Correlation", in Proc. 11th Iberoamerican Congress on Pattern Recognition (CIAPR) , Mexico, 2006.

V. Moschou, D. Ververidis, and C. Kotropoulos, "On the Variants of the Self-Organizing Map That Are Based on Order Statistics ", in Proc. 2006 Int. Conf. Artificial Neural Networks, Athens, Sep. 2006.

D. Ververidis and C. Kotropoulos, "Emotional speech classification using gaussian mixture models and the sequential floating forward selection algorithm", in Proc. of 2005 IEEE Int. Conf. on Multimedia and Expo (ICME 2005), Amsterdam, 6-8 July, 2005.

D. Ververidis and C. Kotropoulos, "Emotional speech classification using Gaussian mixture models", in Proc. of2005 IEEE Int. Symposium Circuits and Systems (ISCAS 2005), pp. 2871-2874, Kobe, Japan, May, 2005.

D. Ververidis, C. Kotropoulos and I. Pitas, "Automatic emotional speech classification", in Proc. of ICASSP 2004, vol. I, pp. 593-596, Montreal, Montreal, Canada, May, 2004.

D. Ververidis and C. Kotropoulos "Automatic Speech Classification to five emotional states based on gender information", in Proc. of 12th European Signal Processing Conf. (EUSIPCO '04), pp. 341-344, Vienna, Austria, September, 2004.

D. Ververidis and C. Kotropoulos, "A Review of Emotional Speech Databases", in Proc. of 9th Panhellenic Conf. on Informatics (PCI `03) , pp. 560-574, Thessaloniki, Greece, 21-23 November, 2003.

D. Ververidis and C. Kotropoulos, "A State of the Art Review on Emotional Speech Databases", in Proc. of 1st Richmedia Conf., pp. 109-119, Laussane, Switzerland, October, 2003.

D. Ververidis and C. Kotropoulos, "Emotional Speech Recognition: Resources, features and methods", Elsevier Speech communication, vol. 48, no. 9, pp. 1162-1181, September, 2006.

I. Kotsia, and I. Pitas, "Facial Expression Recognition in Image Sequences using Geometric Deformation Features and Support Vector Machines", IEEE Transactions on Image Processing, December, 2006.


Research Projects

PENED 2003 - “Use of Virtual Reality for training pupils to deal with earthquakes” (01ED312)

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

top

© 2006