ICT and Ageing

::Home\|Research Profile\|ICT and Ageing
	Facial Expression Recognition Emotional Clustering Speech Emotion Recognition Voice Pathology Detection

Facial Expression Recognition

Computers in these days try to interpret certain human characteristics so as to react better. These characteristics include facial expressions, eyes gaze, body gait, speech etc. . Many applications such as virtual reality, videoconferencing, user profiling, customer satisfaction studies for broadcast and web services and interfaces for people with special needs require efficient facial expression recognition in order to achieve the desired results.

The basic facial expressions are defined as six anger, disgust, fear, happiness, sadness and surprise. A set of muscle movements (Facial Action Units-FAUs) was created to produce those facial expressions, forming the Facial Action Coding System ( FACS ).

An example of each facial expression for a poser from the Cohn-Kanade database

Facial expressions are generally hard to recognize as:

Every person expresses in a different way, no international patterns are available.
The conditions must be ideal, meaning that a full frontal pose of the poser has to be available.
The neutral state has to be found in videos in order to be able to define the fully expressive video frame and thus perform facial expression recognition.
No proper databases available and difficult to create a new one, as supervision from psychologists is required.

Our Method

A novel method that performs facial expression recognition has been developed:

Introduces a new class of Support Vector Machines (SVMs) .
Introduces a subset of Candide grid used for facial expression recognition.
Introduces a new set of simplified rules for facial expression synthesis.
Takes into account the geometrical information of the Candide grid nodes between the first and the last video frame representing the neutral state and the expressive facial expression, respectively.
Uses SVMs as a classifier for the geometrical information extracted.
Classifies deformed grids into facial expressions and detects the FAUs that are activated in the grid under examination. It also performs later facial expression recognition using the detected FAUs.

Examples of grids depicting the 6 basic facial expressions

System architecture for facial expression recognition in facial videos

The accuracy achieved is equal to 99.7% when using multi-class SVMs for facial expression recognition and to 95.1% when using two-class SVMs for FAUs detection and afterwards facial expression recognition.

Downloads

Example of grid tracking for the 6 basic facial expressions

Relevant Publications

I. Kotsia and I. Pitas, "Real time facial expression recognition from video sequences using Support Vector Machines", in Proc. of Visual Communications and Image Processing (VCIP 2005), Beijing, China, 12-15 July, 2005.

I. Kotsia and I. Pitas, "Real time facial expression recognition from image sequences using Support Vector Machines", in Proc. of IEEE Int. Conf. on Image Processing (ICIP 2005), Genova, Italy, 11-14 September, 2005.

I. Kotsia and I. Pitas, "Facial Expression Recognition in Image Sequences using Geometric Deformation Features and Support Vector Machines", IEEE Transactions on Image Processing, January, 2007.

Research Projects

SIMILAR - The European research taskforce creating human-machine interfaces SIMILAR to human-human communication, IST, FP6

PENED 01 - Virtual Reality tools for education on natural disasters

top

Emotional Clustering

Speech is different between persons and it also depends on the emotional state of the speaker. The primitive emotional states are anger, happiness, neutral, sadness, and surprise. The task of emotional clustering refers to uniquely assigning emotional feature vectors into the emotional states.

Our Methods

A) Two well-known variants of the self-organizing map (SOM) that are based on order statistics are the marginal median SOM (MMSOM) and the vector median SOM (VMSOM). We employ the MMSOM and the VMSOM to re-distribute emotional speech patterns from the Danish Emotional Speech database that were originally classified as being neutral to four emotional states such as hot anger, happiness, sadness, and surprise. The latter experiment is motivated by the following facts:

There are emotional facial expression databases such as the Action-Unit coded Cohn-Kanade database, where the neutral emotional class is not represented adequately. Accordingly, facial expression feature vectors are not clustered to the neutral emotional class.
For the emotional speech databases, the utterances are regularly classified as neutral. Accordingly, when the neutral class is not represented in one modality it is difficult to develop multimodal emotion recognition algorithms.
Frequently, the ground truth information related to emotions that is provided by the human evaluators is biased towards the neutral class.

It was proven that the marginal median SOM and the vector median SOM perform better than the standard SOM.

B) Another important issue arises when the emotional feature vectors are represented as points on the (N - 1)-dimensional simplex and the elements of these patterns are the posterior class probabilities for N classes. Such patterns form N clusters on the (N - 1)-dimensional simplex. The challenge is to reduce the number of clusters to N - 1, in order to redistribute the features classified into a particular class in the N - 1 simplex, according to the maximum a posteriori probability principle in an optimal manner using a SOM. We have accomplished a mathematical derivation of the training algorithm for a SOM that reduces the number of clusters by one on a simplex subspace.

Downloads

-

Relevant Publications

V. Moschou, D. Ververidis, and C. Kotropoulos, "On the Variants of the Self-Organizing Map That Are Based on Order Statistics ", in Proc. 2006 Int. Conf. Artificial Neural Networks, Athens, Sep. 2006.

C. Kotropoulos and V. Moschou, “Self Organizing Maps for Reducing the Number of Clusters by One on Simplex Subspaces“, in Proc. 2006 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 5, pp. 725-728, May 2006.

Research Projects

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

Pythagoras II - Funded by the Hellenic Ministry of Education in the framework of the program

top

Speech Emotion Recognition

Affect recognition aims at automatically identifying the emotional or physical state of a human being from his or her face and voice. The emotional and physical states of a speaker are known as emotional aspects of speech and are included in the so-called paralinguistic aspects. Although the emotional state does not alter the linguistic content, it is an important factor in human communication, because it provides feedback information in many applications.

Affect Recognition is related to the following tasks:

Data collection procedures, the kind of speech (natural, simulated, or elicited), the content, and other physiological signals that may accompany the emotional speech.
Short-term features (i.e. features that are extracted on speech frame basis) that are related to the emotional content of speech. The emotions affect the contour characteristics, such as statistics and trends.
Emotion classification techniques that exploit timing information and other techniques that ignore time context.

Our Method

A data collection is under construction. Subjects are a) kids while trying to imitate an actor, b) kids imerged in a VR enviroment. Two cameras, a professional condense microphone, a sweat sensor, and a heart beat sensor were used.
Two databases are obtained, namely: a) Danish Emotional Speech (DES), and b) Speech under simulated and Actual Stress (SUSAS).
Feature extraction algorithms were developed: a) fundamental frequency (pitch), b) formants from reflection coefficients of the linear prediction model, and c) cepstral coefficients.
Feature selection algorithms are improved, namely the Sequential Floating Forward Selection algorithm is improved by statistical comparisons between feature sets using confidence intervals of the prediction error achieved by a feature set.
Several classifiers are developed a) Bayes classifier using mixtures of Gaussian densities (GMM), b) Support vector machines, c) Bayes classifier by using Parzen windows, d) neural networks (self organizing maps), e) the Brunswik model for emotion perception is under development.

Downloads

-

Relevant Publications

D. Ververidis and C. Kotropoulos "Fast Sequential Floating Forward Selection applied to emotional speech features estimated on DES and SUSAS data collections", in Proc. of European Signal Processing Conf. (EUSIPCO 2006), Florence, Italy, 4-8 September, 2006.

M. Haindl, P. Somol, D. Ververidis and C. Kotropoulos, "Feature Selection Based on Mutual Correlation", in Proc. 11th Iberoamerican Congress on Pattern Recognition (CIAPR) , Mexico, 2006.

D. Ververidis and C. Kotropoulos, "Emotional speech classification using gaussian mixture models and the sequential floating forward selection algorithm", in Proc. of 2005 IEEE Int. Conf. on Multimedia and Expo (ICME 2005), Amsterdam, 6-8 July, 2005.

D. Ververidis and C. Kotropoulos, "Emotional speech classification using Gaussian mixture models", in Proc. of2005 IEEE Int. Symposium Circuits and Systems (ISCAS 2005), pp. 2871-2874, Kobe, Japan, May, 2005.

D. Ververidis, C. Kotropoulos and I. Pitas, "Automatic emotional speech classification", in Proc. of ICASSP 2004, vol. I, pp. 593-596, Montreal, Montreal, Canada, May, 2004.

D. Ververidis and C. Kotropoulos "Automatic Speech Classification to five emotional states based on gender information", in Proc. of 12th European Signal Processing Conf. (EUSIPCO '04), pp. 341-344, Vienna, Austria, September, 2004.

D. Ververidis and C. Kotropoulos, "A Review of Emotional Speech Databases", in Proc. of 9th Panhellenic Conf. on Informatics (PCI `03) , pp. 560-574, Thessaloniki, Greece, 21-23 November, 2003.

D. Ververidis and C. Kotropoulos, "A State of the Art Review on Emotional Speech Databases", in Proc. of 1st Richmedia Conf., pp. 109-119, Laussane, Switzerland, October, 2003.

D. Ververidis and C. Kotropoulos, "Emotional Speech Recognition: Resources, features and methods", Elsevier Speech communication, vol. 48, no. 9, pp. 1162-1181, September, 2006.

I. Kotsia, and I. Pitas, "Facial Expression Recognition in Image Sequences using Geometric Deformation Features and Support Vector Machines", IEEE Transactions on Image Processing, December, 2006.

Research Projects

PENED 2003 - “Use of Virtual Reality for training pupils to deal with earthquakes” (01ED312)

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

top

Voice Pathology Detection

Speech processing has proved to be an excellent tool for voice disorder detection. Among the most interesting recent works are those concerned with Parkinson's Disease (PD), multiple sclerosis (MS) and other diseases which belong to a class of neuro-degenerative diseases that affect patients speech, motor, and cognitive capabilities. Such studies are based on the special characteristics of speech of persons who exhibit disorders on voice and/or speech. They aim at either evaluating the performance of special treatments (i.e. Lee Silverman Voice Treatment [LSVT] ) or developing accessibility in communication services for all persons. Thus, it would possibly be a matter of great significance to develop systems able to classify the incoming voice samples into normal or pathological ones before other procedures are further applied.

Our Method

We have developed detection algorithms (classifiers) for two voice pathologies:

vocal fold paralysis (male speakers)
vocal fold edema (female speakers)

They are associated with communication deficits that affect the perceptual characteristics of pitch, loudness, quality, intonation, voice-voiceless contrast etc, having similar symptoms with PD and other neuro-degenerative diseases. The main causes of vocal fold paralysis are usually either several surgical iatrogenic injuries or a glitch in the recurrent laryngeal nerve or possibly a lung cancer, while malfunction at the vocal folds due to edema is usually caused by more trivial reasons such as mild laryngeal injuries, common infectious diseases that affect the respiratory system, or allergies in drugs.

In particular, we have assessed the performance of

Fisher's linear classifier
K-nearest neighbor classifier
nearest mean classifier
linear sample-based parametric classifier
dual-space linear discriminant analysis-based classifier

using a multitude of frame-based features or utterance-averaged frame-based features, such as

14-order linear prediction coefficients (LPCs) extracted on frame-basis (every 20msec) that reduced by Principal Component Analysis
Mel-Frequency Cepstral Coefficients (MFCCs) + Delta + Delta-Delta
Perceptual Linear Prediction (PLPs) coefficients
RASTA coefficients
Minimum Variance Distortion-less Response (MVDR) spectrum coefficient

on the database of disordered speech recorded by Voice and Speech Lab of Massachusetts Eye and Ear Infirmary (MEEI) [sustained vowel “Ah” /a/].


(a)		(b)
Receiver operating characteristic curves using a linear sample-based parametric classifier fed with frame-based PLPs for: (a) vocal fold edema detection (b) vocal fold paralysis detection


(a)		(b)
Receiver operating characteristic curves using a linear sample-based parametric classifier for vocal fold paralysis detection (a) when it is fed with frame-based MFCCs (blue curve) and frame-based LPCs (red curve) and (b) when it is fed with utterance-averaged MFCCs (blue curve) and utterance-averaged LPCs (red curve)

Downloads

Relevant Publications

M. Marinaki, C. Kotropoulos, I. Pitas and N. Maglaveras, "Automatic detection of vocal fold paralysis and edema", in Proc. of 8th Int. Conf. Spoken Language Processing (INTERSPEECH 2004), Jeju, Korea, October, 2004.

E. Ziogas and C. Kotropoulos, "Detection of vocal fold paralysis and edema using linear discriminant classifiers", in Proc. of 4th Panhellenic Artificial Intelligence Conf. (SETN-06), vol. LNAI 3966, pp. 454-464, Heraklion, Greece, May 19-20, 2006.

Research Projects

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

top