::Home|Research Profile|ICT and Ageing |
|
|
Computers in these days try to interpret certain human characteristics so as to react better. These characteristics include facial expressions, eyes gaze, body gait, speech etc. . Many applications such as virtual reality, videoconferencing, user profiling, customer satisfaction studies for broadcast and web services and interfaces for people with special needs require efficient facial expression recognition in order to achieve the desired results. The basic facial expressions are defined as six anger, disgust, fear, happiness, sadness and surprise. A set of muscle movements (Facial Action Units-FAUs) was created to produce those facial expressions, forming the Facial Action Coding System ( FACS ). Facial expressions are generally hard to recognize as:
Our Method A novel method that performs facial expression recognition has been developed:
The accuracy achieved is equal to 99.7% when using multi-class SVMs for facial expression recognition and to 95.1% when using two-class SVMs for FAUs detection and afterwards facial expression recognition.
Example of grid tracking for the 6 basic facial expressions
I. Kotsia and I. Pitas, "Real time facial expression recognition from video sequences using Support Vector Machines", in Proc. of Visual Communications and Image Processing (VCIP 2005), Beijing, China, 12-15 July, 2005. I. Kotsia and I. Pitas, "Real time facial expression recognition from image sequences using Support Vector Machines", in Proc. of IEEE Int. Conf. on Image Processing (ICIP 2005), Genova, Italy, 11-14 September, 2005. I. Kotsia and I. Pitas, "Facial Expression Recognition in Image Sequences using Geometric Deformation Features and Support Vector Machines", IEEE Transactions on Image Processing, January, 2007.
SIMILAR - The European research taskforce creating human-machine interfaces SIMILAR to human-human communication, IST, FP6 PENED 01 - Virtual Reality tools for education on natural disasters |
||||||||||||||||||||||
© 2006 |
||||||||||||||||||||||
Speech is different between persons and it also depends on the emotional state of the speaker. The primitive emotional states are anger, happiness, neutral, sadness, and surprise. The task of emotional clustering refers to uniquely assigning emotional feature vectors into the emotional states.
A) Two well-known variants of the self-organizing map (SOM) that are based on order statistics are the marginal median SOM (MMSOM) and the vector median SOM (VMSOM). We employ the MMSOM and the VMSOM to re-distribute emotional speech patterns from the Danish Emotional Speech database that were originally classified as being neutral to four emotional states such as hot anger, happiness, sadness, and surprise. The latter experiment is motivated by the following facts:
It was proven that the marginal median SOM and the vector median SOM perform better than the standard SOM. B) Another important issue arises when the emotional feature vectors are represented as points on the (N - 1)-dimensional simplex and the elements of these patterns are the posterior class probabilities for N classes. Such patterns form N clusters on the (N - 1)-dimensional simplex. The challenge is to reduce the number of clusters to N - 1, in order to redistribute the features classified into a particular class in the N - 1 simplex, according to the maximum a posteriori probability principle in an optimal manner using a SOM. We have accomplished a mathematical derivation of the training algorithm for a SOM that reduces the number of clusters by one on a simplex subspace.
- V. Moschou, D. Ververidis, and C. Kotropoulos, "On the Variants of the Self-Organizing Map That Are Based on Order Statistics ", in Proc. 2006 Int. Conf. Artificial Neural Networks, Athens, Sep. 2006. C. Kotropoulos and V. Moschou, “Self Organizing Maps for Reducing the Number of Clusters by One on Simplex Subspaces“, in Proc. 2006 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 5, pp. 725-728, May 2006.
MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752) Pythagoras II - Funded by the Hellenic Ministry of Education in the framework of the program |
||||||||||||||||||||||
© 2006 |
||||||||||||||||||||||
Affect recognition aims at automatically identifying the emotional or physical state of a human being from his or her face and voice. The emotional and physical states of a speaker are known as emotional aspects of speech and are included in the so-called paralinguistic aspects. Although the emotional state does not alter the linguistic content, it is an important factor in human communication, because it provides feedback information in many applications. Affect Recognition is related to the following tasks:
Downloads - D. Ververidis and C. Kotropoulos "Fast Sequential Floating Forward Selection applied to emotional speech features estimated on DES and SUSAS data collections", in Proc. of European Signal Processing Conf. (EUSIPCO 2006), Florence, Italy, 4-8 September, 2006. M. Haindl, P. Somol, D. Ververidis and C. Kotropoulos, "Feature Selection Based on Mutual Correlation", in Proc. 11th Iberoamerican Congress on Pattern Recognition (CIAPR) , Mexico, 2006. V. Moschou, D. Ververidis, and C. Kotropoulos, "On the Variants of the Self-Organizing Map That Are Based on Order Statistics ", in Proc. 2006 Int. Conf. Artificial Neural Networks, Athens, Sep. 2006. D. Ververidis and C. Kotropoulos, "Emotional speech classification using gaussian mixture models and the sequential floating forward selection algorithm", in Proc. of 2005 IEEE Int. Conf. on Multimedia and Expo (ICME 2005), Amsterdam, 6-8 July, 2005. D. Ververidis and C. Kotropoulos, "Emotional speech classification using Gaussian mixture models", in Proc. of2005 IEEE Int. Symposium Circuits and Systems (ISCAS 2005), pp. 2871-2874, Kobe, Japan, May, 2005. D. Ververidis, C. Kotropoulos and I. Pitas, "Automatic emotional speech classification", in Proc. of ICASSP 2004, vol. I, pp. 593-596, Montreal, Montreal, Canada, May, 2004. D. Ververidis and C. Kotropoulos "Automatic Speech Classification to five emotional states based on gender information", in Proc. of 12th European Signal Processing Conf. (EUSIPCO '04), pp. 341-344, Vienna, Austria, September, 2004. D. Ververidis and C. Kotropoulos, "A Review of Emotional Speech Databases", in Proc. of 9th Panhellenic Conf. on Informatics (PCI `03) , pp. 560-574, Thessaloniki, Greece, 21-23 November, 2003. D. Ververidis and C. Kotropoulos, "A State of the Art Review on Emotional Speech Databases", in Proc. of 1st Richmedia Conf., pp. 109-119, Laussane, Switzerland, October, 2003. D. Ververidis and C. Kotropoulos, "Emotional Speech Recognition: Resources, features and methods", Elsevier Speech communication, vol. 48, no. 9, pp. 1162-1181, September, 2006. I. Kotsia, and I. Pitas, "Facial Expression Recognition in Image Sequences using Geometric Deformation Features and Support Vector Machines", IEEE Transactions on Image Processing, December, 2006.
PENED 2003 - “Use of Virtual Reality for training pupils to deal with earthquakes” (01ED312) MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752) |
||||||||||||||||||||||
© 2006 |
||||||||||||||||||||||
Speech processing has proved to be an excellent tool for voice disorder detection. Among the most interesting recent works are those concerned with Parkinson's Disease (PD), multiple sclerosis (MS) and other diseases which belong to a class of neuro-degenerative diseases that affect patients speech, motor, and cognitive capabilities. Such studies are based on the special characteristics of speech of persons who exhibit disorders on voice and/or speech. They aim at either evaluating the performance of special treatments (i.e. Lee Silverman Voice Treatment [LSVT] ) or developing accessibility in communication services for all persons. Thus, it would possibly be a matter of great significance to develop systems able to classify the incoming voice samples into normal or pathological ones before other procedures are further applied. Our Method We have developed detection algorithms (classifiers) for two voice pathologies:
They are associated with communication deficits that affect the perceptual characteristics of pitch, loudness, quality, intonation, voice-voiceless contrast etc, having similar symptoms with PD and other neuro-degenerative diseases. The main causes of vocal fold paralysis are usually either several surgical iatrogenic injuries or a glitch in the recurrent laryngeal nerve or possibly a lung cancer, while malfunction at the vocal folds due to edema is usually caused by more trivial reasons such as mild laryngeal injuries, common infectious diseases that affect the respiratory system, or allergies in drugs. In particular, we have assessed the performance of
using a multitude of frame-based features or utterance-averaged frame-based features, such as
on the database of disordered speech recorded by Voice and Speech Lab of Massachusetts Eye and Ear Infirmary (MEEI) [sustained vowel “Ah” /a/].
Downloads - Relevant Publications M. Marinaki, C. Kotropoulos, I. Pitas and N. Maglaveras, "Automatic detection of vocal fold paralysis and edema", in Proc. of 8th Int. Conf. Spoken Language Processing (INTERSPEECH 2004), Jeju, Korea, October, 2004. E. Ziogas and C. Kotropoulos, "Detection of vocal fold paralysis and edema using linear discriminant classifiers", in Proc. of 4th Panhellenic Artificial Intelligence Conf. (SETN-06), vol. LNAI 3966, pp. 454-464, Heraklion, Greece, May 19-20, 2006.
MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752) |
||||||||||||||||||||||
© 2006 |