::Home|Research Profile|Digital Media |
||
|
MPEG-7 has emerged as the standard for multimedia data content description. As it is in his early age, it tries to evolve in a direction in which semantic content description can be implemented. Although many descriptors (Ds) and description schemes (DSs) provided by the MPEG-7 standard can help to implement semantics of a media, grouping together several mpeg-7 classes can provide better results in the video production and video analysis tasks.
We provide some classes to extend the mpeg-7 standard so it can handle, in a more uniform way, the video media data. Several classes are proposed in this context and we prove that this kind of schemes can provide more flexible tools By those new descriptors we achieve:
Main Characteristics
The Descriptors and Description Schemes
Anthropocentric View
-
N. Vretos, V. Solachidis and I. Pitas, "An Anthropocentric Description Scheme For Movies Content Classification And Indexing" , in Proc. of European Signal Processing Conf. (EUSIPCO 2005) , Antalya, Turkey, 4-8 September, 2005. N. Vretos, V. Solachidis and I. Pitas, "An MPEG-7 Based Description Scheme For Video Analysis using Anthropocentric Video Content Descriptors", in Lecture Notes in Computer Science, Advances in Informatics: 10th Panhellenic Conf. on Informatics, PCI 2005 , vol. 3746 / 2005, pp. 725 - 734, Volos, Greece, 11-13 November, 2005.
NM2 - “New media for a new millennium” (IST-004124), FP6S |
|||||||||||||||||||||||||||||
© 2006 |
|||||||||||||||||||||||||||||
Digital movie archives have become a commonplace nowadays. Research on movie content analysis has been very active. A dialogue scene can be defined as a set of consecutive shots which contain conversations of people. However, there is a possibility of having shots in a dialogue scene that do not contain any conversation or even any person.
Our lab is activated in dialogue detection. In our work, we investigate a novel framework for dialogue detection that is based on indicator functions, that are error-free. An indicator function defines that a particular actor is present at each time instant. Two dialogue detection rules are developed:
A total of 25 dialogue scenes and another 8 non-dialogue scenes have been extracted from 6 movies: “Analyze That”, “Cold Mountain”, “Jackie Brown”, “Lord of the Rings I”, “Platoon”, and “Secret Window”. The total duration of the 33 recordings is 31 min and 7 sec. The probabilities of false alarm and detection are estimated by cross-validation, where 70% of the available scenes are used to learn the thresholds employed in the dialogue detection rules and the remaining 30% of the scenes are used for testing. An almost perfect dialogue detection is reported for every distinct threshold. Database Our lab has developed a dialogue database. In total, 33 recordings were extracted from the following six movies: “Analyze That”, “Cold Mountai”, “Jackie Brown”, “Lord of the Rings I”, “Platoon”, and “Secret Window”. The total duration of the 33 recordings is 31 min and 7 sec. The audio track was digitized in PCM at a sampling rate of 48 kHz and the quantized sample length was 16 bit two-channel. 25 out of the 33 recordings correspond to dialogue scenes, while the remaining 8 do not contain any dialogue. For each recording, the ground truth information, that is the actors that appear in the scene, is determined.
- M. Kotti, C. Kotropoulos, B. Zi´olko, I. Pitas, and V. Moschou, "A Framework for Dialogue Detection in Movies", in Int. Workshop on Multimedia Content Representation, Classification, and Security, Istanbul, Turkey, 2006.
MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752) |
|
||||||||||||||||||||||||||||
© 2006 |
|||||||||||||||||||||||||||||
Indexing and retrieval of digital video is a very active research area. Temporal video segmentation is an important step in many video processing applications. The growing amount of digital video footage is driving the need for more effective methods for shot classification, summarization, efficient access, retrieval, and browsing of large video databases. Shot boundary detection is the first step towards further analysis of the video content.
Two methods for shot boundary detection have been developed. The first approach to shot transition detection in the uncompressed image domain, we have developed, is based on the mutual information and the joint entropy between two consecutive video frames.
The detection technique was tested on the TRECVID2003 video test set having different types of shots and containing significant object and camera motion inside the shots. The application of these entropy-based techniques for shot cut detection was experimentally proven to be very efficient, since they produce false acceptance rates very close to zero. The second approach to automated shot boundary detection is using singular value decomposition (SVD). We have used SVD for its capabilities to derive a refined low dimensional feature space from a high dimensional raw feature space, where pattern similarity can easily be detected.
The method can detect cuts and gradual transitions, such as dissolves, fades and wipes. The detection technique was tested on TV video sequences having various types of shots and significant object and camera motion inside the shots. The experiments demonstrated that, by using the projected feature space we can efficiently differentiate between gradual transitions and cuts, pans, object or camera motion, while most of the methods based on histograms fail to characterize these types of video transitions.
-
Z. Cernekova, I. Pitas and C. Nikou, "Information theory-based shot cut/fade detection and video summarization", IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no.1, page(s): 82- 91, January 2006. Z.Cernekova, C.Kotropoulos and I.Pitas, "Video Shot Segmentation using Singular Value Decomposition", in Proc. of 2003 IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), vol. III, pp. 181-184, Hong-Kong, April 2003 (appears also in Proc. IEEE Multimedia and Expo 2003 (ICME), pp. 301-304, Baltimore , July 2003). Z.Cernekova, C.Kotropoulos and I.Pitas, "Video Shot Boundary Detection using Singular Value Decomposition", in Proc. of 4th European Workshop on Image Analysis for Multimedia Interactive Services(WIAMIS-2003), London, April 2003.
MOUMIR - "Models for Unified Multimedia Information Retrieval", RTN, EC MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752) VISNET - European Network of Excellence, funded under the European Commission IST FP6 programme COST211 - "Redundancy Reduction Techniques and Content Analysis for Multimedia Services" |
|||||||||||||||||||||||||||||
© 2006 |
|||||||||||||||||||||||||||||
The ever-growing amount of digital information has created a critical need for the development of assisting data management algorithms. Scene change detection is employed in order to manage large volumes of audio-visual data. Typically it is a tool aiming to group audio-visual data into meaningful categories and thus provide fast browsing and retrieval capabilities. Video shot and scene detection is essential to automatic content-based video segmentation. A video shot is a collection of video frames obtained through a continuous camera recording. Similar background and motion patterns typify the set of frames within a shot. Video shots usually lead to a far too fine segmentation in terms of the semantic audio-visual data representation. In order to acquire an effective non-linear access to video information, the data are grouped into scenes , where scenes are defined as sequences of related shots chosen according to certain semantic rules.
A novel scene change detection method has been developed where
In order to integrate audio and video information
The method has been tested on the well-established TRECVID2003 database. The results are very promising as higher Recall and Precision rates have been attained than the ones recorded by all contemporary algorithms our algorithm competed against.
-
M. Kyperountas, Z. Cernekova, C. Kotropoulos, M. Gavrielides, and I. Pitas, “Audio PCA in a novel multimedia scheme for scene change detection”, in Proc. of ICASSP 2004, Montreal, May 2004. M. Kyperountas, Z. Cernekova, C. Kotropoulos, M. Gavrielides, and I. Pitas, “Scene change detection using audiovisual clues”, in Proc. of Norwegian Conference on Image Processing and Pattern Recognition (NOBIM 2004), Stavanger, Norway, 27-28 May 2004. M. Kyperountas, C. Kotropoulos and I. Pitas, “Enhanced eigen-audioframes for audiovisual scene change detection”, IEEE Transactions on Multimedia, accepted in 2006.
MOUMIR - "Models for Unified Multimedia Information Retrieval", RTN, EC MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752) VISNET - European Network of Excellence, funded under the European Commission IST FP6 programme |
|||||||||||||||||||||||||||||
© 2006 |
|||||||||||||||||||||||||||||
Personalized content is content which matches a particular context or the user's preferences. Nowadays, the huge amount of multimedia information in industry and home use emerge the need for more accurate and user specific indexing and retrieval applications. The main problem to answer is how users can interact with multimedia and therefore enjoy one personal version of their media in hand. Such examples of content personalization are Interactive TV, automatic home video editing etc. In our laboratory, we concentrate efforts in order to provide solutions to that issue. Manual multimedia annotation is the straightforward approach in order to create the bases for content personalization, but its main bottleneck is intense labour and subjectivity. Our approach is based in outputs from low level feature algorithms and by means of organization of those outputs we provide high semantic entities. This is done as follow:
It should be noted that our aim is to concentrate not only to low-level features but also to high level ones. Although that low level features are easier to be extracted , most of them are usually meaningless to the big majority of both the commercial as well as the home users. On the other hand, high level features are more useful since they are closer to a common user requests. Downloads - Relevant Publications N. Vretos , V. Solachidis and I. Pitas , "An Anthropocentric Description Scheme For Movies Content Classification And Indexing", in Proc. of European Signal Processing Conf. (EUSIPCO 2005), Antalya, Turkey, 4-8 September, 2005. N. Vretos , V. Solachidis and I. Pitas , "An MPEG-7 Based Description Scheme For Video Analysis using Anthropocentric Video Content Descriptors", in Lecture Notes in Computer Science, Advances in Informatics: 10th Panhellenic Conf. on Informatics, PCI 2005, vol. 3746 / 2005, pp. 725 - 734, Volos, Greece, 11-13 November, 2005. Ν . Vretos , V. Solachidis and I. Pitas "A Mutual Information Based Algorithm for Face Clustering", in Proc. of Int. Conf. on Multimedia and Expo ( ICME 2006), Toronto Ontario, Canada, 9-12 July, 2006. I. Cherif, V. Solachidis and I. Pitas , "A Tracking Framework for Accurate Face Localization" , in Proc. of Int. Federation for Information Processing Conf. on Artificial Intelligence (IFIP AI 2006), Santiago, Chile, 21-24 August, 2006. P. Antonopoulos, N. Nikolaidis and I. Pitas , “Hierarchical Face Clustering Using SIFT Image Features”, submitted in Proc. of IEEE Symposium on Computational Intelligence in Image and Signal Processing (CIISP 2007), Honolulu, HI, USA, 2007. I. Cherif, V. Solachidis and I. Pitas , “Shot Type Identification Of Movie Content”, International Symposium on Signal Processing and its Applications, 12 - 15 February 2007, Sharjah , United Arab Emirates (U.A.E.) (accepted for publication).
NM2 - “New media for a new millennium” (IST-004124), FP6S |
|||||||||||||||||||||||||||||
© 2006 |