Multimedia Information Retrieval

::Home\|Research Profile\|Multimedia Information Retrieval
	Shot Boundary Detection Video Retrieval and Fingerprinting Audio-visual Scene Change Detection

Shot Boundary Detection

Indexing and retrieval of digital video is a very active research area. Temporal video segmentation is an important step in many video processing applications. The growing amount of digital video footage is driving the need for more effective methods for shot classification, summarization, efficient access, retrieval, and browsing of large video databases. Shot boundary detection is the first step towards further analysis of the video content.

Our Method

Two methods for shot boundary detection have been developed.

The first approach to shot transition detection in the uncompressed image domain, we have developed, is based on the mutual information and the joint entropy between two consecutive video frames.

Mutual information (MI) is a measure of the information transported from one frame to the next.
MI is used within the context of this method for detecting abrupt cuts, where the image intensity or color changes abruptly, leading to a low mutual information value.
Joint entropy is used for detecting fades.
- Fade-out, where the visual intensity is usually decreasing to a black image, the decreasing inter-frame joint entropy is used for detection.
- Fade-in, the increasing joint entropy is used for detection.
The entropy measure produces good results, because it exploits the interframe information flow in a more compact way than a frame subtraction.


Time series of the MI from “ABC news” video sequence showing abrupt cuts and one fade		The joint entropy signal from “CNN news” video sequence showing a fade-out and fade-in to the next shot

The detection technique was tested on the TRECVID2003 video test set having different types of shots and containing significant object and camera motion inside the shots. The application of these entropy-based techniques for shot cut detection was experimentally proven to be very efficient, since they produce false acceptance rates very close to zero.

The second approach to automated shot boundary detection is using singular value decomposition (SVD). We have used SVD for its capabilities to derive a refined low dimensional feature space from a high dimensional raw feature space, where pattern similarity can easily be detected.

The method relies on performing SVD on a matrix created from 3D color histograms of single frames.
After performing SVD we preserved only the 10 largest singular values.
In order to detect the video shots, the feature vectors from SVD are processed using a dynamic clustering method.
To avoid the false detections, every two consecutive clusters, obtained by the clustering procedure are in the second phase tested for a possible merging.
Merging is performed in two steps applied consecutively.
- The fist step is using ratio cosine similarity measure between clusters.
- The second step is based on statistical hypothesis testing using the von Mises-Fisher distribution, which can be considered as the equivalent of the Gaussian distribution for directional data.


Projected frame histograms on the subspace defined by the fifth and sixth singular vectors reveal a dissolve pattern between two shots		Fade detection in the sequence “basketball” visualized on the subspace defined by the first and second left singular vectors

The method can detect cuts and gradual transitions, such as dissolves, fades and wipes. The detection technique was tested on TV video sequences having various types of shots and significant object and camera motion inside the shots. The experiments demonstrated that, by using the projected feature space we can efficiently differentiate between gradual transitions and cuts, pans, object or camera motion, while most of the methods based on histograms fail to characterize these types of video transitions.

Downloads

Relevant Publications

Z. Cernekova, I. Pitas and C. Nikou, "Information theory-based shot cut/fade detection and video summarization", IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no.1, page(s): 82- 91, January 2006.

Z.Cernekova, C.Kotropoulos and I.Pitas, "Video Shot Segmentation using Singular Value Decomposition", in Proc. of 2003 IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), vol. III, pp. 181-184, Hong-Kong, April 2003 (appears also in Proc. IEEE Multimedia and Expo 2003 (ICME), pp. 301-304, Baltimore , July 2003).

Z.Cernekova, C.Kotropoulos and I.Pitas, "Video Shot Boundary Detection using Singular Value Decomposition", in Proc. of 4th European Workshop on Image Analysis for Multimedia Interactive Services(WIAMIS-2003), London, April 2003.

Research Projects

MOUMIR - "Models for Unified Multimedia Information Retrieval", RTN, EC

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

VISNET - European Network of Excellence, funded under the European Commission IST FP6 programme

COST211 - "Redundancy Reduction Techniques and Content Analysis for Multimedia Services"

top

Video Retrieval and Fingerprinting

One of the most fundamental technologies necessary for the management of digital video is the retrieval (from a video database) of one or more video segments that the user is interested in. The methods used for approaching video retrieval are similar to those used for the retrieval of other types of multimedia objects, such as images. Retrieval usually follows one of two paradigms:

Query-by-keyword: A video database is annotated with keywords or other metadata. The user then enters the keywords that best describe what he is searching for or some other appropriate metadata. These metadata are then used to perform a textual or symbolic search in the database.
Query-by-example: The videos in a database are characterized with an appropriate set of features, which constitute a representation of the digital item. We call this representation a signature . The user then inputs a video similar to the one that he is searching for. Then, a set of features is extracted from the user video and used to find images or videos with similar features.

Another technology which is useful for the management of video, particularly with respect to rights protection, is fingerprinting. This is defined as the identification of a video segment using a representation called fingerprint , which is extracted from the video content. The fingerprint must uniquely identify a video segment, and must be invariant to manipulation of the video.

Our Method

The general idea of our approach is that the existence of faces of specific individuals can be used to characterize a video segment. Assuming that the faces in the video have been detected and identified, the video signature (or fingerprint) consists of quartets of the following:

Appearance time/frame a
Disappearance time/frame b
Identity of person s(denoted below by color)
Certainty of appearance F

Given the above representation, we compute the similarity of two videos, for a certain displacement d:

where Fi(n,m) is the certainty that person m appears in frame n of video segment i.

Using this representation, our algorithm for retrieval is as follows:

Find, in the query segment, the quartet with the greatest area.
Find, through an index, all database quartets that refer to the same person as the above quartet.
Calculate, for each quartet found, the range of displacements that can result in a match

Match query quartets with compatible (same person) quartets in the database in that range of displacements
For each such pair, compute the area of the overlap of the pulses
The optimal matching location is at the maximum overall overlap

Downloads

Relevant Publications

C. Cotsaces, N. Nikolaidis and I. Pitas, "The use of face indicator functions for video indexing and fingerprinting", in Proc. of Int. Workshop on Content-Based Multimedia Indexing (CBMI 2005), Riga, Latvia, 21-23 June, 2005.

C. Cotsaces, N.Nikolaidis and I.Pitas, "Video Indexing by Face Occurrence-based Signatures", in Proc. of 2006 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Toulouse, France, 14-19 May, 2006.

Research Projects

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

top

Audio-visual Scene Change Detection

The ever-growing amount of digital information has created a critical need for the development of assisting data management algorithms. Scene change detection is employed in order to manage large volumes of audio-visual data. Typically it is a tool aiming to group audio-visual data into meaningful categories and thus provide fast browsing and retrieval capabilities.

Video shot and scene detection is essential to automatic content-based video segmentation. A video shot is a collection of video frames obtained through a continuous camera recording. Similar background and motion patterns typify the set of frames within a shot. Video shots usually lead to a far too fine segmentation in terms of the semantic audio-visual data representation. In order to acquire an effective non-linear access to video information, the data are grouped into scenes , where scenes are defined as sequences of related shots chosen according to certain semantic rules.

Our Method

A novel scene change detection method has been developed where

processes and fuses audio and video information
audio frames are projected by a set of enhanced eigenframes that ‘discovers' the variations of back-ground noise.
scene changes are found by comparison to a reference noise frame.
video information is used to align the audio-detected scene changes, reduce the false alarm rates and identify fading effects, typically used to separate scenes.

In order to integrate audio and video information

If an audio scene change indication is ‘near' a shot change then a scene-cut is set. The rest are rejected as false indications.
the valid indications are further validated by comparing various acoustic features
the qualified scene-cut is set to the location of the relevant shot change in order to mend audiovisual asynchrony.
video fade effects are set to independently indicate scene changes.

The method has been tested on the well-established TRECVID2003 database. The results are very promising as higher Recall and Precision rates have been attained than the ones recorded by all contemporary algorithms our algorithm competed against.

Example of a detected scene change

Downloads

Relevant Publications

M. Kyperountas, Z. Cernekova, C. Kotropoulos, M. Gavrielides, and I. Pitas, “Audio PCA in a novel multimedia scheme for scene change detection”, in Proc. of ICASSP 2004, Montreal, May 2004.

M. Kyperountas, Z. Cernekova, C. Kotropoulos, M. Gavrielides, and I. Pitas, “Scene change detection using audiovisual clues”, in Proc. of Norwegian Conference on Image Processing and Pattern Recognition (NOBIM 2004), Stavanger, Norway, 27-28 May 2004.

M. Kyperountas, C. Kotropoulos and I. Pitas, “Enhanced eigen-audioframes for audiovisual scene change detection”, IEEE Transactions on Multimedia, accepted in 2006.

Research Projects

MOUMIR - "Models for Unified Multimedia Information Retrieval", RTN, EC

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

VISNET - European Network of Excellence, funded under the European Commission IST FP6 programme

top