Shot Boundary Detection

Indexing and retrieval of digital video is a very active research area. Temporal video segmentation is an important step in many video processing applications. The growing amount of digital video footage is driving the need for more effective methods for shot classification, summarization, efficient access, retrieval, and browsing of large video databases. Shot boundary detection is the first step towards further analysis of the video content.


Our Method

Two methods for shot boundary detection have been developed.

The first approach to shot transition detection in the uncompressed image domain, we have developed, is based on the mutual information and the joint entropy between two consecutive video frames.

  • Mutual information (MI) is a measure of the information transported from one frame to the next.
  • MI is used within the context of this method for detecting abrupt cuts, where the image intensity or color changes abruptly, leading to a low mutual information value.
  • Joint entropy is used for detecting fades.
    • Fade-out, where the visual intensity is usually decreasing to a black image, the decreasing inter-frame joint entropy is used for detection.
    • Fade-in, the increasing joint entropy is used for detection.
  • The entropy measure produces good results, because it exploits the interframe information flow in a more compact way than a frame subtraction.


Time series of the MI from “ABC news” video sequence showing abrupt cuts and one fade

The joint entropy signal from “CNN news” video sequence showing a fade-out and fade-in to the next shot


The detection technique was tested on the TRECVID2003 video test set having different types of shots and containing significant object and camera motion inside the shots. The application of these entropy-based techniques for shot cut detection was experimentally proven to be very efficient, since they produce false acceptance rates very close to zero.

The second approach to automated shot boundary detection is using singular value decomposition (SVD). We have used SVD for its capabilities to derive a refined low dimensional feature space from a high dimensional raw feature space, where pattern similarity can easily be detected.

  • The method relies on performing SVD on a matrix created from 3D color histograms of single frames.
  • After performing SVD we preserved only the 10 largest singular values.
  • In order to detect the video shots, the feature vectors from SVD are processed using a dynamic clustering method.
  • To avoid the false detections, every two consecutive clusters, obtained by the clustering procedure are in the second phase tested for a possible merging.
  • Merging is performed in two steps applied consecutively.
    • The fist step is using ratio cosine similarity measure between clusters.
    • The second step is based on statistical hypothesis testing using the von Mises-Fisher distribution, which can be considered as the equivalent of the Gaussian distribution for directional data.



Projected frame histograms on the subspace defined by the fifth and sixth singular vectors reveal a dissolve pattern between two shots


Fade detection in the sequence “basketball” visualized on the subspace defined by the first and second left singular vectors

The method can detect cuts and gradual transitions, such as dissolves, fades and wipes. The detection technique was tested on TV video sequences having various types of shots and significant object and camera motion inside the shots. The experiments demonstrated that, by using the projected feature space we can efficiently differentiate between gradual transitions and cuts, pans, object or camera motion, while most of the methods based on histograms fail to characterize these types of video transitions.





Relevant Publications

Z. Cernekova, I. Pitas and C. Nikou, "Information theory-based shot cut/fade detection and video summarization", IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no.1, page(s): 82- 91, January 2006.

Z.Cernekova, C.Kotropoulos and I.Pitas, "Video Shot Segmentation using Singular Value Decomposition", in Proc. of 2003 IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), vol. III, pp. 181-184, Hong-Kong, April 2003 (appears also in Proc. IEEE Multimedia and Expo 2003 (ICME), pp. 301-304, Baltimore , July 2003).

Z.Cernekova, C.Kotropoulos and I.Pitas, "Video Shot Boundary Detection using Singular Value Decomposition", in Proc. of 4th European Workshop on Image Analysis for Multimedia Interactive Services(WIAMIS-2003), London, April 2003.


Research Projects

MOUMIR - "Models for Unified Multimedia Information Retrieval", RTN, EC

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

VISNET- European Network of Excellence, funded under the European Commission IST FP6 programme

COST211 - "Redundancy Reduction Techniques and Content Analysis for Multimedia Services"


© 2006