Speaker Segmentation

Speaker segmentation aims at finding the speaker change points in an audio stream. It is a prerequisite for audio indexing, speaker identification\ verification\ tracking, automatic transcription, and dialogue detection in movies. A popular method for speaker segmentation is metric-based segmentation, which focuses on segmenting the input audio stream by evaluating its distance from different segmentation models.


Our Method

Our lab is utilizing the Bayesian Information Criterion (BIC) for speaker segmentation.

Two different systems have been developed.

  • The first is a multiple-pass method which uses a fusion scheme
  • The second employs auxiliary second order statistics and T2 Hotelling statistic

A third system is currently under development.


A demo file can be found here.


Relevant Publications

M. Kotti, E. Benetos and C. Kotropoulos, "Automatic Speaker Change Detection with the Bayesian Information Criterion using MPEG-7 Features and a Fusion Scheme", in Proc. of IEEE International Symposium Circuits & Systems(ISCAS 06), 21-24 May, Island of Kos, Greece.

M. Kotti, E. Benetos, C. Kotropoulos and L. G. Martins, "Speaker Change Detection using BIC: A comparison on two datasets", in Proc. of International Symposium Communications, Control and Signal Processing, 2006.

M. Kotti, L. Gustavo, P. M. Martins, E. Benetos, J. S. Cardoso and C. Kotropoulos, "Automatic speaker segmentation using multiple features and distance measures: a comparison of three approaches", in Proc. of the IEEE International Conference on Multimedia and Expo (ICME 2006), Toronto, Ontario, Canada, 9-12 July, 2006.


Research Projects

MUSCLE - “Multimedia Understanding through Semantics, Computation and LEarning” (FP6-507752)

VISNET- European Network of Excellence, funded under the European Commission IST FP6 programme


© 2006