In particular, the present invention provides a method and
system for conferencing, including the steps of connecting at least two sites to a conference, receiving at least two video signals and two audio signals from the connected sites, consecutively analyzing the audio data from the at least two sites connected in the conference by converting at least a part of the audio data to acoustical features and extracting keywords and speech parameters from the acoustical features using
speech recognition, and comparing said extracted keywords to predefined words, then deciding if said extracted predefined keywords are to be considered a call for attention based on said speech parameters, and further, defining an image
layout based on said decision, and
processing the received video signals to provide a video
signal according to the defined image
layout, and transmitting the
composite video signal to at least one of the at least two connected sites.