A device and a method for determining a speech segment with a high degree of accuracy from a sound
signal in which different sounds coexist are provided. Directional points indicating the
direction of arrival of the sound
signal are connected in the temporal direction, and a speech segment is detected. In this configuration, pattern classification is performed in accordance with directional characteristics with respect to the
direction of arrival, and a directionality pattern and a null
beam pattern are generated from the classification results. Also, an average null
beam pattern is also generated by calculating the average of the null beam patterns at a time when a non-speech-like
signal is input. Further, a threshold that is set at a slightly lower value than the average null
beam pattern is calculated as the threshold to be used in detecting the local minimum point corresponding to the
direction of arrival from each null beam pattern, and a local minimum point equal to or lower than the threshold is determined to be the point corresponding to the direction of arrival.