It is a robotics visual and auditory system provided with an auditory module (20), a face module (30), a stereo module (37), a motor control module (40), and an association module (50) to control these respective modules. The auditory module (20) collects sub-bands having interaural phase difference (IPD) or interaural intensity difference (IID) within a predetermined range by an active direction pass filter (23a) having a pass range which, according to auditory characteristics, becomes minimum in the frontal direction, and larger as the angle becomes wider to the left and right, based on an accurate sound source directional information from the association module (50), and conducts sound source separation by restructuring a wave shape of a sound source, conducts speech recognition of separated sound signals from respective sound sources using a plurality of acoustic models (27d), integrates speech recognition results from each acoustic model by a selector, and judges the most reliable speech recognition result among the speech recognition results.