Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Speaker Localization Method Based on Sound and Image Fusion

A technology of image fusion and positioning method, applied in the direction of positioning, image communication, instruments, etc., can solve the problems of low accuracy, easy to be susceptible to noise reverberation, etc., achieve good robustness, fast focusing speed, and small space and time complexity Effect

Active Publication Date: 2021-04-06
杭州晨安科技股份有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

②The sound source localization method based on high-resolution spectrum is to use the spatial spectrum of the spatial signal correlation matrix received by each microphone array element to solve the correlation matrix to estimate the sound source position. The advantage of this method is that the calculation speed is fast, but the accuracy Low, easily affected by factors such as noise reverberation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Speaker Localization Method Based on Sound and Image Fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The present invention will be described in further detail below through examples, and the following examples are explanations of the present invention and the present invention is not limited to the following examples.

[0050] see figure 1 , the embodiment of the present invention includes the following steps:

[0051] Step 1. Realize sound source localization based on the microphone array, the steps are as follows:

[0052] 1) The quality of the sound positioning effect depends on whether the design of the microphone array shape is reasonable. The array shape usually has a straight shape for horizontal angle positioning, a cross or circle for horizontal and vertical angle positioning, and a horizontal, 3D spheres with vertical and distance positioning, etc. The present invention selects an in-line microphone array, which is defined as A.

[0053] 2) Microphone array A is a collection of multiple microphone array elements A={A 1 ,A 2 ,...,A k ,...,A n}, where n ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a speaker positioning method based on sound and image fusion, which can accurately locate the speaker and smoothly switch to a close-up image of the speaker area. The present invention comprises the following steps: step 1, realizing sound source localization based on a microphone array; step 2, realizing face detection based on an improved YOLO V3 neural network; step 3, setting 2 zoom cameras and 1 fixed-focus camera, And through the sound source localization in step 1 and the face detection in step 2 to locate the speaker, the two zoom cameras are defined as camera 1 and camera 2.

Description

technical field [0001] The invention relates to a speaker positioning method based on sound and image fusion, which is applied in the field of video conference cameras. Background technique [0002] In recent years, in the application scenario of enterprise-level video conferencing, a function that has attracted much attention is the precise positioning of the speaker in the scene and the smooth switching of the speaker's close-up shots. [0003] Imagine a medium-to-large meeting room, generally 5-10 meters long and 4-8 meters wide, with many people participating in the meeting, such as figure 1 As shown, there will be a situation during the meeting that many people take turns to speak, and people except the current speaker only listen and do not speak. For the video output by the video conference camera, the best experience is to only focus on the close-up area of ​​the current speaker, without paying attention to the situation of other listeners. The close-up area of ​​t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G01S5/18H04N7/18G06N3/04
CPCG01S5/18H04N7/181G06V40/161G06V40/10G06N3/045
Inventor 王全强刘红艳毛海滨
Owner 杭州晨安科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products