Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice retrieval apparatus, and voice retrieval method

A sound and sound signal technology, applied in the field of sound retrieval devices, can solve the problems of poor retrieval accuracy and the like

Active Publication Date: 2016-06-29
CASIO COMPUTER CO LTD
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In the technique disclosed in Non-Patent Document 1, there is a problem that the search accuracy deteriorates when the speech rate of the voice of the search object is different from that of the query inputter.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice retrieval apparatus, and voice retrieval method
  • Voice retrieval apparatus, and voice retrieval method
  • Voice retrieval apparatus, and voice retrieval method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach 1

[0027] like figure 1 As shown, the voice search device 100 of Embodiment 1 physically includes: ROM (ReadOnlyMemory: Read Only Memory) 1, RAM (RandomAccessMemory: Random Access Memory) 2, external storage device 3, input device 4, output device 5, CPU (Central Processing Unit: central processing unit) 6 and bus 7 .

[0028] ROM1 stores a sound search program. RAM2 is used as a work area of ​​CPU6.

[0029] The external storage device 3 is constituted by, for example, a hard disk, and stores an audio signal to be searched, a monophone model, a triphone model, and phoneme time lengths described later as data.

[0030] The input device 4 is composed of, for example, a keyboard and a voice recognition device. The input device 4 supplies the search word input by the user to the CPU 6 as text data. The output device 5 includes, for example, a screen such as a liquid crystal display, a speaker, and the like. The output device 5 displays text data output by the CPU 6 on a screen,...

Embodiment approach 2

[0102] In Embodiment 1, the case where the speech rate is assumed to be fixed and only one piece of speech rate information is set has been described. Therefore, the speech rate information can only correspond to one kind. However, in actual speech, it is not limited to pronounce the same word at the same speed. For example, if the word "カテゴリ" is uttered at an average speed, it may also be uttered slowly with emphasis. To cope with this, in Embodiment 2, a plurality of utterance time lengths are derived by using a plurality of speech rate information. In Embodiment 2, a case will be described in which three kinds of speech rate information (change rate of duration length) of 0.7 (fast), 1.0 (normal), and 1.4 (slow) are used as speech rate information.

[0103] The voice search device of Embodiment 2 is the same as the voice search device 100 of Embodiment 1, as figure 1 physically constituted as shown. In addition, regarding the functional structure and figure 2 The stru...

Deformed example 1

[0131] The case where the speech search apparatus 100 of Embodiments 1 and 2 uniformly multiplies the change rate by the duration of each state of a phoneme has been described. However, the present invention is not limited thereto. For example, a case where the rate of change is changed for each state of a phoneme will be described.

[0132] use Figure 12 A case where the rate of change is changed for each state of a phoneme will be described. Let α1 be the rate of change for duration T1 of state 1 of the phoneme, α2 be the rate of change for duration T2 of state 2, and α3 be the rate of change for duration T3 of state 3.

[0133] In this modified example, when the length of duration is extended, the rate of change in state 1 is set to 1.3, the rate of change in state 2 is set to 1.6, and the rate of change in state 3 is set to 1.3 for vowels. Regarding consonants, the rate of change in state 1 was set to 1.1, the rate of change in state 2 was set to 1.2, and the rate of c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a voice retrieval apparatus and a voice retrieval method. A conversion unit (112) converts a retrieval character string into a phoneme string. The speech rate information acquisition unit (114) acquires speech rate information corresponding to the speech rate of a voice signal of a retrieval target. A time length variation unit (115) varies the average duration time length of phonemes based on the speech rate information. A time length export unit (116) uses the duration time length after the variation, and exports the utterance time length of the corresponding voice to the retrieval character string. An interval specifying unit (117) specifies a likelihood acquisition interval of voice signals of a plurality of retrieval targets. A likelihood acquisition unit (121) acquires likelihood indicating that the likelihood acquisition interval is the likelihood of an interval of the uttered voice corresponding to the retrieval character string. A determining unit (127) determines, according to the acquired likelihood of selected likelihood acquisition intervals, an estimated interval of the voice corresponding to retrieval character string from the voice signal of the retrieval target.

Description

[0001] This application claims priority based on Japanese Patent Application No. 2014-259418 filed on December 22, 2014, and the contents of the basic application are incorporated in this application as a reference. technical field [0002] The invention relates to a voice retrieval device and a voice retrieval method. Background technique [0003] With the expansion and popularization of multimedia content such as audio and video, high-precision multimedia retrieval technology is required. Among them, a technique of voice retrieval is being studied, which specifies the position where a voice corresponding to a search term (query) set as a search target is emitted from a voice signal. [0004] In voice retrieval, there is no established retrieval method that has sufficient performance compared with character retrieval using image recognition. Therefore, techniques for realizing voice retrieval with sufficient performance have been intensively studied. [0005] For example,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G10L25/54
CPCG06F16/60G06F16/367G06F16/683G10L2015/025G10L25/54
Inventor 富田宽基
Owner CASIO COMPUTER CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products