Generating method and device for training corpus, equipment and storage medium
A training corpus and corpus technology, applied in the field of data processing, can solve the problems of resource consumption, long iteration cycle of speech recognition model, etc., and achieve the effect of saving resources, shortening the iteration cycle, and improving the effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0042] figure 1It is a flowchart of a method for generating training corpus provided in Embodiment 1 of the present invention. This embodiment is applicable to the situation of generating training corpus for speech recognition, and the method can be executed by the device for generating training corpus provided in the embodiment of the present invention. , the device can be implemented in the form of software and / or hardware, and generally can be integrated into a training corpus generation device. The equipment for generating training corpus includes but is not limited to computers and the like. Such as figure 1 As shown, the method of this embodiment specifically includes:
[0043] Step 101. In the user behavior log associated with the target application, dig out multiple pieces of corpus data to be marked. The corpus data includes: the first behavior log containing the user's voice and the corresponding voice recognition result, and the first behavior log time Associated...
Embodiment 2
[0056] Figure 2a It is a flowchart of a method for generating training corpus provided in Embodiment 2 of the present invention. This embodiment can be combined with each optional solution in one or more of the above embodiments. In this embodiment, according to the association relationship between the first behavior log and the second behavior log in each corpus data to be labeled, the The user's speech in each corpus data and the corresponding speech recognition results are determined as positive feedback corpus or negative feedback corpus, which may include: according to the log type of the first behavior log, obtaining the user's expected behavior corresponding to the first behavior log; When the expected behavior matches the second behavior log, the user voice in the corpus data and the corresponding voice recognition result are determined as positive feedback corpus.
[0057] Correspondingly, such as Figure 2a As shown, the method of the present embodiment includes: ...
Embodiment 3
[0074] Figure 3a It is a flowchart of a method for generating training corpus provided by Embodiment 3 of the present invention. This embodiment can be combined with each optional solution in one or more of the above embodiments. In this embodiment, according to the association relationship between the first behavior log and the second behavior log in each corpus data to be labeled, the The user's speech in each corpus data and the corresponding speech recognition results are determined as positive feedback corpus or negative feedback corpus, which may include: if it is determined that the user behavior corresponding to the second behavior log is a correction behavior for the first behavior log within a set time period , the user's speech in the corpus data and the corresponding speech recognition results are determined as negative feedback corpus.
[0075] And, after determining the user's speech in the corpus data and the corresponding speech recognition result as the nega...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com