Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Corpus processing method and device, electronic equipment and computer readable storage medium

A processing method and corpus technology, applied in the field of data processing, can solve the problems of labor cost, time cost and financial cost of manual labeling corpus

Pending Publication Date: 2021-05-18
TENCENT TECH (SHENZHEN) CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The application provides a corpus processing method, device, electronic equipment, and computer-readable storage medium, which can solve the problem that manual labeling of corpus requires a lot of labor costs, time costs, and financial costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus processing method and device, electronic equipment and computer readable storage medium
  • Corpus processing method and device, electronic equipment and computer readable storage medium
  • Corpus processing method and device, electronic equipment and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0120] Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present application, and should not be construed as limiting the present invention.

[0121] Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the word "comprising" used in the specification of the present application refers to the presence of the features, integers, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and / or groups thereof. It will be u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a corpus processing method and device, electronic equipment and a computer readable storage medium, and relates to the field of data processing. The method comprises the steps of obtaining a multimedia file meeting a preset condition, obtaining audio data of the multimedia file, then obtaining a subtitle file of the multimedia file, and processing the subtitle file based on a preset first rule to obtain a processed subtitle file, wherein the processed subtitle file comprises at least one subtitle; cutting audio data based on at least one subtitle to obtain at least one audio data segment, and taking the at least one subtitle and the corresponding audio data segment as a first audio subtitle pair to obtain at least one first audio subtitle pair. According to the method and the device, the automatic tagged corpus is obtained, manual participation is not needed, a large amount of labor cost and time cost are saved, the corpus tagging efficiency is greatly improved, the buying expenditure of the corpus is reduced, and a large amount of financial cost is saved.

Description

technical field [0001] The present application relates to the technical field of data processing, and in particular, the present application relates to a corpus processing method, device, electronic equipment, and computer-readable storage medium. Background technique [0002] Currently, the corpus labeling methods on the market are manual labeling, and there are two main methods. One is to prepare part of the appropriate text corpus, cover the pronunciation of the corresponding language as much as possible, and then manually read the audio according to the text to obtain the audio-text pair; the other way is to find the corresponding audio file, perform manual dictation, and mark the correct Text, so as to obtain audio-text pairs; most of the corpus currently on the market is the corpus generated by the first method. [0003] Although the first solution can obtain relatively clean corpus, because it is obtained by reading aloud, its pronunciation situation deviates from th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/289G06N3/04G06N3/08H04N21/488G10L15/26G10L15/02G10L15/06G10L15/16
CPCG06F40/289G06N3/049G06N3/08H04N21/4884G10L15/02G10L15/063G10L15/16G10L2015/025G10L2015/0631G06N3/045Y02D10/00
Inventor 彭俊石吴飞彭艺
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products