Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data processing method and device

A data processing and processing unit technology, applied in the field of data processing, can solve the problems of PLM natural language understanding ability bottleneck and single mode

Active Publication Date: 2020-09-01
HUAWEI TECH CO LTD
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The mask training samples obtained in this way will have the problem of single mode. Therefore, using such mask training samples to train PLM will bring a bottleneck in the natural language understanding ability of PLM.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device
  • Data processing method and device
  • Data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] The technical solution in this application will be described below with reference to the accompanying drawings.

[0071] Natural language processing (NLP) is a technology that enables computers to understand and process human natural language, and is an important technical means to realize artificial intelligence (AI). For example, NLP can cover a variety of downstream tasks such as sentiment analysis, part-of-speech analysis, intent analysis, named entity recognition, reading comprehension, logical reasoning, machine translation, or conversational robots. The pre-trained language model (pertrained language model, PLM) is an important general-purpose model in the NLP field that has emerged in recent years. PLM has a good effect on downstream tasks in most NLP fields.

[0072] The commonly used training scheme for PLM is called masked language model (MLM). The training principle of MLM is to enable PLM to learn the ability to capture text context information.

[0073]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a data processing method and device, and relates to the field of artificial intelligence, in particular to natural language processing. The method comprises the steps of determining an original text sample, wherein the original text sample is not subjected to mask processing; and performing mask processing on the original text sample to obtain a mask training sample, the mask processing enabling the mask proportion of the mask training sample to be unfixed, and the mask training sample being used for training a pre-training language model (PLM). According to the method,the PLM is trained by using the mask training sample with the unfixed mask proportion, so that the mode diversity of the PLM training sample can be enhanced, features learned by the PLM can be diversified, the generalization ability of the PLM can be improved, and the natural language understanding ability of the PLM obtained by training can be improved.

Description

technical field [0001] This application relates to the field of artificial intelligence, in particular to a data processing method and device. Background technique [0002] Natural language processing (NLP) is a technology that enables computers to understand and process human natural language, and is an important technical means to realize artificial intelligence. The pre-trained language model (pertrained language model, PLM) is an important general model in the NLP field that has emerged in recent years. The PLM training program is a research hotspot in this field. The PLM training program has two improvement directions: first, to improve the natural language understanding ability of PLM; second, to speed up the model training speed (that is, to speed up the model convergence speed). The commonly used training scheme for PLM is called masked language model (MLM). [0003] The training principle of MLM is to enable PLM to learn the ability to capture text context informa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/211G06F40/295G06F40/30
CPCG06F40/211G06F40/295G06F40/30G06F40/242G06F40/237G06F40/274G06N3/0895G06N3/096G06F40/103G06F40/279G06N3/08
Inventor 廖亿李博文郑豪蒋欣刘群
Owner HUAWEI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products