Data processing method and device

A data processing and corpus technology, applied in the field of data processing, can solve the problems of only identifying specific types of entities, not making full use of existing corpus, wasting resources, etc., to achieve convenient and fast expansion, avoid duplication of labeling work, and avoid waste of resources Effect

Pending Publication Date: 2020-09-22
ALIBABA GRP HLDG LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, such entities are generally Chinese names of people (such as Zhang San), place names (Hangzhou City), and organization names (Alibaba Co., Ltd.). Mai Ti), Japanese company organizations (such as ** Co., Ltd.), British place names (such as Manchester County), etc., often the recognition accuracy will be greatly reduced
[0003] In order to solve the above problems, it is usually possible to relabel a batch of corpus containing such entities for model training according to the required scenarios. However, relabeling corpus requires a lot of manpower and financial resources; the existing corpus cannot be fully utilized and resources are wasted; generalization Relatively weak, can only recognize specific types of entities
[0004] In view of the fact that the training corpus of the entity recognition model in the related technology only contains entities such as Chinese names, place names, and organization names, it is costly and resource-intensive to relabel the names of people of other ethnic groups or the translated names of other national entities. s solution

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device
  • Data processing method and device
  • Data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] According to an embodiment of the present application, an embodiment of a data processing method is provided. It should be noted that the steps shown in the flowcharts of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and, although A logical order is shown in the flowcharts, but in some cases the steps shown or described may be performed in an order different from that shown or described herein.

[0032] The method embodiment provided in Embodiment 1 of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. figure 1 A block diagram of a hardware structure of a computer terminal (or mobile device) for realizing the data processing method is shown. Such as figure 1As shown, the computer terminal 10 (or mobile device 10) may include one or more (shown by 102a, 102b, ..., 102n in the figure) processor 102 (the processor 102 may include but not limit...

Embodiment 2

[0078] According to the embodiment of the present application, an embodiment of a data processing method is also provided. It should be noted that the steps shown in the flow chart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and, Although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.

[0079] Figure 5 It is a flowchart of a data processing method according to Embodiment 2 of the present application. Such as Figure 5 As shown, the method may include the following steps:

[0080] Step S502, obtaining the original corpus and entity set, wherein the entity set includes: multiple entities of the target category, and the multiple entities are different from the entities in the original corpus.

[0081] Optionally, the above target category may include at least one of the following: person na...

Embodiment 3

[0099] According to an embodiment of the present application, a data processing device for implementing the above data processing method is also provided, such as Figure 6 As shown, the apparatus 600 includes: an acquisition module 602 , a processing module 604 and a determination module 606 .

[0100] Wherein, the acquiring module 602 is used to acquire the first corpus, wherein the first corpus is obtained at least according to the entity of the target category in the original corpus; the processing module 604 is used to process the first corpus using the text classification model to obtain the first The probability value of the corpus, wherein the probability value is used to characterize the matching degree between the first corpus and the original corpus; the determination module 606 is used to determine the first corpus as the training corpus when the probability value is greater than or equal to the preset probability value.

[0101] Optionally, the above target catego...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data processing method and device. The method comprises the steps of obtaining a first corpus, and the first corpus is obtained at least according to entities of a target category in an original corpus; processing the first corpus by using a text classification model to obtain a probability value of the first corpus, the probability value being used for representing a matching degree of the first corpus and the original corpus; and under the condition that the probability value is greater than or equal to a preset probability value, determining the first corpus as a training corpus. According to the method and the device, the technical problems that the training corpus of the entity recognition model only contains entities such as Chinese names, place names and organization names in related technologies, names of other ethnic groups or translation names of entities of other countries are labeled again, and relatively high cost and resources are consumed are solved.

Description

technical field [0001] The present application relates to the field of data processing, and in particular, to a data processing method and device. Background technique [0002] Entity recognition refers to the recognition of entities with specific meanings in text, generally including names of people, places, institutions, proper nouns, etc. It is one of the basic tasks of natural language processing, and usually includes two parts: entity boundary recognition and entity category determination. In entity recognition tasks, commonly used entities refer to names of people, places, organizations, etc., and the public Chinese entity recognition training corpus contains these three types of entities at the same time. However, such entities are generally Chinese names of people (such as Zhang San), place names (Hangzhou City), and organization names (Alibaba Co., Ltd.). Maiti), Japanese company organizations (such as ** Co., Ltd.), British place names (such as Manchester County),...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/36G06F16/35
CPCG06F16/367G06F16/35
Inventor 马春平谢朋峻王潇斌李林琳
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products