Word vector configuration method and device, storage medium and electronic device

A configuration method and word vector technology, applied in the field of neural networks, can solve problems such as the decrease in the accuracy of training tasks, and achieve the effect of reducing time-consuming and improving accuracy

Pending Publication Date: 2019-11-05
PING AN TECH (SHENZHEN) CO LTD
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Embodiments of the present invention provide a word vector configuration method, device, storage medium, and electronic device to at least solve the problem in the prior art that the accuracy of subsequent training tasks is reduced when the word vectors of unregistered words are configured in a random allocation manner question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word vector configuration method and device, storage medium and electronic device
  • Word vector configuration method and device, storage medium and electronic device
  • Word vector configuration method and device, storage medium and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0024] This embodiment provides a method for configuring word vectors, which can be applied to mobile terminals, handheld terminals or similar computing devices. Running on different computing devices is only a difference in the execution subject of the solution, and those skilled in the art can foresee that running on different computing devices can produce the same technical effect.

[0025] The method for configuring word vectors provided in this embodiment is to disassemble the words that are not registered in the word vector dictionary, search for the registered word with the closest stroke in the word vector dictionary, and then configure the word vectors of words with similar strokes It is the initial word vector of unregistered words, which solves the technical problem that the accuracy of subsequent training tasks decreases when the word vectors of unregistered words are allocated randomly in related technologies, and uses the semantic information carried in Chinese st...

Embodiment 2

[0063] In this embodiment, a device for configuring word vectors is also provided, which is used to implement the above-mentioned embodiment 1 and its preferred implementation mode. For terms or implementation methods not described in detail in this embodiment, please refer to embodiment 1. Relevant descriptions in , those that have already been explained will not be repeated.

[0064] The term "module" as used below is a combination of software and / or hardware that can realize a predetermined function. Although the devices described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware is also conceivable.

[0065] figure 2 is a schematic diagram of a device for configuring word vectors according to an embodiment of the present invention, such as figure 2 As shown, the device includes: a first determination module 10 , a judgment module 20 , a disassembly module 30 , a calculation module ...

Embodiment 3

[0074] An embodiment of the present invention also provides a storage medium, in which a computer program is stored, wherein the computer program is set to execute the steps in any one of the above method embodiments when running.

[0075]In this embodiment, by dismantling the vocabulary that is not registered in the word vector dictionary, searching for the registered word closest to its stroke in the word vector dictionary, and then configuring the word vector of the word with similar strokes as the initial word vector of the unregistered word , which solves the technical problem that the accuracy of subsequent training tasks decreases when the word vectors of unregistered words are allocated randomly in related technologies, and uses the semantic information carried in Chinese strokes to assign initial word vectors to unregistered words, which can Reduce the time-consuming of subsequent training tasks and improve the accuracy of training tasks.

[0076] Optionally, in this ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a word vector configuration method and a device, a storage medium and an electronic device, and the word vector configuration method provided by the invention comprises the steps: determining a first vocabulary of a to-be-configured initial word vector; judging whether the first vocabulary is in a word vector dictionary, and the word vector dictionary is used for storing theone-to-one correspondence relation between multiple vocabularies and multiple word vectors; if it is judged that the first vocabulary is not in the word vector dictionary, executing stroke disassembling on the first vocabulary, and obtaining a stroke sequence; calculating the similarity between the stroke sequence of each vocabulary in the word vector dictionary and the stroke sequence of the first vocabulary; and determining a word vector corresponding to the vocabulary with the highest stroke sequence similarity with the first vocabulary, and configuring the word vector as an initial word vector of the first vocabulary. According to the method and the device, the technical problem that the precision of subsequent training tasks is reduced when the word vectors of the unregistered wordsare configured in a random allocation mode in related technologies is solved.

Description

technical field [0001] The present invention relates to the field of neural networks, in particular to a word vector configuration method, device, storage medium, and electronic device. Background technique [0002] When processing text data, the most basic steps are usually word segmentation and training word vectors (for example, using the word2vec method for training), and then perform subsequent tasks such as text comparison and classification based on word vectors. In the actual processing process, it often happens that the text to be processed contains new words (unregistered words) that are not within the scope of the word vector dictionary. The usual processing method is to randomly assign word vectors to unregistered words. However, random assignment The word vector of the new word does not utilize the semantic information of the new word, resulting in a decrease in the accuracy of subsequent tasks. [0003] Aiming at the above-mentioned problems existing in relate...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06N3/08
CPCG06N3/08
Inventor 郑立颖徐亮阮晓雯
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products