Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Network context text recognition method and device and storage medium

A text recognition and text technology, applied in biological neural network models, semantic analysis, natural language data processing, etc., can solve problems such as slow model convergence, complex semantic composition, and ignoring the semantic information of Chinese characters

Active Publication Date: 2020-08-25
XIAMEN MEIYA PICO INFORMATION
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, compared with English, the semantic composition of Chinese is more complicated. Chinese words are composed of Chinese characters, and the semantics of Chinese characters are generally related to the meaning of their word-forming components. If the CBOW model is directly used to learn Chinese word vectors, the potential semantics of Chinese characters will be ignored. information, the generalization ability of the word vector model is weak
And the recognition of text is not necessarily effective, the convergence speed of the model is slow during training, etc., there is an urgent need for a new Chinese word vector model to solve one or more of the above technical defects

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network context text recognition method and device and storage medium
  • Network context text recognition method and device and storage medium
  • Network context text recognition method and device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0079] The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain related inventions, rather than to limit the invention. It should also be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

[0080] It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other. The present application will be described in detail below with reference to the accompanying drawings and embodiments.

[0081] figure 1 A text recognition method in a network context of the present invention is shown, the method includes:

[0082] Modeling step S101, constructing a style semantic model based on a long text window, and constructing a radical-level semantic model based on a sh...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a network context text recognition method and device and a storage medium. The method comprises the steps: constructing a style semantic model based on a text long window, and constructing a component semantic model based on a text short window; performing training based on the style semantic model vector model and the component semantic model by using a corpus of the network context to obtain a Chinese word vector model of the network context; and using the Chinese word vector model of the network context to identify an input text of the network context and outputting an identification result. According to the method, two different windows are used during word segmentation; the long window is used for extracting semantic information of a networked style; the text short window is used for extracting semantic features with different fine grit, the long window and the short window are combined in the training stage,more accurate word vector expression is obtained,so that the text recognition rate of the network context is improved. According to the method, a target function is optimized, the model training speed is increased, a radical meaning transfer methodis established during training, and therefore the text recognition rate is increased.

Description

technical field [0001] The invention relates to the technical field of text data processing, in particular to a text recognition method, device and storage medium in a network context. Background technique [0002] Text vector representation has always been an important direction in the research of computer technology and artificial intelligence technology, and it is also a major challenge in natural language analysis and processing. The quality of text vectorization representation directly affects the performance of subsequent natural language analysis models. The vectorized representation of text first adopted the one-hot encoding model, which gradually evolved into the Bag of Words model. These representation methods are simple and clear in thinking, and they better solve the problem of representing text in computers. problem, but does not consider the semantic correlation between word contexts in language and the time series characteristics of language, which splits the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/289G06F40/30G06F40/126G06F40/216G06N3/04
CPCG06F40/289G06F40/30G06F40/126G06F40/216G06N3/047G06N3/045
Inventor 陈思萌何星赵建强陈诚邓叶勋郑伟斌刘晓芳张辉极杜新胜
Owner XIAMEN MEIYA PICO INFORMATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products