Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and apparatus for convert text into word embedding and text classification

A word segmentation method and text technology, applied in the field of text processing, can solve difficult problems such as conversion and embedding of Chinese text, achieve better classification effect and improve accuracy

Active Publication Date: 2019-02-01
ADVANCED NEW TECH CO LTD
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present application provides a method and device for converting text into word embedding and text classification, which are used to solve the problem that it is difficult to effectively convert Chinese text into actual required word embedding when converting Chinese text into word embedding

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and apparatus for convert text into word embedding and text classification
  • A method and apparatus for convert text into word embedding and text classification
  • A method and apparatus for convert text into word embedding and text classification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0061] In natural language processing, when converting Chinese text into word embeddings, first, the Chinese text can be segmented to obtain multiple word segments; secondly, multiple word segments are converted into word embeddings to obtain multiple word embeddings; finally , multiple word embeddings are used as word embeddings corresponding to the Chinese text.

[0062] In the existing technology, in order to facilitate the conversion of Chinese text into word embeddings, an open source word embedding library can be established, in which word embeddings corresponding to different word segmentations can be stored, so that when converting Chinese texts into word embeddings , after the Chinese text is segmented and multiple word segments are obtained, the word embeddings corresponding to the multiple word segments can be searched from the word embedding database, and the found multiple word embeddings can be used as the word embeddings of the Chinese text.

[0063] However, in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method and apparatus for convert text into word embedding and text classification include obtaining text to be processed; According to the application scenario corresponding to the text to be processed, a word segmentation method corresponding to the application scenario is adopted for word segmentation of the text to be processed to obtain a plurality of word segmentation; Searching for word embedding corresponding to the application scenario and the plurality of word segments from a predetermined word embedding library, wherein the word embedding library stores word embedding correspondingto different word segments under different application scenarios; The searched words are embedded as words corresponding to the text to be processed.

Description

technical field [0001] This application involves the field of text processing technology, and especially involves a method and device that transforms text into words into words, text classification methods and devices. Background technique [0002] Word Embedding can also be called word vector, which is a collective name for language models and representation learning technology in Natural Language Treatment (NLP, NATURALALANGE PROCESSING). When processed by text, the text is usually transformed into a word, and the text is processed accordingly on the basis of word embedding. [0003] For Chinese text, when converting Chinese text into words, the existing method is usually segmentation of Chinese texts, and on the basis of word division, Chinese text is converted into words. However, in practical applications, there are many words in Chinese, and different words can be used to obtain different word division results. Different segmentation results can correspond to differen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F17/27
CPCG06F40/289
Inventor 袁锦程王维强许辽萨赵闻飙易灿叶芸
Owner ADVANCED NEW TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products