Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text information extraction method and device, server and storage medium

A technology for text information and extraction methods, applied in instruments, special data processing applications, electrical digital data processing, etc., can solve problems such as low processing efficiency, incomplete and accurate keyword or abstract acquisition, and achieve the effect of improving processing efficiency

Pending Publication Date: 2019-03-01
RUN TECH CO LTD BEIJING
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The embodiment of the present invention provides a text information extraction method, device, server, and storage medium, which solves the problems of incomplete and accurate acquisition of keywords or abstracts and low processing efficiency in the process of information extraction using the TextRank algorithm.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text information extraction method and device, server and storage medium
  • Text information extraction method and device, server and storage medium
  • Text information extraction method and device, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] figure 1 It is a flowchart of a method for extracting text information provided in Embodiment 1 of the present invention. The technical solution of this embodiment can be applied to the situation of extracting key information such as keywords in text. The method can be executed by a device for extracting text information, which can be realized by software and / or hardware, and integrated into a server. The method specifically includes the following operations:

[0032] S110. Determine word vectors of candidate words in the text through the Word2Vec model, and determine similarity values ​​between different word vectors.

[0033] Specifically, the text to be processed is crawled by a web crawler, where the text may be news text in different fields. Perform data cleaning on the text to be processed, remove non-text information in the text to be processed, such as punctuation marks, obtain plain text, and split the plain text into complete sentences. Use the word segmen...

Embodiment 2

[0053] figure 2 It is a flowchart of a method for extracting text information provided by Embodiment 2 of the present invention. This embodiment is further optimized on the basis of the foregoing embodiments, and details not described in detail in this embodiment can be found in Embodiment 1. Such as figure 2 As shown, a text information extraction method provided in Embodiment 2 of the present invention specifically includes the following steps:

[0054] S210. Determine word vectors of candidate words in the text through the Word2Vec model, and determine similarity values ​​between different word vectors.

[0055] S220, using word vectors as nodes, and constructing edges between nodes according to similarity values ​​between word vectors, to obtain a candidate word atlas.

[0056] S230. Determine the weight of the candidate words according to the candidate word atlas through the TextRank algorithm.

[0057] S240. Determine the keywords of the text according to the candi...

Embodiment 3

[0071] image 3 It is a schematic structural diagram of a device for providing text information provided by Embodiment 3 of the present invention. Such as image 3 As shown, the device includes:

[0072] The first determination module 310 is used to determine the word vector of the candidate word in the text by the Word2Vec model, and determine the similarity value between different word vectors;

[0073] The first building module 320 is used to use word vectors as nodes, and construct edges between nodes according to similarity values ​​between word vectors, to obtain candidate word atlases;

[0074] The first weight determination module 330 is used to determine the candidate word weight according to the candidate word atlas through the TextRank algorithm;

[0075] The keyword determination module 340 is configured to determine the keywords of the text according to the candidate word weights.

[0076] Optionally, the first building block 320 is specifically used for:

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a text information extraction method and device, a server and a storage medium. The method comprises the steps that word vectors of candidate words in a text are determined through a Word2Vec model, and similarity values between different word vectors are determined; taking the word vectors as nodes, and constructing edges between the nodes according to thesimilarity values between the word vectors to obtain a candidate word graph set; determining candidate word weights according to the candidate word atlas through a TextRank algorithm; and determiningkeywords of the text according to the weights of the candidate words. The method comprises the following steps of: converting candidate words into word vectors by adopting a Word2Vec model; Accordingto the method, the candidate words can be represented through the low-dimensional vectors, the processing efficiency is improved, the association relationship between the candidate words can be vividly reflected through similarity value calculation and image set construction, and finally the weight values of the candidate words are calculated through the TextRank algorithm, so that the keywords of the text are more accurately and comprehensively determined.

Description

technical field [0001] The embodiments of the present invention relate to the technical field of text extraction, and in particular, to a text information extraction method, device, server, and storage medium. Background technique [0002] With the rapid development of the Internet, the functions of the network are becoming more and more comprehensive, and the amount of network article information is also increasing rapidly. However, many online articles have relatively large lengths, and people usually need to consume a lot of time to read the entire article to obtain key news information. For editors or network monitors who need to extract article information, in order to obtain key article information, it takes a lot of time to read large-length articles, which greatly reduces work efficiency. Therefore, the automatic extraction of text keywords and text abstracts greatly shortens the time for people to obtain key information from large-length Internet articles, and also...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/284
Inventor 谢永恒段小文万月亮
Owner RUN TECH CO LTD BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products