Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for extracting process information in steel material patent text based on improved TextRank algorithm

A technology of process information and text, applied in the field of steel material knowledge map, can solve the problems of many professional terms, ignoring the position of text structure and semantic information, etc.

Inactive Publication Date: 2021-09-03
SHANGHAI UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it only considers the similarity between sentence nodes in the text information, and directly compares the number of common words between sentences when constructing the edge relationship between nodes in the graph model, so as to judge the degree of correlation between the two sentences, while ignoring the The discourse structure of the text and the position and semantic information of sentences in the text
[0006] At the same time, the steel material patent text is different from other texts in other fields. The process description information is relatively concentrated in the text and there are many professional terms, so the existing text information extraction method cannot be directly used for process extraction.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for extracting process information in steel material patent text based on improved TextRank algorithm
  • Method and system for extracting process information in steel material patent text based on improved TextRank algorithm
  • Method and system for extracting process information in steel material patent text based on improved TextRank algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0069] In this example, see figure 1 , a method for extracting process information in steel material patent texts based on the improved TextRank algorithm, including the following steps:

[0070] Step A: Preprocessing the text of steel material process patent documents, mainly including word segmentation, removing stop words and part-of-speech tagging, to obtain the initially screened subject heading set w={w 1 ,w 2 ,...w n};

[0071] Step B: Calculate the TF*IDF value of each word in the subject word set; first calculate the word frequency TF value, and count the number of times the related words in the w set appear in the text; then calculate the inverse document frequency IDF value; where TF represents the subject word The frequency value of each word in the collection, where IDF represents the inverse text frequency value, which is obtained by dividing the total number of texts by the number of texts containing the word, and then taking the logarithm to the base 10 of t...

Embodiment 2

[0080] This embodiment is basically the same as Embodiment 1, especially in that:

[0081] In this example, see figure 1 ,

[0082] In an optional embodiment of the present invention, after the above-mentioned step A obtains the input text, the preprocessing step is specifically:

[0083] Step A1: The word segmentation uses jieba, a Chinese word segmentation tool with good effect, to segment the characters contained in the text;

[0084] Step A2: Summarize the stop vocabulary list according to the characteristics of the process text in the field of steel materials, and use the built stop vocabulary list to remove useless words in the process text. These words are mainly prepositions, particles, and conjunctions;

[0085] Step A3: Use the jieba toolkit to perform part-of-speech tagging on the craft text, remove all non-nouns in the text, and obtain the craft text subject heading set w={w 1 ,w 2 ,...w n};

[0086] Step B is specifically as follows: first calculate the word...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a system for extracting process information in a steel material patent based on an improved TextRank algorithm in the field of steel materials. The method comprises the following steps of: preprocessing an input process related text; calculating a TF * IDF value of each word in the set; converting the words in the set into a vector representation form through a word2vec tool; adding word position information and combining semantic similar words to obtain a final text keyword set; constructing a matrix representation for each sentence in the text; and creating a graph model of an improved TextRank algorithm and performing iteration until convergence. The improvement is that the position of the sentence in the text and the subject term information obtained in the fourth step are fused into graph model vertex calculation, and the edge relation weight in the graph model is obtained by calculating the cosine similarity of a matrix; the sentences with the final score topK of the model in the last step are sorted according to the sorting principle, redundant information in the sentences is removed, and finally reserved process information has coherence. The invention is simple, easy to operate and good in effect.

Description

technical field [0001] The invention belongs to the field of steel material knowledge graphs, and in particular relates to a method for extracting process information in steel material patent texts based on an improved TextRank algorithm. Background technique [0002] The continuous improvement of steel material processing technology has led to the characteristics of various forms and complex content of process knowledge in the field of steel materials. Extracting steel material processing technology information in patent texts is a prerequisite for the final realization of domain process knowledge integration. [0003] However, the text information extraction method has the following shortcomings: [0004] Text information extraction using neural network algorithms requires a large corpus, and there are problems such as long training time and slow extraction of process information, which is not suitable for practical application. [0005] Text information extraction using...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F40/284G06F40/30
CPCG06F16/3346G06F40/284G06F40/30
Inventor 魏晓钱权赵睿丁聪陈永琪
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products