Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Key information identification method based on hierarchical attention and label guide learning

A technology of key information and recognition methods, applied in character and pattern recognition, instruments, unstructured text data retrieval, etc., to achieve the effect of broad application prospects

Pending Publication Date: 2022-03-04
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to overcome the deficiencies of existing document key information identification methods, and to solve the technical problem of document key information efficient identification, creatively propose a key information identification method based on hierarchical attention and label-guided learning

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Key information identification method based on hierarchical attention and label guide learning
  • Key information identification method based on hierarchical attention and label guide learning
  • Key information identification method based on hierarchical attention and label guide learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0068] Taking the 2230 papers on the subject of "covid-19" on the biomedical paper website PubMed as an example, a key information identification method based on hierarchical attention and label-guided learning, such as figure 1 shown, including the following steps:

[0069] Step 1: Literature data collection.

[0070] Use the Selenium WEB automation toolkit crawler to collect the papers and documents published on the PubMed platform, and save them to the computer in pdf format;

[0071] Step 2: Document deconstruction and storage.

[0072] Include the following steps:

[0073] Step 2.1: First, use the fitz toolkit to read English documents page by page, and segment the content of the document at the paragraph level according to the distance between paragraphs to obtain text blocks.

[0074] Then, merge the abnormal block segmentation caused by page changing and inserting tables / pictures, remove irrelevant information including headers and footers, record their coordinates ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a key information identification method based on hierarchical attention and label guide learning, and belongs to the technical field of text mining and information processing. According to the method, a key information recognition framework based on hierarchical attention and label guide learning fusion is adopted, a text representation model is directly applied to limitation of text mining, a word coding layer and a sentence coding layer can fully capture a text organization structure, important words are aggregated into sentence vectors, and then the important sentence vectors are aggregated into text vectors; the word attention layer and the sentence attention layer apply attention mechanisms to a word level and a sentence level respectively, so that more important or secondary important contents can be concerned differently during text representation; a label-guided learning layer is adopted to execute label-based attention coding, text representation is mapped to a label space, and the label-guided learning layer can directly perform joint learning together with context coding. The method has a wide application prospect in the fields of quotation analysis, information retrieval, fine-grained knowledge service and the like.

Description

technical field [0001] The invention relates to an information identification method, in particular to a key information identification method based on hierarchical attention and label-guided learning, and belongs to the technical fields of text mining and information processing. Background technique [0002] Literature analysis plays an important role in promoting scientific and technological innovation and helps researchers to fully understand the development of science and technology. For example, the amount of biomedical literature has continued to grow rapidly in recent years, with an average of more than 3,000 new articles published in peer-reviewed journals every day, excluding preprints and technical reports (such as clinical trial reports, etc.) in various archives. As of January 2019, the biomedical literature database PubMed (https: / / pubmed.ncbi.nlm.nih.gov / ) alone has 29 million articles, and reports containing new discoveries and insights are constantly being ad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/194G06F40/289G06K9/62G06N3/04G06V30/41G06V30/413G06F16/35
CPCG06F40/289G06F40/194G06N3/044G06F18/214
Inventor 牛振东何慧张春霞白思萌易坤
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products