Automatic extraction method for key field in network academic report

A technology for academic reports and key fields, applied in the field of text processing in information technology, can solve problems that are not necessarily universal, time-consuming, labor-intensive, and poor in practicability, and achieve the effect of solving universal problems

Active Publication Date: 2017-05-24
黄山市开发投资集团有限公司
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Collecting academic reports on the Internet manually is simple, but it requires a lot of manpower and energy. It is not only time-consuming and laborious, but also unable to collect academic reports on the entire network, which is poor in practicability
When using web crawlers to collect web academic reports, for a specific web academic report site, although the key information of web academic reports can be effectively extracted, it is not necessarily universal for other web academic report sites

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic extraction method for key field in network academic report
  • Automatic extraction method for key field in network academic report

Examples

Experimental program
Comparison scheme
Effect test

example R

[0071] For the content T in the HTML tag corresponding to the combined selector expression TE C Perform null value judgment, if the content T in the HTML tag corresponding to the selector expression TE is combined C If it is empty, go to step 14; otherwise, set the content T in the HTML tag corresponding to the combined selector expression TE C As the content of the report title; to obtain the quasi-report instance R, go to step 15;

[0072] Step 14. Set the content corresponding to the report title keyword in the preliminary report instance R'to the content corresponding to the web page title; thereby obtaining the quasi-report instance R;

[0073] Step 15. Align the content corresponding to the keyword of the report holding time in the report instance R to make a null value judgment. If it is empty, it means that the quasi-report instance R is not a network academic report, and discard the quasi-report instance R, if it is not empty , Continue to determine whether the content corr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic extraction method for a key field in a network academic report. The method is characterized by comprising the steps that 1, academic report sites are collected to form an academic report site database, and the database is used as a crawling seed of a web crawler; 2, the web crawler is utilized to perform report crawling on each report site; 3, each academic report detail page obtained through crawling is subjected to content extraction, wherein extracted content comprises a report title, a report introduction, a reporter, a reporter introduction, report holding time, a report holding place and a report holding organization, and the content is encapsulated and structuralized; 4, the structuralized report content is subjected to data persistence operation; and 5, the steps are repeated till crawling of all the collected sites is completed. According to the method, by sorting out network academic report site information and processing HTML tags in the network academic report content, key information in the network academic report can be effectively extracted.

Description

Technical field [0001] The invention belongs to the field of text processing in information technology, and mainly relates to a method for automatically extracting key fields of online academic report notice information. Background technique [0002] With the rapid development of Internet technology, human society has entered the information age, and a large amount of academic report information is hidden in the huge and complex Internet. Academic reports are aimed at prescribed subject topics. In order to better exchange professional knowledge, academic achievements, experience, and jointly discuss, analyze, and solve problems, relevant researchers and learners participate in academic activities for discussion, demonstration and research. As an important part of academic exchanges, academic reports play a huge role in the dissemination and development of science and technology, and are also an important means of cultivating talents. [0003] Universities and scientific research i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/955G06F16/9577
Inventor 薛峰许剑东王健伟夏帅孙健陈思洋
Owner 黄山市开发投资集团有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products