Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Index keyword extraction method and system based on big geological data

An extraction method and technology of keywords, applied in the indexing field of geological big data, can solve the problems of lack of specificity of article content, misuse of tags, missing tags of keywords, etc., and achieve the effect of improving retrieval efficiency.

Inactive Publication Date: 2015-10-28
CHENGDU UNIVERSITY OF TECHNOLOGY
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are several problems that are prone to occur in these two kinds of keywords: first, some authors are not aware of the importance of keyword indexing, and randomly select a few words from the article as keywords, resulting in missing keywords, Indiscriminate labeling; second, some keywords are arranged randomly, regardless of primary and secondary, without hierarchy and logic; third, the keywords provided are selected as keywords without independent retrieval meaning, and lack of understanding of the content of the article. Specificity; the keywords provided in this way, in the retrieval system, make it difficult to find the required accurate information in the database when searching

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Index keyword extraction method and system based on big geological data
  • Index keyword extraction method and system based on big geological data
  • Index keyword extraction method and system based on big geological data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0029] As described in the background technology, the problem to be solved by the present invention is that the keywords in the literature search and indexing appear to be missing or indiscriminate in the keywords of the literature, the keywords are randomly arranged without hierarchy and logic, or the selected keywords lack the meaning of the content of the article. specificity and a series of issues. In view of the above problems, the technical solution of the present invention provides an indexing keyword extraction method based on geological big data, which can perform a series of document segmentation, keyword mining and refining, keyword matching and association exclusion, and keyword ranking optimization. The work is completed automatically, providing a quick and easy solution for the sorting of complicated and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an index keyword extraction method and system based on big geological data. The method comprises the following steps of: importing document and performing geological data format conversion, sentence division, word characteristic tagging and position marking; adding a weight coefficient to the segmented document; initially mining and extracting keywords by a keyword mining algorithm; performing weighted operation on the extracted keyword phrases to obtain the comprehensive weight value of each phrase; initially screening according to the sizes of the comprehensive weight values, and reducing the number of the phrases to be determined; matching the phrases with a word library, and finding terms matched with the phrases or relatively-stipulated terms in the word library; performing association degree calculation on the terms and the document, determining the size of the association degree of the terms, and screening again according to the sizes of the association degree; and finally, sequencing the terms according to industrial characteristics, height correlation, features and the like, and determining the order of the keywords. According to the method, file background and relevance can be considered, a vague keyword is abandoned to the greatest extent, the keywords comprehensively reflecting the core content of the document are provided and are sequenced according to a certain logic order, thus, document retrieval is achieved, convenience is provided for the improvement of retrieval efficiency, and the method is a high-efficient index keyword extraction method for the big geological data.

Description

technical field [0001] The invention relates to the technical field of indexing of geological big data, in particular to a method and system for extracting indexing keywords based on geological big data. Background technique [0002] Geological data in my country mainly include geological data, geological literature, geological databases and geological related data from the Internet. The total amount of geological data nationwide is 430,000, of which 128,000 are preserved in the National Geological Archives, covering 32 provinces, municipalities, autonomous regions, and sea areas, involving my country's surrounding areas (countries), polar regions, and oceans, etc., including 1952 Regional geological data and mineral exploration data since the establishment of the National Geological Archives. The National Pavilion currently has 100,000 types of electronic data, about 4 million electronic files, and a total of about 62.59 million electronic data files. The electronic data st...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
Inventor 梁元郭科唐菊兴
Owner CHENGDU UNIVERSITY OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products