Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Geological ontology-based geological report text information extraction method

A technology of text information and extraction methods, applied in the fields of instruments, digital data processing, computing, etc., can solve the problems of difficult management of unstructured data, reducing the efficiency of answering queries and retrieving statistical information, and increasing the difficulty of retrieving and mining data, etc. To achieve the effect of saving manpower

Active Publication Date: 2020-02-14
CHINA UNIV OF GEOSCIENCES (WUHAN)
View PDF11 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Structured data is usually stored and managed using relational or spatial databases, but the nature of unstructured data makes them difficult to manage through virtual applications
There are many types of unstructured data and scattered information, and the information is often richer than structured data and has greater potential value. Using traditional file systems to manage these data will reduce the efficiency of answering queries and retrieving statistical information, increasing Difficulty in retrieving and mining data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Geological ontology-based geological report text information extraction method
  • Geological ontology-based geological report text information extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] In order to make the purpose, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

[0026] Please refer to figure 1 , the embodiment of the present invention provides a method for extracting geological report text information based on geological ontology, comprising the following steps:

[0027] S1. Document preprocessing: convert the file type of the collected geological report documents into the data source format, and then use natural language processing tools to segment the documents in the data source format into sentences, word segmentation, stop words removal, and part-of-speech tagging , to obtain sequence text for information extraction.

[0028] The specific process of the step S1 is: converting the original geological report documents in different formats into a text document (txt format), and removing the graphics in the do...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a geological ontology-based geological report text information extraction method, which comprises the steps of S1, performing preprocessing operation on a geological report document, converting a file type into a data source format, and performing sentence segmentation, word segmentation, stop word removal and part-of-speech tagging; S2, constructing a place name dictionarylibrary and a geological entity dictionary library by utilizing the structured information, expanding on the basis of the existing geological domain ontology, and forming a place name ontology and a geological time ontology; and S3, extracting geological entity information, space-time relationship information and attribute information from the geological report text through a mode matching methodand a rule matching method. The method has the advantages that a large number of training data sets do not need to be marked manually, and manpower and material resources are saved; an existing information extraction model is abstracted and improved, and a certain thought is provided for information extraction in other fields.

Description

technical field [0001] The invention relates to the field of geological information retrieval, in particular to a geological report text information extraction method based on geological ontology. Background technique [0002] For a long time, through a series of geological survey projects and geological data collection and submission mechanism, the geological survey field has accumulated a large amount of geological survey data, and gradually formed a system of various geological professional databases and geological "content" composed of unstructured data. Library". As a typical representative of big data, geological big data is mainly composed of two parts. One is the structured space with normalized definitions and well-structured in various professional databases that have been formed in the existing geological field. Data sets, this type of data usually has a predefined schema for storage and retrieval; the other type is an unstructured text data set composed of texts...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/279G06F40/242G06F40/151
Inventor 邱芹军谢忠吴亮陶留峰罗菁李孜轩曹豪豪
Owner CHINA UNIV OF GEOSCIENCES (WUHAN)
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products