Text importance calculation method and device and equipment and storage medium

A calculation method and importance technology, applied in calculation, text database query, unstructured text data retrieval and other directions, can solve the problem of inaccurate target file and text importance judgment deviation, so as to avoid judgment deviation and obtain efficient acquisition. , Improve the effect of hit accuracy

Active Publication Date: 2019-04-23
RUN TECH CO LTD BEIJING
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the process of realizing the present invention, the inventor found that the prior art has the following defects: In the prior art, the judgment of the importance of the text completely depends on the text content itself , which often leads to the fact that the files obtained after screening are not the target files that you want to obtain, which makes the judgment of the importance of the text deviate
For example, if a user wants to obtain various legal documents and defines "law" as a keyword, the obtained documents may be one or more other types of documents, just because the keyword "law" appears many times in this document itself , so the obtained target file is not accurate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text importance calculation method and device and equipment and storage medium
  • Text importance calculation method and device and equipment and storage medium
  • Text importance calculation method and device and equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] figure 1 It is a flow chart of a method for calculating text importance provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of calculating the importance of text files, and the method can be executed by a text importance calculation device. The device It can be realized by software and / or hardware, and can generally be integrated in a computing device for calculating text importance for one or more text files. The method specifically includes the following steps:

[0029] S110. Acquire a plurality of text files, and perform natural language processing on the plurality of text files to obtain the text content and text format of each text file.

[0030] A text file is a computer file composed of several lines of characters and exists in the computer file system. Generally, the end of the file is indicated by placing a file end mark after the last line of the text file. Corresponding to different application software, there ar...

Embodiment 2

[0055] figure 2 It is a flow chart of a text importance calculation method provided by Embodiment 2 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiments. In this embodiment, only text files including at least one keyword are obtained as The target file set, and calculate the content score and layout score based on the keyword set. Correspondingly, the method in this embodiment specifically includes the following operations:

[0056] S210. Obtain a plurality of text files in the text file library, and perform natural language processing on the plurality of text files to obtain the text content and text format of each text file.

[0057] S220. Acquire a keyword set; add a text file including the at least one keyword in the text content to a target file set, where the keyword set includes at least one keyword.

[0058] For example, the above keyword set includes only one keyword "law", the total number of all text files is 100, ...

Embodiment 3

[0066] image 3 It is a flow chart of a text importance calculation method provided by Embodiment 3 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiments. In this embodiment, all text files are obtained as the target file set, and according to the obtained The content score is calculated for the set of keywords, and the layout score is calculated according to the set rules. Correspondingly, the method in this embodiment specifically includes the following operations:

[0067] S310. Obtain a plurality of text files in the text file library, and perform natural language processing on the plurality of text files to obtain the text content and text format of each text file.

[0068] S320. Acquire a keyword set; extract all text files in the plurality of text files to form the target file set.

[0069] The selection of the target file set has nothing to do with keywords. For example, the above keyword set only includes one keyword "...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text importance calculation method and device and equipment and a storage medium. The method comprises the following steps of obtaining text contents and text formats of a plurality of text files in a text file library; extracting a target file set in the plurality of text files, wherein the target file set is all or part of the plurality of text files; calculating a content importance score of each text file in the target file set according to a content score rule; calculating a layout importance score of each text file in the target file set according to a layout score rule; and according to the content importance score and the format importance score, carrying out importance sorting on each text file in the target file set. According to the technical scheme provided by the embodiment of the invention, the analysis of the importance of each text is realized according to the text content and the text format, the required text file is effectively obtained, andthe screening accuracy of the text file is improved while the manual screening process is avoided.

Description

technical field [0001] The embodiments of the present invention relate to the fields of information retrieval and information classification, and in particular to a text importance calculation method, device, device and storage medium. Background technique [0002] With the rapid development of science and technology and the popularization of Internet technology, the number of text files is increasing day by day. Therefore, how to filter out the most important one or more files among the many text files becomes particularly important. [0003] In the existing technology, information retrieval and information classification technologies are usually used to screen text files and determine their importance. Taking the most common word frequency-reverse file frequency method as an example, by constructing a text file related to industry type, business type or classification type, etc. keyword, calculate the frequency of occurrence of the keyword in each text file, and the propor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/33
CPCG06F40/216
Inventor 万月亮韩石磊火一莽
Owner RUN TECH CO LTD BEIJING
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products