Data classification method based on deep learning and graph establishment method

A technology of data classification and deep learning, applied in the field of article data classification of deep learning, can solve problems such as low efficiency, achieve the effect of reducing the amount of calculation and releasing storage space

Pending Publication Date: 2022-01-04
HANGZHOU FANEWS TECH
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problem of low efficiency when clustering algorithms cluster articles in the same industry, this application provides an article classification method, which uses keyword weights and the matching degree of articles and models to classify

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data classification method based on deep learning and graph establishment method
  • Data classification method based on deep learning and graph establishment method
  • Data classification method based on deep learning and graph establishment method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] This application provides a data classification method based on deep learning. First, articles are randomly selected for model building, and then other articles are used to make the model self-learn, and the final article classification model is obtained through iteration, and the final article classification model is used for industry Judgment of the article. Specifically include the following steps:

[0037] Use crawler technology to collect data, use ElasTic Search cluster storage, and use Hanlp tokenizer for full-text retrieval. You can obtain several basic articles in the same industry from the data of industry websites, and use TextRank algorithm to extract several core keywords from several basic articles. word, calculate the weight value of the core keyword, and establish a first weight correspondence table according to the core keyword and the weight value.

[0038] Taking the forestry industry as an example, we selected 5,000 articles from China Forestry Info...

Embodiment 2

[0067] This embodiment provides a method for establishing a map, specifically a method for establishing a knowledge map of a target industry, using the article classification method in Example 1 to obtain article data of a certain industry, and build a knowledge map of the target industry based on the article data. Specifically include the following steps:

[0068] Article sampling is carried out according to the keyword correlation, sampled articles are obtained, several keywords in the sampled articles are extracted, and derivative words of keywords are obtained through mutual information entropy calculation, weight values ​​of all keywords and derivative words are calculated, and Keywords and derived words are sorted according to weight value, and a topological relationship is established between keywords and derived words to form a network structure graph, thereby obtaining an industry knowledge map.

[0069] The method of calculating the weight value is as follows:

[00...

Embodiment 3

[0081] This embodiment provides an article classification device, including a memory and a processor, wherein the memory is used to store a data processing program, and when the data processing program is read and executed by the processor, it executes the method based on deep learning in Embodiment 1 of the right data classification method.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a data classification method based on deep learning and a graph establishment method. The data classification method comprises the steps of extracting core keywords in basic articles, calculating the weight values of the core keywords, establishing a first weight correspondence table, extracting the keywords of each basic article, calculating the industry matching degree of each article according to the first weight correspondence table to obtain a first matching degree threshold value, and iterating the basic article according to the first matching degree; repeating the steps according to the iterated article to obtain a second weight correspondence table and a second matching degree threshold value; and judging whether the new article belongs to the target industry or not by using the second matching degree threshold. According to the method, the keywords are extracted by using the basic article, the keywords are divided into the title keywords and the body keywords, and different regulatory factors are endowed, so that the industry matching degree can be more effectively calculated, more accurate replacement iteration is performed by using the article with higher matching degree, the space for storing historical data is released, and the optimal model is more quickly obtained.

Description

technical field [0001] This application relates to a data classification method based on deep learning and a method for establishing an industry knowledge map based on the aforementioned data classification method, and specifically relates to a self-learning deep learning article data classification method. Background technique [0002] Text clustering technology can be applied to industry data analysis. The system can collect massive article data from various fields through web crawlers every day, and use algorithms to effectively classify these articles, which can help users quickly understand current industry information and efficiently for further analysis. [0003] At present, for data aggregation and data model establishment of specific industries, clustering algorithms are generally used to cluster data, and then manual statistical classification is performed on the clustered data. However, the clustering algorithm needs to save all historical document information, w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F16/332G06F16/36
CPCG06F16/35G06F16/367G06F16/3329
Inventor 姚洲鹏
Owner HANGZHOU FANEWS TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products