Data classification method based on deep learning and graph establishment method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of data classification and deep learning, applied in the field of article data classification of deep learning, can solve problems such as low efficiency, achieve the effect of reducing the amount of calculation and releasing storage space

Pending Publication Date: 2022-01-04

HANGZHOU FANEWS TECH

View PDF2 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In order to solve the problem of low efficiency when clustering algorithms cluster articles in the same industry, this application provides an article classification method, which uses keyword weights and the matching degree of articles and models to classify

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0036] This application provides a data classification method based on deep learning. First, articles are randomly selected for model building, and then other articles are used to make the model self-learn, and the final article classification model is obtained through iteration, and the final article classification model is used for industry Judgment of the article. Specifically include the following steps:

[0037] Use crawler technology to collect data, use ElasTic Search cluster storage, and use Hanlp tokenizer for full-text retrieval. You can obtain several basic articles in the same industry from the data of industry websites, and use TextRank algorithm to extract several core keywords from several basic articles. word, calculate the weight value of the core keyword, and establish a first weight correspondence table according to the core keyword and the weight value.

[0038] Taking the forestry industry as an example, we selected 5,000 articles from China Forestry Info...

Embodiment 2

[0067] This embodiment provides a method for establishing a map, specifically a method for establishing a knowledge map of a target industry, using the article classification method in Example 1 to obtain article data of a certain industry, and build a knowledge map of the target industry based on the article data. Specifically include the following steps:

[0068] Article sampling is carried out according to the keyword correlation, sampled articles are obtained, several keywords in the sampled articles are extracted, and derivative words of keywords are obtained through mutual information entropy calculation, weight values of all keywords and derivative words are calculated, and Keywords and derived words are sorted according to weight value, and a topological relationship is established between keywords and derived words to form a network structure graph, thereby obtaining an industry knowledge map.

[0069] The method of calculating the weight value is as follows:

[00...

Embodiment 3

[0081] This embodiment provides an article classification device, including a memory and a processor, wherein the memory is used to store a data processing program, and when the data processing program is read and executed by the processor, it executes the method based on deep learning in Embodiment 1 of the right data classification method.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a data classification method based on deep learning and a graph establishment method. The data classification method comprises the steps of extracting core keywords in basic articles, calculating the weight values of the core keywords, establishing a first weight correspondence table, extracting the keywords of each basic article, calculating the industry matching degree of each article according to the first weight correspondence table to obtain a first matching degree threshold value, and iterating the basic article according to the first matching degree; repeating the steps according to the iterated article to obtain a second weight correspondence table and a second matching degree threshold value; and judging whether the new article belongs to the target industry or not by using the second matching degree threshold. According to the method, the keywords are extracted by using the basic article, the keywords are divided into the title keywords and the body keywords, and different regulatory factors are endowed, so that the industry matching degree can be more effectively calculated, more accurate replacement iteration is performed by using the article with higher matching degree, the space for storing historical data is released, and the optimal model is more quickly obtained.

Description

technical field [0001] This application relates to a data classification method based on deep learning and a method for establishing an industry knowledge map based on the aforementioned data classification method, and specifically relates to a self-learning deep learning article data classification method. Background technique [0002] Text clustering technology can be applied to industry data analysis. The system can collect massive article data from various fields through web crawlers every day, and use algorithms to effectively classify these articles, which can help users quickly understand current industry information and efficiently for further analysis. [0003] At present, for data aggregation and data model establishment of specific industries, clustering algorithms are generally used to cluster data, and then manual statistical classification is performed on the clustered data. However, the clustering algorithm needs to save all historical document information, w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/35G06F16/332G06F16/36

CPCG06F16/35G06F16/367G06F16/3329

Inventor 姚洲鹏

Owner HANGZHOU FANEWS TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Data classification method based on deep learning and graph establishment method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology