Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multilevel Email classification method based on Email content

A technology of mail content and classification method, applied in the design field of multi-level mail classification method, can solve problems such as the effect of mail classification, and achieve the effect of increasing balance, good effect, and improving accuracy.

Inactive Publication Date: 2017-02-22
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF5 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is to solve the problem that the imbalance of samples in the prior art has a significant impact on the effect of mail classification, and proposes a multi-level mail classification method based on mail content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multilevel Email classification method based on Email content
  • Multilevel Email classification method based on Email content
  • Multilevel Email classification method based on Email content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Embodiments of the present invention will be further described below in conjunction with the accompanying drawings.

[0024] The invention provides a multi-level mail classification method based on mail content, such as figure 1 shown, including the following steps:

[0025] S1. Preprocessing the original email data set to generate a new email data set Email-Matrix-SVD.

[0026] Such as figure 2 As shown, this step specifically includes the following sub-steps:

[0027] S11. Email analysis.

[0028] Since mail is a semi-structured document and cannot be processed directly as text, it needs to be parsed first. By analyzing the email format, extract the email content, subject, sender address, sending time, recipient address and attachment information in the original email data set, store the extraction results, and generate the email data set EmailDatas.

[0029] S12. Generate an email space vector model.

[0030] Segment the content and subject of the email in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multilevel Email classification method based on Email content. For the problem that imbalance of samples has significant impact on an Email classification effect, a mode of establishing three-level classifiers is employed. According to the mode, a naive bayes algorithm, a support vector machine and a C4.5 algorithm combined voting mode is employed in establishment of a first level classifier; a random forest algorithm is employed in the establishment of a second level classifier; and a liblinear algorithm is employed in the establishment of a third level classifier. According to the method, for design of the classifiers, a multilevel classification mode is employed; levels of the algorithms are increased; through layer by layer filtering, under the condition of ensuring a recall rate, the accuracy of the classifiers is improved gradually; the balance of positive and negative samples is increased continuously; basic balance of the Email data is realized at the last level; the final classification effect is less influenced by the imbalance of the samples; and the relatively good effect is realized in Email classification.

Description

technical field [0001] The invention belongs to the technical field of network communication, and in particular relates to the design of a multi-level mail classification method based on mail content. Background technique [0002] While the network has entered the lives of thousands of people, network communication has become more and more frequent, and mail communication is an important means of network communication. However, while e-mail has gradually become an indispensable and important information exchange tool, the increasing number of e-mails has also brought great troubles to people's life and work. Especially for companies and government agencies, facing a large number of recommendation emails, greeting emails, important emails mixed with them, and urgently needed emails, people need to pay a huge workload, resulting in a lot of human waste and economic losses. . [0003] In order to deal with these problems, email filtering is usually used at present, but the co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L12/58G06K9/62
CPCH04L51/42G06F18/2411G06F18/24155
Inventor 盛泳潘张艳赵鹏谢盈王璐
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products