A training method for horizontally federated xgboost decision trees

A training method and decision tree technology, applied in machine learning, digital transmission systems, instruments, etc., can solve the problems of excessive leakage of original information of data owners, high number of communication rounds, and inability to effectively protect data, so as to improve privacy Effect of protection strength, high data protection strength, high performance and practicality

Active Publication Date: 2022-05-17
神州融安数字科技(北京)有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] When the data is scattered in different data sources, and multiple parties need to jointly conduct xgboost decision tree training to improve the accuracy of the model, usually, multiple data owners will share the data and then conduct regular decision tree training; but if In this way, in the case of plain text, the data cannot be effectively protected; when data privacy and confidentiality are involved, methods such as privacy calculation can be used to complete the joint decision tree training method of multi-party data, for example, combined with the concept of federated learning, multiple Data owners jointly train machine learning models under the premise of protecting their data privacy
[0003] At present, among the existing horizontal federated learning technical solutions, one class uses only uploading plaintext gradients to the aggregator, which is often used for joint training of linear models and neural networks. The assumption of this type of method to protect data privacy is that gradient learning cannot reveal the original data, but the training process of the decision tree will involve the gradient information of a large number of sub-data sets, so the joint training of this type of method for the decision tree will leak too much original information of the data owner; the other is the use of secure multi-party computing (secure multi-party computation, MPC) technology performs secret sharing of data and then executes the corresponding training circuit, but the number of communication rounds in the training process of this type of method is too high
As mentioned above, whether it is horizontal federated learning based on plaintext gradient or horizontal federated learning based on secure multi-party computing, there are unavoidable deficiencies or defects in decision tree training based on multiple data sources.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A training method for horizontally federated xgboost decision trees
  • A training method for horizontally federated xgboost decision trees
  • A training method for horizontally federated xgboost decision trees

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present invention and to fully convey the scope of the present invention to those skilled in the art.

[0048] The horizontal federated decision tree constructed in the embodiment of the present invention means that multiple data owners jointly train the decision tree model while maintaining the privacy of the original data of each data owner. Each data owner holds the same feature type and label type respectively. the same data set Assuming that the samples between the data sets do not overlap, the goal of the horizontal federated decision tr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a training method for a horizontal federated xgboost decision tree, which relates to the technical field of machine learning and federated learning decision trees. A horizontal federated decision tree scheme is constructed by using threshold homomorphic encryption; The state-of-the-art encryption technology improves the strength of privacy protection, reduces the number of communication rounds and traffic, and improves the performance of the solution running in the Internet environment. The main technical scheme of the present invention is: constructing a system structure including an aggregator and a plurality of participants in advance, and then using the threshold homomorphic encryption technology to construct a horizontal federated learning, and performing the gradient data of the aggregator and each participant in the training process. Encryption, and then jointly decrypt the aggregated ciphertext data to obtain a jointly trained decision tree model while protecting the data privacy of all parties.

Description

technical field [0001] The invention relates to the technical field of federated learning and machine learning decision trees, in particular to a training method for horizontal federated xgboost decision trees. Background technique [0002] When the data is scattered in different data sources, and multiple parties need to jointly conduct xgboost decision tree training to improve the accuracy of the model, usually, multiple data owners will share the data and then conduct regular decision tree training; but if In this way, in the case of plain text, the data cannot be effectively protected; when data privacy and confidentiality are involved, methods such as privacy calculation can be used to complete the joint decision tree training method of multi-party data, for example, combined with the concept of federated learning, multiple Data owners jointly train machine learning models under the premise of protecting their data privacy. [0003] At present, among the existing horiz...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06N20/00H04L9/00
CPCG06N20/00H04L9/008
Inventor 李登峰
Owner 神州融安数字科技(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products