Graph-based large-scale embedding model training method and system for click rate prediction

A model training and click-through rate technology, applied in prediction, neural learning methods, biological neural network models, etc., can solve the problems that the click-through rate prediction technology cannot be applied to deep learning models, embedding model training, expensive network communication expenses, etc. Achieve the effects of reducing communication overhead, good locality and load balancing, and good scalability

Active Publication Date: 2022-04-15
PEKING UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] In summary, the existing click-through rate prediction technology cannot be applied to deep learning models, and there is expensive network communication overhead in large-scale distributed training scenarios; existing graph processing algorithms are not suitable for the embedding model used for click-through rate prediction. Training; the existing traditional consensus protocols and learning and training systems do not take into account the update dependencies between embeddings; high overhead and low efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Graph-based large-scale embedding model training method and system for click rate prediction
  • Graph-based large-scale embedding model training method and system for click rate prediction
  • Graph-based large-scale embedding model training method and system for click rate prediction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] Below in conjunction with accompanying drawing, further describe the present invention through embodiment, but do not limit the scope of the present invention in any way.

[0039] The present invention provides a graph-based large-scale embedding model training method and system for click-through rate prediction, designs a new graph-based system method, and proposes a new binary graph representation to manage input data and embedding parameters, It can improve the scalability of training large embedding models.

[0040] Based on the newly constructed binary graph of the present invention, it is necessary to partition the graph to reduce the embedding / gradient communication between different working nodes, while achieving the best balanced workload. In order to reduce communication overhead and achieve optimal workload, the present invention designs a hybrid graph partition mechanism based on the embedding model; and the vertex partition method used by the hybrid graph p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a graph-based large-scale embedding model training method and system for click rate prediction, the system comprises a dense parameter module and a client module, a hybrid communication architecture is adopted, a click rate prediction input data set is distributed to different working nodes, each working node maintains a client, and the client module is used for providing an input data set; all local model parameters are directly stored in a GPU memory; and each working node holds a model parameter copy, and synchronization is carried out during training. According to the method, the importance of category characteristic values corresponding to click rate prediction input data is represented by adopting an Embedding model parameter, the click rate prediction data and an Embedding model vector are represented as a binary graph model, and model parallel training is executed by utilizing graph locality and degree deflection characteristics; and graph-based partition and bounded synchronization are designed, so that the expandability and parallel computing efficiency of training a large embedding model are improved.

Description

technical field [0001] The invention belongs to the technical field of distributed machine learning, and relates to a large-scale embedding model training method and system, in particular to a graph-based large-scale embedding model training method and system for click-through rate prediction. Background technique [0002] Embeddings are often used to tackle representation learning problems on high-dimensional data, such as words in text corpora, users, and items in recommender systems. Deep Embedding technology uses continuous vectors to represent discrete variables and has a large number of practical applications, such as click-through rate (CTR) prediction systems, graph processing, and information extraction. However, with the continuous expansion of the scale of the deep embedding model and the increase of the amount of input data, it is more challenging to build a large embedding model training system in terms of effectiveness and efficiency. For example, Facebook's p...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06Q30/02G06Q10/04G06N3/04G06N3/08
Inventor 崔斌苗旭鹏梁宇轩石屹宁张海林
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products