Protein degradation targeted chimera connector generation method based on deep reinforcement learning

A technology of reinforcement learning and protein degradation, applied in the field of artificial intelligence and protein degradation targeting chimera design, can solve the problems of difficult generalization, inability to generate molecules, inability to use PROTAC design, etc., and achieve the effect of improving robustness

Pending Publication Date: 2022-03-11
SUN YAT SEN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Although these two deep generative models have achieved good results in generating connectors after training on ChEMBL and ZINC datasets, both methods are difficult to generalize to the generation of PROTAC connectors
The SMILES generated by SyntaLinker on the PROTAC dataset has very low legitimacy and cannot get the correct linker at all
However, DeLinker is trained in a graph-based manner, and its legality is guaranteed, but the structure of the generated linker is quite different from the real linker, and it cannot be applied to the real PROTAC design.
At the same time, the two acquire the molecular distribution of the training set, so the properties of the generated linkers can only be close to the training set, and molecules with any desired properties cannot be generated, so as to solve some problems such as the oral absorption and availability of PROTAC.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Protein degradation targeted chimera connector generation method based on deep reinforcement learning
  • Protein degradation targeted chimera connector generation method based on deep reinforcement learning
  • Protein degradation targeted chimera connector generation method based on deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] Such as figure 1 As shown, a method for generating protein degradation targeting chimera linkers based on deep reinforcement learning, the method includes the following steps:

[0054] Construct the ZINC data set, as follows: download molecular SMILES with a molecular weight greater than 500 from the ZINC database, use the matching molecular pair cutting algorithm (MMPs) to cut the downloaded molecules, and obtain the first fragment SMILES containing cuts, linkers, Fragment Molecule Quadruples for Second Fragment SMILES, Whole Molecule Containing Cuts. If multiple quadruples are obtained by cleavage of the same molecule, only the quadruple with the longest linker is kept.

[0055] Subsequently, according to the properties of PROTAC warheads, linkers, E3 ligase ligands, and complete PROTACs, PROTAC-like molecules were screened from the ZINC database, and a ZINC data set was constructed, which contained about 220,000 PROTAC-like molecules.

[0056] The properties of the...

Embodiment 2

[0084] A computer system includes a memory, a processor, and a computer program stored in the memory and operable on the processor. When the processor executes the computer program, the steps of the method are as follows:

[0085] Use the data enhancement method to expand the constructed ZINC data set and PROTAC data set, and use the expanded first ZINC data set and the first PROTAC data set as a supervised training set;

[0086] Build a Transformer model and set the number of training steps, number of network layers, number of attention layers and optimizer parameters;

[0087] Use the first ZINC data set as input to train the Transformer model, and continuously update the parameter values ​​of the Transformer model according to the objective function and backpropagation algorithm training;

[0088] Input the first PROTAC data set into the updated Transformer model, continue to use the above-mentioned objective function and the above-mentioned backpropagation algorithm to tra...

Embodiment 3

[0092] A computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method steps implemented are as follows:

[0093] Use the data enhancement method to expand the constructed ZINC data set and PROTAC data set, and use the expanded first ZINC data set and the first PROTAC data set as a supervised training set;

[0094] Build a Transformer model and set the number of training steps, number of network layers, number of attention layers and optimizer parameters;

[0095] Input the first ZINC data set into the Transformer model, use the objective function and backpropagation algorithm to train, and continuously update the parameter values ​​​​of the Transformer model;

[0096] Input the first PROTAC data set into the updated Transformer model, continue to use the above-mentioned objective function and the above-mentioned backpropagation algorithm to train, further update the network weight, transfer the Tr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a protein degradation targeted chimera linker generation method based on deep reinforcement learning, which comprises the following steps: expanding a constructed ZINC data set and a PROTAC data set by using a data enhancement method, and taking the obtained first ZINC data set and first PROTAC data set as a supervision training set; constructing a Transform model, and setting a training step number, a network layer number, an attention layer number and optimizer parameters; training a Transform model by using the first ZINC data set, and training and updating the Transform model by using an objective function and a back propagation algorithm; the first PROTAC data set is used for training the updated Transform model, the Transform model is migrated to a PROTAC target domain, and a Prior prior model is obtained; inputting the segment pair SMILES into a Prior prior model for batch generation, scoring the generated PROTAC by using a scoring function, introducing a strategy gradient algorithm of reinforcement learning, and updating an Agent model; repeating until the PROTAC score is no longer increased or the training step number is reached; and the updated Agent model is utilized to realize large-scale generation of the protein degradation targeted chimera linker conforming to expected attributes under the condition of given fragment pairs.

Description

technical field [0001] The present invention relates to the technical field of artificial intelligence and protein degradation targeting chimera design, and more specifically, to a method for generating protein degradation targeting chimera linkers based on deep reinforcement learning. Background technique [0002] Protein degradation targeting chimeras (PROTACs), bifunctional small molecules capable of inducing degradation of proteins of interest (POIs), have shown intense interest in drug discovery in recent years. PROTAC mainly consists of three parts: a ligand (warhead) targeting POI, a ligand that recruits E3 ubiquitin ligase, and a linker connecting the two. Due to this bifunctional structural property, PROTAC can simultaneously bind POI and E3 ligase to form a ternary compound, and make POI be labeled with ubiquitin, and then sent to the ubiquitin protein system to be degraded by the 26S proteosome. This process is different from the occupancy-driven approach of trad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B40/20G16B50/30G16B15/30G06N3/08G06N3/04
CPCG16B40/20G16B50/30G16B15/30G06N3/084G06N3/047G06N3/045
Inventor 杨跃东谭游海郑双佳戴凌雪
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products