Method for predicting mutagenicity of chemicals through machine learning algorithm

A technology of mutagenicity and machine learning, applied in the field of ecological risk assessment testing strategies, can solve the problems of chemical mutagenicity bias, experiments without coverage to detect mutagenic types, and one-sided prediction results, achieving low-cost and high-efficiency predictions , Clarify the effect of the scope of application

Active Publication Date: 2021-03-09
DALIAN UNIV OF TECH
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These deficiencies are mainly reflected in the following aspects: First, when previous studies used calculation methods to predict the mutagenicity of chemicals, the training data used for modeling were mostly single experimental data or a combination of two experimental data, without coverage An experiment that detects all types of mutagenesis, resulting in a predicted result representing whether the chemical is mutagenic only on DNA or only on chromosomes
Prediction results are one-sided
Second, most of the prediction models established in the past did not characterize the application domains used by the models, resulting in large deviations in the prediction of the mutagenicity of some chemicals in the process of using the models

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for predicting mutagenicity of chemicals through machine learning algorithm
  • Method for predicting mutagenicity of chemicals through machine learning algorithm
  • Method for predicting mutagenicity of chemicals through machine learning algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] Given a compound dinitrosocaffeine (CAS number: 145438-97-7), to predict its mutagenicity, first calculate its molecular fingerprint according to the Smiles code of dinitrosocaffeine, using the RDkit software package, and then Calculate its similarity with each molecule in the training set. It is calculated that there are 5 molecules in the training set whose similarity is greater than 0.25, so it is in the application domain. Based on its molecular fingerprints, predictions were made using the GBDT model. The result was 1, indicating that the compound is mutagenic. The predicted results are the same as the experimental results.

Embodiment 2

[0040] Given a compound p-anisidine (CAS No.: 104-94-9), to predict its mutagenicity, first calculate its molecular fingerprint using the RDkit software package according to the Smiles code of p-anisidine, and then calculate its difference with the training set. According to the calculation of the similarity of molecules, there are 267 molecules in the training set whose similarity is greater than 0.25, so they are in the application domain. Based on its molecular fingerprints, predictions were made using the GBDT model. The result was 1, indicating that the compound is mutagenic. The predicted results are the same as the experimental results.

Embodiment 3

[0042] Given a compound 10,10-dimethylundecane-1-amine (CAS number: 68955-53-3), to predict its mutagenicity, first according to 10,10-dimethylundecane-1 - The Smiles code of the amine, use the RDkit software package to calculate its molecular fingerprint, and then calculate its similarity with each molecule in the training set. Calculated, there are 91 molecules in the training set with a similarity greater than 0.25, so it is in the application domain . Based on its molecular fingerprints, predictions were made using the GBDT model. The result is 0, indicating that the compound is not mutagenic. The predicted results are the same as the experimental results.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of ecological risk evaluation test strategies, and discloses a method for predicting mutagenicity of chemicals through a machine learning algorithm. On the basis ofthe known compound molecular structure, the mutagenicity of the compound can be quickly and efficiently predicted by calculating the molecular fingerprint and applying the constructed method. The method is simple and quick, is low in cost, and can save manpower, material resources and financial resources required by experiment testing. The construction process of the method comprises the followingsteps: collecting chemical mutagenicity data; preprocessing the data; calculating a molecular fingerprint; selecting a machine learning algorithm and training a model; selecting indexes such as accuracy to evaluate the model; characterizing the application domain; in the construction method, inputting molecules to be detected, and outputting mutagenicity of the molecules to be detected. The prediction model established by the method has good fitting capability, robustness and prediction capability, can effectively predict mutagenicity of the chemicals, provides necessary basic data for risk evaluation and management of the chemicals, and has important significance.

Description

technical field [0001] The invention relates to a method for predicting the mutagenicity of chemicals by establishing a QSAR model, and belongs to the field of ecological risk assessment test strategies. Background technique [0002] Mutagenicity refers to the ability to induce genetic damage. Genetic material can be changed in several ways: gene mutation; chromosomal aberration; chromosome number change. Gene mutation refers to the sudden and heritable variation of genomic DNA molecules. These include substitutions of base pairs as well as additions or deletions of base pairs. Chromosomal aberrations are changes in the structure of chromosomes. Therefore, mutagens that can cause mutations can be divided into two categories. One is a mutagen that acts directly on DNA, and the other is a mutagen that acts on chromosome replication or division. [0003] There are different mutagenicity detection methods for mutagens with different modes of action. For mutagens acting on ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G16B20/50G16B5/00G06N20/00G06K9/62
CPCG16B20/50G16B5/00G06N20/00G06F18/22
Inventor 陈景文吴思甜
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products