Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

QSPR method and system for constructing interpretable XGBoost regression model to predict PCE based on SHAP value

A regression model and model technology, applied in the field of prediction of power conversion efficiency, can solve problems such as the difficulty of understanding the internal principles of machine learning models, and achieve the effects of shortening R&D time, high performance, and reducing R&D costs

Pending Publication Date: 2021-12-17
SHANGHAI UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But the internal principles of these machine learning models are difficult to understand

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • QSPR method and system for constructing interpretable XGBoost regression model to predict PCE based on SHAP value
  • QSPR method and system for constructing interpretable XGBoost regression model to predict PCE based on SHAP value
  • QSPR method and system for constructing interpretable XGBoost regression model to predict PCE based on SHAP value

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] In this example, see figure 1 , a QSPR method for constructing an interpretable XGBoost regression model to predict PCE based on SHAP values, including the following steps:

[0042] 1) Use a computer system to search for literature, and search for the structure of N-P dye molecules, electrolyte conditions and their corresponding PCE values ​​from the literature;

[0043] 2) Divide the collected dye molecules into three fragments, the first two fragments are Ds (Doner Space) and Dc (Doner Core), the first two fragments are donors with electron-pushing groups, and the last fragment is A (Accepter ), an acceptor with an electron-withdrawing group, and the last fragment prepares for subsequent interpretation of fragment effects and high-throughput screening;

[0044] 3) Use ChemDraw to draw the fragment structure of the dye molecule, optimize the molecule through MM2-Minimize energy in chem3D, and then use Dragon software to generate a descriptor; optimize the molecule thr...

Embodiment 2

[0055] This embodiment is basically the same as Embodiment 1, especially in that:

[0056] In this embodiment, in the step 1), after searching for documents, the obtained data samples are preprocessed, including sorting out the molecular structure, electrolyte conditions and PCE of the samples, and determining the number of sample data.

[0057] In this embodiment, in the step 6) and step 7), for a certain characteristic variable, use TreeSHAP to calculate the SHAP value corresponding to the variable in all samples, and use its average value as the importance value of the characteristic variable , so as to get a global explanation; use the SHAP nested XGBoost method to screen variables, start from an initial naive model, based on the error of the observation value in the sample set, build a new model for fitting, and add it to the existing model in the form of addition, and iterate this model repeatedly The process forms an integrated model.

[0058] In this example, the opti...

Embodiment 3

[0065] This embodiment is basically the same as the above-mentioned embodiment, and the special features are:

[0066] In this embodiment, the step 2) divides the collected dye molecules into three segments according to the group empirical rules of electron-withdrawing and electron-pushing abilities, the Ds and Dc fragments are electron-pushing donors, and the A fragment is Electron-withdrawing acceptors are prepared for subsequent interpretation of fragments and high-throughput screening; the junctions of fragments are replaced by free radicals. For examples of specific division methods, see figure 2 , instances of subsequent fragments are replaced by letters.

[0067] In this embodiment, in the step 3), use ChemDraw to draw the 2D structure of the collected N-P dye molecules, then simply calculate the optimal structure of the molecule in Chem3D-Calculations-MM2-Minimize Energy, and finally use the Dragon software Generate corresponding descriptors.

[0068] In this embodi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a QSPR (Quantitative Surface Plasmon Resonance) method and system for constructing an interpretable XGBoost regression model based on an SHAP value to accelerate discovery of a high-PCE N-P organic sensitizer. The QSPR method comprises the following steps: establishing a data set sample; segmenting molecular fragments; carrying out Chem3D molecule optimization; generating a descriptor; randomly dividing a training set and a test set; nesting an XGBoost screening variable by using the SHAP, and selecting an optimal variable subset for XGBoost modeling; establishing a rapid forecasting model of the N-P type organic sensitizer by using XGBoost regression; according to the established model, quickly forecasting the PCE of the dye molecules in the test set; constructing a QSPR model according to the influence on the target features fed back by the SHAP, the reference interpretation descriptors and the structures of the corresponding molecular fragments; the python generates a large number of virtual samples, and the established XGBoost model is used for forecasting. Based on the reliable literature true value and the modeling method, the established XGBoost forecasting model of the N-P organic sensitizer has the advantages of convenience, rapidness and no chemical pollution.

Description

technical field [0001] The present invention relates to a method and system for predicting the power conversion efficiency (PCE) of nitrogen perylene (N-P) organic sensitizers in dye-sensitized solar cells (DSSCs), especially to construct an interpretable XGBoost regression model based on SHAP values A Quantitative Structural Relationship (QSPR) Method for Predicting PCE Discovers Highly Efficient Niperylene-Based Organic Dyes. Background technique [0002] With the rapid development of science and society, it is imminent to alleviate energy and pollution problems. Since 1991 O'Regan and Since Michael published his breakthrough results, dye-sensitized solar cells have attracted widespread attention due to their advantages such as low cost, good flexibility, good stability, and high indoor efficiency. As the main component of DSSCs, sensitizers play a leading role in light harvesting, charge transfer, and charge recombination, which greatly affects the key parameter of DS...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G16C20/30G16C20/70
CPCG16C20/30G16C20/70
Inventor 张瑜李敏杰陈慧敏陆文聪杨晨
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products