Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Double-integration partial least square modeling method based on Monte Carlo and LASSO

A technique of least squares and modeling methods, applied in the field of analytical chemistry, can solve the problem of low accuracy of modeling and prediction, and achieve the effect of improving prediction ability and prediction accuracy

Active Publication Date: 2017-03-22
TIANJIN POLYTECHNIC UNIV +1
View PDF8 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method not only retains the advantages of the two methods, but also overcomes the shortcomings of the single method of modeling and prediction accuracy is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Double-integration partial least square modeling method based on Monte Carlo and LASSO
  • Double-integration partial least square modeling method based on Monte Carlo and LASSO
  • Double-integration partial least square modeling method based on Monte Carlo and LASSO

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0041] This embodiment is applied to the analysis of near-infrared spectroscopy data to determine the oil content in corn samples. The specific steps are as follows:

[0042] 1) Collect 80 corn samples, use three different near-infrared spectrometers (M5, MP5, MP6) to measure the near-infrared spectrum data of corn, and take the oil content as the target value. The wavelength range of near-infrared spectrum is 2498~1100nm (4003~9091cm -1 ), the sampling interval is 2nm, and there are 700 wavelength data points in total. The data was downloaded from http: / / software.eigenvector.com / Data / Corn / index.html. Using the KS grouping method, 53 samples are used as the training set, and the remaining 27 samples are used as the prediction set. The near-infrared spectra of the training set of the data are as follows figure 2 shown.

[0043] 2) Determine the factor number LV of the PLS model

[0044] Calculate the cross-validation root mean square error (RMSECV) under different number...

Embodiment 2

[0055] This embodiment is applied to the analysis of ultraviolet spectrum data to determine the content of single-ring aromatics in gasoline samples. The specific steps are as follows:

[0056] 1) Collect 115 light gasoline and diesel fuel samples, the ultraviolet spectrum wavelength range is 200-400nm, the sampling interval is about 0.35nm, a total of 572 wavelength data points. The content of single-ring aromatics was determined by HP model G1205A supercritical fluid chromatography instrument (Hewlett-Packard, Palo Alto, Calif). The data is downloaded from http: / / myweb.dal.ca / pdwentze / downloads.html. The training set and prediction set are divided according to the instructions on the Internet, the first 70 samples are used as the training set, and the last 44 samples are used as the prediction set. The training set UV spectrum of this data is as follows Figure 6 shown.

[0057] 2) Determine the factor number LV of the PLS model

[0058] Calculate the cross-validation r...

Embodiment 3

[0069] This embodiment is applied to near-infrared spectrum data analysis to measure the content of sesame oil in the quaternary blend oil sample. The specific steps are as follows:

[0070] 1) Collect 51 samples of quaternary blend oil containing sesame oil, corn oil, soybean oil and rice oil. Use Vertex70 multi-band infrared / near-infrared spectrometer (Bruker, Germany) for near-infrared spectral data measurement, with a wavenumber range of 4000-12000cm -1 , the sampling interval is 1.93cm -1 , a total of 4148 data points. The sesame oil content was taken as the target value. Using the KS grouping method, 34 samples are used as the training set, and the remaining 17 samples are used as the prediction set. The near-infrared spectra of the training set of the data are as follows Figure 10 shown.

[0071] 2) Determine the factor number LV of the PLS model

[0072] Calculate the cross-validation root mean square error (RMSECV) under different numbers of factors, and the n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of analytical chemistry, and in particular relates to a double-integration partial least square modeling method based on Monte Carlo and LASSO. The double-integration partial least square modeling method disclosed by the invention comprises the following steps of: firstly, selecting a certain number of samples as a sample sub-set by adopting a Monte Carlo technology, then, selecting a part of variables as a sample variable sub-set from the sample sub-set by adopting a LASSO technology, repeating for many times, establishing multiple sub-models, and directly averaging prediction results of the models so as to obtain a final prediction result. By means of the method, the prediction capability of the models can be effectively improved; the prediction precision of the models can be increased; and the double-integration partial least square modeling method has the obvious advantages in the aspects of the prediction precision and the stability. The double-integration partial least square modeling method disclosed by the invention is suitable for quantitative analysis of complex samples, such as petroleum, tobacco, foods and traditional Chinese medicines.

Description

technical field [0001] The invention belongs to the technical field of analytical chemistry, and in particular relates to a double-integrated partial least squares modeling method based on Monte Carlo and LASSO. Background technique [0002] Spectral analysis technology has been widely used in agriculture, food, medicine, environment and other fields due to its advantages of simplicity, speed, greenness and non-destructiveness. However, due to the serious overlapping of spectral absorption peaks, weak signal absorption, and serious background interference, chemometrics methods are required for qualitative and quantitative analysis of complex samples. Establishing a model with good stability and high prediction accuracy has always been the key to the quantitative analysis of complex samples. [0003] The traditional modeling method uses a single model to establish a quantitative analysis model between the spectrum and the target value to be measured, and the prediction effec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/50G01N21/31
CPCG01N21/31G06F30/20
Inventor 卞希慧张彩霞徐杨谭小耀陈宗蓬王晨
Owner TIANJIN POLYTECHNIC UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products