Small sample-based genotype and phenotype association analysis method in multi-omics data

An omics data, small sample technology, applied in the field of genotype and phenotype correlation research in multi-omics data based on small samples, can solve the difficulty of obtaining clinical data, and cannot meet the needs of multi-omics data fusion method data, SNP features are large and other problems, to achieve the effect of improving the prediction accuracy

Active Publication Date: 2021-07-30
NORTHWESTERN POLYTECHNICAL UNIV +1
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Furthermore, the use of multi-omics methods to explore the relationship between genotype and phenotype requires that each omics data be the same sample set. Due to the huge amount of SNP features, the two types of multi-omics analysis methods require a large sample size when building models. , and due to the protection of patients' personal privacy and the requirements of each institution for data, it is difficult to obtain clinical data
Therefore, the public clinical data cannot meet the data requirements of the multi-omics data fusion method in terms of sample size or the number of omics.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Small sample-based genotype and phenotype association analysis method in multi-omics data
  • Small sample-based genotype and phenotype association analysis method in multi-omics data
  • Small sample-based genotype and phenotype association analysis method in multi-omics data

Examples

Experimental program
Comparison scheme
Effect test

specific example

[0057] 1. Data source and preprocessing

[0058] To verify the effectiveness of the method, the present invention uses two sets of data derived from the GEO database (Gene Expression Omnibus database, https: / / www.ncbi.nlm.nih.gov / geo / ) to verify. GSE33356 is studying lung adenocarcinoma. It includes lung cancer patients and their adjacent normal tissues, which are harvested from the patients. Lung tumors and normal specimens from 84 non-smoking female patients with adenocarcinoma were analyzed using Affymetrix SNP 6.0 and Affymetrix U133plus2.0 chips. GSE114269 is the data comparing myeloid breast cancer (MBC) and non-myeloid basal-like breast cancer (non-MBC BLC), with a sample size of 48. The main reason for choosing these two sets of data for experiments is to illustrate that the method of the present invention can be widely applied to such genotype and phenotype classification problems based on small sample multi-attributes.

[0059] The protein network data comes from ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a small sample-based genotype and phenotype correlation analysis method in multi-omics data. The method specifically comprises the following steps: generating a weighted undirected gene association graph by using a protein network and a gene expression value, and clustering the undirected graph by using an SPICi clustering method to generate a gene cluster; screening the gene clusters by using a group Lasso method; obtaining an SNP cluster corresponding to the screened gene cluster through the eQTL data; constructing each SNP cluster, the corresponding gene cluster and the phenotype into a three-layer network class block, performing regression operation on the association relationship between the SNP and the gene in each class block by adopting a sparse partial least square method, and performing operation on the association relationship between the gene and the phenotype by adopting logistic regression; and averaging the obtained prediction results of the blocks to obtain a final prediction result. The method can solve the problem that effective regression cannot be realized due to huge characteristic values under the condition of small samples in a three-layer network; wherein the prediction accuracy is improved, the biological significance is clearer and tissue specificity is considered.

Description

technical field [0001] The invention relates to the field of bioinformatics, in particular to a small sample-based method for researching the association between genotype and phenotype in multi-omics data. Background technique [0002] An important goal of current genetics is to establish a complete functional link between genotype and phenotype, the so-called genotype-phenotype map. Studying the relationship between genotype and phenotype can make the process of genetic variation more clear. Genome-wide association studies (GWAS) between common genotypes and phenotypes are an effective way to reveal the link between an individual's genetic background and a specific disease or trait. Its principle is to find out the difference sites on all genomes, and analyze the correlation between the difference sites and the phenotype. Over the past decade or so, numerous genome-wide association studies have identified many genetic variants associated with complex diseases or other tra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B20/20G16B40/30G16B40/20G16B50/30
CPCG16B20/20G16B40/30G16B40/20G16B50/30
Inventor 郭新鹏宋亚飞刘帅忱刘树慧王艺菲尚学群
Owner NORTHWESTERN POLYTECHNICAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products