Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for exploring disease-related SNP combination based on evolutionary algorithm in genome-wide association analysis data

A genome-wide and association analysis technology, applied in the field of evolutionary algorithms, it can solve the problems of GWAS data analysis, reduce the search space, and have high memory requirements, so as to improve accuracy and interpretability, reduce time and space, and reduce memory. effect of demand

Active Publication Date: 2019-02-26
JILIN UNIV
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Exhaustive search, such as the MDR algorithm, needs to calculate the correlation between each SNP combination and the disease. As the order of the considered epistasis increases, the computational complexity is unbearable, and this algorithm often cannot be used effectively. Complete the analysis of GWAS data within a short time
Random search, such as BEAM, this type of algorithm measures the correlation between each SNP and disease through a random sampling process. Although it has relatively high efficiency, it often has low precision
Filtering searches, such as BOOST, FashtChi, these algorithms are often divided into two stages. In the first stage, they will use a simple and fast score to measure each SNP or each SNP combination, and only the SNP or SNP combination that passes the inspection You can enter the second stage. This type of algorithm aims to use some fast indicators to reduce the search space. In the second stage, it only needs to evaluate each SNP combination on a small search space, but due to the first stage The scoring used often cannot achieve satisfactory accuracy, resulting in many important SNPs not being able to enter the second stage for sophisticated analysis, therefore, the accuracy of this type of algorithm is still unsatisfactory
Model-based search, such as ts-RF, AdaBoost, this type of algorithm uses mainstream classifiers to build an accurate classification model on GWAS data, and then uses the score of the variable when the model is built to measure the relationship between each SNP and the disease Correlation, but because the model tends to select variables with higher marginal effects during the construction process, such algorithms perform poorly when exploring SNPs that are associated with diseases but have no marginal effects
Based on evolutionary algorithm search, such as FHSA-SED, MACOED, CSE, this type of algorithm uses an index to evaluate the correlation between SNP combinations and diseases as the objective function, and uses evolutionary algorithms such as genetic algorithm or ant colony algorithm to search and use The SNP combination with the optimal objective function, but the design of the evolutionary algorithm and the selection of the objective function have become the difficulties of research
In summary, although there are currently many algorithms for exploring disease-related SNPs or SNP combinations on GWAS data, their shortcomings are summarized as follows: 1. In terms of accuracy, the measurement of the association between SNP combinations and diseases is often only Choose a metric that does not do a good job of finding combinations of SNPs that do not fit its assumptions about the causative model
2. In terms of memory, many algorithms have high memory requirements when analyzing GWAS data, and most computing platforms cannot run
3. In terms of speed, the computational complexity of most algorithms is relatively high, and even many algorithms cannot complete the analysis of GWAS data within a reasonable time
4. The results of the operation of the algorithm cannot be well explained biologically, resulting in the inability of subsequent researchers to make good use of the results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for exploring disease-related SNP combination based on evolutionary algorithm in genome-wide association analysis data
  • Method for exploring disease-related SNP combination based on evolutionary algorithm in genome-wide association analysis data
  • Method for exploring disease-related SNP combination based on evolutionary algorithm in genome-wide association analysis data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0144] Such as Figure 4 As shown, the CD (Crohn's Disease) data set in the WTCCC1 seven disease data sets is analyzed by the SEE algorithm in the present invention, and the second-order SNP combination related to CD is explored, and finally the SNP in the result is converted into a gene marker according to NCBI , to get a series of gene pairs, this figure is a network drawn based on these gene pairs, where each edge represents that there are at least 4 SNP combinations and CD are related to the two connected genes, by Figure 4 It can be clearly seen that the SEE algorithm determines that genes such as LDB2, LOC107986262, RRP15, and SMG1P5 may play some key roles in CD, which is worthy of further study.

[0145] The technical solution of the present invention aims at the problem of insufficient precision of the current algorithm, and proposes to fuse 8 different indicators for evaluating the relationship between SNP combinations and diseases by using a sorting method, wherein...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for exploring a disease-related SNP combination based on evolutionary algorithm in genome-wide association analysis data. The method comprises the following steps: step 1, initializing group and individual recording tables, calculating an evaluation index of individuals in a group, wherein the evaluation index includes ce, gini, k2, g, cec, ginic, k2c, gc; step 2,sorting and merging the evaluation indexes; step 3, determining whether evolution of the group meets a termination condition, if the termination condition is met, outputting an evolution result; step4, generating a random number between 0 and 1, and determining whether the random number is greater than exploration probability, and determining to use an exploration or utilization method to generate a new individual according to a determination result; step 5, adjusting the new individual, calculating an evaluation index of the adjusted new individual, adding the evaluation index to the individual records, and determining whether eight evaluation indexes of the new individual are greater than a maximum value of a corresponding evaluation index maintained in a current group.

Description

technical field [0001] The invention relates to the technical field of evolutionary algorithms, in particular to a method for exploring disease-related SNP combinations in genome-wide association analysis data based on evolutionary algorithms. Background technique [0002] With the rapid development of high-throughput genotyping technology, more and more case-control data of single-nucleotide polymorphism (single-nucleotide polymorphism SNP) based on the whole genome are emerging The scale often contains thousands of samples and hundreds of thousands of SNPs. Researchers expect to use various methods of statistics, computers and biology to analyze these data and find out the SNPs associated with diseases, so as to further explore the potential of diseases. This research direction is called Genome-wide association study (GWAS). Due to the existence of epistasis, certain SNPs will only show correlation with diseases when they are combined with other SNPs. In order to conduct ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B5/00G16B20/20G16B45/00G16B20/00
Inventor 孙立岩刘桂霞
Owner JILIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products