Whole genome association analysis method, system and electronic equipment

A correlation analysis and genome-wide technology, applied in the field of genetic data processing, can solve problems such as unsatisfactory data parts, achieve improved efficiency, overall performance scalability and high efficiency, and achieve the effect of distributed processing

Pending Publication Date: 2019-09-06
SHENZHEN INST OF ADVANCED TECH
View PDF9 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the general computer memory is about 2G, which is far f...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Whole genome association analysis method, system and electronic equipment
  • Whole genome association analysis method, system and electronic equipment
  • Whole genome association analysis method, system and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

[0038] see figure 1 , is a flow chart of the genome-wide association analysis method of the embodiment of the present application. The genome-wide association analysis method of the embodiment of the present application comprises the following steps:

[0039] Step 100: performing gene sequencing on the sample to obtain the original sequencing data of the sample;

[0040] Step 200: Using GATK (The Genome Analysis Toolkit, a software developed by the Broad Institute for next-generation resequencing data analysis) to determine the SNP position of the original sequencing data of the sample, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of gene data processing technology, and particularly relates to a whole genome association analysis method, a system and electronic equipment. The whole genome association analysis method comprises the steps of a, performing SNP site determining on original sequencing data of a sample, and obtaining SNP site information of the sample; b, establishing a coordinateaxis based on reference genome SNP information, and performing characteristic extraction on the SNP site information of the sample according to the coordinate axis based on the reference genome SNP information, and obtaining a characteristic vector of the sample; and c, clustering the characteristic vectors of the sample, obtaining the representative vectors of the sample, and combining the representative characteristic vectors for obtaining a non-redundancy sample. According to the method, through clustering the original data, characteristic expression of the sample is performed, and important characteristics are found, thereby reducing data computing amount; according to the similarity between the samples, the samples with high similarity are combined, and the rest samples are removed,thereby greatly reducing a memory requirement and improving efficiency.

Description

technical field [0001] The present application belongs to the technical field of gene data processing, and in particular relates to a whole genome association analysis method, system and electronic equipment. Background technique [0002] Genome-Wide Association Studies (GWAS) was first proposed in 2005, based on SNP (Single Nucleotide Polymorphism) sequencing technology. In the past ten years, with the rapid development of SNP sequencing technology, the whole genome Association analysis has gradually played an increasingly important role in important economic traits of species, plant breeding, genetic improvement, especially in the analysis of complex human diseases. The purpose of genome-wide association analysis is to find susceptibility locus variants associated with phenotypes across the genome. In recent years, a large number of genes and their interaction detection algorithms have emerged in the field of genome-wide association analysis. Although these algorithms hav...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B20/20G16B30/00G16B40/30
CPCG16B40/00G16B30/00
Inventor 郭宁魏彦杰张慧玲郑志春葛健秋冯圣中
Owner SHENZHEN INST OF ADVANCED TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products