Annotation database index structure, method for quickly annotating genetic variation and system

An index structure and database technology, applied in the field of bioinformatics, can solve the problems of inability to meet the needs of large-scale whole genome applications, low computing efficiency, and no support for multi-threading, so as to reduce disk read operations, save scanning time, and reduce scanning. effect of time

Active Publication Date: 2019-05-03
深圳市泰尔迪恩生物信息科技有限公司
View PDF9 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, these existing algorithms cannot meet the application requirements of large-scale whole genome level due to low computational efficiency and poor support for database scale expansion when facing large-scale query or annotation database files.
For example: in the face of 9 billion pieces of annotation information (about 300GB after compression) in a genetic locus annotation database Combined Annotation Dependent...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Annotation database index structure, method for quickly annotating genetic variation and system
  • Annotation database index structure, method for quickly annotating genetic variation and system
  • Annotation database index structure, method for quickly annotating genetic variation and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0042] It should be understood that when used in this specification and the appended claims, the terms "comprising" and "comprises" indicate the presence of described features, integers, steps, operations, elements and / or components, but do not exclude one or Presence or addition of multiple other features, integers, steps, operations, elements, components and / or collections thereof.

[0043] It should also be understood that the terminology used ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an index structure of a genome function annotation database, a method for quickly annotating a genetic variation and a system. The index structure comprises a first-grade indexfile based on an annotation database, and a second-grade index file based on the first-grade index file. The first-grade index file comprises a plurality of file blocks. Each file block is composed ofa head and a body. The body is composed of a plurality of lines of compressed data. Each file block corresponds with one compressed block of the annotation database. The second-grade index file is composed of a plurality of lines of data. Each line of data store the position interval of the body data of one file block in the first-grade index file, and a 64-bit virtual file address which can directly address the file block. According to the annotation method, the file address of the data line with the result is found through scanning the two-grade index file, and the annotation database is positioned according to the file address for extracting annotation information and performing annotation on the genetic variation. Compared with direct scanning to the annotation database, the index structure has advantages of greatly reducing magnetic disk reading operations and improving searching speed.

Description

technical field [0001] The invention relates to the technical field of bioinformatics, in particular to an index structure of a genome function annotation database and a method and system for quickly annotating genetic variation. Background technique [0002] Genome functional annotation is the use of bioinformatics methods and tools to annotate the biological functions of all genes or non-coding regulatory elements in the genome, which is a hot spot in current functional genomics research. With the popularization of high-throughput sequencing technology, massive genome annotation databases have been generated and accumulated, and the size of the database is growing in a near-exponential manner, doubling on average in less than 9 months. In addition, the development of personalized medicine has also promoted the application of functional annotation of genomic genetic loci in precision medicine. Data query has gradually approached the scale of the whole genome. For example, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G16B50/30G06F16/13G06F16/188
CPCY02D10/00
Inventor 李俊黄丹丹王思发
Owner 深圳市泰尔迪恩生物信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products