Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

LF entropy-based DNA sequence similarity detection method

A DNA sequence and inspection method technology, applied in the field of biological information processing, can solve the problems of time-consuming, time-consuming calculation process, incomplete description of DNA sequence information, etc., so as to avoid loss and improve calculation speed and accuracy.

Active Publication Date: 2017-04-05
FUJIAN NORMAL UNIV
View PDF2 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The calculation process of these methods is very time-consuming, and matching in large databases is very time-consuming
The K-tuple algorithm is a very commonly used method for sequence similarity search. The DNA sequence passes through a sliding window of length K, and each DNA sequence in the window is a tuple, but it has been proved in many studies that the simple The K-tuple algorithm cannot completely describe the information contained in the DNA sequence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • LF entropy-based DNA sequence similarity detection method
  • LF entropy-based DNA sequence similarity detection method
  • LF entropy-based DNA sequence similarity detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] Such as figure 1 Shown, the DNA sequence similarity checking method based on LF entropy of the present invention, it may further comprise the steps:

[0038] (1) Obtain the original DNA sequence from the DNA fragment;

[0039] (2) Map the original DNA sequence according to the L-Gram model to obtain the corresponding digital sequence, preprocess the word length to L, and obtain |Σ| L words to be processed;

[0040] (3) calculate the LF value of each word to be processed in the digital sequence, obtain the LF value sequence X={X of each word to be processed 1 ,X 2 ...X n}, where n is the length of the sequence of LF values; X 1 is the reciprocal of the difference between the second position and the first position of the word W in the current sequence, X n It is the reciprocal of the difference between the n+1th position and the nth position where the word W appears in the current sequence; the LF value is the reciprocal of the distance between the two corresponding...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an LF entropy-based DNA sequence similarity detection method. An original DNA sequence is mapped based on an L-Gram model to obtain a new numerical value sequence. By calculating a matrix formed by LF entropy values of N sequences, a standard entropy is further obtained, and the standard entropy is projected to a hamming space for performing sequence similarity comparison. According to the LF entropy-based DNA sequence similarity detection method, a condition that a converted characteristic space comprises enough original DNA information is taken into full consideration, so that missing of DNA information is avoided; and meanwhile, each section of DNA sequence is converted into a new space, so that operation speed and accuracy can be improved.

Description

technical field [0001] The invention relates to the field of biological information processing, in particular to a DNA sequence similarity test method based on LF entropy. Background technique [0002] DNA sequence similarity, as a basic measure in bioinformatics, has applications in many situations, including predicting the role and function of an unknown sequence, constructing a phylogenetic tree of organisms or species, and analyzing the homology of species, etc. [0003] With the rapid development of biological science and technology, the data resources in the field of biological science have expanded rapidly, so it is becoming an increasingly serious problem in the field of bioinformatics to find an efficient and fast method to process huge biological data. The collection of bioinformatics data is already a behemoth. Classifying and analyzing a large amount of biological sequence data is a very challenging task. [0004] There are many defects in the existing DNA sequ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22G06F19/24
CPCG16B30/00G16B40/00
Inventor 林劼魏静徐彭娜江育娥
Owner FUJIAN NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products