Similar base sequence query method based on editing distance in cloud environment

A base sequence and edit distance technology, applied in the computer field, can solve the problems of consuming a lot of time, leaking string information, occupying hardware resources, etc., to achieve the effect of reducing the number, overcoming the occupation of hardware resources, and fast computing

Inactive Publication Date: 2016-09-21
XIDIAN UNIV
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In most of the existing similarity query algorithms for character sequences, most of them only use the characteristics of the sequence itself for calculation. There are problems: it consumes a lot of time, the operation efficiency is not high, and a large amount of hardware facilities are required, and the security of sequence data cannot be guaranteed.
The shortcomings of this method are: using edit distance to calculate the similarity of strings, the calculation efficiency will be limited by the size of the data, and the scalability and scalability are not strong; the security of sequence data cannot be guaranteed when calculating edit distance, and the string may Maliciously attacked or stolen by hackers, leaking string information
The disadvantage of this method is that all genomic data are stored in the local database, which takes up hardware resources. As the amount of data increases, it will exceed the carrying range of the database and cannot be calculated.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Similar base sequence query method based on editing distance in cloud environment
  • Similar base sequence query method based on editing distance in cloud environment
  • Similar base sequence query method based on editing distance in cloud environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0042] refer to figure 1 , the specific implementation steps of the present invention are as follows:

[0043] Step 1. Calculate the single-character operation set of the base sequence of the user.

[0044] Input all the deoxyribonucleic acid DNA information in each user's genome from the client in the cloud environment and save it in the local database. The deoxyribonucleic acid DNA contains the bases of adenine A, thymine T, cytosine G, and guanine C base sequence.

[0045] Input a public reference sequence Ref corresponding to the base sequence from the client in the cloud environment, and save it in the local database.

[0046] Using the base sequence compression algorithm, the public reference sequence Ref is converted into the base sequence stored in the database, and the minimum edited sequence of the public reference sequence Ref and the base seque...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a similar base sequence query method based on an editing distance in a cloud environment. The method is mainly used for solving the problem of finding similar base sequences from massive electronic records of cloud when the cloud is not completely credible. The method comprises the realization steps of (1), calculating a single character operation set of user base sequences; (2), carrying out clustering; (3), carrying out hash mapping on the user base sequences and a clustering center; (4), calculating the single character operation set of to-be-queried base sequences; (5); carrying out hash mapping on the to-be-queried base sequences; (6), selecting candidate clients; (7) and searching the similar base sequences. According to the method, the similar base sequence query method based on the editing distance in the cloud environment is applied to the cloud environment. Compared with the traditional similar base sequence query method, the method provided by the invention has better extensibility and scalability; the security of the base sequences in the cloud environment is ensured; and the consumption of the hardware resources is reduced.

Description

technical field [0001] The invention belongs to the field of computer technology, and further relates to a method for querying similar base sequences based on edit distance in a cloud environment in the field of cloud computing technology. The invention can be used to find similar base sequences from a large number of electronic records in the cloud when the cloud is not fully credible in the cloud computing environment. Background technique [0002] Sequence data is an important and special type of data, which widely exists in applications such as text, Web access sequences, and base sequences and protein sequences in biological databases. With the development of society and the advancement of technology, the research on efficient query of sequence data also poses severe challenges. Character sequence is a kind of common sequence data. Since character sequence has the characteristics of difficult feature extraction and effective expression, and the calculation of similarit...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F19/22G06F17/30G06K9/62
CPCG06F16/90344G16B30/00G06F18/23213
Inventor 张世哲李辉马建峰马鑫迪
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products