Suffix array based fuzzy tandem repeat recognition method
A technology of series repeating sequences and suffix arrays, which is applied in electrical digital data processing, special data processing applications, instruments, etc., and can solve problems such as occupancy, repetition cycle size limitation, and large memory space
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0113] In the experiment, the default copy number is 2, and the scores of pairing, mismatching, and gaps are set to +2, -2, and -2 respectively during sequence alignment; only sequences with a matching degree greater than 50% will be listed in the results.
[0114] The sequence used in the experiment:
[0115] The promoter and exon part of the human frataxin gene (Friedreich's ataxia, U43748), the base number of this part of the gene is 2465bp.
[0116] Frederick's ataxia is caused by an abnormal copy number of the trinucleotide repeat sequence (GAA) of the human frataxin gene. The parameters of the experiment are: min_p=2, min_ex=2*min_p, min_score=20, L_Align=100. min_p is the minimum copy number, min_ex is the minimum number of bases for left and right extensions, and min_score is the minimum score for sequence alignment. Results The length of the selected repeat sequence was >30bp.
[0117] Comparing the experimental results with the results of Benson's algorithm [Benso...
Embodiment 2
[0119] The sequence used in the experiment:
[0120] human T cell beta receptor sequence
[0121] Parameter settings: min_p=2, min_ex=2*min_p, min_score=50, L_Align=200. The length of the fuzzy sequence is > 100
[0122] Table 2 Tandem repeats of human T cell β receptor sequences
[0123]
[0124] The length of the tandem repeat sequence in Table 2 is greater than 100bp. bp stands for base pair; 1 bp = 1 base pair.
[0125] Table 3 Fuzzy tandem repeat search of human T cell β receptor sequences (manual alignment)
[0126]
[0127]
[0128] R_match is a rough match calculated by hand.
Embodiment 3
[0130] The sequence used in the experiment:
[0131] Sequence of the first chromosome of yeast
[0132] min_p=2, min_ex=2*min_p, min_score=50, L_Align=200. The length of the fuzzy sequence is >100.
[0133] Table 4 Fuzzy tandem repeat search based on the first chromosome sequence of yeast
[0134]
[0135] The tandem repeat lengths in the table are all greater than 100bp
[0136] Table 5 Fuzzy tandem duplication search based on the first chromosome segment of yeast
[0137]
[0138]
[0139] R_match is a rough match calculated by hand.
[0140] Embodiments 1, 2, and 3 are all tandem repeat sequences obtained by using the method of the present invention, wherein embodiment 1 is compared with the original method, and the appearance of new fuzzy tandem repeat sequences reflects that this method is looking for higher complexity. The superiority of the fuzzy tandem repeat sequence; the actual data used in Example 2 is the human T cell beta receptor sequence, and the a...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com