Third generation sequencing alignment algorithm

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A sequence and iterative technology, applied in the field of third-generation sequencing alignment algorithms, which can solve problems such as high error rate and confusion

Inactive Publication Date: 2018-10-23

THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIV

View PDF6 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Compared to first- and second-generation sequencing technologies, TGS tools produce longer reads, but sequencing suffers from higher error rates, mainly in the form of insertions and deletions (indels)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example

[0124] The following are examples of specific embodiments for carrying out the invention. Examples are provided for illustrative purposes only and are not intended to limit the scope of the invention in any way.

[0125] Efforts have been made to ensure accuracy with respect to numbers used (eg amounts, temperature, etc.), but some experimental errors and deviations should be granted.

example 1

[0127] Demonstration of the effect of the cosine similarity measure using the E. coli genome.

[0128] Cosine similarity is a measure used to determine the similarity between two vectors by measuring the cosine of the angle between the two vectors. To demonstrate the effect of this metric, 1000 sequences of 5000 bases in length were each selected at random positions in the E. coli genome. For each sequence, between non-overlapping windows of different lengths w = 50, 100, 500, 1000 and 5000 bases, and 10 of them within each window of sequences and average substitution rates of 15% and 35% The distance cosine (1-cosine similarity) is calculated between random mutation patterns. Figure 7-8 and 9-10 exhibit a distance cosine distribution for k=3 and k=4, respectively. Figure 7-8 and 9-10 illustrate how the distribution of the distance cosines between short k-mer count vectors at random positions is distinguishable from their mutation patterns. Furthermore, as expected, the d...

example 2

[0134] Accuracy and performance analysis using the E. coli genome

[0135] The accuracy and performance of this method was evaluated using a dataset of 20x simulated reads from the E. coli genome with average lengths of 5 kbps and 10 kbps and different sequence accuracies of 85%, 75%, 65% and 55%. read-seq using with-options (--data-typeclr --depth20 --model_qc model_qc_clr --accuracy-min 0.5 --length-average[5000|10000] --length-sd 2000 --accuracy-average [.85|.75|.65|.55] - Accuracy - sd 0.02) simulation of PBSIM (Ono et al., 2013).

[0136] In the default setting (w=500, L t = 7500, f = 2, g = 1, max-num-top-peak = 10, max-fft-block-size = 32768 are reported in Tables 1 and 2 for data sets with average sequence lengths of 5 kbps and 10 kbps, respectively Performance of k=3,4. Even with ~45% error rate, k=4 has almost perfect accuracy. As expected from Table 2, longer reads resulted in a higher overall alignment rate, particularly when mapping reads covering long repeat ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

Methods, software, and systems for aligning a read sequence to a reference sequence are disclosed. In certain embodiments, the methods, software, and systems involve determining similarity of distribution of k-mers between a region of the read sequence and a region of the reference sequence in order to determine whether the region of the read sequence maps to the region of the reference sequence.

Description

[0001] cross reference [0002] This application claims the benefit of US Provisional Patent Application No. 62 / 294,205, filed February 11, 2016, which is hereby incorporated by reference in its entirety. [0003] Statement Regarding Federally Sponsored Research or Development [0004] This invention was made with government support under Contract R01HG007834 awarded by the National Institutes of Health. The government has certain rights in this invention. Background technique [0005] Whole-genome sequencing has revolutionized biological and medically driven comprehensive characterization of DNA sequence changes, resequencing of multiple species, sequencing of microbial communities, detection of methylated regions of the genome, quantification of transcript abundance, characterization of DNA sequences present in a given sample The different isoforms of a gene, the extent to which the recognized mRNA transcript is efficiently translated, etc. Indeed, the field of pharmac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): C12Q1/6874G06F19/28G06F19/22G16B30/10G16B40/00G16B50/00

CPCC12Q1/6874G16B50/00G16B30/00G16B30/10G16B40/00C12Q2535/122G16B45/00C12Q1/6869

Inventor W·H·王P·T·阿夫沙尔

Owner THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIV

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Third generation sequencing alignment algorithm

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example

example 1

example 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology