Compact next-generation sequencing datasets and efficient sequencing processing using them
A compact, gene sequencing technology, applied in the field of gene analysis, which can solve the problems of increased cost and high computing cost, and achieve the effect of preserving compatibility
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0025] Disclosed herein is a method for formatting raw read data including base quality scores in a manner that allows for a substantial reduction in file size while preserving most of the useful information. As discussed earlier, in the regular FASTQ format, reads occupy slightly more than 2L 序列 (ASCII) characters, where L 序列 is the number of bases. Other existing text-based storage formats that store base sequences and corresponding base quality scores occupy a considerable amount of storage. For example, in the Qseq format, base sequences and quality scores are stored but arranged in a single line of text. The FASTA format is able to cut this storage roughly in half - but it does so by losing all base quality score information. Alternatively, anyone can convert a text-formatted read entry to a non-text format (eg, a binary format where two bits encode a base and the phred score is represented by a binary integer value). However, the most downstream processing components...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com