Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and apparatus for mRNA assembly

a technology of mrna and assembly method, which is applied in the direction of nucleotide library, instrument, library creation, etc., can solve the problems of difficult analysis, changes, deletions, and very delicate materials of proteins, and achieve the effect of facilitating the creation of longer and/or complete mrna sequences and eliminating errors

Inactive Publication Date: 2005-04-14
COMPUGEN
View PDF2 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0017] It is an object of some embodiments of the present invention to provide a method of mRNA assembly which reduces existing raw EST databases, removes errors therefrom and facilitates the creation of longer and / or complete mRNA sequences. The desired end result is a reduced database in which each mRNA sequence and / or EST encodes a different protein. At least, the ratio between the number of ESTs and the number of proteins should be reduced as much as possible. Two types of errors should preferably be avoided and / or corrected: incorrect mRNA sequences and errors of omission, where a real difference between two mRNA sequences is lost, due to the method of reducing the raw database.
[0028] One aspect of some embodiments of the present invention relates to using a method that directly compares a database with a database, rather than a method that compares an individual EST with a database. As a result, a more efficient analysis algorithm can be developed. In accordance with a preferred embodiment of the invention, an algorithm whose complexity is near O(k(N)×N), where k is a slowly increasing function of N, rather than O(N2), (N is the number of ESTs) is provided. In huge EST databases, this difference is extremely important and may pave the way to using mRNA analysis of cells from biopsies to diagnose individuals, in a short time.
[0035] Another aspect of the present invention relates to DNA chip design. Correct selection of DNA sequences to place on a DNA chip is limited by the uncertainty of the relative importance and association of different ESTs. Once the ESTs are assembled into mRNA sequences, it is possible to select one or more sets of DNA segments which will be most useful for the DNA matching task. The high degree of automation possible with and the quality of mRNA sequence determination, in accordance with preferred embodiments of the present invention, make such an analysis for DNA chip design a reality. Such a set can also take into account alternative splicing and / or the types and distributions of different errors in the EST database. Thus, a DNA chip can be made more robust for a particular application. In one preferred embodiment of the invention, the indexing method is used to generate an index of all the short segments of nucleotides in the mRNA sequences of interest. The length of the short segments is determined based on the design constraints of the DNA chip. The number of short segments necessary to correctly identify a single mRNA sequence (or DNA sequences, in genomic applications) can be determined by the number of re-indexing steps required to isolate that sequence in a database. The utilization of a DNA chip can be maximized by selecting only mRNA sequences which can be identified using a minimal number of short DNA sequences.

Problems solved by technology

However, proteins are very delicate materials, which are difficult to analyze. mRNA, which controls the creation of the proteins, is easier to separate and analyze.
Second, in the process of transcribing DNA, changes, especially deletions, are made to the nucleotide sequence.
Unfortunately, the art of reading mRNA sequences is not yet completely developed.
The error rate of the reading increases with increasing length of the mRNA sequence.
The common errors are insertion or deletion of bases, and errors in the identification of individual bases.
At a certain sequence length, the error rate increases to a point where further reading is not possible.
In addition, EST databases contain many other types of errors, which may be accumulated during the complicated process of EST generation in addition to features, inherent in the mRNA, which make the assembly difficult.
These causes of difficulty include:
However, since the cell is disrupted in the middle of its normal activity, the transcription process may be incomplete or otherwise disrupted, for example by introns being incorporated in the mRNA sequences.
During the process of extraction and replication the mRNA sequences may be broken and, in some cases, may be reconnected, not necessarily correctly.
In addition, whole sections of mRNA sequences may be inadvertently removed.
This is not an error in the ESTs but it is an important cause of mismatch between ESTs.
As a result, there is a high redundancy of ESTs in the raw database.
However, due to the errors in reading out the ESTs, the ESTs will not exactly match.
This lack of consistency makes the task of assembly more difficult.
However, due to the above-described problems, it is very difficult to correctly match up the ESTs.
In general, the limiting factor in this field is information analysis, rather than information volume.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for mRNA assembly
  • Method and apparatus for mRNA assembly
  • Method and apparatus for mRNA assembly

Examples

Experimental program
Comparison scheme
Effect test

examples

[0234] Attached herewith as an appendix “B” are transcript listings of mRNA sequences and clusters of ESTs, which were generated from a public domain database of a mouse, in accordance with preferred embodiments of the present invention. There are three cluster descriptions, each having the following format:

[0235] (a) a short description of the cluster;

[0236] (b) a list of the mRNA sequences and the associated ESTs used to generate the sequences;

[0237] (c) for each EST alternative spliced variant, a cross-reference listing between the sequence and a consensus of all the ESTs;

[0238] (d) a sequence listing of the consensus of all the ESTs, which need not match any particular variant; and

[0239] (e) transcriptions of the alternative spliced variants detected for the mRNA sequence.

[0240] For example, sequence number 10827, contains on page B-8 two transcripts, one corresponding to each of the two alternative spliced variants.

[0241] The cross-reference listing shown between page B-...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

PropertyMeasurementUnit
Lengthaaaaaaaaaa
Ratioaaaaaaaaaa
Responsivityaaaaaaaaaa
Login to View More

Abstract

A method of comparing nucleic acid sequences being ESTs included in a first database of sequences and nucleic acid sequences included in a second database of sequences to form groups of sequences from the two databases that all relate to the same gene. For each one or more n-groups of sequences of one of the two databases, associating therewith lists of nucleic acid sequences, each from one of said two databases, each sequence on the list containing the n-groups, and matching sequences on the lists to generate said group.

Description

FIELD OF THE INVENTION [0001] The present invention relates to automatic assembly of mRNA sequences from databases containing large numbers of partial cDNA sequences. BACKGROUND OF THE INVENTION [0002] In human cells, genetic material is stored as DNA in a nucleus of the cell. When a certain protein is needed by the cell, a portion of the DNA is transcribed as mRNA, which is transported the cytoplasm of the cell. In the cytoplasm, ribosomes create proteins, using the mRNA as a template. Generally, the mRNA comprises a long sequence of bases, each triplet (codon) of which encodes a specific amino acid. Thus, a sequence of triplets encodes a sequence of amino acids, which form a protein. [0003] Cell function can, theoretically, be analyzed by determining the type of and ratio between the proteins in the cell. However, proteins are very delicate materials, which are difficult to analyze. mRNA, which controls the creation of the proteins, is easier to separate and analyze. Although seve...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G16B30/20C12N15/10
CPCC12N15/1034G06F19/22C12N15/1089G16B30/00G16B30/20
Inventor AMITAI, MORGILL-MORE, RAVEH AVRAHAMHALPERIN, ERANMAGEN, AVNERPOLLOCK, SARAH RACHEL
Owner COMPUGEN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products