This
patent application claims processes and compositions of matter that enable the discovery of single
nucleotide polymorphisms (SNPs) that distinguish the genomes of two individual organisms in the same species, as well as that distinguish the paternal and maternal genetic inheritance of a single individual, as well as distinguish the genomes of cells in special tissues (e.g.
cancer tissues) within an individual from the genomes of the standard cells in the same individuals, as well as the SNPs that are discovered using these processes and compositions. Two steps are essential to the invention disclosed in this application. The first step provides four sets of primers, which are designated “T-extendable”, “A-extendable”, “C-extendable”, and “G-extendable”. These primers, when targeted against a
reference genome as a template, add (respectively) T, A, C, and G to their 3′-ends in a template-directed
primer extension reaction. The second step presents these four primer sets, separately, to a sample of the target
genome DNA under conditions where they bind to their complementary segments within the target
DNA. Once bound, members of each primer set serve as primers for a template-directed
primer extension reaction using the target
genome as the template. If the template from the target
genome presents the same templating
nucleotide for the first
nucleotide added in the extension reaction as the
reference genome, then the T-extendable, A-extendable, C-extendable, and G-extendable primers will be extended (respectively) by T, A, C, and G. If, however, the template from the target genome presents a nucleotide different from the
reference genome, then the T-extendable, A-extendable, C-extendable, and G-extendable primers will be extended (respectively) by not T, not A, not C, and not G (referred to here as “3N” or “3”, to indicate the other three nucleotides, where which of the other three is understood by context). In these cases, the primers have discovered a SNP, a difference between the target and reference genomes. Then, the T-extendable, A-extendable, C-extendable, and G-extendable primers that add (respectively) not-T, not-A, not-C, and not-G are separated or made otherwise physically distinct (through, for example, the use of irreversible terminators, such as 2′,3′-dideoxynucleosides) from those that added T, A, C, and G (respectively). Those that added T, A, C, and G (respectively) did not discover a SNP, and are discarded. The primers that added “not-T”, “not-A”, “not-C”, and “not-G” discovered a SNP, and presented in a mixture enriched (relative to those primers that did not discover a SNP) in a useful
deliverable. Following these steps, the SNPs discoveries are realized by sequencing the extracted species. The information obtained from this sequencing allows the identification of the locus of the SNP in the
in silico genome.