The present invention relates to methods, apparatus and computer systems for assigning a numerical value to a
genotype at a single- or multi-base segment in an individual's
genome to denote the presence of a match or a mismatch of a
nucleic acid base sequence of one or more chromosomal copies of the segment, as compared to the
nucleic acid base sequence at a
reference genome segment that corresponds to the segment of the individual's
genome. The methods involve assigning a single digit numerical value to the match or the mismatch of each chromosomal copy of the segment in the
genome, so that the numerical value assigned to a mismatch is greater than the numerical value of the match. A null symbol is assigned to a no call determination. The assigned numerical values are summed and a total numerical value which is a single digit or a fixed number of digits is obtained. The steps are repeated to create a vector of total numerical values for the segment among the set of genomes, to thereby obtain a segment-specific pattern of
genotype match / mismatch between a set of genomes and the
nucleic acid base sequence at the
reference genome segment. The segment-specific pattern, also referred to as a “diff pattern” can be used to filter or uncover specific trends or sub-patterns across a set of genomes, and more quickly identify genotypic / phenotypic relationships by identifying sites where the distribution of genotypes in the set of genomes relates in a distinctive, causal way to the distribution of a given
phenotype among the individuals whose genomes are under study.