A method has been developed to create databases of peptides having a desirable property, such as
antimicrobial activity, based on analyzing a
database of known peptides for a pattern statistically associated with an activity. One can determine a set of patterns that may be representative of
a peptide having a desired characteristic or property, and evaluate a set of sequences against the set of patterns (grammars) to determine if the
peptide sequence being evaluated has similar patterns to those of
a peptide having the desired characteristic or property. The set of sequences being evaluated may include
peptide sequences of a desired length comprising all or substantially all combinations of amino acids that conform to at least one of the set of patterns. Once the
database is identified the
database may be processed in a
pattern recognition procedure that identifies a set of patterns that could be understood as representative of
a peptide having the characteristic of interest. A set of newly generated peptides sequences may then be processed to
score these new sequences against the identified patterns to correlate the patterns to the sequences and determine a
degree of association or a similarity between a respective one of the new sequences and the set of identified patterns. The method is used to provide a database of sequences that are expected to have one or more desired activities, specific sequences within the database proven to have the desired activity, and the patterns or grammars used to create the database of sequences. Although described with reference to
antimicrobial peptides, a database of peptides may be identified that contains peptides that have antiviral properties, wound response properties, or some other property of interest.