The invention is directed to a
natural language generation (NLG)
software system that generates rich, content-sensitive
human language descriptions based on unparsed raw domain-specific data. In one embodiment, the NLG
software system may include a data parser / normalizer, a
comparator, a language engine, and a document generator. The data parser / normalizer may be configured to retrieve specification information for items to be described by the NLG
software system, to extract pertinent information from the raw specification information, and to convert and normalize the extracted information so that the items may be compared specification by specification. The
comparator may be configured to use the normalized data from the data parser / normalizer to compare the specifications of the items using comparison functions and interpretation rules to determine outcomes of the comparisons. The language engine may be configured to cycle through all or a subset of the normalized specification information, to retrieve all
sentence templates associated with each of the item specifications, to call the
comparator to compute or retrieve the results of the comparisons between the item specifications, and to recursively generate every possible syntactically legal
sentence associated with the specifications based on the retrieved
sentence templates. The document generator may be configured to select one or more discourse models having instructions regarding the selection, organization and modification of the generated sentences, and to apply the instructions of the discourse model to the generated sentences to generate a
natural language description of the selected items.