A
system for electronically distilling information from a business document uses a network
scanner to electronically scan a platen area, having a business document thereon, to create a
bitmap. A network
server carries out a segmentation process to segment the scan generated
bitmap into a
bitmap object, the bitmap object corresponding to the scanned business document; a bitmap to text conversion process to convert the bitmap object into a block of text; a semantic recognition process to generate a structured representation of semantic entities corresponding to the scanned business document; and a document
generation process to convert the structured representation into a structure
text file. The semantic recognition process includes the processes of generating, for each line of text having a keyword therein, a terminal symbol corresponding to the keyword therein; generating, for each line of text not having a keyword therein and absent of numeric characters, an alphabetic terminal symbol; generating, for each line of text not having a keyword therein and having a numeric character therein, an
alphanumeric terminal symbol; generating a string of terminal symbols from the generated terminal symbols; determining a probable
parsing of the generated string of terminal symbols; labeling each text line, according to a determined function, with non-terminal symbols; and
parsing the business document information text into fields of business document information text based upon the non-terminal symbol of each text line and the determined probable
parsing of the generated string of terminal symbols.