A data capture
system receives a sequence of output document objects and, for each output document object, writes output data values to an output
data structure. The
system includes a data storage, a first tier
data extraction system and at least a second tier
data extraction system. The first tier
data extraction system is adapted to receive each output document object. Each output document object may be in a print language format comprising a plurality of print elements. For each required
invoice data element, the first tier data extraction system obtains identification of a positional element value from a positional
data set that includes, as its
invoice data element, identification of the required
invoice data element; and, if the output document object includes a qualifying
text string, writes an output
data value to the output
data structure in association with identification of the required invoice data element. If the output document object does not include a qualifying
text string, the first tier data extraction system identifies the output document object for tier two
processing. The second tier data extraction system receives each output document identified for tier two
processing, performs
character recognition on a graphical representation thereof and, for each required invoice data element, writes an output
data value to the output
data structure in association with identification of the required invoice data element.