Coding free text documents, especially in
medicine, has become an urgent priority as electronic medical records (EMR) mature, and the need to exchange data between EMRs becomes more acute. However, only a few automated coding systems exist, and they can only code a small portion of the free text against a limited number of codes. The precision of these systems is low and code quality is not measured. The present invention discloses a process and
system which implements semantic coding against standard
lexicon(s) with high precision. The standard
lexicon can come from a number of different sources, but is usually developed by a standard's body. The
system is semi-automated to enable medical coders or others to process free text documents at a
rapid rate and with high precision. The
system performs the steps of segmenting a document,
flagging the need for corrections, validating the document against a
data type definition, and looking up both the
semantics and standard codes which correspond to the document's sentences. The coder has the option to intervene at any step in the process to fix mistakes made by the system. A
knowledge base, consisting of propositions, represents the semantic knowledge in the domain. When sentences with unknown
semantics are discovered they can be easily added to the
knowledge base. The propositions in the
knowledge base are associated with codes in the standard
lexicon. The quality of each match is rated by a professional who understands the knowledge domain. The system uses this information to perform high precision coding and measure the quality of the match.