A method for probabilistic
record linkage includes providing a
record pair comprising a plurality of fields, providing a plurality of scenarios, each
scenario relating to a distribution of patterns among a plurality of attribute statuses, and comparing the
record pair to determine a record difference. The method includes determining a probability of a status for each of a plurality of attributes based on the distance metric of the plurality of fields, wherein each field corresponds to a respective attribute, wherein the field is
observable and the attribute is hidden, determining a probability of each
scenario based on the probability of the status for each attribute and the Bayesian net representing the probabilistic model on the relationship between scenarios and attributes, and outputting a probability of duplication or non-duplication of the record pair determined from the probabilities of the plurality of scenarios.