Disclosed are methods and apparatus for automatically organizing and / or analyzing a plurality of defect images without first providing a predefined set of classified images (herein referred to as a
training set). In other words, sorting is not based on a
training set or predefined classification codes for such defect images. In one embodiment, the defect images each include associated identifying data, such as a fabrication identifier,
lot number,
wafer number, and layer identifier. Initially, the defect images are sorted according to at least a portion of the associated identifying data into a plurality of “identifying data groups” or image families. The defect data in each identifying data group is then automatically sorted according to defect appearance. That is, similar defect images are associated with a single bin and similar bins are associated with other similar bins. For example, similar bins are arranged next to each other within a
graphical user interface (GUI). A representative
feature vector (herein referred to as a “
centroid”) is then associated with each bin. The
centroid generally represents the images within the particular bin. A search for images that “look like” a specified target image may then be efficiently performed on a particular identifying data group using the centroids of each bin. The target image's
feature vector is compared with the
centroid of each bin that is within the same identifying data group or image family as the target image. The techniques of the present invention may also be applied to
wafer maps, as well as defect images.