The invention discloses a method for detecting
phishing web pages based on a spatial mixed index mechanism, which comprehensively utilizes spatial
layout, character features and image features of the web pages. The method relates to a design scheme based on visual
layout features of pages and aspatial
database, solving the problems of detecting
phishing web pages rapidly according to visual similarities of web pages. Combined with a rendering engine of a browser, the method carries out
feature extraction of visual
layout for appointed suspicious web pages and utilizes
spatial database index combined with text and image features of the web pages simultaneously to form a spatial tree i.e., a DIIR tree, wherein the DIIR tree is a
reverse index of comprehensive document images in the spatial mixed index mechanism. The DIIR tree improves an
R tree of a spatial area in the spatial index mechanism by adding
reverse index files of characters and image features of network objects to each node of the
R tree of the spatial area. When querying a new network object, not only the spatial layout feature of the object is considered, but also text and image features of the network object are simultaneously combined.