The invention discloses a method for positioning object step by step based on vision fusion, comprising collecting a first image containing target feature points to realize the coarse positioning of the target feature points, according to the coordinate information contained in the first image, collecting the second image containing the target feature point information, carrying out the precise positioning, obtaining the coordinate information contained in the second image, and obtaining the target feature point based on the coordinate transformation amount under the same coordinate system according to the conversion relationship between different coordinate systems, and locating the target feature point. At the same time, the invention also discloses an application, device and system forpositioning object step by step based on vision fusion. The scheme separates the positioning accuracy from the condition of large field of view, two different types of cameras are used to realize step-by-step positioning, which solves the problem that the positioning accuracy is insufficient when the workpiece moves in a large range; and when there are many screw holes, there is no need to take photographs repeatedly, which can reduce the number of photographs, improve the production efficiency, and has the advantages of high precision and high positioning efficiency.