The invention discloses a target tracking method based on an encoding and decoding structure. According to the method, a similar generative adversarial network structure is generated through the combination of an encoder-decoder and a discriminator, the features extracted by an encoder are more generalized, and the essential features of a tracked object are learned. Due to the fact that the objects which are semi-shielded and affected by illumination and motion blur exist in the object frames, the influence on the network is smaller, and the robustness is higher. According to the method, FocalLoss is used for replacing a traditional cross entropy loss function, so that the loss of easy-to-classify samples in the network is reduced, the model pays more attention to difficult and misclassified samples, and meanwhile the number of positive and negative samples is balanced. Distance-U loss is used as regression loss, an overlapping region is concerned, other non-overlapping regions are concerned, scale invariance is achieved, the moving direction can be provided for a bounding box, and meanwhile the convergence speed is high.