The invention discloses a text detection method, system and device based on multi-receptive field depth characteristics and a medium, and the method comprises the steps: obtaining a text detection database, and taking the text detection database as a network training database; building a multi-receptive field depth network model; inputting a natural scene text picture and corresponding textbox coordinate true value data in the network training database into a multi-receptive field depth network model for training; calculating an image mask for segmentation through the trained multi-receptive field depth network model to obtain a segmentation result, and converting the segmentation region into a regression textbox coordinate; and counting the textbox size of the network training database, designing a textbox filtering condition, and screening out a target textbox according to the textbox filtering condition. The method fully utilizes the feature learning capability and classification performance of the deep network model, combines the characteristics of image segmentation, has the characteristics of high detection accuracy, high recall rate, strong robustness and the like, and has agood text detection effect in a natural scene.