The invention discloses a graph, table and text mixed
layout analysis method combining a threshold value and a
projection method. The method comprises the following steps: S1, converting a corrected
grayscale image R'into a binary (black and white) image according to a threshold value Tg; S2, dividing each foreground area in the
binary image into a character area and a non-character area; S3, analyzing the table sub-graph into a table; S4, segmenting each table into rows / columns; S5, detecting whether the page
layout of the whole image is correct or not; S6, sorting the columns / rows to determine the
processing sequence of the next step, wherein the columns are sorted before the rows are sorted; S7, for each column / row of the table, performing column-first and row-second operation, only
processing pure columns / rows, and sequentially segmenting the columns / rows into characters; and S8,
processing the compounded columns / rows according to the characters, and sequentially segmenting the columns / rows into characters. According to the method, the
layout of the page can be analyzed to determine whether the page is reversed in front, back, up-down and left-right directions, characters adhered to the table can be processed, and the recognition rate of the characters can be greatly improved.