Tibetan historical document text line segmentation method based on baseline estimation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of historical documents and baseline estimation, applied in the field of image processing, can solve the problems of inaccurate positioning and segmentation, inability to handle curved text lines, and only estimate the approximate position, etc., to achieve high segmentation accuracy

Active Publication Date: 2018-02-23

BEIJING UNIV OF TECH

View PDF7 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This method has two disadvantages in dealing with Tibetan historical documents: (1) It can only estimate the approximate position of the text line in the document, and cannot deal with the curved text lines that exist in a large number of Tibetan historical documents

(2) For the cohesive parts in Tibetan historical documents, traditional projection-based segmentation methods cannot accurately locate and segment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0047] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0048] The flowchart of the method involved in the present invention is as figure 1 shown, including the following steps:

[0049] Step 1, extract the left partial image of the input image.

[0050] Extract the left 1 / 4 part of the image from the input Tibetan historical document image to analyze and extract the baseline position and line number of the text line, and name the image as image A.

[0051] Step 2, remove Tibetan vowel nodes and some prominent strokes.

[0052] Divide the input image into image blocks through a sliding window of size N*M, where the width N is the width of the Tibetan character D in the image, and the length M is twice the width N. like figure 2 As shown in , select 80 image blocks with baselines at the top as templates, and use the principal component analysis (PCA) method to obtain their 13-dimensional feature...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a Tibetan historical document text line segmentation method. The method comprises the steps that the image of the left part of a Tibetan historical document is extracted; Tibetan vowel sound nodes and certain prominent strokes are removed; the starting position information of the baselines of the Tibetan text lines and the number of text lines are acquired; according to the starting position of the baselines, the baselines are established from left to right; during the establishment of the baselines, the baselines needs to be dynamically adjusted according to the pixelvalues of the surrounding points; the estimated baselines are used, and a communication area analysis method is used to determine the position of an adhesion area from two baselines for segmentation;and finally text lines are separated. According to the invention, the Tibetan historical document text line segmentation method based on baseline estimation is more suitable for the segmentation ofthe text lines of the Tibetan historical document, and has more segmentation precision than a traditional technology based on projection segmentation; and compared with a method based on projection segmentation, the method has the advantage that the segmented text lines are more correct.

Description

technical field [0001] The invention relates to an image processing method, in particular to a text line segmentation method of a Tibetan text image. Background technique [0002] Text is an important carrier of human development, one of the main media for information transmission, and one of the important ways for people to record history. Tibetan is my country's first national script with international standards, and it is also one of the oldest scripts in the world. Tibetan historical documents preserve the essence of Tibetan cultural thought and are the precious wealth of human cultural thought. In order to protect this ancient and precious historical and cultural heritage and facilitate people to consult according to the content of the text, converting images of Tibetan ancient books into text is an important method to protect Tibetan historical documents. [0003] Generally speaking, the transformation of ancient book images into computer-readable text needs to go th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06T7/11G06T7/194G06T3/00

CPCG06T7/11G06T7/194G06T2207/30176G06T3/04

Inventor 段立娟李颜兴

Owner BEIJING UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Tibetan historical document text line segmentation method based on baseline estimation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology