Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

65 results about "Document segmentation" patented technology

Character and vector graphics watermark for structured electronic documents security

The present invention is a method and apparatus for watermarking text or vector graphics documents. It is based on character-wise or vector graphics element-wise grayscale or color modulation. At high resolution, halftone or dither modulation can also be used in addition or in place of grayscale / color modulation, for printed representation of electronic document. For the detection, the document is acquired through an acquisition device, document segmentation is performed, characters / elements are segmented, the watermark signal is estimated and the information is decoded. Although the proposed scheme mostly addresses the watermarking of hard-copy documents, it can easily be integrated into electronic document editing and acquisition tools, and the watermark is attached to the document electronic version. The invention is applicable either using expensive high- resolution printing and acquisition devices, either using common cheap low-resolution devices, depending on the application needs. The proposed scheme is suitable for example to the protection of security documents, contracts, technical and commercial documentation; it can use any physical support like paper, cellulose, or plastic; it can be used for copy protection, authentication, or tamper proofing; finally it can also be applied to other non security-related applications, such as document tracking, as well as document embedded annotation and watermarked-assisted automatic processing.
Owner:UNIVERSITY OF GENEVA +1

Photo-document segmentation method and system

The present application provides an improved segmentation method and system for processing digital images that include an imaged document and surrounding image. A plurality of edge detection techniques are used to determine the edges of the imaged document and then segment the imaged document from the surrounding image.
Owner:COMPULINK MANAGEMENT CENT

Method of texture-based color document segmentation

A method for segmenting a color document into regions of text and halftone discriminates between text and halftone by examining the texture of the document. A color document is digitized, and a color space transform is preferably applied to the digitized document. The texture of the document is identified and a noise reduction step is preferably applied. Bounding boxes (blocks) within the document are identified and then the areas within the bounding boxes are classified as either text or halftone. Two alternative methods are described for examining the document texture. In the first method, a windowing operation is applied to either an intensity image or a color difference image. In the second method, a wavelet transform step combined with Fuzzy K-Mean clustering is applied. Bounding boxes are classified as either text or halftone based upon the relative periodicity of a horizontal or vertical (or both) histogram of each bounding box.
Owner:PANASONIC CORP

System and method for policy-driven file segmentation and inter-cloud file storage and retrieval

A file storage system includes one or more document input devices and a processor communicating with both a memory and the one or more document input devices. The processor executes a software application stored on the memory to separate a sensitive portion of a document from an insensitive portion of a document. A first type of cloud storage includes one or more storage devices in operable communication with the one or more document input devices. The first type of cloud storage is configured to store one or both of the separated portions with a level of encryption agreed upon by a user. A second type of cloud storage includes one or more storage devices in operable communication with the one or more document input devices. The second type of cloud storage is configured to store the insensitive portion of a document based on a consent of the user.
Owner:XEROX CORP

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

A document authentication method compares a target document image (scanned image) with an original document image at multiple levels, such as block (e.g. paragraph, graphics, image), line, word and character levels. The paragraph level comparison determines whether the target and original images have the same number of paragraphs and whether the paragraphs have the same sizes and locations; the line level comparison determines if the target and original images have the same number of lines and whether the lines have the same sizes and locations; etc. Document segmentation is performed on the target and original images to segment them into paragraph units, line units, etc. for purposes of the comparisons. The original document may be segmented beforehand and the segmentation information stored for later use. The authentication process may be designed to stop when alterations are detected at a higher level, so lower level comparisons are not carried out.
Owner:KONICA MINOLTA LAB U S A INC

Character and vector graphics watermark for structured electronic documents security

The present invention is a method and apparatus for watermarking text or vector graphics documents. It is based on character-wise or vector graphics element-wise grayscale or color modulation. At high resolution, halftone or dither modulation can also be used in addition or in place of grayscale / color modulation, for printed representation of electronic document. For the detection, the document is acquired through an acquisition device, document segmentation is performed, characters / elements are segmented, the watermark signal is estimated and the information is decoded. Although the proposed scheme mostly addresses the watermarking of hard-copy documents, it can easily be integrated into electronic document editing and acquisition tools, and the watermark is attached to the document electronic version.The invention is applicable either using expensive high-resolution printing and acquisition devices, either using common cheap low-resolution devices, depending on the application needs. The proposed scheme is suitable for example to the protection of security documents, contracts, technical and commercial documentation; it can use any physical support like paper, cellulose, or plastic; it can be used for copy protection, authentication, or tamper proofing; finally it can also be applied to other non security-related applications, such as document tracking, as well as document embedded annotation and watermarked-assisted automatic processing.
Owner:UNIVERSITY OF GENEVA +1

Font characteristic driven halftoning

InactiveUS7224489B2Reduction and elimination of halftone generated jaggednessImproved renderingVisual presentation using printersPictoral communicationImaging processingComputer graphics (images)
Characteristics of text or text components or features are considered when selecting halftoning screens. For example, an italic slant angle of text is recognized and used to select or generate a compatible halftone screen oriented at the same angle. A screen frequency may be selected based on a thickness of a text component. Descriptive tags associated with text or text components facilitate screen selection. Tags are assigned based on font descriptions included in a document during authoring. Alternatively, tags are assigned based on the results of document segmentation and character recognition techniques. An image processing system operative to consider characteristics of text or text components when selecting halftone screens includes a text component characteristic recognizer, a halftone screen selector and a halftoner. Optionally a print engine is also included. In a xerographic environment the print engine includes a xerographic printer.
Owner:XEROX CORP

Vision-based document segmentation

Vision-based document segmentation identifies one or more portions of semantic content of a document. The one or more portions are identified by identifying a plurality of visual blocks in the document, and detecting one or more separators between the visual blocks of the plurality of visual blocks. A content structure for the document is constructed based at least in part on the plurality of visual blocks and the one or more separators, and the content structure identifies the one or more portions of semantic content of the document. The content structure obtained using the vision-based document segmentation can optionally be used during document retrieval.
Owner:MICROSOFT CORP

Method and system for document segmentation

A method of document segmentation. Specifically, one embodiment of the present invention discloses a method of document segmentation that performs a plurality of projection profiles of pixel intensities on a document containing a plurality of text lines over a range of angles. A plurality of slope values for a plurality of discrete distances perpendicular to said range of angles is calculated for the plurality of projection profiles. A set of maximum absolute slope values is sorted out from the plurality of slope values. Text lines of first and second type are identified by setting a threshold slope value. Absolute slope values greater than the threshold slope value indicate the plurality of text lines of said first type. Absolute slope values less than the threshold slope value indicate the plurality of text lines of a second type.
Owner:HEWLETT PACKARD DEV CO LP

Text extraction, in particular table extraction from electronic documents

A method for extracting of data contained in a fixed format electronic document is disclosed. The method is particularly applicable to extracting data from tables in electronic documents and includes reading, by a computer system, the electronic document as a computer image file; segmenting, by the computer system, the computer image file into document sections representative of distinct portions of data; applying a label to each distinct document section; and executing, by the computer system, an optical character recognition algorithm to convert the image file into computer-readable text, wherein segments of the converted text is associated with a respective label indicative of each distinct document section.
Owner:KIRA INC +1

Morphology and integral projection-based printed Uygur document segmentation method

ActiveCN106372639AOvercoming disadvantages of flexibility limitationsImprove Segmentation AccuracyCharacter and pattern recognitionDocumentationDocument segmentation
The invention discloses a morphology and integral projection-based printed Uygur document segmentation method, and mainly solves the problem of flexibility limitation during acquisition of a row document image and the problem of missing segmentation of a character define in the specification during acquisition of a single-character image in an existing segmentation method. The method comprises the steps of (1) inputting a binary image; (2) acquiring the row document image; (3) acquiring a sub-word image; (4) acquiring a connected segment image; (5) acquiring a connected segment image only with a main body stroke part; (6) determining a baseline domain of the connected segment image only with the main body stroke part; and (7) acquiring the single-character image. Compared with an existing printed Uygur document segmentation method, the morphology and integral projection-based printed Uygur document segmentation method has the advantages that a threshold is not set during the acquisition of the row document image, so that the flexibility is better, the problem of the missing segmentation of the character define in the specification is avoided, and the accuracy of printed Uygur document segmentation can be improved.
Owner:XIDIAN UNIV

Picture file encryption and decryption method and system based on content-associated secret key

InactiveCN105743906AStrong ability to resist brute force crackingIncrease system costTransmissionGray levelMechanism based
The invention relates to a picture file encryption and decryption method and a picture file encryption and decryption system based on a content-associated secret key. When in encryption, according to the importance of data with respect to file application, the original picture file is segmented into sensitive data and main body data, the sensitive data forms a secret key, the main body data is subjected to supplemental treatment to form a cryptograph, and the secrete key and the cryptograph are respectively transmitted via different ways, and then combined at a destination to revert to the original picture file when in decryption; when the original picture file is segmented, the segmentation operation of the picture file is performed in a file layer, and the segmentation operation is performed in pixels based on a pixel gray level value or the contour line of a monochromatic chromatic value. According to the new encryption mechanism based on the content-associated secret key provided by the invention, the secret key with large data volume can be used, and the encryption and decryption processes are basically performed on the client, thus the system load is not increased; the new encryption mechanism can improve the data safety of the cloud storage system, in particular enhancing the privacy of the user data.
Owner:WUHAN YOUXIN ZHONGWANG TECH CO LTD

Method and device for document segmentation based on text lines

The invention relates to the field of text processing. In view of the problems in the prior art, the invention provides a method and device for document segmentation based on text lines. Whether or not text line units are merged into one paragraph is judged according to a merged score of the text line units; when the score of the text line units does not satisfy a merging need, merging of a current paragraph is ended, and processing of a new paragraph begins. According to the method and device for the document segmentation based on the text lines, the problems existing in the prior art can be solved simply and effectively; pages and document data structures can be extracted by using the method, and text line information is extracted from the document data structure corresponding to each text line; each document data structure including text lines is traversed in a full text, and context information of the full text and context information of each page can be calculated in a statistical mode according to a text line information list formed by the text line information of the document data structures separately; based on n text line unit structure lists in each page, by combining other context information, segmentation is conducted on the text line units in each page according to a segmentation algorithm.
Owner:科来网络技术股份有限公司

Document segmentation for mixed raster content representation

In an array of pixels, a method for segmenting a selected pixel of the array between at least two layers including identifying an N-by-N window centered upon the selected pixel, evaluating at least one pixel in the N-by-N window to determine whether the selected pixel is a potential text element, identifying an M-by-M window centered upon the selected pixel when the evaluation determines that the selected pixel is a potential text element, wherein the M-by-M window is smaller than the N-by-N window, and determining whether the potential text element includes text by comparing at least two pixels within the M-by-M window.
Owner:LEXMARK INT INC +1

System and method of performing patch-based document segmentation for information extraction

A user device associated with a user may receive a document associated with the user. The user device may encrypt the received document. The user device may perform patch-based document segmentation on the received document to form a plurality of patches on the received document. The user device may extract text from each patch of the plurality of patches. The user device may analyze the extracted text from each patch to detect a field title and a field value. The user device may encrypt the extracted text and its associated field value for each patch of the plurality of patches. The user device may send the encrypted extracted text and its associated field value to the user device and instructions to display the extracted text and its associated field value on a user interface.
Owner:INTUIT INC

Document updating method and server

The invention discloses a document updating method and a server. The method comprises the steps: generating a homomorphic calculation function and a homomorphic parameter according to to-be-updated content information, and the to-be-updated content information is the information obtained through the comparison of an original document and a to-be-updated document; determining an identifier of a to-be-updated sub-file according to the to-be-updated content information, the to-be-updated sub-file being a to-be-updated file in the segmentation sub-file, and the segmentation sub-file being a file obtained by segmenting the original document; updating parameters are generated according to the homomorphic calculation function, the parameter encryption result and the identifiers and numbers corresponding to the to-be-updated sub-files, and the parameter encryption result is obtained by homomorphic encryption of the homomorphic parameters through adoption of a homomorphic encryption public key;and sending the update parameters to the blockchain network, so that the storage node server updates the segmentation sub-files stored by the storage node server according to the update parameters. And the working efficiency is improved, and resource waste is avoided.
Owner:CHINA UNITED NETWORK COMM GRP CO LTD

Cloud printing large document rapid printing method

The invention provides a cloud printing large document rapid printing method. The cloud printing large document rapid printing method comprises a client, a document server, a mobile terminal and a printing terminal, wherein each terminal is in network connection with the server; the client transmits a to-be-printed document to a conversion assembly, and performs document conversion and segmentation processing after detecting a large document message; the document server receives and stores the converted document uploaded by the client; the mobile terminal is associated with the client accountinformation, obtains the to-be-printed document, and is associated with the printing terminal by scanning a QR code of the printing terminal, so as to submit the to-be-printed document; and the printing terminal receives the to-be-printed document and prints the document. And the conversion assembly integrated by the client comprises a control end and a processing end and is used for converting and processing the large document. When a printer receives a large document printing command, document segmentation can be automatically carried out and then the documents are printed sequentially, thussolving the problems that the printing speed is low, and the waiting time of a user is long and the like, and improving the printing efficiency.
Owner:南京信安宝信息科技有限公司

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

The present invention relates to a method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics. A document authentication method compares a target document image (scanned image) with an original document image at multiple levels, such as block (e.g. paragraph, graphics, image), line, word and character levels. The paragraph level comparison determines whether the target and original images have the same number of paragraphs and whether the paragraphs have the same sizes and locations; the line level comparison determines if the target and original images have the same number of lines and whether the lines have the same sizes and locations; etc. Document segmentation is performed on the target and original images to segment them into paragraph units, line units, etc. for purposes of the comparisons. The original document may be segmented beforehand and the segmentation information stored for later use. The authentication process may be designed to stop when alterations are detected at a higher level, so lower level comparisons are not carried out.
Owner:KONICA MINOLTA LAB U S A INC

File segmenting method and device for FAT file system

Modifying cluster chain table of original files, the method creates new structure of sub file list item. Finally, deleting the cluster chain table of original files, the method completes operation for segmenting files. Without need of carrying out operation of reading / writing file, the method needs only to modify cluster chain to realize segmentation of files. Thus, the invention saves time for operating files and saves storage space of system.
Owner:VIMICRO CORP

Mobile scan setup and context capture prior to scanning

Example embodiments described herein are directed to utilizing image matching technology to allow people to use their mobile device to setup scan workflows (or “Scan Flows”) in advance of a future scanning operation. Using an application on a mobile device, the user takes a photo of the first page of a document that he / she will scan at some later time and associates the image of the document with a simple workflow (e.g., where to store the document once the document has been scanned). Additional workflow actions may include automatic document sharing and notification, automatic document segmentation, and automatic document cropping.
Owner:FUJIFILM BUSINESS INNOVATION CORP

Bidding document review analysis method based on regional processing

The invention discloses a bidding document review analysis method based on regional processing, which belongs to the field of data processing, and comprises the following steps of: S1, importing a bidding document text, and processing the bidding document text into a character string stream which can be processed by a computer; s2, finding out corresponding regional identification data in the system, and segmenting the bidding document into a plurality of pieces of text information according to corresponding identifications; s3, analyzing and judging the segmented regional texts in the bidding document according to corresponding rules in the system; and S4, sorting and summarizing the analysis results to form a bidding document analysis and review result to be output. According to the method, region segmentation and decomposition can be carried out on the bidding document, the bidding document is analyzed in a regional mode, a more targeted analysis and review method can be used according to different regional characteristics and requirements, meanwhile, various analysis methods and tools can be conveniently applied to review work of the bidding document, and the compatibility of the analysis and review system is improved.
Owner:HONGTA TOBACCO GRP

Document segmentation for mixed raster content representation

In an array of pixels, a method for segmenting a selected pixel of the array between at least two layers including identifying an N-by-N window centered upon the selected pixel, evaluating at least one pixel in the N-by-N window to determine whether the selected pixel is a potential text element, identifying an M-by-M window centered upon the selected pixel when the evaluation determines that the selected pixel is a potential text element, wherein the M-by-M window is smaller than the N-by-N window, and determining whether the potential text element includes text by comparing at least two pixels within the M-by-M window.
Owner:LEXMARK INT INC +1

Plugin tool for collecting user generated document segmentation feedback

A method, system and a computer program product are provided for collecting document segmentation data by activating a document segmentation collection browser plugin with a designated toolbar button to generate one or more initial document segments from a webpage document and to receive user feedback for modifying a first initial document segment through a document segment control tool to generate a modified set of one or more initial document segments which are stored as document and document preprocessing data for the webpage document.
Owner:IBM CORP

Digital file recognition and deposit system

Systems and methods for recognizing and depositing digital files. Receive an unidentified file. Identify a target client and at least one account associated with the unidentified file. Segment the unidentified file into one or more document images. For each document image: scan the image and extract content, label the image based on its content, select an account of the target client, and deposit the labeled image in the selected account.
Owner:BANK OF AMERICA CORP

Document segmentation method

A document segmentation method of detecting segmentation points where a topic of an input document is discontinuous before and after the point to divide the document into plural blocks includes: detecting terms that occur in an input document; segmenting the input document into document segments, each segment being an appropriate sized chunk; generating document segment vectors with as its elements values related to frequencies of the terms occurring in the document segments; calculating eigenvalues and eigenvectors of a square sum matrix of the document segment vectors; selecting the basis vectors consisting a subspace from the eigenvectors to calculate the topic continuity of the document segments; calculating vectors with as their elements the values corresponding to the projection values of the each document segment vector onto the basis vector; and determining segmentation points of the document based on the continuity of the projected vectors.
Owner:HEWLETT PACKARD DEV CO LP

Plugin Tool for Collecting User Generated Document Segmentation Feedback

A method, system and a computer program product are provided for collecting document segmentation data by activating a document segmentation collection browser plugin with a designated toolbar button to generate one or more initial document segments from a webpage document and to receive user feedback for modifying a first initial document segment through a document segment control tool to generate a modified set of one or more initial document segments which are stored as document and document preprocessing data for the webpage document.
Owner:IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products