Patents

Literature

Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.

65 results about "Document segmentation" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Character and vector graphics watermark for structured electronic documents security

ActiveUS20060075241A1Reasonable data-hiding rateIncrease chanceUser identity/authority verificationDigital data protectionDocumentation procedurePaper document

The present invention is a method and apparatus for watermarking text or vector graphics documents. It is based on character-wise or vector graphics element-wise grayscale or color modulation. At high resolution, halftone or dither modulation can also be used in addition or in place of grayscale / color modulation, for printed representation of electronic document. For the detection, the document is acquired through an acquisition device, document segmentation is performed, characters / elements are segmented, the watermark signal is estimated and the information is decoded. Although the proposed scheme mostly addresses the watermarking of hard-copy documents, it can easily be integrated into electronic document editing and acquisition tools, and the watermark is attached to the document electronic version. The invention is applicable either using expensive high- resolution printing and acquisition devices, either using common cheap low-resolution devices, depending on the application needs. The proposed scheme is suitable for example to the protection of security documents, contracts, technical and commercial documentation; it can use any physical support like paper, cellulose, or plastic; it can be used for copy protection, authentication, or tamper proofing; finally it can also be applied to other non security-related applications, such as document tracking, as well as document embedded annotation and watermarked-assisted automatic processing.

Character and vector graphics watermark for structured electronic documents security

Character and vector graphics watermark for structured electronic documents security

Character and vector graphics watermark for structured electronic documents security

Owner:UNIVERSITY OF GENEVA +1

Photo-document segmentation method and system

ActiveUS20090175537A1Simple methodImage enhancementImage analysisComputer visionDigital image

The present application provides an improved segmentation method and system for processing digital images that include an imaged document and surrounding image. A plurality of edge detection techniques are used to determine the edges of the imaged document and then segment the imaged document from the surrounding image.

Photo-document segmentation method and system

Photo-document segmentation method and system

Photo-document segmentation method and system

Owner:COMPULINK MANAGEMENT CENT

Method of texture-based color document segmentation

InactiveUS6993185B2Noise generatedCharacter and pattern recognitionNoise reductionAlternative methods

A method for segmenting a color document into regions of text and halftone discriminates between text and halftone by examining the texture of the document. A color document is digitized, and a color space transform is preferably applied to the digitized document. The texture of the document is identified and a noise reduction step is preferably applied. Bounding boxes (blocks) within the document are identified and then the areas within the bounding boxes are classified as either text or halftone. Two alternative methods are described for examining the document texture. In the first method, a windowing operation is applied to either an intensity image or a color difference image. In the second method, a wavelet transform step combined with Fuzzy K-Mean clustering is applied. Bounding boxes are classified as either text or halftone based upon the relative periodicity of a horizontal or vertical (or both) histogram of each bounding box.

Method of texture-based color document segmentation

Method of texture-based color document segmentation

Method of texture-based color document segmentation

Owner:PANASONIC CORP

System and method for policy-driven file segmentation and inter-cloud file storage and retrieval

InactiveUS20100325422A1Digital data information retrievalDigital data processing detailsDocumentationCloud storage

A file storage system includes one or more document input devices and a processor communicating with both a memory and the one or more document input devices. The processor executes a software application stored on the memory to separate a sensitive portion of a document from an insensitive portion of a document. A first type of cloud storage includes one or more storage devices in operable communication with the one or more document input devices. The first type of cloud storage is configured to store one or both of the separated portions with a level of encryption agreed upon by a user. A second type of cloud storage includes one or more storage devices in operable communication with the one or more document input devices. The second type of cloud storage is configured to store the insensitive portion of a document based on a consent of the user.

System and method for policy-driven file segmentation and inter-cloud file storage and retrieval

System and method for policy-driven file segmentation and inter-cloud file storage and retrieval

System and method for policy-driven file segmentation and inter-cloud file storage and retrieval

Owner:XEROX CORP

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

ActiveUS8000528B2Improve performanceCharacter and pattern recognitionPattern recognitionGraphics

A document authentication method compares a target document image (scanned image) with an original document image at multiple levels, such as block (e.g. paragraph, graphics, image), line, word and character levels. The paragraph level comparison determines whether the target and original images have the same number of paragraphs and whether the paragraphs have the same sizes and locations; the line level comparison determines if the target and original images have the same number of lines and whether the lines have the same sizes and locations; etc. Document segmentation is performed on the target and original images to segment them into paragraph units, line units, etc. for purposes of the comparisons. The original document may be segmented beforehand and the segmentation information stored for later use. The authentication process may be designed to stop when alterations are detected at a higher level, so lower level comparisons are not carried out.

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

Owner:KONICA MINOLTA LAB U S A INC

Character and vector graphics watermark for structured electronic documents security

ActiveUS7644281B2Reasonable data-hiding rateIncrease chanceUser identity/authority verificationDigital data protectionPattern recognitionEngineering

The present invention is a method and apparatus for watermarking text or vector graphics documents. It is based on character-wise or vector graphics element-wise grayscale or color modulation. At high resolution, halftone or dither modulation can also be used in addition or in place of grayscale / color modulation, for printed representation of electronic document. For the detection, the document is acquired through an acquisition device, document segmentation is performed, characters / elements are segmented, the watermark signal is estimated and the information is decoded. Although the proposed scheme mostly addresses the watermarking of hard-copy documents, it can easily be integrated into electronic document editing and acquisition tools, and the watermark is attached to the document electronic version.The invention is applicable either using expensive high-resolution printing and acquisition devices, either using common cheap low-resolution devices, depending on the application needs. The proposed scheme is suitable for example to the protection of security documents, contracts, technical and commercial documentation; it can use any physical support like paper, cellulose, or plastic; it can be used for copy protection, authentication, or tamper proofing; finally it can also be applied to other non security-related applications, such as document tracking, as well as document embedded annotation and watermarked-assisted automatic processing.

Character and vector graphics watermark for structured electronic documents security

Character and vector graphics watermark for structured electronic documents security

Character and vector graphics watermark for structured electronic documents security

Owner:UNIVERSITY OF GENEVA +1

Document segmentation based on visual gaps

InactiveUS20060149775A1Digital data information retrievalNatural language data processingVision basedPaper document

A document may be segmented based on a visual model of the document. The visual model is determined according to an amount of visual white space or gaps that are in the document. In one implementation, the visual model is used to identify a hierarchical structure of the document, which may then be used to segment the document.

Document segmentation based on visual gaps

Document segmentation based on visual gaps

Document segmentation based on visual gaps

Owner:GOOGLE LLC

Font characteristic driven halftoning

InactiveUS7224489B2Reduction and elimination of halftone generated jaggednessImproved renderingVisual presentation using printersPictoral communicationImaging processingComputer graphics (images)

Characteristics of text or text components or features are considered when selecting halftoning screens. For example, an italic slant angle of text is recognized and used to select or generate a compatible halftone screen oriented at the same angle. A screen frequency may be selected based on a thickness of a text component. Descriptive tags associated with text or text components facilitate screen selection. Tags are assigned based on font descriptions included in a document during authoring. Alternatively, tags are assigned based on the results of document segmentation and character recognition techniques. An image processing system operative to consider characteristics of text or text components when selecting halftone screens includes a text component characteristic recognizer, a halftone screen selector and a halftoner. Optionally a print engine is also included. In a xerographic environment the print engine includes a xerographic printer.

Font characteristic driven halftoning

Font characteristic driven halftoning

Font characteristic driven halftoning

Owner:XEROX CORP

Vision-based document segmentation

InactiveCN1577328ADigital computer detailsCharacter and pattern recognitionVision basedDocumentation

Vision-based document segmentation identifies one or more portions of semantic content of a document. The one or more portions are identified by identifying a plurality of visual blocks in the document, and detecting one or more separators between the visual blocks of the plurality of visual blocks. A content structure for the document is constructed based at least in part on the plurality of visual blocks and the one or more separators, and the content structure identifies the one or more portions of semantic content of the document. The content structure obtained using the vision-based document segmentation can optionally be used during document retrieval.

Vision-based document segmentation

Vision-based document segmentation

Vision-based document segmentation

Owner:MICROSOFT CORP

Method and system for document segmentation

ActiveUS6904170B2Character recognitionAlgorithmDocument segmentation

A method of document segmentation. Specifically, one embodiment of the present invention discloses a method of document segmentation that performs a plurality of projection profiles of pixel intensities on a document containing a plurality of text lines over a range of angles. A plurality of slope values for a plurality of discrete distances perpendicular to said range of angles is calculated for the plurality of projection profiles. A set of maximum absolute slope values is sorted out from the plurality of slope values. Text lines of first and second type are identified by setting a threshold slope value. Absolute slope values greater than the threshold slope value indicate the plurality of text lines of said first type. Absolute slope values less than the threshold slope value indicate the plurality of text lines of a second type.

Method and system for document segmentation

Method and system for document segmentation

Method and system for document segmentation

Owner:HEWLETT PACKARD DEV CO LP

Photo-document segmentation method and system

ActiveUS8265393B2Image enhancementImage analysisComputer visionDigital image

The present application provides an improved segmentation method and system for processing digital images that include an imaged document and surrounding image. A plurality of edge detection techniques are used to determine the edges of the imaged document and then segment the imaged document from the surrounding image.

Photo-document segmentation method and system

Photo-document segmentation method and system

Photo-document segmentation method and system

Owner:COMPULINK MANAGEMENT CENT

Text extraction, in particular table extraction from electronic documents

ActiveUS20210056300A1Reduce lossesImage enhancementImage analysisData packElectronic document

A method for extracting of data contained in a fixed format electronic document is disclosed. The method is particularly applicable to extracting data from tables in electronic documents and includes reading, by a computer system, the electronic document as a computer image file; segmenting, by the computer system, the computer image file into document sections representative of distinct portions of data; applying a label to each distinct document section; and executing, by the computer system, an optical character recognition algorithm to convert the image file into computer-readable text, wherein segments of the converted text is associated with a respective label indicative of each distinct document section.

Text extraction, in particular table extraction from electronic documents

Text extraction, in particular table extraction from electronic documents

Text extraction, in particular table extraction from electronic documents

Owner:KIRA INC +1

Morphology and integral projection-based printed Uygur document segmentation method

ActiveCN106372639AOvercoming disadvantages of flexibility limitationsImprove Segmentation AccuracyCharacter and pattern recognitionDocumentationDocument segmentation

The invention discloses a morphology and integral projection-based printed Uygur document segmentation method, and mainly solves the problem of flexibility limitation during acquisition of a row document image and the problem of missing segmentation of a character define in the specification during acquisition of a single-character image in an existing segmentation method. The method comprises the steps of (1) inputting a binary image; (2) acquiring the row document image; (3) acquiring a sub-word image; (4) acquiring a connected segment image; (5) acquiring a connected segment image only with a main body stroke part; (6) determining a baseline domain of the connected segment image only with the main body stroke part; and (7) acquiring the single-character image. Compared with an existing printed Uygur document segmentation method, the morphology and integral projection-based printed Uygur document segmentation method has the advantages that a threshold is not set during the acquisition of the row document image, so that the flexibility is better, the problem of the missing segmentation of the character define in the specification is avoided, and the accuracy of printed Uygur document segmentation can be improved.

Morphology and integral projection-based printed Uygur document segmentation method

Morphology and integral projection-based printed Uygur document segmentation method

Morphology and integral projection-based printed Uygur document segmentation method

Owner:XIDIAN UNIV

Picture file encryption and decryption method and system based on content-associated secret key

InactiveCN105743906AStrong ability to resist brute force crackingIncrease system costTransmissionGray levelMechanism based

The invention relates to a picture file encryption and decryption method and a picture file encryption and decryption system based on a content-associated secret key. When in encryption, according to the importance of data with respect to file application, the original picture file is segmented into sensitive data and main body data, the sensitive data forms a secret key, the main body data is subjected to supplemental treatment to form a cryptograph, and the secrete key and the cryptograph are respectively transmitted via different ways, and then combined at a destination to revert to the original picture file when in decryption; when the original picture file is segmented, the segmentation operation of the picture file is performed in a file layer, and the segmentation operation is performed in pixels based on a pixel gray level value or the contour line of a monochromatic chromatic value. According to the new encryption mechanism based on the content-associated secret key provided by the invention, the secret key with large data volume can be used, and the encryption and decryption processes are basically performed on the client, thus the system load is not increased; the new encryption mechanism can improve the data safety of the cloud storage system, in particular enhancing the privacy of the user data.

Picture file encryption and decryption method and system based on content-associated secret key

Picture file encryption and decryption method and system based on content-associated secret key

Picture file encryption and decryption method and system based on content-associated secret key

Owner:WUHAN YOUXIN ZHONGWANG TECH CO LTD

Method and device for document segmentation based on text lines

ActiveCN107391457AImprove performanceMeet actual needsText processingSpecial data processing applicationsDocumentation procedureUnit structure

The invention relates to the field of text processing. In view of the problems in the prior art, the invention provides a method and device for document segmentation based on text lines. Whether or not text line units are merged into one paragraph is judged according to a merged score of the text line units; when the score of the text line units does not satisfy a merging need, merging of a current paragraph is ended, and processing of a new paragraph begins. According to the method and device for the document segmentation based on the text lines, the problems existing in the prior art can be solved simply and effectively; pages and document data structures can be extracted by using the method, and text line information is extracted from the document data structure corresponding to each text line; each document data structure including text lines is traversed in a full text, and context information of the full text and context information of each page can be calculated in a statistical mode according to a text line information list formed by the text line information of the document data structures separately; based on n text line unit structure lists in each page, by combining other context information, segmentation is conducted on the text line units in each page according to a segmentation algorithm.

Method and device for document segmentation based on text lines

Owner:科来网络技术股份有限公司

Document segmentation for mixed raster content representation

InactiveUS20070150803A1Digital computer detailsCharacter and pattern recognitionAlgorithmMixed raster content

In an array of pixels, a method for segmenting a selected pixel of the array between at least two layers including identifying an N-by-N window centered upon the selected pixel, evaluating at least one pixel in the N-by-N window to determine whether the selected pixel is a potential text element, identifying an M-by-M window centered upon the selected pixel when the evaluation determines that the selected pixel is a potential text element, wherein the M-by-M window is smaller than the N-by-N window, and determining whether the potential text element includes text by comparing at least two pixels within the M-by-M window.

Document segmentation for mixed raster content representation

Document segmentation for mixed raster content representation

Document segmentation for mixed raster content representation

Owner:LEXMARK INT INC +1

System and method of performing patch-based document segmentation for information extraction

ActiveUS20210064865A1Image enhancementImage analysisUser deviceHuman–computer interaction

A user device associated with a user may receive a document associated with the user. The user device may encrypt the received document. The user device may perform patch-based document segmentation on the received document to form a plurality of patches on the received document. The user device may extract text from each patch of the plurality of patches. The user device may analyze the extracted text from each patch to detect a field title and a field value. The user device may encrypt the extracted text and its associated field value for each patch of the plurality of patches. The user device may send the encrypted extracted text and its associated field value to the user device and instructions to display the extracted text and its associated field value on a user interface.

System and method of performing patch-based document segmentation for information extraction

System and method of performing patch-based document segmentation for information extraction

System and method of performing patch-based document segmentation for information extraction

Owner:INTUIT INC

Hierarchical information extraction using document segmentation and optical character recognition correction

ActiveUS9715625B2Natural language data processingCharacter recognitionDocumentationDocument segmentation

Systems, methods, and media for extracting and processing entity data included in an electronic document are provided herein. Methods may include executing one or more extractors to extract entity data within an electronic document based upon an extraction model for the document, selecting extracted entity data via one or more experts, each of the experts applying at least one business rule to organize at least a portion of the selected entity data into a desired format, and providing the organized entity data for use by an end user.

Hierarchical information extraction using document segmentation and optical character recognition correction

Hierarchical information extraction using document segmentation and optical character recognition correction

Hierarchical information extraction using document segmentation and optical character recognition correction

Owner:OPEN TEXT HLDG INC

Document updating method and server

ActiveCN111221569AAvoid wastingAvoid uploadingProgram documentationDigital data protectionEngineeringData mining

The invention discloses a document updating method and a server. The method comprises the steps: generating a homomorphic calculation function and a homomorphic parameter according to to-be-updated content information, and the to-be-updated content information is the information obtained through the comparison of an original document and a to-be-updated document; determining an identifier of a to-be-updated sub-file according to the to-be-updated content information, the to-be-updated sub-file being a to-be-updated file in the segmentation sub-file, and the segmentation sub-file being a file obtained by segmenting the original document; updating parameters are generated according to the homomorphic calculation function, the parameter encryption result and the identifiers and numbers corresponding to the to-be-updated sub-files, and the parameter encryption result is obtained by homomorphic encryption of the homomorphic parameters through adoption of a homomorphic encryption public key;and sending the update parameters to the blockchain network, so that the storage node server updates the segmentation sub-files stored by the storage node server according to the update parameters. And the working efficiency is improved, and resource waste is avoided.

Document updating method and server

Document updating method and server

Document updating method and server

Owner:CHINA UNITED NETWORK COMM GRP CO LTD

Cloud printing large document rapid printing method

InactiveCN110134343AReduce waiting timeSolve the speed problemDigital output to print unitsNetwork connectionClient-side

The invention provides a cloud printing large document rapid printing method. The cloud printing large document rapid printing method comprises a client, a document server, a mobile terminal and a printing terminal, wherein each terminal is in network connection with the server; the client transmits a to-be-printed document to a conversion assembly, and performs document conversion and segmentation processing after detecting a large document message; the document server receives and stores the converted document uploaded by the client; the mobile terminal is associated with the client accountinformation, obtains the to-be-printed document, and is associated with the printing terminal by scanning a QR code of the printing terminal, so as to submit the to-be-printed document; and the printing terminal receives the to-be-printed document and prints the document. And the conversion assembly integrated by the client comprises a control end and a processing end and is used for converting and processing the large document. When a printer receives a large document printing command, document segmentation can be automatically carried out and then the documents are printed sequentially, thussolving the problems that the printing speed is low, and the waiting time of a user is long and the like, and improving the printing efficiency.

Cloud printing large document rapid printing method

Cloud printing large document rapid printing method

Cloud printing large document rapid printing method

Owner:南京信安宝信息科技有限公司

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

ActiveCN102117414ACharacter and pattern recognitionPattern recognitionGraphics

The present invention relates to a method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics. A document authentication method compares a target document image (scanned image) with an original document image at multiple levels, such as block (e.g. paragraph, graphics, image), line, word and character levels. The paragraph level comparison determines whether the target and original images have the same number of paragraphs and whether the paragraphs have the same sizes and locations; the line level comparison determines if the target and original images have the same number of lines and whether the lines have the same sizes and locations; etc. Document segmentation is performed on the target and original images to segment them into paragraph units, line units, etc. for purposes of the comparisons. The original document may be segmented beforehand and the segmentation information stored for later use. The authentication process may be designed to stop when alterations are detected at a higher level, so lower level comparisons are not carried out.

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

Method and apparatus for authenticating printed documents using multi-level image comparison based on document characteristics

Owner:KONICA MINOLTA LAB U S A INC

File segmenting method and device for FAT file system

InactiveCN1776687ASave operating timeSave storage spaceSpecial data processing applicationsFile systemLinked list

Modifying cluster chain table of original files, the method creates new structure of sub file list item. Finally, deleting the cluster chain table of original files, the method completes operation for segmenting files. Without need of carrying out operation of reading / writing file, the method needs only to modify cluster chain to realize segmentation of files. Thus, the invention saves time for operating files and saves storage space of system.

File segmenting method and device for FAT file system

File segmenting method and device for FAT file system

File segmenting method and device for FAT file system

Owner:VIMICRO CORP

Mobile scan setup and context capture prior to scanning

ActiveUS8941847B2Digital computer detailsDigital output to print unitsDocumentation procedureApplication software

Example embodiments described herein are directed to utilizing image matching technology to allow people to use their mobile device to setup scan workflows (or “Scan Flows”) in advance of a future scanning operation. Using an application on a mobile device, the user takes a photo of the first page of a document that he / she will scan at some later time and associates the image of the document with a simple workflow (e.g., where to store the document once the document has been scanned). Additional workflow actions may include automatic document sharing and notification, automatic document segmentation, and automatic document cropping.

Mobile scan setup and context capture prior to scanning

Mobile scan setup and context capture prior to scanning

Mobile scan setup and context capture prior to scanning

Owner:FUJIFILM BUSINESS INNOVATION CORP

Bidding document review analysis method based on regional processing

PendingCN112613285AImprove compatibilityNatural language data processingDocument analysisData mining

The invention discloses a bidding document review analysis method based on regional processing, which belongs to the field of data processing, and comprises the following steps of: S1, importing a bidding document text, and processing the bidding document text into a character string stream which can be processed by a computer; s2, finding out corresponding regional identification data in the system, and segmenting the bidding document into a plurality of pieces of text information according to corresponding identifications; s3, analyzing and judging the segmented regional texts in the bidding document according to corresponding rules in the system; and S4, sorting and summarizing the analysis results to form a bidding document analysis and review result to be output. According to the method, region segmentation and decomposition can be carried out on the bidding document, the bidding document is analyzed in a regional mode, a more targeted analysis and review method can be used according to different regional characteristics and requirements, meanwhile, various analysis methods and tools can be conveniently applied to review work of the bidding document, and the compatibility of the analysis and review system is improved.

Bidding document review analysis method based on regional processing

Bidding document review analysis method based on regional processing

Bidding document review analysis method based on regional processing

Owner:HONGTA TOBACCO GRP

Document segmentation for mixed raster content representation

InactiveUS7729546B2Character and pattern recognitionSpecial data processing applicationsPattern recognitionComputer graphics (images)

In an array of pixels, a method for segmenting a selected pixel of the array between at least two layers including identifying an N-by-N window centered upon the selected pixel, evaluating at least one pixel in the N-by-N window to determine whether the selected pixel is a potential text element, identifying an M-by-M window centered upon the selected pixel when the evaluation determines that the selected pixel is a potential text element, wherein the M-by-M window is smaller than the N-by-N window, and determining whether the potential text element includes text by comparing at least two pixels within the M-by-M window.

Document segmentation for mixed raster content representation

Document segmentation for mixed raster content representation

Document segmentation for mixed raster content representation

Owner:LEXMARK INT INC +1

Document segmentation based on visual gaps

InactiveUS7421651B2Digital data information retrievalNatural language data processingVision basedDocument preparation

A document may be segmented based on a visual model of the document. The visual model is determined according to an amount of visual white space or gaps that are in the document. In one implementation, the visual model is used to identify a hierarchical structure of the document, which may then be used to segment the document.

Document segmentation based on visual gaps

Document segmentation based on visual gaps

Document segmentation based on visual gaps

Owner:GOOGLE LLC

Plugin tool for collecting user generated document segmentation feedback

InactiveUS9940320B2Improve accuracyNatural language data processingKnowledge representationDocumentationWeb page

A method, system and a computer program product are provided for collecting document segmentation data by activating a document segmentation collection browser plugin with a designated toolbar button to generate one or more initial document segments from a webpage document and to receive user feedback for modifying a first initial document segment through a document segment control tool to generate a modified set of one or more initial document segments which are stored as document and document preprocessing data for the webpage document.

Plugin tool for collecting user generated document segmentation feedback

Plugin tool for collecting user generated document segmentation feedback

Plugin tool for collecting user generated document segmentation feedback

Owner:IBM CORP

Digital file recognition and deposit system

ActiveUS20200183880A1File system administrationData switching networksDocument IdentifierComputer graphics (images)

Systems and methods for recognizing and depositing digital files. Receive an unidentified file. Identify a target client and at least one account associated with the unidentified file. Segment the unidentified file into one or more document images. For each document image: scan the image and extract content, label the image based on its content, select an account of the target client, and deposit the labeled image in the selected account.

Digital file recognition and deposit system

Digital file recognition and deposit system

Digital file recognition and deposit system

Owner:BANK OF AMERICA CORP

Document segmentation method

InactiveUS7308138B2Character and pattern recognitionNatural language data processingFeature vectorEigenvalues and eigenvectors

A document segmentation method of detecting segmentation points where a topic of an input document is discontinuous before and after the point to divide the document into plural blocks includes: detecting terms that occur in an input document; segmenting the input document into document segments, each segment being an appropriate sized chunk; generating document segment vectors with as its elements values related to frequencies of the terms occurring in the document segments; calculating eigenvalues and eigenvectors of a square sum matrix of the document segment vectors; selecting the basis vectors consisting a subspace from the eigenvectors to calculate the topic continuity of the document segments; calculating vectors with as their elements the values corresponding to the projection values of the each document segment vector onto the basis vector; and determining segmentation points of the document based on the continuity of the projected vectors.

Document segmentation method

Document segmentation method

Document segmentation method

Owner:HEWLETT PACKARD DEV CO LP

Plugin Tool for Collecting User Generated Document Segmentation Feedback

InactiveUS20170154031A1Improve accuracyNatural language data processingKnowledge representationDocumentationWeb page

A method, system and a computer program product are provided for collecting document segmentation data by activating a document segmentation collection browser plugin with a designated toolbar button to generate one or more initial document segments from a webpage document and to receive user feedback for modifying a first initial document segment through a document segment control tool to generate a modified set of one or more initial document segments which are stored as document and document preprocessing data for the webpage document.

Plugin Tool for Collecting User Generated Document Segmentation Feedback

Plugin Tool for Collecting User Generated Document Segmentation Feedback

Plugin Tool for Collecting User Generated Document Segmentation Feedback

Owner:IBM CORP

Popular searches

Hue Grayscale Automatic processing Authentication Annotation Vector graphics Hard copy Halftone Image resolution Copy protection

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com