Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Token stream differencing with moved-block detection

a technology of moving blocks and token streams, applied in the field of token stream differencing, can solve the problems of confusing results and cluttering the results report, and achieve the effects of simple change tracking, effective detection of moved blocks of text, and good moved block detection performan

Inactive Publication Date: 2009-01-08
ADOBE SYST INC
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The invention is about a system and method for identifying and matching tokens in a text document to represent changes in document flow. The system obtains a first and second token stream from the text document, compares them to identify common sub-sequences, and presents the matched information as a representation of the changes in document flow. The system can also identify additional matched blocks of tokens and present them for further analysis. The technical effects of the invention include improved accuracy in identifying changes in document flow and improved efficiency in analyzing text documents.

Problems solved by technology

Moreover, when such techniques actually do identify moved blocks, the displayed results can be very confusing because small additions and / or deletions within a moved block of text can create a checker-boarding effect in the generated results, where moved and unmoved words interleave each other, thus cluttering the results report.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Token stream differencing with moved-block detection
  • Token stream differencing with moved-block detection
  • Token stream differencing with moved-block detection

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026]FIG. 1 is a flowchart showing a process of token stream differencing. A token stream is an ordered sequence of tokens. The ordered sequence can be in an electronic document. As used herein, the terms “electronic document” and “document” mean a set of electronic data, including both electronic data stored in a file and electronic data received over a network. An electronic document does not necessarily correspond to a file. A document may be stored in a portion of a file that holds other documents, in a single file dedicated to the document in question, or in a set of coordinated files.

[0027]Tokens in a token stream can represent nearly anything. For text files, tokens can be characters, words, or lines of text, and can include white space tokens or other text elements. For example, the tokens in a text file can be the words in the file arranged within the token stream in reading order. Tokens can be any discrete data elements that can be arranged in a sequence. For example, in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Methods and apparatus implementing systems and techniques for differencing token streams and detecting moved blocks of tokens. In general, in one implementation, the technique includes: obtaining a first token stream and a second token stream, comparing the first and second token streams to identify a group of tokens that are substantially similar in the first and second token streams, the similar-tokens group including common sub-sequences, which are identical in the first and second token streams, and at least one unmatched token, and presenting matched token information corresponding to the similar-tokens group to represent changes in document flow.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application is a continuation application of and claims priority to U.S. application Ser. No. 10 / 272,858 filed on Oct. 16, 2002. The disclosure of the prior application is considered part of (and is incorporated by reference in) the disclosure of this application.BACKGROUND OF THE INVENTION[0002]The present application describes systems and techniques relating to token stream differencing, for example, comparison of text documents to identify document changes.[0003]Various techniques exist for comparing token streams. Such comparison is commonly referred to as differencing or as a diff operation. Differencing two token streams typically involves comparing two versions of a token stream, commonly referred to as the original stream and the modified stream, and looking for differences between them. In the context of text comparison, many differencing processes use individual text characters or words as the tokens. Such diff processes ar...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F17/27G06F17/22
CPCG06F17/277G06F17/2211G06F40/194G06F40/284
Inventor IE, WILLIAMALTMAN, ADAM E.ROWE, EDWARD R. W.
Owner ADOBE SYST INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products