Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for cutting index participle

A word segmentation and indexing technology, applied in the field of information indexing, to achieve the effect of enhancing experience

Active Publication Date: 2007-11-14
SHENZHEN TENCENT COMP SYST CO LTD
View PDF0 Cites 45 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The purpose of the present invention is to provide a method for segmenting index word segmentation, which can simultaneously solve the problems of accurate word segmentation, a certain amount of redundant words and single word segmentation, and enhance user experience
[0008] The object of the present invention is also to provide a system for segmenting index word segmentation, which can simultaneously solve the problems of accurate word segmentation, a certain amount of redundant words and single word segmentation, and enhance user experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for cutting index participle
  • Method and system for cutting index participle
  • Method and system for cutting index participle

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0089] A preferred embodiment of the method of the present invention comprises the following steps:

[0090] S10. Read the character stream.

[0091] S20. Identify the character stream, and determine Chinese characters, English characters or numbers, and unrecognizable characters.

[0092] S21. Store the recognized character stream in an internal character buffer.

[0093] Before the character stream is stored in the internal character buffer, the character stream can also be processed with unified characters.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a segmentation index segmentation method. Including the following steps: read the character stream; identification described the character stream to identify Chinese characters and English characters, as well as an identification number or character; already identified Chinese and English characters or Digital and pre-built 1.1 tree comparison, the sub-set match words; English characters or figures generic fuzzy matching ASCII codes to determine English string or string of digital-term matching the above mentioned English words and string or digital string of words and non-recognition of characters referred to the character stream by order of ranking; The words and figures mentioned in the English string or strings of the sort described in the order of the character stream. The invention also openly segmentation Index segmentation system. The invention provides a cut-word indexing method and system can simultaneously address the precise words, a certain amount of redundant words and word-term problems, enhance the user experience.

Description

technical field [0001] The invention relates to the field of information indexing, in particular to a method and system for segmenting index words. Background technique [0002] Existing information retrieval systems have become increasingly popular, ranging from web search engines to application-specific information retrieval systems. When it is necessary to process Chinese character information, the information retrieval system will encounter the problem of how to segment words. [0003] There are many word segmentation algorithms at present, among which n-gram word segmentation is a mechanical word segmentation method that does not require a dictionary and is easy to implement. However, this word segmentation method has a large degree of redundancy and cannot solve the problem of single-character word segmentation. [0004] The binary word segmentation method is to separate any two adjacent words that appear in a sentence, and build an inverted index. For example: the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/30
Inventor 王启明
Owner SHENZHEN TENCENT COMP SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products