Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for catching limit word information, optimizing output and input method system

A technology for limiting information and feature information, applied in the field of computer character input data processing, can solve the problems of reducing input efficiency, troublesome user input, increasing the number of user candidates, etc., to optimize the character output process and improve the effect of intelligence

Active Publication Date: 2007-10-17
BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
View PDF0 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Including the word "quantity general" in the input method lexicon can certainly increase the intelligence of the input method (to achieve a higher intelligent word formation effect), but because the word "quantity general" is very difficult to use when it becomes a word alone appear less, which may cause trouble for user input, increase the number of candidates that users need to choose, and reduce input efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for catching limit word information, optimizing output and input method system
  • Method for catching limit word information, optimizing output and input method system
  • Method for catching limit word information, optimizing output and input method system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] Referring to FIG. 1 , a method embodiment 1 for obtaining information on restricted words is shown, which may specifically include:

[0055] Step 101, obtaining a target word;

[0056] The process of obtaining the target word can be obtained from the Internet, that is, directly obtained from the Internet corpus (for example, Internet web page collection or search keyword collection, etc.) through statistics and screening, and can also be obtained from the existing thesaurus. It does not need to be limited, as long as a target word set can be obtained; as for the range of the set, those skilled in the art can set it according to actual needs.

[0057] Preferably, for the obtained set of target words, an optimization step may also be included, using some attributes of the target words to remove some vocabulary, so as to further narrow the range. For example, words whose Internet word frequency or word frequency in the thesaurus is less than or equal to a preset threshold...

example 1

[0064] The feature information is: the word at the beginning of the target word is used as the characteristic value of the beginning of the word in the preset corpus, and the word at the end of the target word is used as the characteristic value of the end of the word in the preset corpus;

[0065] The preset condition for judging is: whether there is at least one eigenvalue among the above-mentioned eigenvalues ​​and whether it belongs to a preset range.

[0066] For example, for the word "quantity" in "quantity will" seldom appears at the beginning of a word, if its frequency of occurrence of the beginning of a word is less than or equal to the preset threshold, then "quantity will" can be determined as a restricted word.

[0067] Of course, if the target word is composed of three or more characters, it is also possible to determine the feature value of the word at a certain position in the word in the same position in the word in the preset corpus.

example 2

[0069] The feature information is: the feature value of the linguistic collocation relationship of each single-word and / or multi-word contained in the target word in the preset corpus;

[0070] The preset condition for judging is: whether at least one of the above-mentioned feature values ​​belongs to a preset range.

[0071] The linguistic collocation relationship may include multiple matching relationships such as collocation parameters between words, collocation parameters between words and parts of speech, and collocation parameters between parts of speech and parts of speech. Those skilled in the art may select or combine the various matching relationships described above according to actual needs.

[0072] For example, for the word "is to play", "yes" is followed by a verb, such a collocation relationship is rare in linguistics, so it can be obtained that its collocation feature value is less than or equal to the preset threshold, then it can be determined that "yes" Pl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for acquiring the limiting word information, comprising the steps of: acquiring a target word; acquiring the corresponding characteristic information of target word; jugding whether the characteristic information or the corresponding numerical result accords with the preset condition, if being suitable, determining the target word as a limit word and recording the related limiting information which is used for limiting the arrangement when the word is outputed alone. The inventive embodiment, by preseting the word stock including the inout method of limiting word information, judges whether the output candidative item accords with the preset condition of application limiting information when user inputs the word, and further, based on the result, judges whether the candidative item with limiting word information is displayed and outputed, accordingly user can obtain more effective output without increasing the operation, the character output process of input method system is optimized greatly, and the intelligentance of input method system is also improved.

Description

technical field [0001] The invention relates to the field of computer character input data processing, in particular to a method and device for obtaining information on restricted words, a method for updating an input method lexicon, a method for optimizing output and an input method system. Background technique [0002] With the popularization and development of computer technology and Internet technology, users with different professional fields, different interests and usage habits have higher and higher requirements for the intelligence of input method systems. [0003] In the prior art, there have been techniques to obtain the input method lexicon by using the statistics and screening of the huge and complex Internet corpus. The obtained Internet thesaurus can contain many new words that cannot be obtained through previous closed corpus information (such as modern Chinese dictionaries, news, newspapers, etc.), thereby greatly improving people's input efficiency. Howeve...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F3/023
CPCG06F17/276G06F3/018G06F40/274
Inventor 吕杰勇
Owner BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products