Technology for division word in inverted reference sentence

A word segmentation technology, word segmentation technology, applied in the field of word segmentation technology, can solve the problem that the accuracy rate cannot meet the needs and so on

Inactive Publication Date: 2007-05-16
徐文新
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The above-mentioned word segmentation methods are word segmentation methods based on the vocabulary, the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The implementation method of using the inverted file method in speech input, machine translation, search engine, etc. will be described below. There are slight differences in the three aspects, but the principle is the same. Comprehend the Chinese phonetic alphabet as the phonetic symbols of Chinese, then other languages ​​can also refer to implementation.

[0020] 1. Establish an S database of reference sentence patterns (including word collocations, phrases, and words, the same below) in a certain language, give the sentence pattern number n, and count the number k of characters contained in the reference sentence patterns and word collocations.

[0021] In voice input, speed and accuracy must be weighed, so the reference sentence pattern should be appropriate. The following is the pattern of the voice input reference sentence pattern database:

[0022] serial number

pinyin string

k

j

Chinese character string

predicate v...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

This invention relates to reference shape and word technique based on reference shape and word division technique, which comprises the following steps: firstly establishing certain language reference sentence database to give sentence code n for statistical sentence type composed of character number k; then establishing inverse row file composed of characters to list all reference sentences codes n composed of the characters to inverse reference sentences.

Description

technical field [0001] The invention is a word segmentation technology based on reference sentence shapes and word collocations, which can be used in speech input, machine translation, search engines and the like. Background technique [0002] Speech input, machine translation, and search engines all need word segmentation. What attracts more attention is that improving the word segmentation accuracy of pinyin strings has become the key to improving the level of Chinese speech input. The algorithms for automatic word segmentation of Chinese pinyin strings mainly include: maximum matching method (MM), least word frequency selection method (FWF) and word-by-word traversal method. According to different scanning directions, the maximum matching method is divided into forward maximum matching method (FMM) and reverse maximum matching method (BMM). The least word segmentation algorithm is based on the principle of the least number of words obtained after segmentation, which is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27
CPCG06F17/2775G06F40/289
Inventor 徐文新
Owner 徐文新
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products