Wrongly written character monitoring method and system

A typo and dictionary technology, applied in the field of typo monitoring methods and systems, can solve the problems of reducing the accuracy of word segmentation, rough word segmentation process, and inaccurate word segmentation results, and achieve the effect of improving the accuracy rate

Inactive Publication Date: 2018-02-09
湖南网数科技有限公司
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1. The existing word segmentation system only uses one word segmentation method (forward maximum matching method or reverse maximum matching method) for word segmentation. The word segmentation process is relatively rough, resulting in inaccurate word segmentation results and reducing the accuracy of word segmentation;
[0006] 2. The existing word segmentation methods usually only involve word segmentation in a single field, and it is still impossible to effectively identify strings in multiple fields

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Wrongly written character monitoring method and system
  • Wrongly written character monitoring method and system
  • Wrongly written character monitoring method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0046] Such as figure 1 Shown is the flow chart of embodiment 1 of a kind of typos monitoring method disclosed by the present invention, and described method can comprise the following steps:

[0047] S101. Constructing a thesaurus of typos;

[0048] When it is necessary to monitor typos on the website, the basic data should be prepared first, and a typo database should be constructed. When constructing a typo database, it mainly includes collecting basic inf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a wrongly written character monitoring method, which comprises the following steps that: constructing a wrongly written character lexicon; carrying out data collection on a target website to obtain website data; carrying out preprocessing, webpage analysis and denoising on the obtained website data to obtain text contents; carrying out word segmentation processing on the text contents to obtain independent words; on the basis of a wrongly written character character lexicon, constructing an AC (Aho-Corasick) automaton dictionary tree, and generating a cache; constructinga context analysis model; and according to an AC dictionary tree cache and the context analysis model, carrying out wrongly written character identification, and outputting a wrongly written character identification result. By use of the method, wrongly written character monitoring accuracy can be effectively improved.

Description

technical field [0001] The present invention relates to the technical field of typo identification, in particular to a typo monitoring method and system. Background technique [0002] At present, some government websites have several serious typos on the same page or multiple pages with serious typos, which has attracted the attention of the public media and seriously affected the image of the government. In response to this phenomenon, the first national government website census included serious typos in the inspection points of the "serious mistakes" indicator. [0003] The typo monitoring method is usually composed of a typo lexicon, a word segmentation technology, and a typo recognition model. [0004] Word segmentation technology is the premise and key to identify typos. There are many word segmentation methods in the prior art, among which the word segmentation method based on character string is more general because it is relatively simple. String-based word segme...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/205G06F40/284
Inventor 周金娟王治平
Owner 湖南网数科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products