Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Device and method for detecting bad corpus data content

A content detection and detection device technology, applied in semantic analysis, natural language data processing, instruments, etc., can solve the problems of low accuracy, inability to accurately and comprehensively detect bad content, missed judgments, etc., and achieve the goal of preventing missed judgments Effect

Inactive Publication Date: 2017-05-24
SHENZHEN GOWILD ROBOTICS CO LTD
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the prior art, the detection method for bad corpus usually adopts the statistical method, and the statistical method mainly judges whether it is bad content according to the bad information lexicon. Detect all bad content in the content, which is easy to cause missed judgments

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Device and method for detecting bad corpus data content
  • Device and method for detecting bad corpus data content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0010] The technical solutions of the present invention will be further described in more detail below in conjunction with specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0011] The construction of corpus is an important foundation of statistical learning methods. In recent years, the great value of corpus resources for natural language research has been more and more recognized. In particular, the bilingual corpus (Bilingual Corpus) has become an indispensable and important resource for research on machine translation, machine-assisted translation and translation knowledge acquisition. On the one hand, the emergence of bilingual corpora has directly promoted the development of n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a device and method for detecting the bad corpus data content. The device comprises a semantic frame determining module used for carrying out word segmentation to corpus data to be detected and determining the semantic frame of the corpus data to be detected; a detection standard setting module connected with a corpus and the semantic frame determining module, and is used for transmitting corpus data in the corpus to the semantic frame determining module in order to determine the semantic frame of the corpus data in the corpus, and extracting bad content words obtained during the word segmentation process of the corpus; and a detecting module used for comparing the word segmentation result of the corpus data to be detected with the bad content words, comparing the semantic frame to be detected with all semantic frames, and determining whether the corpus data to be detected is a bad corpus data content. According to the invention, by comparing the semantic frame to be detected with known semantic frames, whether the semantic frame to be detected is a bad corpus data content is judged, whether the corpus data to be detected is a bad content can be judged accurately, and omission of judge can be prevented.

Description

technical field [0001] The invention relates to the field of word processing, in particular to a device and method for detecting bad corpus content. Background technique [0002] With the development of the Internet, the demand for web search is getting higher and higher, so it is necessary to reserve more keywords and corpus, which are stored in the corpus in the cloud for use by netizens when searching online. In order to optimize the network environment, it is often necessary to detect bad content on vocabulary or corpus input by network users, and to block the bad content vocabulary or corpus. [0003] In the prior art, the detection method for bad corpus usually adopts the statistical method, and the statistical method mainly judges whether it is bad content according to the bad information lexicon. All bad content in the detection content is likely to cause missed judgments. Contents of the invention [0004] The technical problem mainly solved by the present inven...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/353G06F16/9535G06F40/289G06F40/30G06F16/00
Inventor 杨新宇王昊奋邱楠
Owner SHENZHEN GOWILD ROBOTICS CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products