Security information duplicate checking method and system based on semantic analysis

A semantic analysis and information technology, applied in the security information plagiarism check method and system field based on semantic analysis, can solve the problems of difficult to locate sentences, consume large resources, etc., and achieve the effect of improving the accuracy and efficiency of plagiarism check

Pending Publication Date: 2019-10-11
GF SECURITIES CO LTD
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in the research of the prior art, the inventor of the present invention found that the existing plagiarism checking algorithm has the following disadvantages: on the one hand, a text library must be established in advance, similar to the paper library of HowNet or the webpage l

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Security information duplicate checking method and system based on semantic analysis
  • Security information duplicate checking method and system based on semantic analysis
  • Security information duplicate checking method and system based on semantic analysis

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0055] see Figure 1-2 .

[0056] Such as figure 1 As shown, a semantic analysis-based securities information plagiarism checking method provided by an embodiment of the present invention is suitable for execution in a computing device, and at least includes the following steps:

[0057] S101. Collect the latest information data to be detected in real time from the business system for writing information;

[0058] Specifically, for step S101, the latest information is mainly collected in real time from multiple business systems to the plagiarism checking system. In this embodiment, the latest information written by investment consultants is mainly collected from the business systems through Kafka, and stored in Oracle, Mysql, etc. type database.

[0059] S102. Using a text segmentation algorithm to perform block processing on the information data to be detected to obtain several information blocks;

[0060] Specifically, for step S102, the received information is divided i...

no. 2 example

[0098] see Figure 3-7 .

[0099] Such as image 3 As shown, another embodiment of the present invention also provides a security information plagiarism checking system based on semantic analysis, including:

[0100] The information collection module 100 is used to collect the latest information data to be detected in real time from the business system for writing information;

[0101] Specifically, the information collection module 100 mainly collects the latest information to be checked in real time from multiple business systems to the plagiarism checking system, including author information, investment certificate number, information review object, information review basis, information review text and other information. In this embodiment, the latest information written by investment consultants is mainly collected from the business system through Kafka, and stored in relational databases such as Oracle and Mysql.

[0102] The central control module 200 is used to block...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a security information duplicate checking method and system based on semantic analysis. The method comprises the steps of collecting latest to-be-detected information data in real time from an information writing service system; blocking the to-be-detected information data by adopting a text segmentation algorithm to obtain a plurality of information blocks; conducting crawler processing on each information block, conducting cleaning and splicing through a webpage blocking method and a webpage information structured extraction method after corresponding associated texts are obtained, and obtaining final crawler data; and sequentially performing neighborhood retrieval and semantic similarity calculation on the information data to be detected and the final crawler data by adopting a simhash neighborhood algorithm and a maximum text fragment algorithm to finally obtain a semantic similarity calculation result. According to the method, a simhash algorithm and a maximum text fragment algorithm are provided and organically combined with a crawler system, the problem of original detection of security industry information is solved on the premise that a text library does not need to be established, and duplicate checking accuracy and efficiency are improved.

Description

technical field [0001] The invention relates to the technical field of big data processing, in particular to a security information plagiarism checking method and system based on semantic analysis. Background technique [0002] With the continuous development of Internet technology, the information on the network is getting bigger and bigger. Content sources such as Weibo, official accounts, and news media are updating a large amount of information all the time. On the other end, many writers are responsible for producing a large number of articles every day. For securities companies, a large number of investment consultants and industry researchers provide stock selection advice, buying and selling opportunities, hot spot analysis and other services for investors. While providing investment advice to investors, investment advisors must abide by laws and regulations and provide investors with appropriate advice. However, if there is plagiarism in the information written b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/9032G06F16/951G06F17/27G06K9/62G06Q40/06
CPCG06F16/90332G06F16/951G06Q40/06G06F40/30G06F18/22
Inventor 张凤娟谭则涛王永强温丽香杨嵩钟志斌
Owner GF SECURITIES CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products