Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data information updating system and method based on multiple verifications

A technology of data information and update method, which is applied in the field of data verification, can solve problems such as increasing duplicate pages, quality problems, page quality decline, and low algorithm efficiency, so as to avoid obtaining duplicate data information, improve efficiency, and reduce quantity Effect

Inactive Publication Date: 2017-11-07
合肥智权信息科技有限公司
View PDF8 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, in order to improve the crawling speed, the network usually adopts a parallel crawling method, which introduces new problems: repeatability, when parallel running crawlers or crawling threads run at the same time, repeated pages and quality problems are added. A crawler or crawler thread can only fetch part of the page, resulting in poor page quality
[0003] Due to the huge amount of data in the existing database, with the increase of captured data information, the workload will be larger and the efficiency of the algorithm will become lower.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data information updating system and method based on multiple verifications
  • Data information updating system and method based on multiple verifications

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] like figure 1 as shown, figure 1 It is a block diagram of a data information update system based on multiple checks proposed by the present invention;

[0040] refer to figure 1 , a kind of data information update system based on multiple checks proposed by the present invention, comprising:

[0041] The data acquisition module is used to acquire data information by using a web crawler.

[0042] In a specific solution, the data acquisition module can be configured to include multiple data acquisition sub-modules, and each data sub-module can use multiple web crawlers to acquire data. According to intelligence collection and analysis goals, use web crawlers to collect various types of information.

[0043] The first verification module is used to perform preliminary verification on the data information obtained by the web crawler at a preset time to obtain a preliminary data information set;

[0044] In a specific solution, the first verification module is used to: ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data information updating system and method based on multiple verifications. The system is characterized by comprising a data obtaining module, a first verification module, a second verification module, an eliminating module and an updating module, wherein the data obtaining module is used for obtaining data information by using web crawlers; the first verification module is used for preliminarily verifying the data information obtained by the web crawlers within the preset time and obtaining a preliminary data information set; the second verification module is used for conducting multiplicity verification on the data information in the preliminary data information set and a preset data information base; the eliminating module is used for deleting the data information with the multiplicity verification failing in the preliminary data information set; the updating module is used for adding the data information with the multiplicity verification succeeding in the preliminary data information set into the preset data information base and updating the preset data information base. In this way, the similarity verification is conducted after a lot of data information is obtained by the web crawlers, the high-similarity data information which is obtained in real time is eliminated, the web crawlers are avoided from obtaining the repetitive data information, the amount of the data information is reduced, and the multiplicity verification efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of data verification, in particular to a system and method for updating data information based on multiple verifications. Background technique [0002] With the explosive growth of Internet information, the data information on the Internet is increasing geometrically every day. When users obtain the data information they need, they are often submerged in a large amount of useless repetitive information. Data information is already a convenient way recommended by most users. As one of the basic components of search engines, web crawlers need to obtain data information from the Internet to provide users with data information support. However, with the wanton reprinting and Multi-site delivery, whether the data information acquired by the web crawler is rich, whether the similarity and overlap are high, are closely related to the efficiency of the web crawler. Therefore, in order to improve the crawling speed,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/23G06F16/951
Inventor 周钰徐
Owner 合肥智权信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products