Mass data processing method and device

A massive data processing and database technology, applied in the computer field, can solve the problem of repeated data determination errors and so on

Pending Publication Date: 2020-09-29
BEIJING WODONG TIANJUN INFORMATION TECH CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the present disclosure proposes a duplicate data solution that can solve the problem of repeated data determination errors, and can take into account the determination efficiency, and can be applied to massive data processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass data processing method and device
  • Mass data processing method and device
  • Mass data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The following will clearly and completely describe the technical solutions in the embodiments of the present disclosure with reference to the drawings in the embodiments of the present disclosure.

[0055] In the present disclosure, descriptions such as "first", "second", "third", and "fourth" are only used to distinguish different objects, and are not used to express meanings such as size or timing.

[0056] For ease of description, the following names are given:

[0057] The element (set as e) performs hash calculation based on the hash function (set as h) to obtain a hash value (set as h(e)).

[0058] The first element (set as e1) performs hash calculation based on the first hash function (set as h1) to obtain a first hash value (set as h1(e1)).

[0059]The first element performs hash calculation based on at least one second hash function (set as h2) to obtain at least one second hash value (set as h2(e1)).

[0060] The second element (set as e2) performs hash calc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a mass data processing method and device, and relates to the field of computers. The method comprises the steps of: positioning the storage position of a first element in a hashtable through the hash value of the first element; searching the first element from elements stored in the storage block at the storage position; due to the fact that the hash value of the element rather than the hash value of the element is searched for, and accurately judging whether the first element is the repetitive element or not, so that the problem of repetitive element judgment errors caused by hash conflicts is solved, the elements are searched for in a small number of elements through positioning, and therefore the judgment efficiency of the repetitive element can be considered atthe same time. The mass data processing method and device are suitable for repeated data judgment scenes of mass data processing.

Description

technical field [0001] The present disclosure relates to the field of computers, in particular to a massive data processing method and device. Background technique [0002] The web crawler crawls the corresponding web page content based on the web page address. The web page address is also called a uniform resource locator (Uniform Resource Locator, URL). Before crawling, it is necessary to determine whether the webpage to be crawled has already been crawled, so as to avoid repeatedly crawling the same webpage. [0003] Due to the huge number of webpages, in order to improve the judgment efficiency, in some related technologies, the hash value of the address of the webpage that has been crawled by the web crawler and the hash value of the address of the webpage to be crawled are calculated. If the hash value of the webpage address is the same as the hash value of any webpage address that has been crawled, it will be determined that the webpage address to be crawled is a we...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/953G06F16/951G06F16/958
Inventor 余伟伟闫创任莉强邢淇翔
Owner BEIJING WODONG TIANJUN INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products