Massive Network Data Search Method Based on Data Flow Structure

A network data and data flow technology, applied in text database indexing, electronic digital data processing, digital data information retrieval, etc., can solve the problems of large storage space occupied by information summaries, small storage space of index structure, high probability of query errors, etc. , to achieve the effects of fast index matching search, high query efficiency, and fast processing speed

Active Publication Date: 2020-06-16
SOUTHEAST UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the query of massive data is generally based on a large database, and the query of massive data is performed through distributed computing technology, table partitioning technology and query decomposition technology. This method is mainly for structured data; for unstructured data, the general Based on the index technology, the data is hashed to establish an index, and the search is performed according to the index to reduce the search workload and the search speed is fast. The current mainstream hash algorithms include MD5 algorithm, Bit-Map algorithm, Bloom Filter algorithm, etc., MD5 algorithm After the data is calculated, a 128-bit information summary is obtained, and the original data is compressed to reduce the storage space of the index structure, but the information summary still occupies a large storage space; the Bit-Map algorithm maps the data to the BitSet Each piece of data corresponds to one bit, and the index structure space is very small, but hash table conflicts are prone to occur, and the probability of error is high for queries; the Bloom Filter algorithm uses an m-bit BitSet, The data is hashed with k hash functions, and the value range of each hash function is 0 to m-1. Each hash function maps the data to a certain bit of the BitSet, so that the k bits of the BitSet Corresponding to one data, not only makes the storage space of the index structure small, but also reduces the conflict rate, and the query efficiency is high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Massive Network Data Search Method Based on Data Flow Structure
  • Massive Network Data Search Method Based on Data Flow Structure

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] A massive network data search method based on a data flow structure, including: a real-time data storage method and a real-time query method,

[0028] The storage method of the real-time data is:

[0029] Step 101 configuration parameters, set a file threshold F, set two bit vectors Bsip and Bdip that are 2N in size, N is a positive integer greater than 1, and the initial values ​​of all 2N bits in the bit vectors Bsip and Bdip are set to 0,

[0030] Step 102 creates a new empty network data storage file and source IP and destination IP index file, the threshold of the network data storage file is taken as the file threshold F set in step 101,

[0031] Step 103 obtains the network message, intercepts the byte stream of the first K bytes of the obtained network message and the byte stream of the first K bytes includes source IP, destination IP and network data, and from the intercepted Extract source IP and sink IP in the byte stream, K is the byte number of the interce...

Embodiment 2

[0040] A method for searching massive network data based on a data flow structure, comprising: a storage method for real-time data and a real-time query method, characterized in that,

[0041] The storage method of the real-time data is:

[0042] Step 101 configuration parameters, set a file threshold F, set two bit vectors Bsip and Bdip that are 2N in size, N is a positive integer greater than 1, and the initial values ​​of all 2N bits in the bit vectors Bsip and Bdip are set to 0,

[0043] Step 102 creates a new empty network data storage file and source IP and destination IP index file, the threshold of the network data storage file is taken as the file threshold F set in step 101,

[0044] Step 103 obtains the network message, intercepts the byte stream of the first K bytes of the obtained network message and the byte stream of the first K bytes includes source IP, destination IP and network data, and from the intercepted Extract the source IP and destination IP from the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mass network data search method based on a data stream structure. The method comprises a real-time data storage method and a real-time query method. Data in a network arrives in a data stream form, so that it is needed to reduce the network data, capture a specific length of each piece of network message data and store valid information according to a specific data file size. According to the method, a Bloom Filter algorithm is used to establish an index file for a source IP and a host IP in each network message; and during real-time query, the index file is searched for first, a data storage file is searched for after matching succeeds, and detailed information is obtained. Through the method, mass network data can be quickly processed, a small storage space is occupied, and a large amount of network detailed information is saved; by use of the Bloom Filter Hash algorithm to establish indexes for the network message data, grading retrieval of file content is realized, the index structure is simple, and the occupied space is small; by use of three different Hash functions for calculation, the Hash conflict rate is low; and during retrieval, index matching search is quick and accurate, and high time and space efficiency is achieved.

Description

technical field [0001] The invention relates to the field of massive data processing, in particular to a search method for massive network data. Background technique [0002] Data search refers to a technique of extracting required data from a computer file or database according to search requirements. At present, common data search methods for file data include: linear search, binary search, skip search, etc.; linear search is to compare a given keyword value with the records in the file one by one until a matching record is found. . This method is simple and easy to implement, but it is inefficient when querying massive data, and it is difficult to meet the demand. Binary search is to arrange the records in the file according to the size of the key value, use the divide and conquer method to divide the file into two, and compare the given key value with the record at the midpoint. If they match, the search is successful; Otherwise, judge whether the record you want to f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31G06F16/332
CPCG06F16/325G06F16/3347
Inventor 程光郭春生周余阳
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products