Massive Network Data Search Method Based on Data Flow Structure

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A network data and data flow technology, applied in text database indexing, electronic digital data processing, digital data information retrieval, etc., can solve the problems of large storage space occupied by information summaries, small storage space of index structure, high probability of query errors, etc. , to achieve the effects of fast index matching search, high query efficiency, and fast processing speed

Active Publication Date: 2020-06-16

SOUTHEAST UNIV

View PDF7 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

At present, the query of massive data is generally based on a large database, and the query of massive data is performed through distributed computing technology, table partitioning technology and query decomposition technology. This method is mainly for structured data; for unstructured data, the general Based on the index technology, the data is hashed to establish an index, and the search is performed according to the index to reduce the search workload and the search speed is fast. The current mainstream hash algorithms include MD5 algorithm, Bit-Map algorithm, Bloom Filter algorithm, etc., MD5 algorithm After the data is calculated, a 128-bit information summary is obtained, and the original data is compressed to reduce the storage space of the index structure, but the information summary still occupies a large storage space; the Bit-Map algorithm maps the data to the BitSet Each piece of data corresponds to one bit, and the index structure space is very small, but hash table conflicts are prone to occur, and the probability of error is high for queries; the Bloom Filter algorithm uses an m-bit BitSet, The data is hashed with k hash functions, and the value range of each hash function is 0 to m-1. Each hash function maps the data to a certain bit of the BitSet, so that the k bits of the BitSet Corresponding to one data, not only makes the storage space of the index structure small, but also reduces the conflict rate, and the query efficiency is high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0027] A massive network data search method based on a data flow structure, including: a real-time data storage method and a real-time query method,

[0028] The storage method of the real-time data is:

[0029] Step 101 configuration parameters, set a file threshold F, set two bit vectors Bsip and Bdip that are 2N in size, N is a positive integer greater than 1, and the initial values of all 2N bits in the bit vectors Bsip and Bdip are set to 0,

[0030] Step 102 creates a new empty network data storage file and source IP and destination IP index file, the threshold of the network data storage file is taken as the file threshold F set in step 101,

[0031] Step 103 obtains the network message, intercepts the byte stream of the first K bytes of the obtained network message and the byte stream of the first K bytes includes source IP, destination IP and network data, and from the intercepted Extract source IP and sink IP in the byte stream, K is the byte number of the interce...

Embodiment 2

[0040] A method for searching massive network data based on a data flow structure, comprising: a storage method for real-time data and a real-time query method, characterized in that,

[0041] The storage method of the real-time data is:

[0042] Step 101 configuration parameters, set a file threshold F, set two bit vectors Bsip and Bdip that are 2N in size, N is a positive integer greater than 1, and the initial values of all 2N bits in the bit vectors Bsip and Bdip are set to 0,

[0043] Step 102 creates a new empty network data storage file and source IP and destination IP index file, the threshold of the network data storage file is taken as the file threshold F set in step 101,

[0044] Step 103 obtains the network message, intercepts the byte stream of the first K bytes of the obtained network message and the byte stream of the first K bytes includes source IP, destination IP and network data, and from the intercepted Extract the source IP and destination IP from the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a mass network data search method based on a data stream structure. The method comprises a real-time data storage method and a real-time query method. Data in a network arrives in a data stream form, so that it is needed to reduce the network data, capture a specific length of each piece of network message data and store valid information according to a specific data file size. According to the method, a Bloom Filter algorithm is used to establish an index file for a source IP and a host IP in each network message; and during real-time query, the index file is searched for first, a data storage file is searched for after matching succeeds, and detailed information is obtained. Through the method, mass network data can be quickly processed, a small storage space is occupied, and a large amount of network detailed information is saved; by use of the Bloom Filter Hash algorithm to establish indexes for the network message data, grading retrieval of file content is realized, the index structure is simple, and the occupied space is small; by use of three different Hash functions for calculation, the Hash conflict rate is low; and during retrieval, index matching search is quick and accurate, and high time and space efficiency is achieved.

Description

technical field [0001] The invention relates to the field of massive data processing, in particular to a search method for massive network data. Background technique [0002] Data search refers to a technique of extracting required data from a computer file or database according to search requirements. At present, common data search methods for file data include: linear search, binary search, skip search, etc.; linear search is to compare a given keyword value with the records in the file one by one until a matching record is found. . This method is simple and easy to implement, but it is inefficient when querying massive data, and it is difficult to meet the demand. Binary search is to arrange the records in the file according to the size of the key value, use the divide and conquer method to divide the file into two, and compare the given key value with the record at the midpoint. If they match, the search is successful; Otherwise, judge whether the record you want to f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/31G06F16/332

CPCG06F16/325G06F16/3347

Inventor 程光郭春生周余阳

Owner SOUTHEAST UNIV

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Massive Network Data Search Method Based on Data Flow Structure

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology