A method and device for extracting full-text data
A data extraction and data technology, which is applied in electrical digital data processing, digital data information retrieval, special data processing applications, etc., can solve the problems of labor consumption, low efficiency, low efficiency, etc., to shorten data extraction time and improve extraction efficiency. , the effect of reducing the search matching time
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0022] figure 1 It is a schematic flowchart of a method for extracting full-text data provided in Embodiment 1 of the present invention. The method can be executed by a device for extracting full-text data. The device can be implemented by means of hardware and / or software. The specific method includes Do as follows:
[0023] S110. Parse the network packet data into session data.
[0024] The method provided in this embodiment is applicable to data extraction of various communication protocols, and the following uses HyperText Transfer Protocol (HyperText Transfer Protocol, HTTP) data as an example to describe in detail. Firstly, the network packet data obtained from the data source is parsed into session data in text format. For HTTP protocol data, the HTTP protocol stack is used to parse it into HTTPPOST session data. The parsed session data includes HTTP header and HTTP entity part. To parse and restore HTTP POST session data according to the HTTP protocol stack, it is ...
Embodiment 2
[0034] figure 2 A schematic flow diagram of a method for extracting full-text data provided in Embodiment 2 of the present invention, as shown in figure 2 As shown, the method includes:
[0035] S210. Parse the network packet data into session data.
[0036] S220. Determine whether the entity part of the session data conforms to a preset data format.
[0037] If yes, perform operations S230 and S250 in sequence, otherwise return to perform operations S240 and S220 in sequence.
[0038] S230. Mark the session data in a data format.
[0039] S240. Parse subsequent network packet data into session data.
[0040] S250. Perform multi-mode matching on the session data conforming to the preset data format, and judge whether the preset feature string is matched.
[0041] When the preset feature string is matched, operations S260, S270, S280, and S290 are executed sequentially, otherwise, operations S240 and S220 are executed sequentially.
[0042] S260. Obtain the hit position...
Embodiment 3
[0066] image 3 A device for extracting full-text data provided in Embodiment 3 of the present invention, such as image 3 As shown, the device includes:
[0067] Parsing module 31, for parsing network packet data into session data;
[0068] Annotation module 32, is used for judging whether the entity part of described session data conforms to preset data format, if so, carry out data format label to described session data;
[0069] The multi-mode matching module 33 is used to perform multi-mode matching on the session data conforming to the preset data format, judge whether to hit the preset feature string, and obtain the hit position of the preset feature string when hitting the preset feature string;
[0070] The data extraction module 34 is configured to determine the corresponding extraction function of the session data according to the data format annotation of the session data and the hit position of the preset feature string, and perform the extraction function on th...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com