File data feature extraction method and device and computing equipment

A file data and feature extraction technology, applied in the field of data processing, can solve the problems of insufficient system resources, affecting the calculation efficiency of eigenvalues, and normal speed calculation, and achieve the effect of improving calculation efficiency and file reading speed.

Pending Publication Date: 2020-12-01
BEIJING QIHOO TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In the specific implementation process, it is necessary to load all the large file data into the memory for calculation. This full loading method has the problem of slow file reading speed; in addition, in the case of limited server resources, it will lead to serious shortage of system resources. , cannot be calculated at normal speed, which greatly affects the efficiency of eigenvalue calculation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • File data feature extraction method and device and computing equipment
  • File data feature extraction method and device and computing equipment
  • File data feature extraction method and device and computing equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0073] image 3 A schematic diagram of a sub-flow executed by any thread in the file data feature extraction method according to another embodiment of the present invention is shown. image 3 It is an optional implementation mode of thread execution, which specifically includes the following steps:

[0074] In step S301, the thread reads the first row of data of the file data segment to be read according to the allocated starting position of the file data segment.

[0075] The thread starts to read data from the beginning of the file data slice, and when the end-of-line mark is read, it indicates that a line of data has been read. Considering that the starting position of the file data fragmentation may not be the starting position of a line of data, the first line of data read by a thread may not be a whole line of data.

[0076] Step S302, judging whether the number of columns of the first row of data is equal to the total number of columns of the file data (specifically, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a file data feature extraction method and apparatus, and a computing device. The method comprises the steps that data volume information of file data is read, the file data isfragmented according to the preset fragmented data volume, and the starting position and the stopping position of each file data fragment are determined; starting a corresponding number of threads according to the number of the fragments, allocating a to-be-read file data fragment to each thread, so that each thread starts to read line by line from the initial position of the allocated file data fragment and performs eigenvalue calculation until the reading position exceeds the cut-off position of the file data fragment; and summarizing calculation results of all threads to obtain a feature output result of the file data. According to the method, large file data does not need to be completely loaded into a memory, and the file reading speed is greatly increased in a segmented reading mode;moreover, the reading strategy provided by the invention does not occupy too much server resources, so that the calculation efficiency is greatly improved.

Description

technical field [0001] The present invention relates to the technical field of data processing, in particular to a file data feature extraction method and device, computing equipment, and computer storage media. Background technique [0002] In actual business, it is necessary to perform feature extraction on some file data with a large amount of data (hereinafter referred to as large file data), for example: to count the eigenvalues ​​of each column of data, the eigenvalues ​​include: average value, frequency distribution, etc. When performing statistics on the data, it is also necessary to consider filtering each row of data to filter out the row data that conforms to the rules, and use it as the statistics of the characteristic values ​​of the column data. [0003] The prior art provides a large file data feature extraction method using Python Panda data processing technology. In the specific implementation process, it is necessary to load all the large file data into th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/16
CPCG06F16/164
Inventor 张正武
Owner BEIJING QIHOO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products