The invention discloses a
big data file rapid parallel extraction method based on
memory mapping, which comprises the following steps of generating a task domain, and forming the task domain by task blocks, wherein the task blocks are elements in the task domain; generating a task
pool, carrying out sub-task domain merging on elements in the task domain according to the principle of low communication cost, taking a set of elements in the task domain as a task
pool for task scheduling, and extracting tasks for execution by a processor according to scheduling selection; scheduling task, decidingthe scheduling
granularity of the tasks according to the residual quantity of the tasks, extracting the tasks meeting the requirements out of the task
pool, and preparing for mapping; and mapping theprocessor, mapping the extracted task to the current idle processor for execution. According to the method, the multi-core
advantage can be exerted, the file mapping efficiency of the memory is improved, the method can be applied to the large file reading of a single file with the capacity below 4GB, the reading speed of the files can be effectively increased, and the I / O
throughput rate of the magnetic disk files is increased.