Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

High-performance computer numa-aware thread and memory resource optimization method and system

A memory resource and computer technology, applied in the computer field, can solve the problems of high page migration cost, inability to perceive the locality of application memory access requirements, and inability to accurately predict program memory access behavior.

Active Publication Date: 2016-03-30
INST OF APPLIED PHYSICS & COMPUTATIONAL MATHEMATICS
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1. The operating system cannot perceive the memory access locality requirements of the application
[0005] The scheduling of threads by the operating system does not take into account the memory locality requirements of the application, and may schedule the execution thread to a processor core that is not adjacent to its memory, which may lead to remote memory access, thereby reducing the application program performance
[0006] 2. Application programming model memory management cannot be NUMA-aware
[0009] (1) The overhead of page migration is large and has a lag
[0010] (2) First-Touch memory allocation strategy and Auto-Migration automatic page migration technology cannot accurately predict the memory access behavior of the program, which may lead to inappropriate memory migration and remote memory access of the application
For a numerical simulation program, its memory access requirements are based on a variable size block (a piece of continuous memory requested by an application) as the basic unit. These memory blocks can range from several Bytes to several MBs, and LibNUMA The minimum management unit of the library is a memory page. In many cases, the size of a memory page can contain many memory blocks. If the application frequently applies for fine-grained memory, it will cause a huge waste of memory resources.
[0013] (2) Frequent calls will lead to context switching between user mode and kernel mode, resulting in high overhead
The main functions of LibNUMA are implemented in the kernel mode. If the application directly and frequently calls the LibNUMA interface, it will cause too frequent context switching and greatly reduce the execution performance of the application.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-performance computer numa-aware thread and memory resource optimization method and system
  • High-performance computer numa-aware thread and memory resource optimization method and system
  • High-performance computer numa-aware thread and memory resource optimization method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0099] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments of the present invention and their descriptions are used to explain the present invention, but not to limit the present invention.

[0100] In order to solve the problems existing in the prior art, an embodiment of the present invention proposes a NUMA-aware thread and memory resource optimization technology for high-performance computer nodes, including a high-performance computer NUMA-aware thread and memory resource optimization system and method, A NUMA-aware multi-thread memory manager and a multi-thread memory management method created based on a high-performance computer NUMA-aware thread and memory resource optimization method. On the one hand, the hardware architecture features of pa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a thread for high-performance computer NUMA perception and a memory resource optimizing method and system. The system comprises a runtime environment detection module used for detecting hardware resources and the number of parallel processes of a calculation node, a calculation resource distribution and management module used for distributing calculation resources for parallel processes and building the mapping between the parallel processes and the thread and a processor core and physical memory, a parallel programming interface, and a thread binding module which is used for providing the parallel programming interface, obtaining a binding position mask of the thread according to mapping relations and binding the executing thread to a corresponding CPU core. The invention further discloses a multi-thread memory manager for NUMA perception and a multi-thread memory management method of the multi-thread memory manager. The manager comprises a DSM memory management module and an SMP module memory pool which manage SMP modules which the MPI processes belong to and memory distributing and releasing in the single SMP module respectively, the system calling frequency of the memory operation can be reduced, the memory management performance is improved, remote site memory access behaviors of application programs are reduced, and the performance of the application programs is improved.

Description

technical field [0001] The present invention relates to the field of computer technology, in particular to a high-performance computer NUMA (Non-Uniform Memory Access, non-uniform memory access)-aware thread and memory resource optimization method and system. Background technique [0002] At present, most mainstream high-performance computer nodes adopt DSM (Distributed Shared Memory, distributed shared memory structure) design. like figure 1 As shown, based on the computing node designed by the DSM architecture, each CPU (CenterProcessunit, central processing unit, and processor) can access its own memory module (local memory) through the memory controller in the CPU, or through the high-speed The Internet accesses memory modules of other CPUs (remote memory). The cost of accessing remote memory is higher than that of local memory, and even several times higher on some systems. This memory access feature is called NUMA (Non-Uniform Memory Access, non-uniform memory access...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50
Inventor 莫则尧张爱清杨章田鸿运
Owner INST OF APPLIED PHYSICS & COMPUTATIONAL MATHEMATICS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products