Accelerated running method based on Sunway many-core processor

A technology of many-core processor and accelerated operation, applied in the field of accelerated operation based on Shenwei many-core processor, can solve the problem of insufficient utilization of core group computing resources, limited data required for master-slave computing, and utilization rate of core group computing resources. Low-level problems, to achieve the effect of saving the waiting time of the master core, reducing the data transfer time of the master and slave cores, and reducing the number of DMA transfers

Pending Publication Date: 2019-09-10
SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The storage wall problem has become increasingly prominent in the application of Shenwei many-core processors. According to the current actual usage, there are three problems as follows: First, the utilization rate of computing resources of the core group is insufficient
Therefore, the amount of data allocated by the process on each node is limited, and the data required for a single function body to be optimized for a single master-slave calculation is limited, resulting in low utilization of computing resources by the core group
Second, the data transmission time between the main memory and the core group is longer than the time for the core group to access the local memory
If the core group is started multiple times, it is necessary to frequently spawn and join core group threads, resulting in low overall operating efficiency of the program

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Accelerated running method based on Sunway many-core processor
  • Accelerated running method based on Sunway many-core processor
  • Accelerated running method based on Sunway many-core processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] An accelerated operation method based on the Shenwei many-core processor, which runs on a computer and executes the program. The program includes several program segments. The technical solution of the present invention takes three program segments as an example, and can be summarized and specifically processed based on the Shenwei many-core There are many cases of parallel programming of the processor. Set any continuous three-segment program segment as program segment A, program segment B, and program segment C, where program segment A and program segment B can be optimized in parallel (can be executed on the slave core), and program segment B cannot be parallelized Optimization (can only be performed on the main core), including the following steps:

[0055] Ⅰ. Determine the program context dependencies among program segment A, program segment B, and program segment C. If program segment A, program segment B, and program segment C all have program context dependencie...

Embodiment 2

[0064] According to the accelerated operation method based on the Shenwei many-core processor described in Embodiment 1, the difference is that: before executing program segment A, program segment B, and program segment C, perform the following operations:

[0065] a. Judging whether two or more program subsections are included in program segment A, program segment B or program segment C, if not, then directly execute the program segment, otherwise, enter step b;

[0066] b. Judging whether two or more program subsections include loop upper and lower bounds, constants, common input data that will not change during the program cycle, if not, execute two or more program subsections in sequence; otherwise, enter step c;

[0067] c. The upper and lower bounds of the loop, constants, common input data that will not change during the program cycle will be extracted, and the extracted input data will be transferred from the main core at one time to enter Each slave core executes two...

Embodiment 3

[0070] According to a kind of accelerated operation method based on Shenwei many-core processor described in embodiment 1, its difference is:

[0071] Execute program segment A, including the following steps:

[0072] (5) judge whether program segment A comprises several program subsections, if program segment A comprises several program subsections, enter step (6); otherwise, enter step (7);

[0073] (6) Set any two consecutive subsections of the program as subsection A1 and subsection A2, perform the following steps for all the subsections until the subsection A is executed: determine the subsection A1, the subsection A2 The program context dependency between A2, if there is a program context dependency between the program subsection A1 and the program subsection A2, then the program subsection A1 and the program subsection A2 are executed sequentially; otherwise, according to the program subsection A1, the program The data volume of the subsection A2 is allocated to the co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an accelerated running method based on a Sunway many-core processor. The accelerated running method comprises the following steps: A, judging a program context dependency relationship among a program segment A, a program segment B and a program segment C; if program context dependency relationships exist among the program segment A, the program segment B and the program segment C, executing the program segments in sequence; otherwise, adjusting the execution sequence among the program segment A, the program segment B and the program segment C, and executing the programsegments; and B, executing the step A on the three continuous program segments until all the programs are executed. The program context dependency relationship between a program segment and a programsub-segment is judged; the method is flexible in processing according to conditions, a communication lock synchronization mechanism is introduced, the waiting time of a main core is saved, parallel processing of the main core and a core group is realized, the required times of threads of a spawn and a join core group are reduced in the program execution process, and the program execution efficiency is improved.

Description

technical field [0001] The invention relates to the technical fields of computer high performance, parallel computing, and system structure, and in particular to an accelerated operation method based on a Shenwei many-core processor. Background technique [0002] The Shenwei many-core processor is a representative work among domestic high-performance processors. It is a high-performance computing chip independently developed by my country. At present, the "Sunway Taihu Light" supercomputer, which ranks at the top of the computing power in the world, uses 4 More than 10,000 Shenwei multi-core processors. [0003] Each Shenwei many-core processor chip (Shenwei 26010) includes 4 core groups, and the core groups are connected through an on-chip network. Each core group is mainly composed of memory controller, management unit, 1 master core and 64 slave cores. The 64 slave cores are connected with an 8×8 grid topology. Each slave core of each core group has 64KB local memory, s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F15/78G06F9/52G06F9/48
CPCG06F15/7807G06F9/52G06F9/4881Y02D10/00
Inventor 潘景山刘弢王利郭强庄园曾云辉
Owner SHANDONG COMP SCI CENTNAT SUPERCOMP CENT IN JINAN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products