A novel
massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures based upon
System-On-a-
Chip technology, i.e., each
processing node comprises a
single Application Specific
Integrated Circuit (ASIC). Within each ASIC node is a plurality of
processing elements each of which consists of a
central processing unit (CPU) and plurality of
floating point processors to enable optimal balance of computational performance, packaging density, low cost, and power and cooling requirements. The plurality of processors within a
single node may be used individually or simultaneously to work on any combination of computation or communication as required by the particular
algorithm being solved or executed at any point in time. The
system-on-a-
chip ASIC nodes are interconnected by multiple independent networks that optimally maximizes packet communications
throughput and minimizes latency. In the preferred embodiment, the multiple networks include three high-speed networks for
parallel algorithm message passing including a Torus, Global Tree, and a Global
Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an
algorithm for optimizing
algorithm processing performance. For particular classes of parallel algorithms, or parts of parallel calculations, this architecture exhibits exceptional computational performance, and may be enabled to perform calculations for new classes of parallel algorithms. Additional networks are provided for external
connectivity and used for Input / Output,
System Management and Configuration, and Debug and Monitoring functions. Special node packaging techniques implementing midplane and other hardware devices facilitates partitioning of the
supercomputer in multiple networks for optimizing supercomputing resources.