

# Introduction of Fujitsu's next-generation supercomputer

MATSUMOTO Takayuki

July 16, 2014

#### **HPC Platform Solutions**





1

Copyright 2014 FUJITSU LIMITED

### K computer and Fujitsu PRIMEHPC series

#### FUJITSU

#### Single CPU/node architecture for multicore

- Good Bytes/flop and scalability
- Key technologies for massively parallel supercomputers
  - Original CPU and interconnect
  - Support for tens of millions of cores (VISIMPACT\*, Collective comm. HW)



# Architecture continuity for compatibility

- Upper compatible CPU:
  - Binary-compatible with the K computer & PRIMEHPC FX10
  - Good byte/flop balance
- New features:
  - New instructions (stride load/store, indirect load/store, permutation, concatenation)
  - Improved micro architecture (out-of-order, branch-prediction, etc.)
- For distributed parallel executions:
  - Compatible interconnect architecture
  - Improved interconnect bandwidth







#### The K computer and the evolution of PRIMEHPC



|              | K computer            | PRIMEHPC<br>FX10 | Post-FX10                |
|--------------|-----------------------|------------------|--------------------------|
| CPU          | SPARC64 VIIIfx        | SPARC64 IXfx     | SPARC64 XIfx             |
| Peak perf.   | 128 GFLOPS            | 236.5 GFLOPS     | 1TFLOPS ~                |
| # of cores   | 8                     | 16               | 32 + 2                   |
| Метогу       | DDR3 SDRAM            | ←                | НМС                      |
| Interconnect | Tofu Interconnect     | $\leftarrow$     | Tofu Interconnect 2      |
| System size  | 11PFLOPS              | Max. 23PFLOPS    | Max. 100PFLOPS           |
| Link BW      | 5GB/s x bidirectional | $\leftarrow$     | 12.5GB/s x bidirectional |







#### Feature and Configuration of Post-FX10





12.5 GB/s×2(in/out)/link

10 links/node

**Optical technology** 

- Cabinet
- 200~ nodes/cabinet High-density
- 100% water cooled with EXCU (option)

3 x 8 Micron's HMCs

8 Finisar's opt modules, BOA, for

inter-chassis connections

### Flexible SIMD operations

FUjitsu

New 256bit wide SIMD functions enable versatile operations
Four double-precision calculations

Stride load/store, Indirect (list) load/store, Permutation, Concatenation



# **Tofu Interconnect 2**

Successor to Tofu Interconnect

 Highly scalable, 6-dimensional mesh/torus topology
 Increased link bandwidth by 2.5 times to 12.5 GB/s

Interconnect integrated into CPU

 System-on-chip (SoC) removes off-chip I/O
 Improved packaging density and energy efficiency

Optical cable connection between chassis

Scalable three-dimensional torus



#### Entire software stack is enhanced for Post-FX10





#### **PRIMEHPC FX series**

### **HPC Platform Solutions**







#### Open a bright future

with Technical Computing

# FUJTSU

shaping tomorrow with you