In 2012 Cray introduced a nuclear salvo in the supercomputer wars with dispatch of its XC30, a 100 petaflop- skilled brute that can scale up to 1 million cores. Created in conjunction with DARPA, the Cascade-codenamed system utilizes another sort of architecture called Aries interconnect and Intel Xeon E5-2600 processors to effectively jump its late Titan kin, the past speed champ. That put Cray well in front of opponents like China’s Tianhe-2, and the organization would expect to keep that edge by supercharging future adaptations with Intel Xeon Phi coprocessors and NVIDIA Tesla GPUs. Top of the line exploration focuses have submitted $1000 million worth of requests as such.
The Cray XC30 consolidates the new Aries interconnect, Intel® Xeon® processors, Cray’s effective and completely incorporated software environment, and imaginative power and cooling advances to make a creation supercomputer that is intended to scale elite computing (HPC) workloads of more than 100 petaflops.
The principal in a group of items that will compass from technical undertaking computing to the biggest systems on the planet, the Cray XC30 supercomputer has been built to meet this present reality performance difficulties of HPC clients. Cray’s new top of the line system highlights the new HPC-improved Aries system interconnect; another Dragonfly topology that liberates applications from region limitations; an inventive cooling system that uses a transverse wind current to lower clients’ aggregate expense of possession; the up and coming era of the versatile, superior Cray Linux Environment that likewise underpins an extensive variety of ISV applications; Cray’s HPC enhanced programming environment; and the capacity to handle a wide assortment of processor sorts including the Intel® Xeon® processors a first for Cray’s top of the line systems.
2 Computer Architecture Selection
The Cray XC30 uses the Intel® Xeon® processors E5-2600 item family and with these Intel processors, Cray XC30 systems can scale in overabundance of one million cores. Also, future variants of the Cray XC group of supercomputers will be accessible with the new Intel® Xeon Phi™ coprocessors and NVIDIA® Tesla® GPUs in light of the cutting edge NVIDIA Kepler™ GPU computing architecture. With these quickening agent and coprocessor choices, Cray clients can alter a Cray XC supercomputer with the inventive processor advancements that best meets the HPC needs of their experimental app.
There is a system called Piz Daint, which is actually 115,984-core XC30 system at the Swiss National Supercomputing Centre, situated in southern Switzerland whose specifications are as follows:
|Site:||Swiss National Supercomputing Centre (CSCS)|
|Linpack Performance (Rmax)||6,271 TFlop/s|
|Theoretical Peak (Rpeak)||7,788.85 TFlop/s|
|Processor:||Xeon E5-2670 8C 2.6GHz|
|Operating System:||Cray Linux Environment|
Cray’s Cascade XC30 closed 10 years in length relationship with AMD and its Opteron processors, with the firm choosing Intel Sandy Bridge Xeon processors and supporting the Xeon Phi quickening agent. While the change isn’t astonishing given Cray’s nearby connections with Intel, the firm additionally touted its Aries interconnect that makes its presentation with the XC30 and gives the firm its exceptional offering point over contenders.
The Cascade XC30 packs eight Intel Xeon chips into each figure cutting edge, with every chip having entry to four DIMM openings each on a different channel. While Cray hasn’t done much with Intel’s processors other than drop them into its motherboard, the firm was quick to hotshot its Aries interconnect that uses PCI Express 3 to connection register nodes and processors.
As per Cray, the structure of the Aries interconnect implies physical separation between nodes no more corrupts performance, bringing about higher normal data transfer capacity at all parts of the group. The firm said this ought to imply that occupations are not sent just to nodes that are in the same bureau, underutilizing system limit, however to all nodes, augmenting performance responsiveness and both limit use and throughput.
Cray’s Cascade XC30 utilizes the company’s Cray Linux Environment, a Suse based Linux conveyance that the firm said all its HPC clients must run. The firm additionally touted compiler backing from GCC, Intel and PGI and applications software support from firms, for example, Mathworks, Accelrys and Simula.
3 Overview of the Architecture
The Cray XC30 is a Massively Parallel Processor (MPP) supercomputer configuration. It is hence manufactured from numerous a large number of individual nodes. There are two fundamental sorts of nodes in any Cray XC30:
- Compute nodes: These just do client calculation and are constantly alluded to as Compute nodes
- Service nodes: These give all the extra administrations required for the system to work, and are given extra names relying upon their individual task:
- Login nodes: permit clients to sign in and perform intuitive undertakings
- PBS Mom nodes: run and overseeing PBS cluster scripts
- Service Database node (SDB): holds system setup data
- LNET Routers: join with the outer document system..
There are usually many more compute than service nodes.
Clearly, to work as a solitary supercomputer, the individual nodes must have technique to commute with one another.
Clearly, to work as a solitary supercomputer, the individual nodes must have system to speak with one another. All nodes in the interconnected by the fast, low inactivity Cray Aries Network. Neither figure nor administration nodes have capacity of their own. It must be joined by means of the administration node’s local Luster Client or anticipated utilizing the Cray Data Virtualization Service (DVS)
3.2 Interacting with the system
Clients don’t log straightforwardly into the system. Rather they run summons by means of a Cray Development Login servers. This server will hand-off summons and data by means of an administration node alluded to as an “Gateway node”
3.3 Cray XC30 Intel® Xeon® Compute Node Architecture
The XC30 Compute node features:
- 2 x Intel® Xeon® Sockets/die
- 12 core Ivybridge
- QPI interconnect
- Forms 2 NUMA nodes
- 8 x 1833MHz DDR3
- 8 GB per Channel
- 64 GB total
- 1 x Aries NIC
- Connects to shared Aries router and wider network
- PCI-e 3.0
3.4 Intel® Xeon® Ivybridge12-core socket/die
4 Analysis of the Architecture
4.1 Cray XC30 System Building Blocks
Intel® Xeon® Processor E5-2600 Family
The Cray XC30 supercomputer influences the performance advantages of the Intel® Xeon® Processor E5-2600 Product Family, and exploits an exploits a Xeon® processor guide begins with eight core gadgets, that empower up to 66 teraflops for every bureau of figure performance and is upgradeable with the Intel timetables to propel clock recurrence and the quantity of inserted cores. This family gives up to 80 percent performance enhancements and 30 percent less latency than past generations.
4.3 (Cray XC30 Quad Processor Daughter Card)
The Cray XC30 system mates processor engine tech and innovation to the primary compute blade by means of two configurable little girl cards. The adaptable PCI Express 3.0 standard obliges scalar processors, coprocessors and quickening agents to make cross breed systems that can advance after some time. For instance, PDCs can be swapped out or reconfigured while keeping the first process base blades set up, rapidly utilizing the most ideal performance innovations
4.4.1 Intel Xeon IvybridgeCore Structure
Manufactured on a 22nm Process
256 bit AVX Instructions (4 double precision floating point)
2 Hardware threads (Hyper-threads)
Peak DP FP per node 8FLOPS/clock
4.5 Main Memory
The most recent Intel Xeon Ivy Bridge processors utilized as a part of XC30 give the up and coming era of computational muscle, with best-in-class coasting point performance, memory data transfer capacity and vitality effectiveness. Contingent upon the need of an association memory can change, for instance at an organization utilizing XC30 super computers every node can include two 12-core 2.7 GHz Ivy Bridge multi-core processors, no less than 64 GB of DDR3-1833 MHz principle memory and every single computer node must be interconnected by means of an Aries Network Interface Card. Such system has 3008 such nodes, i.e., 72,192 cores, in just 16 cupboards giving a sum of 1.56 Petaflops of hypothetical crest performance. Scratch stockpiling is given by 20 Cray Sonexion Scalable Storage Units, giving 4.4PB of open space with supported read-compose transmission capacity of more than 100GB every second.
4.6 Cray XC30 Fully PopulatedCompute Blade
5 Special or Interesting Features
To give this leap forward performance and adaptability, Cray XC arrangement supercomputers incorporate the HPC-upgraded Aries between join. This inventive intercommunications innovation, executed with a high-data transmission, low-width net-work topology called Dragonfly, gives generous upgrades on all the network performance measurements for HPC: transfer speed, idleness, message rate and that’s only the tip of the iceberg.
Conveying remarkable worldwide data transfer capacity versatility at sensible expense over a dispersed memory system, this net-work gives software engineers worldwide access to the greater part of the memory of parallel applications and sup-ports the most requesting worldwide correspondence designs. The Dragonfly network topology is developed from a configurable blend of backplane, copper and optical connections, giving versatile worldwide band-width and keeping away from costly outer switches.
Versatile supercomputing means a measured structure giving an adaptable system to clients to oversee section costs, and empowers simple set up updates for developing transmission capacity necessities later on. The Aries ASIC gives the network interconnect to the figure nodes on the Cray XC40 system base blades and actualizes a standard PCI Express Gen3 host between face, enabling availability to an extensive variety of HPC preparing register motors.
This widespread nature of the Cray XC arrangement open architecture permits the system to be designed with the best accessible gadgets today, and afterward enlarged or redesigned later on with the client’s decision of processors/coprocessors using processor daughter cards (PDCs), each with their own autonomous abilities and upgrade plan.
5.1 XC30 Compute Blade
Compute Blade the Cray XC30 arrangement architecture actualizes two processor motors for every compute node, and has four compute nodes for every blade. Compute blades stack 16 to a suspension, and every bureau can be populated with up to three frame, coming full circle in 384 attachments for each bureau. Cray XC30 supercomputers can be designed into several cupboards and moved up to surpass 100 petaflops per system.
5.2 Custom or ISV Jobs on the Same System — Extreme Scale and Cluster Compatibility
Instead of being confined by a limited system architecture, the Cray XC arrangement gives complete workload adaptability. Taking into account eras of involvement with both situations, Cray has utilized a solitary machine to run both exceedingly adaptable custom work-loads and additionally industry-standard ISV occupations through the capable Cray Linux® Environment (CLE). CLE empowers a Cluster Compatibility Mode (CCM) to come up short on the-crate Linux/x86 variants of ISV delicate product with no prerequisite for porting, recompiling or relinking. On the other hand, Cray’s Extreme Scalability Mode (ESM) can be set to keep running in a performance-upgraded situation for custom codes. These adaptable and upgraded operation modes are dynamic and accessible to the client on an individual occupation premise. CLE has been enhanced to capitalize on the progressions in the Aries interconnect and the Dragonfly topology without requiring client tuning. Versatile supercomputing means supporting distinctive procedures of code execution on the fly.
5.3 ROI, Upgradability and Investment Protection
Other than the adjustable arrangement of the careful machine a client requires, Cray XC40 supercomputer architecture is designed for simple, adaptable redesigns and development, an advantage that draws out its beneficial lifetime and the client’s venture. As new innovation headways get to be accessible, clients can exploit these cutting edge movements profound into the life cycle before perpetually considering supplanting a HPC system. Versatile supercomputing means life span.
5.4 Cray XC30 System Resiliency Features
The Aries interconnect is intended to scale to monstrous HPC systems in which disappointments are not out of the ordinary, yet where it is basic that applications rushed to fruitful culmination in the vicinity of blunders. Aries utilizes mistake rectifying code (ECC) to ensure significant recollections and data ways inside of the gadget. The ECC consolidated with the Aries versatile directing equipment give enhanced system and applications flexibility. In the case of a path disappointment, the versatile steering equipment will naturally cover it out. The HSS can even consequently reconfigure to course around the awful connections in the occasion of losing all network between two interconnects.
5.5 Innovative Cooling and Green Systems
Cray keeps on propelling its HPC cooling productivity favorable circumstances, incorporating a blend of vertical fluid loop units per compute bureau and transverse wind stream reused through the system. Fans in blower cupboards can be hot swapped and the system yields “room nonpartisan” air deplete. Enhance your TCO by lessening the quantity of chillers and air handlers. Wipe out the requirement for hot/chilly walkways in your datacenter.
With the earlier XE5m and XE6m midrange supers, Cray backstepped from a 3D torus to a 2D torus topology utilizing the Gemini interconnect, and there was a positive performance hit from doing as such. Be that as it may, with the Dragonfly approach, all processors are connected to every single other processor (not straightforwardly obviously, but rather without any than five jumps between any two processors) and there is no performance hit moving down to an AC model contrasted with a LC model. The excellence is that you utilize the same Cray Linux Environment on both machines and the same compilers and math libraries, as well.
The XC30-AC machines are accessible now and expense from $500,000 to $3m, which works out to around $22,000 per teraflops for a solitary racker to around $17,000 per teraflops for an eight-racker.
One intriguing note about processor decisions. For vast supercomputing focuses, having Opterons for as long as decade was fine, with the exception of the incidental defer or bug. Be that as it may, when attempting to push down into littler associations, Bolding says numerous have a “purchase Intel” rationality.
In view of this, one major change with the XC30-AC over the XE5m and XE6m midrange supers is that the change from AMD to Intel processors will naturally expand the aggregate addressable business sector for the machines by a component of four or something like that. The XE5m and XE6m machines were a “moderate accomplishment,” as indicated by Bolding, and if Cray does a few times the offers of these crates with the XC30-AC machines, this will constitute an incredible achievement.