Click here to start

Table of contents

Nodes and Networks hardware

Overview

Computer families/1

Computer families/2

Micro architecture features for HPC

Superscalar

Pipelining/1

Pipelining/2

Pipelining/3

P3/P4 superpipelining

Opteron pipeline

Itanium2 pipeline

Out Of Order (OOO) execution/1

OOO/2

OOO/3

OOO/4

Branch prediction Speculative execution

Hyperthreading/1

Hyperthreading/2

Hyperthreading/3

Hyperthreading/4

Hyperthreading/5

x86 uarchitectures

Intel x86 family

AMD Athlon family

Pentium 4 0.13u uarchitecture

Netburst 90nm uarch

Athlon uarchitecture

x86 Architecture extensions/1

x86 Architecture extensions/2

x86 architecture extensions/3

x86 architecture extensions/4

SIMD technology

typical SIMD operation

MMX

SSE

SSE2

Intel SSE3/1

SSE3/2

Cache memory

Cache memory/1

Cache memory/2

Cache memory/3

Cache memory/4

Cache memory/5

Cache memory/6

Cache memory/7

Cache memory/8

Cache memory/9

Cache memory/10

Cache memory/11

Cache memory/12

Cache memory/13

Cache memories/14

Real Caches/1

Real Caches/2

Real Caches/3

Memory performance

MTRR/1

MTRR/2

MTRR/3

MTRR/4

MTRR/5

MTRR/6

Explicit cache control/1

Explicit cache control/2

Performance and timestamp counters/1

Performance and timestamp counters/2

Performance and timestamp counters/3

Performance and timestamp counters/4

Performance and timestamp counters/5

Performance and timestamp counters/6

Performance and timestamp counters/7

Intel Itanium/1

Intel Itanium/2

Intel Itanium/3

Intel Itanium/4

Intel Itanium/4bis

Intel Itanium/5

Intel Itanium/6

Intel Itanium/7

Intel Itanium/8

Intel Itanium/9

Intel Itanium/10

Intel Itanium/11

Intel Itanium/12

Intel Itanium/13

Slide 88

Intel Itanium/15

Intel Itanium/16

Intel Itanium/17

Intel Itanium/18

Intel Itanium/18

Opteron uarch

64 bit architectures

AMD-64/1

AMD Hammer/Opteron

Intel EM64T/1

x86-64 or AMD64

Processor bus/1

Processor bus/2

Processor bus/3

Intel IA32 node

Intel PIII/P4 processor bus

Alpha node

Alpha/Athlon EV6 interconnect

Opteron integrated memory controller

2P Hammer

4P Hammer

4P Opteron detailed view

8P Hammer

Hyper Transport hops

HT local/xfire bandwidth

Double/Quad core

SMPs

Intel MP Processor bus arbitration

Cache coherency

Cache consistency

Snooping

Intel MP snooping

MESI protocol

MESI states

L2/L1 coherence

Atomic Read Modify Write

Intel MP interrupts

Broadcast cache coherence

PCI Bus

PCI-X 1.0

PCI-X 2.0

PCI efficiency

PCI 2.2/X timing diagram

common chipsets PCI performance

chipsets PCI-X performance

Memory buses

Interconnects /1

LogP metrics (Culler)

LogP diagram

Interconnects /2

Interconnects /3

Interconnects

Interconnects /4

Slide 143

Slide 144

Interconnect/4bis

Interconnects/5

Bisection /1

Bisection /2

Bisection /3

Slide 150

Interconnects /5

Interconnects /6

Interconnects /7

Gilder’s law

Optical networks

NIC Interconnection point (from D.Culler)

LVDS/1

LVDS/2

LVDS/3

LVDS/4

VCSEL/1

VCSEL/2-EEL (Edge Emitting)

VCSEL/3- Surface.Emission

VCSEL/4

VCSEL/5

Ethernet history

Ethernet

Ethernet Frames

Hubs

Ethernet flow/control

Auto Negotiation

GigE

10 GbE /1

10GbE /2

VLAN/1

VLAN/2

Infiniband/1

Infiniband/2

Infiniband/3

Infiniband/4

Infiniband/5

Myrinet/1

Myrinet/2

Myrinet/3

Myrinet/4

Clos networks

Clos networks/2

SCI/1

SCI/2

Slide 190

SCI/3

SCI/4 3D Torus

Interconnect/1GbE

Interconnect/SCI

Interconnect/Myrinet

Interconnect/Infiniband

Interconnect/10GbE

Software

Software overhead

Standard data flow /1

Standard data flow /2

Zero-copy data flow

Zero Copy Research

OS bypass – User level networking

Active Messages (AM)

FastMessages (FM)

VIA/1

VIA/2

VIA/3

Fbufs (Druschel 1993)

Network technologies

Trapeze driver for Myrinet2000 on freeBSD

Bibliography

Author: roberto innocente

E-mail: inno@sissa.it