Connor's Notes

⬅️ Back to portfolio

❯

❯

❯

❯

COSC 3P93: Week 3 Notes

COSC 3P93: Week 3 Notes

Sep 17, 20253 min read

Parallel hardware

MIMD systems

two types (based on memory arrangement):
- Shared-memory systems: processors connected to common memory via interconnect, communication by shared data structures
- Distributed-memory systems: each processor has private memory, communicate by message passing / special access functions

Shared-memory systems

multiprocessor architecture (MIMD general-purpose)
widely used in servers, workstations
usually homogeneous: n identical processors
each node = cpu + local memory/caches + i/o
all share main memory (firmware level)
any processor can access any memory
some cache levels may be shared
Multicore chips: most common case (private L1, higher caches sometimes shared)

Processing nodes

Interface unit (W): adapts cpu memory requests to memory modules + interconnect, packages requests into messages (src/dest, routing info, etc.)
I/O unit: handles direct node-to-node comm, uses DMA, same interconnect

Physical addressing

Multiprocessor memory: scales to TBs (40+ bit addresses)
CPU: must generate large physical addresses
requires indivisible access sequences + cache coherence
logical vs physical address sizes are independent

Classes of multiprocessor architectures

Process-to-processor mapping:
- Anonymous: any process on any processor, dynamic scheduling, global ready list
- Dedicated: static assignment at load time, each node has own ready list, occasional re-allocation for fault-tolerance/load-balancing
Modular memory organization:
- UMA (SMP): uniform access time
- NUMA: non-uniform access time

Shared-memory types

Type 1 – UMA

Definition: all processors directly connect to same memory, equal latency (indep. of i, j)
tightly-coupled, resource-sharing
easier to program
Local access: L1/L2 caches
Remote access: L3/main memory
Symmetric multiprocessors: all processors identical, equal access to devices (Intel Xeon SMP)
Asymmetric multiprocessors: master processor runs OS, others specialized (ARM big.LITTLE)

Type 2 – NUMA

Definition: each processor has local memory, but all form global address space
local faster than remote, different access times (cache, local, remote)
COMA (Cache-only memory architecture): experimental, treats local memory as cache
examples: AMD EPYC, Intel Xeon Scalable

Architecture combinations

anonymous + UMA
anonymous + NUMA
dedicated + UMA
dedicated + NUMA
most natural = 1 & 4
Memory hierarchies: smooth differences between combos

Issues in shared-memory

Access latency
Memory conflicts
optimization target = minimize both

Minimizing access latency

Interconnection network latency:
- bus = linear
- butterflies/high-dim cubes/trees = logarithmic
- low-dim cubes = √n
shared memory is expensive
- UMA: all remote
- NUMA: mix of local + remote
UMA goal: dynamic allocation in caches
NUMA goal: static allocation in local memory + dynamic caching
- dedicated: private info (code, data) mapped to local memory, remote only for shared info

Minimizing memory conflicts

Queueing model: nodes = clients, memory modules = servers
latency = server response time
depends on server utilization (conflicts)
higher interarrival time = less congestion
Local accesses: key for performance (NUMA memories, SMP caches)

Cache coherence

Problem: writable shared info in caches can cause inconsistencies
Solution: mechanisms needed to ensure consistency across caches

Graph View

Parallel hardware
MIMD systems
Shared-memory systems
Processing nodes
Physical addressing
Classes of multiprocessor architectures
Shared-memory types
Type 1 – UMA
Type 2 – NUMA
Architecture combinations
Issues in shared-memory
Minimizing access latency
Minimizing memory conflicts
Cache coherence

Created with Quartz v4.4.0 © 2025

GitHub
Portfolio