Note
Missed last week, starting from
l1b.pdf
slide 25.
Terminology
- Communication: how different parallel tasks exchange data (commonly over a shared memory bus or network)
- Synchronization: coordination of parallel tasks in real time
- often implemented by establishing a synchronization point within an application (task cannot proceed until another task reaches the same point)
- Granularity: a qualitative measure of the ratio of computation to communication
- Coarse: large amounts of computational work are done between communication events
- Fine: small amounts of computational work are done between communication events
- Observed speedup: how much faster code that has been parallelized is
- Parallel overhead: extra control calls for parallel execution
- the amount of time required to coordinate parallel tasks, opposed to doing useful work
- examples:
- task start-up time
- synchronizations
- data communications
- software overhead imposed by parallel languages, libraries, OS, etc.
- task termination time
- Scalability: system’s ability to demonstrate a proportionate increase in parallel speedup with the addition of more resources
- Massively parallel: close to linear growth in observed speedup with system (# of cores), good scalability
- refers to the problem / the code
- Embarrassingly parallel: little to no need for coordination between tasks, extremely good scalability
- rare to happen
- example: image processing (split into multiple parts, each part is handled completely independently)
Note
End of
l1b.pdf
, moving on tol2a.pdf
Flynn’s Classic Taxonomy
- there are different ways to classify parallel computers
- Flynn’s Taxonomy (1966): one of the more widely used classifications
- distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction Stream and Data Stream
SISD
Single Instruction stream, Single Data stream
SIMD
Single Instruction stream, Multiple data streams
- all processing units execute the same instruction
- each processing unit can operate on a different data unit
- most modern computers use SIMD, specifically with GPU’s / graphics
- not exactly, GPU’s use SIMT (single instruction, multiple threads)
MISD
Multiple Instruction streams, Single Data stream
- cracking an encrypted code (hypothetical)
MIMD
Multiple Instruction streams, Multiple Data streams
- every processor may be executing a different instructions stream
- every processor may be working with a different data stream
- most common one
- MIMD is usually asynchronous
- no global clock (no relation between system times on two different processors)
- examples:
- most current super computers
- cloud computing systems/networked parallel computer clustres and “grids”
- CPUs with simultaneous multithreading
- Intel Xeon, AMD EPYC
- execution can be
- synchronous or asynchronous
- …? (in slides)
Note
End of
l2a.pdf
, moving on tol2b.pdf
Two types of MIMD systems, classified based on memory arrangement
Shared-memory systems
- autonomous processors connected to a memory system
- interconnection network
- each processor accessing each memory location
- processor communication ⇒ accessing shared data structures
- typically homogeneous: n identical processors
- CPU (multiple cores), local memory/caches, local I/O
- processing nodes share the main memory physical space
- any processor can address any location of main memory (firmware level)
- some cache levels (first, secondary, tertiary cache) can be shared
Distributed-memory systems
- each processor-memory paired with its private memory
- processor-memory pairs communication
- interconnection network
- processor-memory pairs communication
- processor communication
- message passing ⇒ used to synchronization
- special functions granting access to other memory regions