Performance

Note

Starting at l4a.pdf, slide 10

Performance and Scalability

  • Sequential runtime (): a function of problem size and architecture
  • Parallel runtime (): a function of problem size, parallel architecture, & number of processors used in the execution
    • Parallel performance affected by algorithm & architecture
  • Scalability: how the program behaves in terms of complexity and system size (number of processors)
    • Strongly scalable: same efficiency when increasing system size when problem size does not change
    • Weakly scalable: same efficiency when increasing system size at the same rate as problem size

Performance metrics and formulas

  • : the execution time on a single processor
  • : the execution time on a N-processor system
  • (speedup)
  • (efficiency)

Speedups and efficiency

Parallel program

p124816
S1.01.93.66.510.8
1.00.950.900.810.68

Parallel program on different problem sizes

p124816
HalfS1.01.93.14.86.2
E1.00.950.780.600.39
OriginalS1.01.93.66.510.8
E1.00.950.900.810.68
DoubleS1.01.93.97.514.2
E1.00.950.980.940.89

Limits & costs of parallel programming

Amdahl’s law

  • potential program speedup is defined by the fraction of code () that can be parallelized
  • if none of the code can be parallelized and (no speedup)
  • if all of the code is parallelized and (in theory)
  • if 50% of the code can be parallelized maximum speedup (code will run twice as fast)
NP = 0.50P = 0.90P = 0.95P = 0.99
101.825.266.899.17
1001.989.1716.8050.25
10001.999.9119.6290.99
100001.999.9119.9699.02
1000001.999.9919.9999.90

Amdahl’s law - Fixed size speedup

  • Let be the fraction of a program that is sequential is the fraction that can be parallelized
  • Let be the execution time on 1 processor
  • Let be the execution time on N processors
  • is the speedup
  • As
  • When does Amdahl’s Law apply?
    • when the problem size is fixed
    • strong scaling (, )
    • speedup bound is determined by the degree of sequential execution time in the computation, not # of processors

Gustafson-Barsis’ Law (scaled speedup)

  • assume parallel time is kept constant
    • is constant
    • is the fraction of spent in sequential execution
    • is the fraction of spent in parallel execution
  • What is the execution time on one processor?
  • What is the speedup in this case?
  • When does Gustafson’s Law apply?
    • when problem size can increase as the number of processors increases
    • weak scaling ()
    • speedup function includes number of processors
    • can maintain or increase parallel efficiency as the problem scales

Amdahl vs Gustafson-Baris

DAG Model of Computation

  • think of a program as a directed acyclic graph (DAG) of tasks
    • task cannot execute until all the inputs to the task are available
    • these come from outputs of earlier executing tasks
    • DAG shows explicitly the task dependencies
  • think of the hardware as consisting of workers (processors)
  • consider a greedy scheduler of the DAG tasks to workers (no worker is idle while there are tasks still to execute)
graph LR
    A --> B
    A --> D
    A --> E
    B --> C
    C --> F
    D --> F
    E --> F
    F --> G

Work-span model

  • = time to run with workers
  • = work (execution of all tasks by 1 worker)
  • Sum of all work
  • = span (time along critical path)
  • Critical path: sequence of task execution (path) through DAG that takes the longest time to execute assumes an inf

Lower/upper bound on greedy scheduling

  • suppose we only have workers
  • we can write a work-span formula to derive a lower bound on
  • is the best possible execution time
  • Brent’s Lemma derives an upper bound:
    • capture the additional cost executing the other tasks not on the critical path
    • assume we can do so without overhead

Amdahl was an optimist

Note

kind of ended slide 31? he speedran through the rest

Midterm

  • no multiple choice? potentially
  • 4 or 5 questions
  • format will come via email!!!!!!!!