Program point: any value of program counter (any line of code)
Invariant: property that holds at a program point across all inputs/executions
Types of program analysis:
Static analysis: done at compile time, inspects code without execution
sound but often incomplete
Dynamic analysis: done at run-time, monitors executions
complete for observed runs but not sound
Hybrid: combination of static & dynamic analysis
Soundness: never reports something as correct when it is actually wrong
Correctness: finds all possible issues or cases
Control flow graph (CFG): a diagram that shows all possible paths a program can take when it runs
Node: basic block of code (a straight sequence of statements with no jumps)
Edge: possible flow of code (like if, while, goto statements)
Normalization: break complex expressions into single steps & use fresh variables
False positive: test falsely claims a problem
False negative: test falsely misses a problem
Precision: how many of the reported issues are truly real (precision=tp+fptp) → false positive rate
Recall: how many of the real issues were successfully reported (recall=tp+fntp) → false negative rate
F measure: one number that represents both precision & recall
higher number = more sound analysis
F measure=2∗precision+recallprecision∗recall
Static analysis workflow: build an intermediate representation (CFG/AST) → choose an abstract domain to approximate program states → apply transfer functions per statement → iterate to a fixed point to converge on facts
Undecidability limits: general program properties (e.g., termination) cannot be decided automatically, so no practical analyzer can be both sound and complete while always terminating
Trade-offs: tools usually pick sound but incomplete (no false negatives, some false positives) or complete but unsound (finds everything it reports, may miss bugs) depending on risk tolerance
Consumers of analysis: compilers (optimization and warnings), QA/security tooling, and IDEs surface these results during development
Software testing
Software testing: the process used to identify the completeness, correctness, and quality of a software → checks the following:
Specification: does the system have all the features advertised/specified?
Functionality: do those features actually work?
Performance: how well does it run under different conditions?
“Good” test:
high probability of finding an error
not redundant
should be “best of breed”
should be neither too simple or too complex
Black-box testing: based on requirements/functionality, no knowledge of code implementation
Types of black-box testing:
Requirements based: pick one from each category
Equivalence partitioning:
identify classes of inputs with same behaviour
test at least one member
assume the rest of the members will be the same
Boundary value analysis: testing conditions near the bounds of inputs → likely source of programmer error (< vs <=, etc.)
Positive/negative: checks both good/bad results
Decision tables
State-based: based on object state diagrams
Compatibility testing: run the software in different environments
e.g. open a website on different browsers/devices (firefox, mobile, …)
White-box testing: use knowledge of the code to write the tests
Types of white-box testing:
Statement testing: test single statements
Branch testing: make sure each possible outcome from a condition is tested at least once
Pros: simple to start, formats agnostic, can uncover deep/rare failures
Cons: uneven coverage without guidance, hard-to-reproduce crashes, and potentially large time/resource costs
Depth of a concurrency bug: number of ordering constraints (e.g. if/else’s) a schedule has to satisfy to find a bug
Delta debugging
Delta debugging: a method that automatically minimizes a test case that produces a bug
Why simplify failing inputs: smaller cases are easier to explain, reproduce faster, reduce state space, and often subsume duplicate reports
Global Minimum: the smallest test case that makes a test fail
Local Minimum: the smallest test case with respect to itself that causes a test to fail
removing one or more items from the failing test case would not create any other smaller test case
n-minimal: if making between 1 and n changes from c causes the test to no longer fail
1-minimal: if making any single change from c causes the test to no longer fail
2-minimal: if making any 2 changes from c causes the test to no longer fail
k-minimal: if making any k changes from c causes the test to no longer fail
different than n-minimal, since n-minimal is any 1 to n, and this is exactly k changes
Algorithm 1 (linear reduction): removes 1 element at a time and tests
slow but simple, finds the smallest failing input by gradually trimming pieces
Algorithm 2 (binary search reduction): splits into 2 halves, tests each half, and recursively continues on the failing part
faster because it cuts the search space in half each time
Algorithm 3 (hierarchical/balanced reduction): if both halves pass, it means the failure needs elements from both sides
keep one half while reducing the other, then swap
combines both minimal failing parts for the smallest possible failing input
Test function: treat the failing configuration cF as a set of changes and define test(c)∈{pass,fail,?} for each subset—delta debugging repeatedly queries this oracle
ddmin loop: partition cF into n disjoint subsets (Δ1…Δn), test each subset and its complement; if a smaller failing chunk is found, recurse on it, otherwise increase n (finer granularity) until 1-minimal