Note
Starting at
1-Introduction to Analysis
, slide 41. Missed last week.
False Positive & False Negative
- False positive: when a tool reports a vulnerability when none exists
- False negative: when the tool does not report a vulnerability that does exist
Precision & Recall
-
TP: true positive
-
TN: true negative
-
FP: false positive
-
FN: false negative
-
Precision: how many of the reported issues are truly real (real bugs)?
- soundness
- false positive rate
-
Recall: how many of the real issues were successfully reported?
- completeness
- false negative rate
-
F measure: one number to represent both precision and recall - higher number = better analysis (more sound)
Quiz: Comparing program analyses
- Calculate the precision, recall, and f-measure of each of the following:
precision | recall | f-measure | |
---|---|---|---|
sound and complete analysis | 1.0 | 1.0 | 1.0 |
complete analysis with 40% FP rate | 0.6 | 1.0 | 0.75 |
sound analysis with 40% FN rate | 1.0 | 0.6 | 0.75 |
analysis with 30% FP rate and 70% FN rate | 0.7 | 0.3 | 0.42 |
Undecidability of Program Properties
A property of programs is undecidable if no algorithm can always determine (for every possible program) whether that property holds or not, in finite time
For example:
- “does this program ever crash?”
- “does this program halt on all inputs?”
- “does this array access ever go out of bounds?”
These are properties about program behaviour. Some can be answered for specific programs, but no general algorithm exists that always answers correctly for all possible programs.
Decidable Program Properties
Some properties can be checked algorithmically, because they are purely syntactic or simple enough.
For example:
- “does this source code contain more than 100 lines?” (syntactic, trivial)
- “does this program reference a variable x?” (lexical check)
- “is the syntax of this program valid in C?” (compiler can determine)
- “does this finite-state machine reach a final state?” (finite state model checking is decidable)
- “does this loop run at most 10 times in this program fragment?” (if bounds are syntactically visible).
Sound and Complete Analysis
Can program analysis be sound and complete?
- not if we want it to terminate
- because undecidability of certain program properties limits the applicability and the possibilities of sound and complete analysis tools
- this means that not all properties can be automatically checked, so some may require human intervention
Who needs program analysis?
Three primary consumers of program analysis:
- compilers
- software quality tools
- primary focus of this course
- IDE’s
Sidenote
test/experiment with ai coding tools
Quiz 1
Sound? | Complete? | Note | |
---|---|---|---|
A1 | ❌ | ❌ | neither |
A2 | ✅ | ❌ | correctly runs all tests, but doesn’t test everything |
A3 | ❌ | ✅ | detects some false bugs |
Quiz 2
NASA engineer Bob has been asked to apply program analysis to check divide-by-zero errors in software that will control NASA’s next billion-dollar space mission.
Sound? | Complete? | |
---|---|---|
A1 | ❌ | ❌ |
A2 | ✅ | ❌ |
A3 | ❌ | ✅ |
What should Bob choose?
A2 should be chosen.
Quiz 3
Microsoft developer Ann has agreed to apply program analysis to check divide-by-zero errors in her programs on the condition that it will not produce false alarms.
Sound? | Complete? | |
---|---|---|
A1 | ❌ | ❌ |
A2 | ✅ | ❌ |
A3 | ❌ | ✅ |
What should Ann choose?
B2 should be chosen (they want less false alarms).
Quiz 4
Early-stage software development or prototyping.
Sound? | Complete? | |
---|---|---|
A1 | ❌ | ❌ |
A2 | ✅ | ❌ |
A3 | ❌ | ✅ |
What should they choose?
A1 should be chosen (detailed testing less important during prototyping).
Quiz 5
Security Auditing for Known Vulnerabilities
Sound? | Complete? | |
---|---|---|
A1 | ❌ | ❌ |
A2 | ✅ | ❌ |
A3 | ❌ | ✅ |
What should they choose?
A3 should be chosen (all possible errors should be identified, false positives can be manually checked) (A2 has valid argument as well).
Reference
First 2 chapters of this book (online; free): https://cs.au.dk/~amoeller/spa/