Note
Starting at
3-Introduction to Testing.pdf, slide 41
Random testing
AFL vs LibFuzzer
- AFL runs as a black-box process and mutates inputs externally
- LibFuzzer is in-process and uses compiler instrumentation to guide input mutations more directly
| AFL | LibFuzzer |
|---|---|
| standalone tool | library that can be implemented into a larger testing framework |
OSS-Fuzz
a Google service that uses ClusterFuzz (+ sanitizers and fuzzers like AFL/LibFuzzer) to continuously fuzz open-source projects
- has discovered over 17400 bugs from 2016 to 2019 in many large projects (e.g. openssl, llvm, postresql, git, firefox)
ClusterFuzz
Google’s scalable fuzzing infrastructure
- used in OSS-Fuzz & to fuzz the Chrome browser
- as of January 2019, it has found ~16000 bugs in Chrome and ~11000 bugs in 160+ OS projects
- highly scalable (1000+ machines)
- accurate deduplication of crashes
- fully automatic bug filing for issue trackers
- analytics & web interface
Grammar-based fuzzing
there are different ways to formally describe a language:
- Regular expressions: simplest class of languages
- example:
[a-z]*denotes a (possibly empty) sequence of lowercase letters
- example:
- Context-free grammars: can express a wide range of properties of an input language (e.g. syntactical structure of an input format)
- e.g.
- expression → term | expression + term
- term → factor | term * factor
- factor → integer | (expression)
- integer → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
- e.g.
Testing concurrent programs
uncovering bugs in concurrent programs requires not only discovering specific program inputs, but also specific thread schedules
- Solution: add random delays using
sleep(x)(adding these delays has the effect of attempting different thread schedules → hopefully one causes a bug)
Depth of a concurrency bug
Bug depth: the number of ordering constraints (e.g. if/else’s) a schedule has to satisfy to find a bug
- how deeply the bug is embedded in the program’s logic and how difficult it is to fix
- Ordering constraints: the order in which the operations are executed by different threads
Case Studies
Google Monkey (android testing)
- generates
TOUCH(x,y), wherexandyare randomly generated - generates
MOVE(x2,y2), wherex2andy2are randomly generated - generates
MOTION(..), consists of aDOWN()event somewhere on the screen, sequence ofMOVE(..)events, and anUP()event
Grammar of Monkey events
test_case -> event *
event -> action ( x, y ) | ...
action -> DOWN | MOVE | UP
x -> 0 | 1 | ... | x_limit
y -> 0 | 1 | ... | y_limit
- Input:
DOWN(0,0); MOVE(1,1); UP(2,2); - Expected output: an event of
DOWNis happening on the coordinate (0,0), an event ofMOVEis happening on the coordinate (1,1), and an event ofUPis happening on the coordinate (2,2)
Microsoft Cuzz (concurrent testing)
- works by generating random inputs and concurrent schedules of threads to test the application’s behaviour
- detects and reports any race conditions or other concurrency-related bugs that are found during testing
- Main idea: automate the approach of implementing
sleep()calls systematically - gives worst-case probabilistic guarantee on finding bugs
Probabilistic guarantee
Given a program with:
- n-threads (~tens)
- k steps (~millions)
- bug of depth (1 or 2)
Cuzz will find the bug with a probability of at least
Example
# function that each thread will run
def increment_counter():
global counter
for _ in range(1000000):
counter += 1
# fixed version (thread safe)
counter_lock = threading.Lock()
def increment_counter():
global counter
for _ in range(1000000):
with counter_lock:
counter += 1Quiz: Concurrency bug depth
// thread 1:
lock(a);
lock(b);
g = g + 1;
unlock(b);
unlock(a);
// thread 2:
lock(b);
lock(a);
g = 0;
unlock(a);
unlock(b);
specify the depth of the currency bug, and specify all ordering constraints needed to trigger the bug (use notation x(y) to mean statement x comes before statement y, and separate multiple constraints by a space)
Solution
Depth of the concurrency bug: 2
Ordering constraints: (1,7) (6,2)
deadlock happens
Note
Slides up to slide 76/77