Note
Starting at
3-Introduction to Testing.pdf
, slide 41
Random testing
AFL vs LibFuzzer
- AFL runs as a black-box process and mutates inputs externally
- LibFuzzer is in-process and uses compiler instrumentation to guide input mutations more directly
AFL | LibFuzzer |
---|---|
standalone tool | library that can be implemented into a larger testing framework |
OSS-Fuzz
a Google service that uses ClusterFuzz (+ sanitizers and fuzzers like AFL/LibFuzzer) to continuously fuzz open-source projects
- has discovered over 17400 bugs from 2016 to 2019 in many large projects (e.g. openssl, llvm, postresql, git, firefox)
ClusterFuzz
Google’s scalable fuzzing infrastructure
- used in OSS-Fuzz & to fuzz the Chrome browser
- as of January 2019, it has found ~16000 bugs in Chrome and ~11000 bugs in 160+ OS projects
- highly scalable (1000+ machines)
- accurate deduplication of crashes
- fully automatic bug filing for issue trackers
- analytics & web interface
Grammar-based fuzzing
there are different ways to formally describe a language:
- Regular expressions: simplest class of languages
- example:
[a-z]*
denotes a (possibly empty) sequence of lowercase letters
- example:
- Context-free grammars: can express a wide range of properties of an input language (e.g. syntactical structure of an input format)
- e.g.
- expression → term | expression + term
- term → factor | term * factor
- factor → integer | (expression)
- integer → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
- e.g.
Testing concurrent programs
uncovering bugs in concurrent programs requires not only discovering specific program inputs, but also specific thread schedules
- Solution: add random delays using
sleep(x)
(adding these delays has the effect of attempting different thread schedules → hopefully one causes a bug)
Depth of a concurrency bug
Bug depth: the number of ordering constraints (e.g. if/else’s) a schedule has to satisfy to find a bug
- how deeply the bug is embedded in the program’s logic and how difficult it is to fix
- Ordering constraints: the order in which the operations are executed by different threads
Case Studies
Google Monkey (android testing)
- generates
TOUCH(x,y)
, wherex
andy
are randomly generated - generates
MOVE(x2,y2)
, wherex2
andy2
are randomly generated - generates
MOTION(..)
, consists of aDOWN()
event somewhere on the screen, sequence ofMOVE(..)
events, and anUP()
event
Grammar of Monkey events
test_case -> event *
event -> action ( x, y ) | ...
action -> DOWN | MOVE | UP
x -> 0 | 1 | ... | x_limit
y -> 0 | 1 | ... | y_limit
- Input:
DOWN(0,0); MOVE(1,1); UP(2,2);
- Expected output: an event of
DOWN
is happening on the coordinate (0,0), an event ofMOVE
is happening on the coordinate (1,1), and an event ofUP
is happening on the coordinate (2,2)
Microsoft Cuzz (concurrent testing)
- works by generating random inputs and concurrent schedules of threads to test the application’s behaviour
- detects and reports any race conditions or other concurrency-related bugs that are found during testing
- Main idea: automate the approach of implementing
sleep()
calls systematically - gives worst-case probabilistic guarantee on finding bugs
Probabilistic guarantee
Given a program with:
- n-threads (~tens)
- k steps (~millions)
- bug of depth (1 or 2)
Cuzz will find the bug with a probability of at least
Example
# function that each thread will run
def increment_counter():
global counter
for _ in range(1000000):
counter += 1
# fixed version (thread safe)
counter_lock = threading.Lock()
def increment_counter():
global counter
for _ in range(1000000):
with counter_lock:
counter += 1
Quiz: Concurrency bug depth
// thread 1:
lock(a);
lock(b);
g = g + 1;
unlock(b);
unlock(a);
// thread 2:
lock(b);
lock(a);
g = 0;
unlock(a);
unlock(b);
specify the depth of the currency bug, and specify all ordering constraints needed to trigger the bug (use notation x(y) to mean statement x comes before statement y, and separate multiple constraints by a space)
Solution
Depth of the concurrency bug: 2
Ordering constraints: (1,7) (6,2)
deadlock happens
Note
Slides up to slide 76/77