Random testing
- software testing method where a program is bombarded with large amounts of random, unexpected, or malformed input data
- covers crashes, memory leaks, or security vulnerabilities
- finds edge cases
- usually a black-box testing method (code is not required for random input)
Origin
- Professor Barton Paul Miller (University of Wisconsin-Madison)
- thunderstorm (random electrical noise) was causing his programs to crash → corrupted inputs through dial-up connections
Fuzzing
- Idea: test cases are selected at random from a larger pool of test cases
- special case of mutation analysis (input mutation)
The first fuzzing study
- 1990: command line fuzzer, tested reliability of existing UNIX programs
- caused 25-33% of UNIX utility programs to crash or hang
- 1995: created GUI-based programs, network protocols, and system library apis
- “Even worse is that many of the same bugs that we reported in 1990 are still present in the code releases of 1995.”
First generation
flowchart LR f[Fuzzer] gen[[Randomly generate input]] i1[Input #1] i2[Input #2] i3[Input #3] i4["H@5^23#t"] run[[Run on inputs]] p[Program] coverage(("Poor\nCoverage!")) cmd["./Program < /dev/random"] f --> gen --> i1 & i2 & i3 & i4 --> run --> p p -.-> coverage f --- cmd
Second generation
flowchart LR subgraph corpus[ ] direction TB in1[Input] in2[Input] dots["..."] in3[Input] end pick[[Pick an Input]] mutate[[Mutate the Input]] run[[Run on Inputs]] f[Fuzzer] gen1[Mutated Input #1] gen2[Mutated Input #2] gen3["<!BTTLIST>"] p[Program] corpus --> pick --> f f --> mutate --> gen1 & gen2 & gen3 --> run --> p
- tests all seeds → generates samples for each seed
Example:
- user input seed 1
<user>
<name>Alice</name>
<age>25</age>
</user>
- user input seed 2
<book>
<title>Fuzzing 101</title>
<author>Barton Miller</author>
</book>
e.g.
<user>
<name>Alice</name>
<age>25</age>
</user>
<user>
<name>Ali@@@@ce</name>
<age>25</age>
</user>
<user>
<name>Alice</name>
<age>25</age>
<age>999</age>
</user>
<user>
<name>
<age>25</age>
</name>
</user>
Third generation
flowchart LR subgraph corpus[ ] direction TB in1[Input] in2[Input] dots["..."] in3[Input] end pick[[Pick an Input]] f[Fuzzer] mutate[[Mutate the Input]] run[[Run on Inputs]] p[Program] feedback[Feedback] decision{Interesting?} yes[[Yes: add Input]] no[[No: discard Input]] gen1[Mutated Input #1] gen2[Mutated Input #2] gen3["<!BTTLIST>"] note["new paths discovered? longer execution traces? or compliance with target specifications?"] corpus --> pick --> f f --> mutate --> gen1 & gen2 & gen3 --> run --> p p --> feedback --> decision decision -->|Yes| yes --> corpus decision -->|No| no decision -.-> note
- feedback loop
- using feedback form the system to guide the testing process, testers can be more thorough and efficient in uncovering defects and bugs
Example: testing a search function on a website
- random input search queries, including special characters or accented characters
- if the search function returns unexpected results, tester can use feedback to guide the selection of subsequent inputs and test cases
The infinite monkey theorem
A monkey hitting keys at random on a typewriter keyboard will produce any given text, such as the complete works of Shakespeare, with probability approaching 1 as time increases.
Examples
Testing a video game
- a tester randomly selects different levels, characters, and weapons to use
- finds issues that might not be found through more traditional methods
Testing a financial application
- a tester randomly deposits, withdraws, inputs random numbers, checks balance, etc
Implementation
- the
random_test
function takes a list of test cases as input- each test case is a tuple with input data and expected output
random_test
runs 10 iterations where it selects and runs a random test case- check output and make sure it is the same as expected output
import random
def random_test(test_cases):
for i in range(10):
test_case = random.choice(test_cases)
input_data = test_case[0]
expected_output = test_case[1]
# run the function being tested with the input_data
output = function_being_tested(*input_data)
# check if the output matches the expected output
if output != expected_output:
print("Test case failed: input was", input_data,
"expected output was", expected_output, "but got", output)
else:
print("Test case passed: input was", input_data)
Example
#include <stdio.h>
#include <string.h>
void copy_string(char* destination, char* source) {
strcpy(destination, source);
}
int main(int argc, char** argv) {
char destination[10];
copy_string(destination, argv[1]);
printf("Copied string: %s\n", destination);
return 0;
}
Are there any bugs? How can random testing help?
Solution
destination
has a fixed size of 10 characters, so if the input is greater than 9 characters + null terminator, we get a buffer overflow.
random testing can find an issue like this since you would be generating random inputs of all kinds (e.g. a string with length > 9)
Random testing code
#include <stdlib.h>
#include <time.h>
int main() {
srand(time(0)); // Initialize random seed
char destination[10];
size_t num_tests = 20; // Number of random tests
size_t max_random_string_size = 50; // Maximum size of the generated random string
for (size_t i = 0; i < num_tests; ++i) {
// Generate random string of random length
size_t random_length = rand() % max_random_string_size;
char *random_string = generate_random_string(random_length);
printf("Test %lu: Source string length = %lu, Source string = %s\n",
i + 1, random_length, random_string);
// Perform the copy operation and observe the potential overflow
copy_string(destination, random_string);
// Display the destination string
printf("Destination string: %s\n\n", destination);
free(random_string);
}
return 0;
}
// Function to generate a random string
char* generate_random_string(size_t length) {
const char charset[] =
"abcdefghijklmnopqrstuvwxyz"
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"0123456789";
char *random_string = malloc(length + 1);
if (random_string) {
for (int n = 0; n < length; n++) {
int random_index = rand() % (sizeof(charset) - 1);
random_string[n] = charset[random_index];
}
random_string[length] = '\0';
}
return random_string;
}
Fixed code
#include <stdio.h>
#include <string.h>
void copy_string(char* destination, char* source, size_t size) {
strncpy(destination, source, size - 1);
destination[size - 1] = '\0';
}
int main(int argc, char** argv) {
if (argc < 2) {
printf("No input string provided.\n");
return 1;
}
char destination[10];
copy_string(destination, argv[1], sizeof(destination));
printf("Copied string: %s\n", destination);
return 0;
}
What kinds of bugs can random testing / fuzzing find?
- Input validation errors: invalid input such as input that is out of range or in the wrong format
- Race condition errors: related to concurrent access to shared resources, such as data race conditions or synchronization issues
- Boundary value errors: related to the handling of edge or corner cases, such as overflow or underflow conditions
- Error handling: related to how the system handles errors and exceptions, such as unexpected crashes or incorrect error messages
- Resource leaks: related to the management of resources, such as memory leaks or file descriptor leak
- Compatibility issues: related to the compatibility of the software with different operating systems, browsers, or hardware configurations
- Security vulnerabilities: related to security, such as SQL injection or cross-site scripting (XSS) vulnerabilities
- Performance issues: related to the performance of the software, such as high CPU usage or slow response times
Pros and cons
Pros:
- easy to implement
- probably good coverage given enough tests
- can work with programs of any format
- appealing for finding many problems
Cons:
- inefficient test suite
- time consuming and resource intensive to generate large numbers of inputs
- might find bugs that are unimportant
- difficult to reproduce bugs
- poor & uneven code coverage
Examples
Uneven code coverage: example 1
void test_me(int x) {
int y = x + 3; // could be run 5 million times
if (y == 13) { // and y could just never be 13
ERROR; // meaning the error never gets hit
}
}
- need to test this function times to 100% find the “bug”
- assuming each test is unique
AFL: American Fuzzy Lop
- brute force fuzzer coupled with an exceedingly simple but rock-solid instrumentation guided genetic algorithm
- arguably the best known coverage guided fuzzing tool
- Steps:
- load user-supplied initial test cases into the queue
- take next input file from the queue
- attempt to trim the test case to the smallest size that doesn’t alter the measured behaviour of the program,
- repeatedly mutate the file using a balanced and well-researched variety of traditional fuzzing strategies,
- if any of the generated mutations resulted in a new state transition recorded by the instrumentation, add mutated output as a new entry in the queue
- go to 2
LibFuzzer
- Motivation: enable to fuzz libraries or smaller unites (i.e., program components) instead of whole programs
- user provides fuzzing entry points called fuzz targets
- Intuition: if program has X lines of code and Y fuzz targets, then fuzzer only has to cover X / Y lines of code on average per target
- Fuzz target: a function that takes an array of bytes as input and performs something interesting with the bytes using the API under test
- fuzz target is executed by the fuzzer multiple times with different data
Note
Ended at slide 40