Random testing

software testing method where a program is bombarded with large amounts of random, unexpected, or malformed input data
- covers crashes, memory leaks, or security vulnerabilities
- finds edge cases
usually a black-box testing method (code is not required for random input)

Origin

Professor Barton Paul Miller (University of Wisconsin-Madison)
thunderstorm (random electrical noise) was causing his programs to crash → corrupted inputs through dial-up connections

Fuzzing

Idea: test cases are selected at random from a larger pool of test cases
special case of mutation analysis (input mutation)

The first fuzzing study

1990: command line fuzzer, tested reliability of existing UNIX programs
- caused 25-33% of UNIX utility programs to crash or hang
1995: created GUI-based programs, network protocols, and system library apis
- “Even worse is that many of the same bugs that we reported in 1990 are still present in the code releases of 1995.”

First generation

flowchart LR
    f[Fuzzer]
    gen[[Randomly generate input]]
    i1[Input #1]
    i2[Input #2]
    i3[Input #3]
    i4["H@5^23#t"]
    run[[Run on inputs]]
    p[Program]
    coverage(("Poor\nCoverage!"))
    cmd["./Program < /dev/random"]

    f --> gen --> i1 & i2 & i3 & i4 --> run --> p
    p -.-> coverage
    f --- cmd

Second generation

flowchart LR
    subgraph corpus[ ]
        direction TB
        in1[Input]
        in2[Input]
        dots["..."]
        in3[Input]
    end

    pick[[Pick an Input]]
    mutate[[Mutate the Input]]
    run[[Run on Inputs]]

    f[Fuzzer]
    gen1[Mutated Input #1]
    gen2[Mutated Input #2]
    gen3["<!BTTLIST>"]

    p[Program]

    corpus --> pick --> f
    f --> mutate --> gen1 & gen2 & gen3 --> run --> p

tests all seeds → generates samples for each seed

Example:

user input seed 1

<user>
    <name>Alice</name>
    <age>25</age>
</user>

user input seed 2

<book>
    <title>Fuzzing 101</title>
    <author>Barton Miller</author>
</book>

e.g.

<user>
    <name>Alice</name>
    <age>25</age>
</user>
 
<user>
    <name>Ali@@@@ce</name>
    <age>25</age>
</user>
 
<user>
    <name>Alice</name>
    <age>25</age>
    <age>999</age>
</user>
 
<user>
    <name>
        <age>25</age>
    </name>
</user>

Third generation

flowchart LR
    subgraph corpus[ ]
        direction TB
        in1[Input]
        in2[Input]
        dots["..."]
        in3[Input]
    end

    pick[[Pick an Input]]
    f[Fuzzer]
    mutate[[Mutate the Input]]
    run[[Run on Inputs]]
    p[Program]
    feedback[Feedback]
    decision{Interesting?}
    yes[[Yes: add Input]]
    no[[No: discard Input]]

    gen1[Mutated Input #1]
    gen2[Mutated Input #2]
    gen3["<!BTTLIST>"]

    note["new paths discovered? longer execution traces? or compliance with target specifications?"]

    corpus --> pick --> f
    f --> mutate --> gen1 & gen2 & gen3 --> run --> p
    p --> feedback --> decision
    decision -->|Yes| yes --> corpus
    decision -->|No| no
    decision -.-> note

feedback loop
- using feedback form the system to guide the testing process, testers can be more thorough and efficient in uncovering defects and bugs

Example: testing a search function on a website

random input search queries, including special characters or accented characters
if the search function returns unexpected results, tester can use feedback to guide the selection of subsequent inputs and test cases

The infinite monkey theorem

A monkey hitting keys at random on a typewriter keyboard will produce any given text, such as the complete works of Shakespeare, with probability approaching 1 as time increases.

Examples

Testing a video game

a tester randomly selects different levels, characters, and weapons to use
finds issues that might not be found through more traditional methods

Testing a financial application

a tester randomly deposits, withdraws, inputs random numbers, checks balance, etc

Implementation

the random_test function takes a list of test cases as input
- each test case is a tuple with input data and expected output
random_test runs 10 iterations where it selects and runs a random test case
check output and make sure it is the same as expected output

import random
 
def random_test(test_cases):
    for i in range(10):
        test_case = random.choice(test_cases)
        input_data = test_case[0]
        expected_output = test_case[1]
        
        # run the function being tested with the input_data
        output = function_being_tested(*input_data)
        
        # check if the output matches the expected output
        if output != expected_output:
            print("Test case failed: input was", input_data,
                  "expected output was", expected_output, "but got", output)
        else:
            print("Test case passed: input was", input_data)

Example

#include <stdio.h>
#include <string.h>
 
void copy_string(char* destination, char* source) {
    strcpy(destination, source);
}
 
int main(int argc, char** argv) {
    char destination[10];
    copy_string(destination, argv[1]);
    printf("Copied string: %s\n", destination);
    return 0;
}

Are there any bugs? How can random testing help?

Solution

destination has a fixed size of 10 characters, so if the input is greater than 9 characters + null terminator, we get a buffer overflow.
random testing can find an issue like this since you would be generating random inputs of all kinds (e.g. a string with length > 9)

Random testing code

#include <stdlib.h>
#include <time.h>
 
int main() {
    srand(time(0));  // Initialize random seed
 
    char destination[10];
    size_t num_tests = 20;  // Number of random tests
    size_t max_random_string_size = 50;  // Maximum size of the generated random string
 
    for (size_t i = 0; i < num_tests; ++i) {
        // Generate random string of random length
        size_t random_length = rand() % max_random_string_size;
        char *random_string = generate_random_string(random_length);
 
        printf("Test %lu: Source string length = %lu, Source string = %s\n",
               i + 1, random_length, random_string);
 
        // Perform the copy operation and observe the potential overflow
        copy_string(destination, random_string);
 
        // Display the destination string
        printf("Destination string: %s\n\n", destination);
 
        free(random_string);
    }
 
    return 0;
}

// Function to generate a random string
char* generate_random_string(size_t length) {
    const char charset[] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789";
    char *random_string = malloc(length + 1);
 
    if (random_string) {
        for (int n = 0; n < length; n++) {
            int random_index = rand() % (sizeof(charset) - 1);
            random_string[n] = charset[random_index];
        }
        random_string[length] = '\0';
    }
 
    return random_string;
}

Fixed code

#include <stdio.h>
#include <string.h>
 
void copy_string(char* destination, char* source, size_t size) {
    strncpy(destination, source, size - 1);
    destination[size - 1] = '\0';
}
 
int main(int argc, char** argv) {
    if (argc < 2) {
        printf("No input string provided.\n");
        return 1;
    }
 
    char destination[10];
    copy_string(destination, argv[1], sizeof(destination));
    printf("Copied string: %s\n", destination);
    return 0;
}

What kinds of bugs can random testing / fuzzing find?

Input validation errors: invalid input such as input that is out of range or in the wrong format
Race condition errors: related to concurrent access to shared resources, such as data race conditions or synchronization issues
Boundary value errors: related to the handling of edge or corner cases, such as overflow or underflow conditions
Error handling: related to how the system handles errors and exceptions, such as unexpected crashes or incorrect error messages
Resource leaks: related to the management of resources, such as memory leaks or file descriptor leak
Compatibility issues: related to the compatibility of the software with different operating systems, browsers, or hardware configurations
Security vulnerabilities: related to security, such as SQL injection or cross-site scripting (XSS) vulnerabilities
Performance issues: related to the performance of the software, such as high CPU usage or slow response times

Pros and cons

Pros:

easy to implement
probably good coverage given enough tests
can work with programs of any format
appealing for finding many problems

Cons:

inefficient test suite
- time consuming and resource intensive to generate large numbers of inputs
might find bugs that are unimportant
difficult to reproduce bugs
poor & uneven code coverage

Examples

Uneven code coverage: example 1

void test_me(int x) {
    int y = x + 3; // could be run 5 million times
    if (y == 13) { // and y could just never be 13
        ERROR;     // meaning the error never gets hit
    }
}

need to test this function $2^{32}$ times to 100% find the “bug”
assuming each test is unique

AFL: American Fuzzy Lop

brute force fuzzer coupled with an exceedingly simple but rock-solid instrumentation guided genetic algorithm
- arguably the best known coverage guided fuzzing tool
Steps:
1. load user-supplied initial test cases into the queue
2. take next input file from the queue
3. attempt to trim the test case to the smallest size that doesn’t alter the measured behaviour of the program,
4. repeatedly mutate the file using a balanced and well-researched variety of traditional fuzzing strategies,
5. if any of the generated mutations resulted in a new state transition recorded by the instrumentation, add mutated output as a new entry in the queue
6. go to 2

LibFuzzer

Motivation: enable to fuzz libraries or smaller unites (i.e., program components) instead of whole programs
user provides fuzzing entry points called fuzz targets
Intuition: if program has X lines of code and Y fuzz targets, then fuzzer only has to cover X / Y lines of code on average per target
Fuzz target: a function that takes an array of bytes as input and performs something interesting with the bytes using the API under test
- fuzz target is executed by the fuzzer multiple times with different data

Note

Ended at slide 40

Connor's Notes

⬅️ Back to portfolio

Explorer

COSC 3P95: Lecture 6 Notes

Random testing

Origin

Fuzzing

The first fuzzing study

First generation

Second generation

Third generation

Example: testing a search function on a website

The infinite monkey theorem

Examples

Testing a video game

Testing a financial application

Implementation

Example

Random testing code

Fixed code

What kinds of bugs can random testing / fuzzing find?

Pros and cons

Examples

Uneven code coverage: example 1

AFL: American Fuzzy Lop

LibFuzzer

Graph View

Table of Contents