Random testing

  • software testing method where a program is bombarded with large amounts of random, unexpected, or malformed input data
    • covers crashes, memory leaks, or security vulnerabilities
    • finds edge cases
  • usually a black-box testing method (code is not required for random input)

Origin

  • Professor Barton Paul Miller (University of Wisconsin-Madison)
  • thunderstorm (random electrical noise) was causing his programs to crash corrupted inputs through dial-up connections

Fuzzing

  • Idea: test cases are selected at random from a larger pool of test cases
  • special case of mutation analysis (input mutation)

The first fuzzing study

  • 1990: command line fuzzer, tested reliability of existing UNIX programs
    • caused 25-33% of UNIX utility programs to crash or hang
  • 1995: created GUI-based programs, network protocols, and system library apis
    • “Even worse is that many of the same bugs that we reported in 1990 are still present in the code releases of 1995.”

First generation

flowchart LR
    f[Fuzzer]
    gen[[Randomly generate input]]
    i1[Input #1]
    i2[Input #2]
    i3[Input #3]
    i4["H@5^23#t"]
    run[[Run on inputs]]
    p[Program]
    coverage(("Poor\nCoverage!"))
    cmd["./Program < /dev/random"]

    f --> gen --> i1 & i2 & i3 & i4 --> run --> p
    p -.-> coverage
    f --- cmd

Second generation

flowchart LR
    subgraph corpus[ ]
        direction TB
        in1[Input]
        in2[Input]
        dots["..."]
        in3[Input]
    end

    pick[[Pick an Input]]
    mutate[[Mutate the Input]]
    run[[Run on Inputs]]

    f[Fuzzer]
    gen1[Mutated Input #1]
    gen2[Mutated Input #2]
    gen3["<!BTTLIST>"]

    p[Program]

    corpus --> pick --> f
    f --> mutate --> gen1 & gen2 & gen3 --> run --> p
  • tests all seeds generates samples for each seed

Example:

  • user input seed 1
<user>
    <name>Alice</name>
    <age>25</age>
</user>
  • user input seed 2
<book>
    <title>Fuzzing 101</title>
    <author>Barton Miller</author>
</book>

e.g.

<user>
    <name>Alice</name>
    <age>25</age>
</user>
 
<user>
    <name>Ali@@@@ce</name>
    <age>25</age>
</user>
 
<user>
    <name>Alice</name>
    <age>25</age>
    <age>999</age>
</user>
 
<user>
    <name>
        <age>25</age>
    </name>
</user>

Third generation

flowchart LR
    subgraph corpus[ ]
        direction TB
        in1[Input]
        in2[Input]
        dots["..."]
        in3[Input]
    end

    pick[[Pick an Input]]
    f[Fuzzer]
    mutate[[Mutate the Input]]
    run[[Run on Inputs]]
    p[Program]
    feedback[Feedback]
    decision{Interesting?}
    yes[[Yes: add Input]]
    no[[No: discard Input]]

    gen1[Mutated Input #1]
    gen2[Mutated Input #2]
    gen3["<!BTTLIST>"]

    note["new paths discovered? longer execution traces? or compliance with target specifications?"]

    corpus --> pick --> f
    f --> mutate --> gen1 & gen2 & gen3 --> run --> p
    p --> feedback --> decision
    decision -->|Yes| yes --> corpus
    decision -->|No| no
    decision -.-> note
  • feedback loop
    • using feedback form the system to guide the testing process, testers can be more thorough and efficient in uncovering defects and bugs
Example: testing a search function on a website
  • random input search queries, including special characters or accented characters
  • if the search function returns unexpected results, tester can use feedback to guide the selection of subsequent inputs and test cases

The infinite monkey theorem

A monkey hitting keys at random on a typewriter keyboard will produce any given text, such as the complete works of Shakespeare, with probability approaching 1 as time increases.

Examples

Testing a video game

  • a tester randomly selects different levels, characters, and weapons to use
  • finds issues that might not be found through more traditional methods

Testing a financial application

  • a tester randomly deposits, withdraws, inputs random numbers, checks balance, etc

Implementation

  • the random_test function takes a list of test cases as input
    • each test case is a tuple with input data and expected output
  • random_test runs 10 iterations where it selects and runs a random test case
  • check output and make sure it is the same as expected output
import random
 
def random_test(test_cases):
    for i in range(10):
        test_case = random.choice(test_cases)
        input_data = test_case[0]
        expected_output = test_case[1]
        
        # run the function being tested with the input_data
        output = function_being_tested(*input_data)
        
        # check if the output matches the expected output
        if output != expected_output:
            print("Test case failed: input was", input_data,
                  "expected output was", expected_output, "but got", output)
        else:
            print("Test case passed: input was", input_data)

Example

#include <stdio.h>
#include <string.h>
 
void copy_string(char* destination, char* source) {
    strcpy(destination, source);
}
 
int main(int argc, char** argv) {
    char destination[10];
    copy_string(destination, argv[1]);
    printf("Copied string: %s\n", destination);
    return 0;
}

Are there any bugs? How can random testing help?

Random testing code

#include <stdlib.h>
#include <time.h>
 
int main() {
    srand(time(0));  // Initialize random seed
 
    char destination[10];
    size_t num_tests = 20;  // Number of random tests
    size_t max_random_string_size = 50;  // Maximum size of the generated random string
 
    for (size_t i = 0; i < num_tests; ++i) {
        // Generate random string of random length
        size_t random_length = rand() % max_random_string_size;
        char *random_string = generate_random_string(random_length);
 
        printf("Test %lu: Source string length = %lu, Source string = %s\n",
               i + 1, random_length, random_string);
 
        // Perform the copy operation and observe the potential overflow
        copy_string(destination, random_string);
 
        // Display the destination string
        printf("Destination string: %s\n\n", destination);
 
        free(random_string);
    }
 
    return 0;
}
// Function to generate a random string
char* generate_random_string(size_t length) {
    const char charset[] =
        "abcdefghijklmnopqrstuvwxyz"
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
        "0123456789";
    char *random_string = malloc(length + 1);
 
    if (random_string) {
        for (int n = 0; n < length; n++) {
            int random_index = rand() % (sizeof(charset) - 1);
            random_string[n] = charset[random_index];
        }
        random_string[length] = '\0';
    }
 
    return random_string;
}

Fixed code

#include <stdio.h>
#include <string.h>
 
void copy_string(char* destination, char* source, size_t size) {
    strncpy(destination, source, size - 1);
    destination[size - 1] = '\0';
}
 
int main(int argc, char** argv) {
    if (argc < 2) {
        printf("No input string provided.\n");
        return 1;
    }
 
    char destination[10];
    copy_string(destination, argv[1], sizeof(destination));
    printf("Copied string: %s\n", destination);
    return 0;
}

What kinds of bugs can random testing / fuzzing find?

  1. Input validation errors: invalid input such as input that is out of range or in the wrong format
  2. Race condition errors: related to concurrent access to shared resources, such as data race conditions or synchronization issues
  3. Boundary value errors: related to the handling of edge or corner cases, such as overflow or underflow conditions
  4. Error handling: related to how the system handles errors and exceptions, such as unexpected crashes or incorrect error messages
  5. Resource leaks: related to the management of resources, such as memory leaks or file descriptor leak
  6. Compatibility issues: related to the compatibility of the software with different operating systems, browsers, or hardware configurations
  7. Security vulnerabilities: related to security, such as SQL injection or cross-site scripting (XSS) vulnerabilities
  8. Performance issues: related to the performance of the software, such as high CPU usage or slow response times

Pros and cons

Pros:

  • easy to implement
  • probably good coverage given enough tests
  • can work with programs of any format
  • appealing for finding many problems

Cons:

  • inefficient test suite
    • time consuming and resource intensive to generate large numbers of inputs
  • might find bugs that are unimportant
  • difficult to reproduce bugs
  • poor & uneven code coverage

Examples

Uneven code coverage: example 1

void test_me(int x) {
    int y = x + 3; // could be run 5 million times
    if (y == 13) { // and y could just never be 13
        ERROR;     // meaning the error never gets hit
    }
}
  • need to test this function times to 100% find the “bug”
  • assuming each test is unique

AFL: American Fuzzy Lop

  • brute force fuzzer coupled with an exceedingly simple but rock-solid instrumentation guided genetic algorithm
    • arguably the best known coverage guided fuzzing tool
  • Steps:
    1. load user-supplied initial test cases into the queue
    2. take next input file from the queue
    3. attempt to trim the test case to the smallest size that doesn’t alter the measured behaviour of the program,
    4. repeatedly mutate the file using a balanced and well-researched variety of traditional fuzzing strategies,
    5. if any of the generated mutations resulted in a new state transition recorded by the instrumentation, add mutated output as a new entry in the queue
    6. go to 2

LibFuzzer

  • Motivation: enable to fuzz libraries or smaller unites (i.e., program components) instead of whole programs
  • user provides fuzzing entry points called fuzz targets
  • Intuition: if program has X lines of code and Y fuzz targets, then fuzzer only has to cover X / Y lines of code on average per target
  • Fuzz target: a function that takes an array of bytes as input and performs something interesting with the bytes using the API under test
    • fuzz target is executed by the fuzzer multiple times with different data

Note

Ended at slide 40