Skip to content

Under Constrained Symbolic Execution IRAD

Langston Barrett edited this page Feb 3, 2021 · 3 revisions

Table of Contents

  1. Under-constrained Symbolic Execution IR&D
    1. Prior Work: "Under-Constrained Symbolic Execution: Correctness Checking for Real Code"
      1. Abstract
      2. Limitations of Traditional Symbolic Execution
      3. One Function at a Time
      4. No Specification Required
      5. UC-KLEE Works On LLVM Bitcode
      6. Why "Underconstrained"? False Positives!
      7. Silencing False Positives
      8. Input Generation with Lazy Initialization
        1. K-bounding
        2. No Aliasing Nor Cycles
      9. Patch Checking
        1. Path Pruning
        2. Error Uniquing
        3. Overriding Macros for Determinism
        4. Must-Fail Bugs
        5. Verification
        6. False Positives
      10. Generalized Checking
        1. Missing Functions, Inline Assembly, and Symbolic Function Pointers
        2. Uninitialized Data Checker
        3. User Input Checker
        4. Evaluation
        5. Interesting Point About Byte Order
      11. Implementation
        1. Object Sizes
        2. Error Reporting
        3. KLEE Optimizations
        4. Lazy Constraints
        5. Function Pointers
    2. Design Notes
      1. High-Level Requirements
      2. Initial Design
      3. Design Questions
        1. Precondition and Heuristic Complexity

Under-constrained Symbolic Execution IR&D

The purpose of this IR&D project is to extend Crucible with the capability to perform under-constrained symbolic execution. Namely, to generate inputs for, and symbolically execute, arbitrary functions inside a program without starting from the program entrypoint, and without requiring significant specification effort from a user.

This page consists of Langston Barrett's notes on this project.

Prior Work: "Under-Constrained Symbolic Execution: Correctness Checking for Real Code"

An extension of KLEE (called UC-KLEE) that analyzes LLVM bitcode.

Abstract

Software bugs are a well-known source of security vulnerabilities. One technique for finding bugs, symbolic execution, considers all possible inputs to a program but suffers from scalability limitations. This paper uses a variant, under-constrained symbolic execution, that improves scalability by directly checking individual functions, rather than whole programs. We present UC-KLEE, a novel, scalable framework for checking C/C++ systems code, along with two use cases. First, we use UC-KLEE to check whether patches introduce crashes. We check over 800 patches from BIND and OpenSSL and find 12 bugs, including two OpenSSL denial-of-service vulnerabilities. We also verify (with caveats) that 115 patches do not introduce crashes. Second, we use UC-KLEE as a generalized checking framework and implement checkers to find memory leaks, uninitialized data, and unsafe user input. We evaluate the checkers on over 20,000 functions from BIND, OpenSSL, and the Linux kernel, find 67 bugs, and verify that hundreds of functions are leak free and that thousands of functions do not access uninitialized data.

Limitations of Traditional Symbolic Execution

Symbolc execution suffers from path explosion, and so is often unable to reach code deep in the control-flow graph.

One Function at a Time

The key difference with classical symbolic execution seems to be that underconstrained symbolic execution operations on one function at a time, rather than whole programs.

No Specification Required

without requiring a manual specification or even a single testcase

UC-KLEE Works On LLVM Bitcode

UC-KLEE works on LLVM bitcode.

Why "Underconstrained"? False Positives!

Individual functions have preconditions that the symbolic execution tool can't know, which means the inputs that it generates are under-constrained. This means there may be false positives, where the tool generates an input that doesn't conform to the function's preconditions.

Silencing False Positives

Users may manually annotate code (annotations are written in C).

Input Generation with Lazy Initialization

All pointers begin in an unbound state. When an unbound pointer is compared to null, UC-KLEE will fork. When an unbound pointer is loaded from or stored to, UC-KLEE lazily allocates a block of memory.

K-bounding

To prevent infinite execution, UC-KLEE takes a command-line flag with a static depth bound on all generated datastructures.

No Aliasing Nor Cycles

UC-KLEE will never construct inputs with aliasing or cyclical datastructures.

Patch Checking

UC-KLEE can check that patches don't introduce bugs. It works via crash equivalence, i.e. it will only report errors that introduce new crashes relative to the old code.

Path Pruning

UC-KLEE uses a static analysis to find paths in the new code that cannot execute differing sequences of instructions from the old code, and soundly prunes them from consideration for performance.

Error Uniquing

UC-KLEE uniques errors by PC and error type.

Overriding Macros for Determinism

In the build system, UC-KLEE overrides the following C macros:

  • __LINE__
  • VERSION
  • SRCID
  • DATE
  • OPENSSL_VERSION_NUMBER

Must-Fail Bugs

If there is a control-flow path such that it's infeasible to avoid an error after you've followed that path, that's a must-fail bug.

TODO I need a deeper understanding of this.

Verification

With some caveats, UC-KLEE can verify that a patch doesn't introduce any new bugs (up to the specified depth) by exhaustively checking every path.

False Positives

The authors claim that there are three main classes of missing preconditions:

  1. Data structure invariants
  2. State machine invariants (???)
  3. API invariants

They provide an example of a buffer and it's length bundled in a struct where UC-KLEE has no awareness that the integer represents the length of the buffer.

  1. Annotations

    These false positives may be partially alleviated with annotations written in C.

    For one of the programs, the authors wrote 400 lines of annotations. Seems like a lot, especially considering the number/rate of false positives.

    TODO catalog these

  2. Heuristics

    • Must-fail: See above
    • Belief-fail: A bug is more likely to be "true" if it involves a function "contradicting itself", i.e. dereferencing a pointer where a locally-added path constraint says the pointer is null. A form of Belief Analysis.
    • Concrete-fail: assertion failure or memory error occurred with a concrete condition or pointer, respectively.

    only 8.6% of the belief-fail errors for BIND and 20% of those for OpenSSL were true bugs.

    So belief-fail doesn't sound super effective…

Generalized Checking

UC-KLEE provides a C++ interface for generic checkers.

Missing Functions, Inline Assembly, and Symbolic Function Pointers

When UC-KLEE encounters a call to a function that doesn't have a definition in the bitcode module, it skips it. This under-approximates the postcondition of the function (which may write to memory), so may miss bugs. LB: I think it also may cause false positives? Because you might have a concrete error later that would have been prevented if that function had written to its arguments?

The authors report that over-approximation (havocing every global and piece of memory reachable in the points-to graph from the function's arguments) caused significant performance problems.

UC-KLEE also just skips inline assembly and symbolic function pointers.

Uninitialized Data Checker

The authors say

in practice, loads of uninitialized data are often intentional; they frequently arise within calls to memcpy or when code manipulates bit fields within a C struct.

User Input Checker

While data generated by UC-KLEE must be considered under-constrained, data received from the network, or read from files or environment variables can be considered fully-constrained, because it really could be anything.

UC-KLEE tracks fully-constrained data via shadow memory, reporting its use in bugs.

UC-KLEE conservatively assumes that the appearance of comparisons between fully-constrained and under-constrained data in path conditions represents sanitization of the fully-constrained data.

Evaluation

The leak checker had a false positive rate of 90%. Seems not much better than a compiler warning or linter at that point…

The authors claim that the combination of bugs found with the ability to verify significant numbers of functions represents a significant step forward.

Interesting Point About Byte Order

Chou observed that data swapped from network byte order to host byte order is generally untrusted.

Implementation

Object Sizes

When generating input data,

If the size is too small, later accesses to this object may trigger out-of-bounds memory errors. On the other hand, a size that is too large can hide legitimate errors.

  1. Strategy 1: Backtracking

    One strategy was to pick an initial size based on type information where available, and backtrack whenever an out-of-bounds access happened.

  2. Strategy 2: Symbolic Sizes

    Another strategy is to use symbolic sizes for each object.

Error Reporting

Unlike traditional symbolic execution, underconstrained symex can't report whole-program inputs (basically, bytes), but instead generates pointer-rich datastructures. The authors have a clever way to report errors.

KLEE Optimizations

The authors use what Crucible calls "overrides", i.e. specialized models, for functions like strlen.

Lazy Constraints

Instead of eagerly querying SMT solvers at branches to see if one of them is infeasible, UC-KLEE sometimes replaces the branch condition with a fresh symbolic variable and a lazy constraint that it is equal to the branch condition. In the case that no errors are reported on one of the branches, no query is necessary. If an error is reported, the lazy constraint is added before a query is omitted.

Function Pointers

Emulation of object-oriented code using structs and function pointers proved problematic. There were some manual annotations that alleviated some of this.

Design Notes

High-Level Requirements

A minimal initial prototype would consist of a command-line program that takes as input:

  • an LLVM bitcode module,
  • the (mangled) name of a function to execute, and
  • an input depth and/or size bound,

and produces as output either an "all clear" message, or some representation of inputs to that function that trigger some kind of undesirable behavior.

An great final product would also include (in roughly decreasing priority order):

  • The ability to generate source-language-level programs that trigger the bug
  • Advanced heuristics for detecting false positives
  • Cross-language (C, C++, Java, Rust) support
  • Performance optimizations?

Initial Design

Rather than pursue lazy initialization (a la UC-KLEE), the prototype will instead use an iterative, type- and error-driven approach. Instead of allocating memory dynamically as it's used by the program, it will instead generate program inputs from the program's type signature.

Specifically, the first input will be the smallest possible input of the correct type. The tool will then execute the function on this input, and iteratively expand it:

  • If the symbolic execution succeeded with no safety condition violations, it will increase the input depth (up to the user-specified bound) and execute it again.

  • If some safety conditions were (potentially) violated, the tool will examine the error, and either:

    • report it to the user (and optionally continue executing to find other problems), or
    • use heuristics to decide that the error was likely a false positive (i.e., due to a unsatisfied precondition, such as an uninitialized global variable). The tool may then use further heuristics to generate further inputs or a program state/memory layout that satisfies the precondition. These may include:
      • Allocating memory to back previously unmapped pointers
      • Expanding the sizes of allocations
      • Initializing global variables

Design Questions

Precondition and Heuristic Complexity

How complex of preconditions would we like to be able to deduce? Generally, as we invest in deducing preconditions we may end up with fewer false positives, at the cost of higher complexity.

Generally, any of the "relational" preconditions below would be considerably more difficult to deduce in the general case, and somewhat more difficult to generate arguments for.

Note that pointers in crucible-llvm are represented by a (natural, bitvector) pair where the natural is the allocation ID, and the bitvector is its size.

  1. Nullness

    This is possibly the simplest precondition, stating that a pointer should be non-null. For example, we should deduce that the following function has a precondition that x is not null.

    int deref(int *x) {
      return *x;
    }
    

    This precondition also expresses depth constraints. If you have a linked list

    typedef struct node {
        int val;
        struct node *next;
    } node_t;
    

    You could express that it has length greater than three by saying that l->next, =l->next->next, and l->next->next->next are all non-null.

    You could deduce that a function (probably) has this sort of precondition if

    • execution failed with a null pointer dereference, and
    • the generated arguments contained a null pointer.
  2. Minimal Allocation Size

    If a function indexes a constant amount into an array, it likely has a precondition that said array has at least that much space.

    1. Constant Minimal Allocation Size

      In this example, we could likely deduce that the allocation pointed to by x needs to have a size larger than 2 .

      int get_const(int *x) {
        return x[2];
      }
      

      You could deduce that a function (probably) has this sort of precondition if

      • execution failed with an out-of-bounds read or write,
      • the offset of the read or write is constant, and
      • the allocation being pointed to was allocated during argument generation.
    2. Relational Minimal Allocation Size

      In some similar cases, we may be able to deduce that the allocation pointed to by x needs to have a size larger than i:

      int get(int *x, size_t i) {
        return x[i];
      }
      

      If the index is involved in some kind of calculation, it would be more difficult, if not impossible. This may limit the utility of this kind of heuristic.

      int get(int *x, size_t i) {
        return x[i*2];
      }
      

      You could deduce that a function (probably) has this sort of precondition if

      • execution failed with an out-of-bounds read or write,
      • the allocation being pointed to was allocated during argument generation, and
      • the offset of the read or write was a bitvector that appears inside one of the function's arguments.
  3. Bound by Allocation Size

    If a function indexes an array with an argument, it likely expects that the argument is within bounds of that array.

    1. Bound by Constant Allocation Size

      In this case, we could probably deduce fairly easily that i needs to be less than 5.

      int glob[5];
      int get(int i) {
        return glob[i];
      }
      

      You could deduce that a function (probably) has this sort of precondition if:

      • execution failed with an out-of-bounds read or write,
      • the offset of the read or write was a bitvector that appears inside one of the function's arguments,
      • the size of the allocation was concrete, and
      • the allocation was not allocated as part of argument generation.
    2. Relational Bound by Allocation Size

      This is the converse case to Relational Minimal Allocation Size.

  4. Required Write

    1. To Arguments

      This function requires not only that it's argument has the right size (as in Constant Minimal Allocation Size), but also that there was a previous write to x[2].

      int get_const(int *x) {
        return x[2];
      }
      

      For contrast, the following function does not require such a write:

      void write_const(int *x) {
        x[2] = 0;
      }
      

      You could deduce that a function (probably) has this sort of precondition if:

      • execution failed with an uninitialized read, and
      • the allocation being pointed to was allocated during argument generation.
    2. To Globals

      This case is analogous to the above, but for global variables. In this case, the required write may just be initialization (Crucible doesn't always automatically allocate and initialize all global variables).