Research Idea Implementation Progress - roboguy13/suslik GitHub Wiki

Symbol Meaning
:book: Ongoing research
:heavy_check_mark: Implemented
:x: Do not implement
Section Status
Develop approach for fusion in Suslik repair :book: Ongoing research

Setting the stage for fusion & deforestation in Suslik program repair

Fusion is a technique for eliminating intermediate data structures, particularly those which are consumed immediately after they are generated.

Description of the problem

Consider a Suslik function

void f(loc x)
  { A(x) }
  { B(x) }
{
  ...
}

which we want to update to use new data structures A' and B':

void f'(loc x)
  { A'(x) }
  { B'(x) }
{
  ?
}

where we have the four functions:

upgrade_A : A -> A'      to_A : A' -> A
upgrade_B : B -> B'      to_B : B' -> B

such that the following two triangles both commute (this is an example of the concept of a section and retraction):

       upgrade_A                upgrade_B
        A ---> A'                B ---> B'
         \     | to_A             \     | to_B
       id \    v                id \    v
           `-> A                    `-> B

that is, these two equations hold:

     to_A . upgrade_A = id       to_B . upgrade_B = id

Note that the new data structures will often have new fields, which do not have a equivalent in the old data structure. This is why these function pairs are not isomorphisms and instead satisfy the weaker condition given above.

Given this, we can provide a naive implementation of f':

void f'(loc x)
  { A'(x) }
  { B'(x) }
{
  to_A_M(x);
  f(x);
  upgrade_B_M(x);
}

(Where the *_M versions of the functions mutate the value at the memory location that their argument points to. These impure procedures are for demonstration purposes, here.)

This may not be the most efficient way, however. Among the possible reasons:

  1. Perhaps the to_A_M call is never needed and f also works with the original A structure.
  2. Maybe only some of the transformation performed by the to_A_M call is needed by f.
  3. Maybe part of the updates performed by upgrade_B_M are already performed by f. Also, maybe this is the case in one branch of an if, but not the other branch.
  4. Maybe all of the appropriate updates are performed by f and upgrade_B_M is not needed at all.

The approach often taken for fusion is to use a system of rewrite rules. This cannot be easily applied to Suslik in its original form, since such rewrite rules are usually expression-based and assume purity. However, rewrites could be applied to monadic expressions and there is a potential straightforward transformation from Suslik code into a monadic representation.

Then, rewrite rules can be expressed in terms of this monadic representation. This is similar to the approach taken in the paper by Mark Grebe, Andy Gill and me (David) on rewriting a shallow DSL (in particular, see the last rewrite rule at the end of page 5).

A simple monadic representation of Suslik code

Consider the example

void swap(loc x, loc y)
{
  let a2 = *x;
  let b2 = *y;
  *y = a2;
  *x = b2;
}

This can be translated to

swapM x y = do
  a2 <- read x
  b2 <- read y
  write y a2
  write x b2

    |
    | Desugaring to (>>=)
    |
    v

swapM x y =
  read x >>= \a2 ->
  read y >>= \b2 ->
  write y a2 >>= \() ->
  write x b2

It is worth noting that this does not, in its current form, track the separation logic aspect (pre-/post-conditions, etc). It is possible that it could be extended to keep track of this. This could be useful so that we can avoid duplicating work (if the new data structure has some constraint which we know is already solved due to some pre-/post-condition in some relevant part of the code, we don't need to try to maintain the constraint again).

Some rewrite rules

To prepare to fuse, we need to get applications of a pair of conversion functions to be "close together." Also, note that without any fusion specific to the data structures involved, we can eliminate any to_A(upgrade_A(x)) and simply replace it with x (see the previously mentioned commuting triangles).

Generally, we will want to move to_A outwards and move upgrade_A inwards, such as with these rewrite rules:

  if c then to_A(x) else to_A(y)
    ==>
  to_A (if c then x else y)

  upgrade_A (if c then x else y)
    ==>
  if c then upgrade_A(x) else upgrade_A(y)

The interaction between (>>=) and these two conversion functions is also significantly important. We can take the approach described in the previously mentioned paper by Mark, Andy and me (David) (see the end of page 5), and use this rewrite rule:

  fmap upgrade_A x >>= k
    ==>
  x >>= (k . upgrade_A)

A corresponding rule for to_A will need to be formulated as well.

In addition to the previously mentioned paper, also see this paper by me (David), Mark and Andy and the corresponding talk for an example of a (non-monadic, in this case) transformation of an EDSL where a function similar to to_A gets moved in one way and a function similar to upgrade_A gets moved in a different way.

To give a toy, but concrete, example of these transformations being applied consider the pseudo-Suslik code fragment (note that upgrade_A and to_A are pure functions, and therefore cannot be directly implemented in Suslik in the exact form given here)

let y = upgrade_A(*x);
*z = to_A(y);
...

This translates to the following monadic expression (using the above monadic representation), where k represents the translation of the code in the ...:

fmap upgrade_A (read x) >>= (\y ->
write z (to_A y) >>= \_ ->
k)

The rewrite rules can be applied to this as follows:

fmap upgrade_A (read x) >>= (\y -> write z (to_A y) >> k)

  ==>   { Bind rule for upgrade_A }
read x >>= (\y -> write z (to_A (upgrade_A y)) >> k)

  ==>   { "to_A . upgrade_A = id" }
read x >>= (\y -> write z y >> k)

The SuSLik code fragment corresponding to the result of these rewrites is

let y = *x;
*z = y;
...

Observe that the calls to the data structure conversion functions have been eliminated.