The Gory Details - mprast/immutable-staging GitHub Wiki

Our task is to write a proxy that simulates an object while intercepting and squirreling away all writes to that object. The mention of intercepting immediately suggests that we should use a object proxy, and luckily for us those are part of ES6:

const writeCache = {};
return new Proxy(object, {
    set(target, property, value, receiver) {
        writeCache[property] = value;
        return true;
    }
});

Since we don't actually set the property on the object in the call to set(), the object is never mutated. Of course, when we set a property on a real object, we get the new value back when we read the property later, so we need to add a get() that pulls from the cache if possible:

return new Proxy(object, {
    get(target, property, receiver) {
        return writeCache[property] || target[property]
    },
    set ...
});

ownKeys(), has(), and getOwnPropertyDescriptor() are also necessary to get parity with regular objects, and all can be designed using a similar approach. So far, so good - at least for scalar properties. Nested objects give us some trouble, as using the current approach we can mutate a nested property on our proxy simply by doing

myProxy.nestedProp.thing = 'something new'

In fact, because we haven't actually assigned to myProxy.nestedProp, our writeCache will be empty after the above operation, which violates our invariant. Clearly our nested object needs a proxy of its own. To accomplish this we can create a factory method and call that recursively, like so:

function buildProxy(writeCache, object) {
    return new Proxy(object, {
        get(target, property, receiver) {
            if (property in writeCache) {
                return writeCache[property];
            } else if (typeof target[property] === 'object') {
                return buildProxy(writeCache, target[property]);
            } else {
                return target[property];
            }
        },
        set ...
    });
}

Two things to call out here - first, we pass the writeCache down via the recursive call, so all writes to every part of the object tree will be stored in the same data structure. Second, if the property is in writeCache, we return its value, even if it's a nested object. To see why this makes sense, imagine we did something like myProxy.nestedProp = {somethingNew: "I'm new!"}. Now nestedProp contains a new object - and since we just created that object, there's no way it can be a part of the original object tree. Therefore we can modify it with impunity - we don't need to store any of those updates in the writeCache. Therefore, no proxy.

Quick side note - there's an edge case we have to consider here, and that's assigning parts of the object tree
to other parts of the object tree, as in myProxy.nestedProp2 = myProxy.nestedProp. In this case nestedProp2 does refer to something in the object graph, but things work out because nestedProp is stored in the writeCache still wrapped in its proxy. The only adjustment we need to make is to remove the proxy from the object when applying the writes later on.

You may have guessed what the next problem is - we key our cache based on the properties that our proxies give us during get and set calls, but the proxies don't have any knowledge of the rest of the object graph. This means that myProxy.prop1 and myProxy.nestedProp.prop1 will both look for prop1 in the write cache. If you noticed the problem, you probably notice the solution as well - pass a prefix down through the factory method and use that to qualify the property names:

function buildProxy(writeCache, object, prefix) {
    return new Proxy(object, {
        get(target, property, receiver) {
            const qualifiedProp = prefix + '.' + property;
            if (qualifiedProp in writeCache) {
                return writeCache[qualifiedProp];
            } else if (typeof target[property] === 'object') {
                return buildProxy(writeCache, target[property], qualifiedProp);
            } else {
                return target[property];
            }
        },
        set ...
    });
}

Cool, now the keys are unique. At this point we've got enough sketched out to start running test cases, at which point we encounter our next challenge - arrays.

It would be great for our purposes if arrays in javascript were modeled entirely in terms of objects - and they almost are. Indexed properties are simply keyed by their number, so if myArray is in a proxy myArray[1] would be intercepted by get(). That's great; it means a lot of our work so far carries over. However, there are tricky things that the runtime does to update the length property of an array when the array is modified. Since it's a property, length is not dynamic, and since we're intercepting writes to the array the runtime will never actually update the length property for us; we have to do it ourselves, whenever we set a property on the array:

function buildProxy(writeCache, object, prefix) {
    return new Proxy(object, {
        get ... ,
        set(target, property, value, receiver) {
            writeCache[prefix + '.' + property] = value;
         
            if (Array.isArray(target)) {
                // guarantees that length is updated correctly when we set an 
                // integral property on the array, and that the correct elements 
                // are removed when we set the length of the array.
                maintainArrayInvariant(writeCache, target, property, value);
            }        

            return true;
        }
    });
}

This is enough to simulate the correct behavior for arrays when we use simple bracket syntax to read from and write to our proxies. But there's another way to interact with arrays, and that's to use the utility functions on Array.prototype. What happens when we want to use push() or filter()? This problem is not unique to arrays, of course, so it's worth solving. What we want to do is capture writes to this from within the body of any function. bind() comes to the rescue:

function buildProxy(writeCache, object, prefix) {
    return new Proxy(object, {
        get(target, property, receiver) {
            const qualifiedProp = prefix + '.' + property;
            if (qualifiedProp in writeCache) {
                return writeCache[qualifiedProp];
            } else if (typeof target[property] === 'object') {
                return buildProxy(writeCache, target[property], qualifiedProp);
            } else if (typeof target[property] === 'function') {
                // 'receiver is set to the proxy that received the message - *not* 
                // the wrapped object.
                return target[property].bind(receiver);
            } else {
                return target[property];
            }
        },
        set ...
    });
}

Now we have a full-featured solution that will capture writes while preserving normal object behavior...at least for trees of objects. This won't work for object DAGs though. Consider the case in which we have something like

const nested = {message: "I am the tip of the diamond"}
const dag = {
    referenceOne: nested,
    referenceTwo: nested
}

Let's say we then intercept a write of the form dag.referenceOne.message = "I'm new!". Then we'll store that write in the cache under the key dag.referenceOne.message. This means that when the time comes to apply the writes and create a new object graph, we'll know that we need to make a new object {message: "I'm new!"} and make sure that dag.referenceOne points to it in the final graph. However, dag.referenceTwo also needs to point to this new object, but we have no way of knowing that from the write cache. If we wanted to keep using qualified prop names to key the cache, the key for a particular write would have to be an array of all the qualified names that refer to the written property. We'd end up needing to do a lot of object reference comparisons to figure out whether to put two property names in the same key. We can accomplish the same thing a bit more cleanly by abandoning string keys and keying on object references themselves, using a WeakMap.

The idea behind this new setup is that every object has a diff stored in the WeakMap under its object reference (objects with empty diffs aren't included in the map at all). So going back to our dag object above - if our writes are

dag.referenceOne.message = "I'm new!";
dag.newKey = 123;

And if r1 and r2 === dag.referenceOne and dag, respectively, we can expect writeCache[r1] to be {message: "I'm new!"} and writeCache[r2] to be {newKey: 123}. Notice that writeCache[r1] does not contain referenceOne, even though that property will contain a new value in the new object graph - we already know to change it since dag.referenceOne is a key in writeCache. As before, we only track writes to objects in the original graph, so if we did

dag.referenceOne = {};
dag.referenceOne.newMessage = "I'm new!";
dag.newKey = 123;

writeCache[r2] would be {newKey: 123, referenceOne: {newMessage: "I'm new!"}}, and writeCache[r1] wouldn't exist.

The essential difficulty in supporting DAGs is that nodes may have multiple parents, and keying on the object reference makes sense since it's the only identifier guaranteed to be consistent across all parents. There's a piece we haven't filled in yet (the last one; we're almost there!) and that's using the captured writes to construct a new object graph from the old. The objects we want to change are the keys of our writeCache, the challenge is finding them in the graph. The simplest solution is just to walk the tree (although there's probably a more efficient way), applying the patches as we go. Important to note here that the object tree may change as a result of applying the patches (i.e. if we added an edge from one node to another), so we want to make sure to apply the patch before we recurse. Now we have a good start:

function applyWrites(object, writeCache) {
    const patched = Object.assign({}, object, writeCache[object]);
    Object.getOwnPropertyNames(patched).forEach((property) => {
        if (typeof patched[property] === "object") {
            patched[property] = applyWrites(patched[property], writeCache);
        }
    });
    return patched;
}

Two final modifications - 1) applyWrites is a pure function, so we should memoize it, and 2) to achieve correct behavior we need to only duplicate an object if it actually changes. So now we have:

function applyWrites(object, writeCache, applied) {
    const objectPatch = writeCache[object] || {};
    const patched = Object.assign({}, object, objectPatch);

    Object.getOwnPropertyNames(patched).forEach((property) => {
        if (typeof patched[property] === "object") {
            if (!(patched[property] in applied)) {
               applyWrites(patched[property], writeCache, applied)
            }

            if (patched[property] !== applied[patched[property]]) {
                objectPatch[property] = applied[patched[property]];
            }
        }
    });

    const retObj;
    if (Object.keys(objectPatch).length === 0){
        retObj = object;
    } else {
        retObj = Object.assign({}, object, objectPatch);
    }

    applied[object] = retObj;
    return retObj;
}

Apart from some errata around things like array handling, this is the immutable-staging source as it stands.