C API: jv - troyp/jq GitHub Wiki

jv is jq's internal JSON library. All jv objects are immutable, which is a requirement if you want to implement jq's backtracking while remaining approximately sane.

This means that functions operating on jv values tend to be referentially transparent: you can't pass an empty array to a function and expect it to be filled in when the function returns. If you want a function to return some information, it has to actually return a new object since it can't go modifying its arguments.

This means that some of the API usage will look a little odd. For instance, the functions jv_array_get and jv_array_set can be used to get and set elements of an array. The usage of jv_array_get is fairly standard:

jv elem = jv_array_get(array, 42);

But to use jv_array_set, you have to know that it returns the new array. You can't ignore the return value.

array = jv_array_set(array, 42, elem);

Kinds

The "kind" of a jv value is one of the following, defined in the enum jv_kind:

  • JV_KIND_INVALID
  • JV_KIND_NULL
  • JV_KIND_FALSE
  • JV_KIND_TRUE
  • JV_KIND_NUMBER
  • JV_KIND_STRING
  • JV_KIND_ARRAY
  • JV_KIND_OBJECT

All but the first represent normal JSON values. The next section explains invalid objects. You can check the kind of an object by calling jv_get_kind.

Errors

As well as the normal kinds of JSON values (array, bool, string, etc.), jv supports objects of kind JV_KIND_INVALID. Such objects are used to signal errors. Some of them carry error messages, which may be an arbitrary JSON value (you can check with jv_invalid_has_msg and retrieve it with jv_invalid_get_msg).

Generally, the kind-specific functions like jv_array_get require that their argument be of the correct kind, and trigger an assertion failure aborting the program if not. That is, the program will crash if you pass a string to jv_array_get: you must check that the argument is an array before using this function.

The functions are forgiving as long as the kinds are right. If you call jv_array_get with an out-of-bounds index, then you will get an object of kind JV_KIND_INVALID back. This definitely indicates an invalid index; it is impossible to store an object of kind JV_KIND_INVALID in an array (or in anything else, for that matter).

You may find it more convenient to use the higher-level functions from jv_aux.h, which do more runtime error-checking and are implemented in terms of the primitives in jv.h. For instance, jv_get from jv_aux.h takes a value and an index. If the value is an array and the index is in-bounds, it returns the corresponding element. If the value is an object and the index is a valid string key, it returns the corresponding entry. Otherwise, it returns a JV_KIND_INVALID with a suitable error message.

Memory management

jv refcounts all heap-allocated objects. The usual objection to refcounting is that it fails when objects contain cycles. This is true. Luckily, due to the immutability of jv objects, it's impossible to create a cycle.

This is a pleasant property; as well as getting rid of pointer aliasing (a fertile source of bugs), it also limits us to acyclic heap structures. Since JSON does not support cyclic structures, this means that any jv object can be rendered as JSON.

Most jv functions are said to "consume" their arguments. That is, once you have passed the arguments to the function you may no longer use them and their memory may be reused. For instance, in the jv_array_get example above, it is invalid to use the variable array after that line has executed. If you need to reuse a jv value, you can call jv_copy to get a second copy of it. jv_copy does not consume its argument.

It may seem like jv_copy does a deep copy of the object. It certainly behaves in this way, and if you keep that model in mind when writing jv code you'll get the right answer. However, jv_copy is in fact very cheap, see below for how it works.

You must consume every jv value, otherwise there may be memory leaks (the tests won't pass if so, as they're run under valgrind). If you have nothing else to do with a value, pass it to jv_free, which consumes its argument and does nothing with it.

Implementation

The jv API can be used as though every operation copied the entire object and jv_copy did a deep-copy. That's a useful mental model to program with, but it would be horrendously slow. Instead, jv uses a copy-on-write scheme for all objects.

In the worst case, the jv functions will need to copy their input object. However, most of the time there's no reason to keep the old version around as it will never be used again. In this case, the refcount of the input object will be 1 (only one reference) and the function would have to free it. So, all of the functions that return new version of an object (e.g. jv_array_set) first check whether the refcount is 1. If so, they know they can safely modify the object in-place without allocating any new memory. Thus, most of the time, jv_array_set won't copy anything.

jv_copy is then implemented by increasing the reference count by 1. This means that the object won't be modified by future calls to jv_array_set and the like. Instead, jv_array_set will copy the object and modify that.