Filtering resources - adewg/ICAR GitHub Wiki

Context

Many use cases only need a specific subset of a resource. For example, when querying the animals endpoint in the registration API, you may be interested in only the alive animals, the ones that were on the farm in a specific period, or maybe the ones that require attention. Some filters are quite common, others are very use-case specific. Selecting on a date range is very common. Not all filters are as easy to implement. For example: the 'requires attention' filter may be easy to implement for servers that work closely with a farmer and have task-lists etc implemented, but not so much for basic registration systems.

Providing proper filters on an endpoint may reduce the load on the server (since it can deliver less data) but only if that filter is easily added / readily available for that server. A client can always filter out the animals it needs for a specific use case, provided that it has the information need to do so. As such, within ICAR ADE, filters are not compulsory. Instead, the standard focuses on creating common names for possible filters. However, a client cannot depend on the data source implementing that filter. As such, a client should expect the possibility of more data being delivered than one would expect based on the filter parameters. Common filters, specifically the ones that filter on required data in the message, should be easily implemented and as such are recommended for any data source.

(As a side note: discovery of which filters are available for a data source is a topic we are investigating.)

ICAR ADE standard

The ADE standard thus states:

  • a server MAY implement any filter it deems relevant for an endpoint
  • a server SHOULD implement RECOMMENDED filters
  • whenever a server implements a filter, it MUST use the naming conventions as provided by the standard

When no naming convention exists for a filter, a server can choose its own name, keeping the back- and forward compatibility rules in mind:

Consider if your filter is potentially standardisable. If other regions or vendors may require something similar, define it in such a way that it may become part of the standard at one time. By doing this, no technical changes may be required by the time a new version comes out which includes this filter. If your filter is not potentially standardisable, then make sure that the name you choose does not potentially conflict with future fields. E.g., use a prefix for your region or company name. (See also Backward and forward compatibility ).

Naming conventions

Field based filters

In many cases, a filter parameter relates to a specific field in the message. In those cases, simply name the filter the same as the field name. For example, using the animals endpoint again: expected filters can be gender=Female or specie=Buffalo. This would filter all animals based on those constraints. Note that these act as an AND operation: if both the gender and specie filter are specified, only female buffalo's will be returned. If the same field is specified more than once, e.g. specie=Cow and specie=Buffalo, this is interpreted as an OR (for that field). The field name is assumed to be located directly beneath the member field (the wrapper object found in most messages). If the field is nested deeper, the parent fields should be included in the filter, concatenated with a "-". E.g., for the treatment-programs endpoint, you could specify a filter diagnoses-name.

In other cases, you may need to be able to filter on a range. For example, all animals born in a specific period. Here, we can again use the field name (birthDate) and use a suffix indicating the beginning and/or ending of the period. birthDate-from=2020-01-01 and birthDate-to=2020-02-01 should give you all animals born in January 2020. Note that the from is inclusive and the to is exclusive (an other way look at this is to expand the birthDate to a timestamp, filling out the time part with 00:00:00).

Composite fields

Fields like the animal id are composite fields: they are composed of an id and a scheme that allows us to support different regions and countries: "animal": { "id": "NL 877034232", "scheme": "nl-v1" }. To create a filter for this, we simply concatenate the subfields and create two parameters: animal-id="NL 877034232" and animal-scheme="nl-v1". As both parameters work as an AND operation (see above), this will select precisely the animal within the dutch numbering scheme. It is not recommended to allow just one of these parameters to be used without the other: while selecting all animals with any dutch id is fine (although a very unlikely use case), selecting animals only on their id is ambiguous.

Similarly this happens with fields with units. For example, a milking visit duration is specified as: "milkingVisitDuration": { "value": 349, "unitCode": "SEC" }. A range filter for this would be: milkingVisitDuration-value.from=60 & milkingVisitDuration-unitCode.from=SEC to select all visits equal to or longer than a minute. The To/From postfix is appended to both the value and the unitCode field to keeps things generic.

Specialized filters

In some cases, there is no direct field which can be used to implement a filter. For example, if you want to known which animals are available at a location at a specific point in time is derived data: you need to listen to all arrivals and departures. If the server has implemented this using events, then even for the server it may not be trivial to answer. If the server does want to support a filter for this use case, the ADE standard will provide recommended names for them.

Synchronization filters

A client may need to synchronise its state with a server. For that it can be necessary to track changes from a specific source from a specific point in time. As such, filters like 'meta-modified-from' and 'meta-source' are RECOMMENDED.

Fuzzy matching filters

There may be use cases for fuzzy matching on e.g. descriptions or names (e.g., "all animal id's that start with an 'N'"). We could consider adding a suffix for these use cases (e.g. animal.name.match). As of this point, we do not have enough hard use cases to make a recommendation for this. Following our design principles, we leave fuzzy matching out of the specification for now until we have a need for it.

Recommended filter name list

We expect the filter list to grow based on use cases. As such, to be able to track changes and have a proper review process, we should have markdown file in git's version management. For now, I've collected some of the proposed filters here:

Field based filters

Resource Filter Description
meta meta-source select a specific source
meta-modified-from & meta-modified-to select a range for the modified timestamp
meta-created-from & meta-created-to select a range for the created timestamp
meta-creator select the creator of the record
meta-validFrom & meta-validTo select a range in which the record is valid
deprecated start-date-time prefer to use meta-modified-from
end-date-time prefer to use meta-modified-to
common animal-id + animal-scheme select a specific animal
location-id + location-scheme select a specific location
milking-visits milkingStartingDateTime-from range filter based on the milking start time
milkingStartingDateTime-to range filter based on the milking start time
milkingVisitDuration-value-from & milkingVisitDuration-unitCode-from range filter based on the milking visit duration
milkingVisitDuration-value-to & milkingVisitDuration-unitCode-to range filter based on the milking visit duration
milkingDuration-value-from & milkingDuration-unitCode-from range filter based on the milking duration
milkingDuration-value-to & milkingDuration-unitCode-to range filter based on the milking duration
milkingType filters on milking type
...
quarterMilkings-icarQuarterId filters on milking visits that have a recording for a specific quarter

Related topics

Pagination

In our collections definition, we allow for pagination based on JSON-LD. Typically, the pagination is driven by the server: previous/next pages are determined by URI's as provided in the view object in the message. Optionally, a client could steer the pagination by providing query parameters. Typically, these query parameters are named similar to the fields in the view object (e.g. page=2). We do not anticipate a naming clash for these query parameters but we will have to be careful if we introduce fields in the actual payload that are similar to those in the view object.

Other considered options

We also discussed the OData standard. This was rejected since it feels like a heavy burden on the implementation server side. Also, since it allows for complex queries, it may put a heavy load on the server which is hard to manage by syntax only. Simply using OData syntax in a limited way felt like misleading since we would not support the full standard.

An alternative towards these filters would be to allow querying by example. This offers a powerful and relatively easy way to search. We may consider defining this functionality at a later stage at a separate end-point (e.g. ../search). This way, interested parties can simply publish this endpoint while others can stick with the current proposal.

Note that parties are still free to add OData or search support, as long as it fits the guidelines as set out at Backward and forward compatibility.

For more on the in-depth discussion, see the original ticket issue 130.