On ordering - mlibrary/hydra-prototype GitHub Wiki

Ordering in RDF

As a pure graph oriented formalism, RDF offers no predefined possibility (besides the disputed Bag mechanism) to express order ¹. This article then walked through eight approaches to representing an ordered list of relationships. Part of what makes this interesting is that you can write an ordered sequence, but the actual representation doesn't look anything like the original RDF. For example:

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Seq rdf:about="http://example.org/favourite-fruit">
    <rdf:li rdf:resource="http://example.org/banana"/>
    <rdf:li rdf:resource="http://example.org/apple"/>
    <rdf:li rdf:resource="http://example.org/pear"/>
  </rdf:Seq>
</rdf:RDF>

becomes

<http://example.org/favourite-fruit> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq> .
<http://example.org/favourite-fruit> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <http://example.org/banana> .
<http://example.org/favourite-fruit> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_2> <http://example.org/apple> .
<http://example.org/favourite-fruit> <http://www.w3.org/1999/02/22-rdf-syntax-ns#_3> <http://example.org/pear> .

Historically, an additional challenge is that it's complicated to construct SPARQL queries to work with scenarios where you may or may not have a sequence of items. Unlike, say, XPath, where queries can handily work across multiple axis. Does this issue matter for our use of Fedora?

Anyway, in Hydra, the practice is to flatten the relationship between objects (e.g. an object can have multiple member objects), and then use link relations to set up an alternative path: what is the first child member? Then from that, the next, etc. Hydra employes proxy objects here, to avoid tightly coupling an object and its members. This works great for book/pages, or item/works, etc. but might be overly complicated for item has multiple authors.

Ordering in Fedora...

Let's say you load the following set of triples into Fedora as the resource /objects/mbook1, specifying multiple authors for dc:creator using RDF Collections:

<> a <http://pcdm.org/models#Object>,
     <http://projecthydra.org/works/models#GenericWork>;
   <http://purl.org/dc/terms/title> "The Murders in the Rue Morgue";
   <http://purl.org/dc/terms/creator> ( "Poe, Edgar Allan" "Bradbury, Ray" );
   <info:fedora/fedora-system:def/model#hasModel> "BibliographicWork" .

Fedora parses that into the object graph:

</objects/mbook1>
- dc:title: "The Murders in the Rue Morgue"
- dc:creator: <.well-known/genid/6b8f40d1-5838-4015-9313-2a5baed4f939>
<.well-known/genid/6b8f40d1-5838-4015-9313-2a5baed4f939>
- rdf:first: "Poe, Edgar Allan"
- rdf:rest: </.well-known/genid/4ff4b71d-d175-44a6-8132-c88ff69e88bd>
</.well-known/genid/4ff4b71d-d175-44a6-8132-c88ff69e88bd>
- rdf:first: "Bradbury, Ray"
- rdf:rest: rdf:nil

When </objects/mbook1> is fetched (via REST), the response includes the anonymous properties that make up the "collection" of authors.

If you use rdf:Seq, you load

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
<> a <http://pcdm.org/models#Object>,
     <http://projecthydra.org/works/models#GenericWork>;
   <http://purl.org/dc/terms/title> "The Pit and the Pendulum";
   <http://purl.org/dc/terms/creator> [
    a rdf:Set;
    rdf:_1 "Poe, Edgar Allan";
    rdf:_2 "Bradbury, Ray"
   ];
   <info:fedora/fedora-system:def/model#hasModel> "BibliographicWork" .

which creates:

</objects/mbook2>
- dc:title: "The Pit and the Pendulum"
- dc:creator: <.well-known/genid/0e8da192-9c00-4b02-9886-db02a0a96a29>
<.well-known/genid/0e8da192-9c00-4b02-9886-db02a0a96a29>
- rdf:_1: "Poe, Edgar Allan"
- rdf:_2: "Bradbury, Ray"

Note that you have to use the rdf:_N syntax; loading data marked up as rdf:li results in an unordered list.

Ordering in ActiveTriples/ActiveFedora

If you decide to do this in your model:

class BibliographicWork < ActiveFedora::Base
  include Hydra::Works::GenericWorkBehavior
  property :author, predicate: ::RDF::DC.creator, multiple: true

and then in your code:

work.author = [ 'Poe, Edgar Allan', 'Bradbury, Ray' ]

What's actually persisted is the not-guaranteed-to-be-in-any-order multiple values for dc:creator (in my test, the authors end up sorted as Bradbury, Poe).

Ordering in indirect containers

Indirect container properties (see: aggregates :members) also provides an ordered property (e.g. ordered_members) which creates order proxies in the http://www.iana.org/assignments/link-relations/ namespace.

The ordered list of members is not automatically populated by using obj.members << other_obj. The ordered property has to be explicitly used: obj.ordered_members << other_obj.

The basic container (e.g. Collection) then gets an iana:first and iana:last pointing to the proxy objects in the indirect container (e.g. members). The proxy object gets iana:next and iana:prev as appropriate.

Of note is that this doesn't seem to discriminate between which container is used --- in my Bag model, I have both the members indirect container supplied by Hydra::Works::CollectionBehavior, and a works container that uses an app-defined relationship (http://lib.umich.edu/models#hasWork). In one bag (bag), I set ordered_works. Now, while bag.members returns a different list from bag.works, bag.ordered_members and bag.ordered_works return the same list (the items in bag.works).

This implies that a pcdm:Object can only support one ordered relationship. (RDF has always been lousy with ordered sequences, so I don't know how much this matters.)

Representing Order in RDF