Tough modelling choices - UICrail/SemanticRSM GitHub Wiki

Here, we collect some common modelling choices to which there is no definite answer, just hints, pre-guidelines, so to say.

Properties vs. Subclasses

There are railway vehicles, all of class VEHICLE. Vehicles:

  • may carry passengers, freight, other payloads, or no payload at all: 2^3 = 8 possible combinations;
  • may be powered or not: 2 options that are independent of the above.

Bottom line, you are faced with 2^4 = 16 combinations of properties.

Traditionally, some combinations are associated with a name, for instance:

  • no payload + powered = locomotive (EN, FR) or Lokomotive (DE);
  • passengers + powered = railcar (EN), autorail (FR) and sometimes Schienenbus (DE) if it runs on two axles, all compound terms using a name borrowed from a road vehicle and implying a single carbody, assigned to circulation on rails;
  • passengers + no power = coach, or passenger carriage, or sometimes trailer (EN), voiture or sometimes remorque (FR), Personenwagen (DE);
  • freight + no power = wagon (EN, FR), but Güterwagen (DE) and carro merci (IT), the latter two actually being compound names (vehicle + type of load).

When such combinations have a name, this suggests that people (professionals, users, trainspotters) would also think in these categories. Defining classes that reflect such known combinations helps reading and using the model. This is the mechanism dubbed "attributes make new subclasses" as found in Andreas Thalhammer's publication, 2021. Recently, the term folksonomy has been coined for designating taxonomies built under an informal tagging process.

A mechanism is not a magic recipe. Some polishing of the vocabulary (such as the preferred label for each class) will always be needed, as the example shows.

Given the number of possible combinations, creating and naming each resulting class may add many pages to the dictionary and confuse users. Please consider that we did not even mention vehicle combinations: "multiple units", what a strange term by the way. And we also did not mention the power supply variants. For instance, SNCF went creative by calling multiple units that are both electric and diesel "amphibious", a tribute to endangered salamanders.

So yes, an option would be to define subclasses of VEHICLE when there are common terms to name them, because that's how professionals talk and, ultimately, think. This means that vehicles sporting other combinations of properties might be represented by plain instances of VEHICLE, plus the properties of interest. Another possibility is using and instance of a subclass and add the property, such as "a coach that has a driving desk" that may or may not be important enough for being identified as a subclass. For SNCF, coaches with driving desks were important enough, apparently, to get their own designation "remorque de réversibilité".

From a modelling perspective, there is a lack of homogeneity - so what? models have no "rules". The model maintainability is not affected (you can still add as many properties you want at VEHICLE or at subclass level without risking an oversight). Machines will be able to exploit its instances: it is easy to assert, once and for all, that "X is a coach entails that X can carry passengers", and "X is a vehicle with passengers and no power entails that X is a coach"; SHACL shapes may enforce the latter rule at data provision level, so data homogeneity is maintained.

Whether such lack of homogeneity would affect, positively or negatively, execution time or memory footprint is a premature question - we are at platform-independent level here. And in this particular case, the answer will heavily depend on use cases, so why bother?