CI Properties and Dynamic Fields - bschmalhofer/otobo-ideas GitHub Wiki
Introduction
In OTOBO, entities like tickets, articles, customers, and customer users can hold a ordered flat list of properties. A property has a name, a type, a value, and maybe additional info like a display name. An ordered list is either a special case of an ordered tree or of a JSON-like data structure. In this context a single property is referred to as a dynamic field. Let's call the list of properties a ticketing property tree.
Configuration items can have nested data structure that contains properties. The differences to the afore mentioned property lists are:
- different type declaration
- the ordered tree can have nesting depth greater one
- different representation in the database
Let's call that data structure a CI property tree.
It would be nice to merge the technical base for ticketing property trees and CI property trees.
Implementation as of OTOBO 10.1.x
xml_storage
.
Storage in the table The table xml_storage
and the Perl module Kernel::System::XML
are used for storing the stats configuration
and the CI property trees. The stored data structure is a XMLHash
.
A XMLHash
is a list of undefs and hashrefs. The values of the hashrefs are either strings or other XMLHashes. undefs are used
primarily as the first element of a XMLHash in order to allow for indexes starting at 1. Thus XMLHashes are like ordered trees.
They would be real ordered trees if the keys holding an arrayref were really sub elements.
Note: The data stored in xml_storage
is more JSON-like than XML-like.
The table attribute xml_storage.xml_content_key
holds a path to the value in a nested Perl data structure. Searching is done
with LIKE
searches. There is no indexing of specific content_keys.
CI Property tree in XML storage
A subset of XMLHashes where only the key Content has a string value. For storing CI annotation we need a reference to the root and for each property:
- name
- type
- reference to the parent or materialized path or an interval
- maybe payload as key value pairs
- sequence id per name an parent
TODO: Does it make sense to switch to DAGs? Might be useful for versioning config items, where there are usually only small changes.
Schema of CI property tree
Currently the description for CI types is homegrown. It contains everything needed for consistency checking, creation, editing, search. Notably, is declared the order in which properties are displayed.
Storage of dynamic fields
The tables dynamic_field
and dynamic_field_value
are used for that.
Proposal for changing the underlying data storage
This chapter is about changing the underlying data storage that supports CI property trees and ticketing property trees. These are the goals:
- be 100 % feature compliant (no new features yet)
- provide a migration path
- use standards, specifically JSON-related standards
- unify CI property trees and Ticketing property trees
- allow to be performant
- allow easy addition of new features
The general idea is to keep the basic aproach of xml_storage
. Property sets are declared with JSON Schema.
Storage of specific data types will be supported.
json_storage
New Database Table There will be a new table called json_storage
. A replacement for xml_storage
and dynamic_field_value
.
The table attributes are:
object_type
: likexml_storage.xml_type
with new possible values for the ticketing properties treeobject_key
: likexml_storage.xml_key
foreign key to the table indicated by object_typepointer
: likexml_storage.xml_content_key
but a JSON Pointer instead of Dumper-like pathvalue_text, value_data, value_int
: likexml_storage.xml_content_value
but more specific, just like indynamic_field_value
.version
: replacement for the first level in the old xml_storage ???is_latest
: mark what the latest version is ???
For CI property trees all content will be stored in value_text
so there will be no migration hindrances.
Ticketing property trees will use the specific types.
The TagKey entries from xml_storage
are not needed and can be discarded. Thus there no need to extra Content and we do away
with the dummy undef
element as the first element of a list.
Searching in json_storage
is initially done with the same strategy as was used for xml_storage
.
json_schema
New Database Table Contains JSON Schema. Replacement for configitem_definition
and dynamic_field
.
Actually this is not plain JSON Schema definitions, but schemas with additional vocabulary. This includes the required information from the old tables and the old CI field declaration. Among that are the indicators whether properties are searchable and whether changes lead to new versions. It would be nice to base the vocabulary on an existing approach.
eliminate the dynamic field tables
This is done by switching to the new tables
enhance the test suite
create benchmarks
Proposal for Enhancements
allow nesting in ticketing property trees
support the dynamic field types in CI property trees
Using libraries for generating the UI
Something like:
- JSON Forms https://jsonforms.io
- https://ui-schema.bemit.codes/
- one of https://json-schema.org/implementations.html#web-ui-generation.
JSON-LD
- add mapping to fully qualified names to the JSON schema
- emit machine readable JSON-LD in the emitted content
- activate features like providing a map for addresses
avoid LIKE in searches
This will likely involve the addition of generated fields based on json_storage.pointer
xml_storage
altogether, by storing the stats configuration as a simple JSON field
Remove the table Optimize the searching only after doing benchmarks.
Support for includes in JSON schemas
Seems to be supported: https://json-schema.org/understanding-json-schema/structuring.html
Excursion: How to store an ordered tree in the database
Note: This discusses the storage of ordered trees, not JSON-like data structures.
Storage in database tables
The three standard approaches are:
- adjacency list
- materialized path
- nested set
These are nicely explained in https://railsware.com/blog/storing-tree-structures-in-the-rdbms/.
Other options are:
- nested intervals, see https://arxiv.org/pdf/cs/0402051.pdf
- matrix encoding, see https://vadimtropashko.files.wordpress.com/2011/07/ch5.pdf
There are also:
- closure tables
- PostgreSQL has ltree for supporting materialized paths
The best approach for general trees might recursive common table expressions. See https://mariadb.com/kb/en/recursive-common-table-expressions-overview/. Is there a CPAN tree module that uses recursive CTEs ? https://metacpan.org/pod/SQL::Abstract::More does provide with_recursive()
.
External Storage in NoSQL-databases
Yet another option: Using MariaDB CONNECT for connecting to MongoDB.
Storage as XML
Oracle has XMLType, MySQL and PostgreSQL might have analogues.
Storage as JSON or JSON-B
The data structure for ordered trees can be dumped as JSON and stored in the database. MySQL and PostgreSQL have good support for JSON, but currently no good way of indexing.
Generated fields
Values from the stored JSON can be extracted into a generated fields which can be indexed.
JSON-LD
Linked data, https://json-ld.org/, is the preferred approach.
Reading List
- https://arxiv.org/pdf/cs/0402051.pdf
- https://stackoverflow.com/questions/192220/what-is-the-most-efficient-elegant-way-to-parse-a-flat-table-into-a-tree
- http://ceur-ws.org/Vol-1145/paper8.pdf
- https://www.essentialsql.com/introduction-common-table-expressions-ctes/
- https://learnsql.com/blog/do-it-in-sql-recursive-tree-traversal/
- https://www.sitepoint.com/hierarchical-data-database/
- https://www.sitepoint.com/hierarchical-data-database-2/
- https://db.in.tum.de/~teubnerj/publications/diss.pdf
- https://en.wikipedia.org/wiki/FLWOR
- https://medium.com/@spybugg/storing-and-retrieving-tree-structures-in-relational-databases-using-python-django-7480f40c24b
- https://www.quora.com/Why-not-use-relational-databases-to-store-JSON-data-like-a-primary-key-field-and-a-BLOB-field-for-JSON-data-in-a-MySQL-table-instead-of-using-NoSQL-databases-MongoDB-etc
- https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.681.3347&rep=rep1&type=pdf&utm_source=pocket_mylist
- https://railsware.com/blog/storing-tree-structures-in-the-rdbms/?utm_source=pocket_mylist
- https://softeng.oicr.on.ca/junjun_zhang/2017/09/28/dag-db-with-version-control-in-elasticsearch/
- https://dl.gi.de/handle/20.500.12116/13890
- https://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=9024876&fileOId=9024877
- https://mariadb.com/kb/en/oqgraph-overview/
- https://docs.google.com/presentation/d/1bXyuDK6VWH4_ZntTK-ZZ_Svsb37GgKVMKchwVTpN9RI/pub?start=false&loop=false#slide=id.i99
Relevant CPAN modules
- https://metacpan.org/pod/JSON::Schema::Modern according to https://www.reddit.com/r/perl/comments/vc0ig8/small_survey_of_json_schema_modules/
- https://metacpan.org/pod/DBIx::Tree
- https://metacpan.org/dist/DBIx-Class-Tree-Mobius/view/lib/DBIx/Class/Tree/Mobius.pm
- https://metacpan.org/pod/Tree::DAG_Node::Persist
- https://metacpan.org/pod/Parse::Taxonomy
- https://metacpan.org/pod/DBIx::Tree::MaterializedPath