Tutorial - abstractfactory/openmetadata GitHub Wiki

Let's get started with Open Metadata!

Open Metadata is a library for associating metadata with directories on your file-system. The association is made by storing hidden files inside of the folder you associate with; also known as sidecar-files.

A command-line tour

Once installed, we can do this:

$ cd marcus
$ openmetadata write height --value=1.87

Here, marcus has been associated with height. We can also store nested entries.

$ openmetadata write address/street --value="Blackwall Way"

Then we can read it back in:

$ openmetadata read height
1.87
$ openmetadata read address
['street']
$ openmetadata read address/street
Blackwall Way

Alternatively, you can use the short hand om

$ om read height
1.87

Python can do better

Cool, we've established some information about this particular user using the command-line interface, or CLI. The CLI is running a Python library in the background and is really good for learning and debugging; but if we want to make full use of Open Metadata we should probably get a glimpse into the library ourselves.

Here is what the equivalent Python code to the above might look like:

>>> import os
>>> import openmetadata
>>> cwd = os.getcwd()
>>> openmetadata.write(cwd, 'height', value=1.87)
>>> openmetadata.write(cwd, 'address/street', value="Blackwall Way)

write() then has the equivalent read() with which to read metadata from disk.

>>> openmetadata.read(cwd, 'height')
1.87

Going lower-level

Great! Let's keep digging. The function write() you see here is a convenience method wrapping around some lower-level objects called Location and Entry. Location represents any plain absolute path to a folder on disk. We use it rather than a typical string because it can also store parent/child relationships, and all metadata must have a parent.

Entry on the other hand represents the actual metadata and as you might have guessed, every Entry must have a location. Here is what the equivalent code would look like with these object as opposed to read() and write()

>>> import os
>>> import openmetadata
>>> cwd = os.getcwd()
>>> location = openmetadata.Location(cwd)
>>> height = openmetadata.Entry('height', value=1.87, parent=location)

A delayed flush

At this point, we've selected a directory on disk - the current working directory - and we've associated some metadata to it. However, no data has yet been written. This is so that we can continue adding entries to this location and delay physically writing anything for as long as possible.

>>> width = openmetadata.Entry('width', value=10, parent=location)
>>> depth = openmetadata.Entry('depth', value=15, parent=location)

Once we're happy with our associations, we "flush"

>>> openmetadata.flush(location)

To flush means to physically commit our metadata to disk. In this case, we flush our Location object which has just been associated with the three entries - width, height and depth. flush will write the immediate children of location as well as any grand-children.

>>> address = openmetadata.Entry('address', parent=location)
>>> street = openmetadata.Entry('street', value="Blackwall Way", parent=address)

Note here, that instead of parenting this last entry to our location we parent it to our address object instead, which is another Entry. This is how we can create nested entries and form a hierarchy of metadata.

>>> print location.ls()
marcus
    width
    height
    depth
    address
        street

Once we're done associating, we mustn't forget to flush.

$ openmetadata.flush(address)

Since we've already flushed our location, double flushing would mean to re-commit our metadata which is might not be optimal. Instead, we flush our address object directly, and in so doing only physically write the newly created associations.