Tutorial - abstractfactory/openmetadata GitHub Wiki
Let's get started with Open Metadata!
Open Metadata is a library for associating metadata with directories on your file-system. The association is made by storing hidden files inside of the folder you associate with; also known as sidecar-files.
A command-line tour
Once installed, we can do this:
$ cd marcus
$ openmetadata write height --value=1.87
Here, marcus
has been associated with height
. We can also store nested entries.
$ openmetadata write address/street --value="Blackwall Way"
Then we can read it back in:
$ openmetadata read height
1.87
$ openmetadata read address
['street']
$ openmetadata read address/street
Blackwall Way
Alternatively, you can use the short hand om
$ om read height
1.87
Python can do better
Cool, we've established some information about this particular user using the command-line interface, or CLI. The CLI is running a Python library in the background and is really good for learning and debugging; but if we want to make full use of Open Metadata we should probably get a glimpse into the library ourselves.
Here is what the equivalent Python code to the above might look like:
>>> import os
>>> import openmetadata
>>> cwd = os.getcwd()
>>> openmetadata.write(cwd, 'height', value=1.87)
>>> openmetadata.write(cwd, 'address/street', value="Blackwall Way)
write()
then has the equivalent read()
with which to read metadata from disk.
>>> openmetadata.read(cwd, 'height')
1.87
Going lower-level
Great! Let's keep digging. The function write()
you see here is a convenience method wrapping around some lower-level objects called Location
and Entry
. Location
represents any plain absolute path to a folder on disk. We use it rather than a typical string because it can also store parent/child relationships, and all metadata must have a parent.
Entry
on the other hand represents the actual metadata and as you might have guessed, every Entry
must have a location. Here is what the equivalent code would look like with these object as opposed to read()
and write()
>>> import os
>>> import openmetadata
>>> cwd = os.getcwd()
>>> location = openmetadata.Location(cwd)
>>> height = openmetadata.Entry('height', value=1.87, parent=location)
A delayed flush
At this point, we've selected a directory on disk - the current working directory - and we've associated some metadata to it. However, no data has yet been written. This is so that we can continue adding entries to this location and delay physically writing anything for as long as possible.
>>> width = openmetadata.Entry('width', value=10, parent=location)
>>> depth = openmetadata.Entry('depth', value=15, parent=location)
Once we're happy with our associations, we "flush"
>>> openmetadata.flush(location)
To flush means to physically commit our metadata to disk. In this case, we flush our Location
object which has just been associated with the three entries - width, height and depth. flush
will write the immediate children of location
as well as any grand-children.
>>> address = openmetadata.Entry('address', parent=location)
>>> street = openmetadata.Entry('street', value="Blackwall Way", parent=address)
Note here, that instead of parenting this last entry to our location
we parent it to our address
object instead, which is another Entry
. This is how we can create nested entries and form a hierarchy of metadata.
>>> print location.ls()
marcus
width
height
depth
address
street
Once we're done associating, we mustn't forget to flush.
$ openmetadata.flush(address)
Since we've already flushed our location
, double flushing would mean to re-commit our metadata which is might not be optimal. Instead, we flush our address
object directly, and in so doing only physically write the newly created associations.