Design - nokia/minifold GitHub Wiki
The design of minifold is based on three main entities:
- Queries characterizes a set of objects of interest.
- Entries gather the set of objects to a given query. In minifold, entries is represented by a list of dictionaries.
- Nodes are in charge to perform the processing related to queries and entries. A node may have children in charge of performing some sub-tasks. This induces a directed acyclic graph structure (usually a tree).
The resulting graph corresponds to the query plan. There are two kinds of nodes:
- Connectors corresponds to its leaves. They are in charge of wrapping sources of data and are called. They translate minifold queries according to the remote platform paradigm (e.g. a LDAP query, a HAL query, etc)
- Operators corresponds to the other nodes. They usually implements simple operations (like SQL operators) on the data.
Once the query plan is built, it can be executed:
- The arcs of the query plan graph are traversed forward by queries and backward by entries.
- The user sends a query via the root node. The root returns to the user the requested entries.
- The query is recursively forwarded by each node to its children. The query may be modified when it is forwarded to a child node.
- When a connector handles a query, it transposes it into the data source formalism, collects the matching results, and returns the entries to its parent node.
- A parent node processes the entries returned by their children, according to the primitive it implements.
- The task assigned to a given node ends once all the entries returned by its children have been processed.
- The query plan is fully executed once the root has finished its job.
A Query
is an object which characterize the data of interest in a unified format. This object is inspired from SQL background and typically embeds:
- an action:
ACTION_CREATE
,ACTION_GET
,ACTION_UPDATE
,ACTION_DELETE
- the queried object/table: for instance
"researcher"
,"publication"
or"conference"
- the requested fields: for instance
"year"
,"title"
,"authors"
- optionally some filters (see
where
) - and some other options (
offset
,limit
...)
Example: Institution query
from minifold.query import Query, ACTION_GET
q_institution = Query(
action = ACTION_GET,
object = "institutions",
attributes = [],
filters = BinaryPredicate("institution_id", "==", 3)
)
Example: LDAP query:
from minifold.query import Query, ACTION_GET
q_ldap = Query(
action = ACTION_GET,
object = "ou=users,dc=lincs,dc=fr",
attributes = ["uid", "sn", "givenName", "departmentNumber"],
filters = BinaryPredicate("sn", "==", "Mathieu")
)
A Connector
is in charge of wrapping sources of data involved of the query plan.
The following examples shows how to build a Connector
that can be queried afterwards, as shown in the "Query" section.
Example: from a list of python dictionary.
from minifold.entries import EntriesConnector
from minifold.query import Query, ACTION_GET
q_institution = Query(
action = ACTION_GET,
object = "institutions", # Not needed
attributes = [],
filters = BinaryPredicate("institution_id", "==", 3)
)
institution_connector = EntriesConnector([
{"institution_id" : 1, "institution" : "TPT"},
{"institution_id" : 2, "institution" : "UPMC"},
{"institution_id" : 3, "institution" : "INRIA"},
{"institution_id" : 4, "institution" : "SystemX"},
{"institution_id" : 5, "institution" : "Nokia"},
])
entries = institution_connector.query(q_institution)
Example: from a LDAP server:
from lincs_config import LDAP_HOST, LDAP_USERNAME, LDAP_PASSWORD
from minifold.ldap import LdapConnector
from minifold.query import Query, ACTION_GET
q_ldap = Query(
action = ACTION_GET,
object = "ou=users,dc=lincs,dc=fr",
attributes = ["uid", "sn", "givenName", "departmentNumber"],
filters = BinaryPredicate("sn", "==", "Mathieu")
)
with LdapConnector(LDAP_HOST, LDAP_USERNAME, LDAP_PASSWORD) as ldap_connector:
# here we can query the container
entries = ldap_connector.query(q_ldap)
An Operator
in charge of processing the Query issued by its parent(s) and to process the entries issued by its children. Operators are usually based on a underlying function. The developer is free to interconnect nodes or to directly rely on these functions to build its workflow.
The following example shows the difference between these two approaches.
Example: querying Hal with the original ontology and rename the results afterwards:
from pprint import pprint
from minifold.rename import rename, RenameConnector
from minifold.hal import HAL_ALIASES, HalConnector
hal_connector = HalConnector()
entries = hal_connector.query(Query(
action = ACTION_READ,
object = "publication",
attributes = [
"title_s", "producedDateY_i",
"authFullName_s", "conferenceTitle_s"
],
filters = BinaryPredicate("authFullName_s", "==", "Fabien Mathieu")
))
publications = rename(HAL_ALIASES, entries)
pprint(publications)
Example: querying Hal with a renamed ontology
from pprint import pprint
from minifold.rename import rename, RenameConnector
from minifold.hal import HAL_ALIASES, HalConnector
hal_connector = RenameConnector(HAL_ALIASES, HalConnector())
publications = hal_connector.query(Query(
action = ACTION_READ,
object = "publication",
attributes = ["title", "year", "authors", "conference"],
filters = BinaryPredicate("authors", "==", "Fabien Mathieu")
))
pprint(publications)
In terms of implementation, Nodes rely on two primitives:
-
query()
handles an incomingQuery
and forwards it to its child(ren). TheQuery
may be altered during this step, depending on the nature of Node. For example, theRenameConnector
changes the attributes names mentioned in the query. -
answer()
processes the entries (resulting from a past query) returned by its child(ren) and returns them to its own parent (if any).
If a Node
requires a connection-state, we rely on the python "with" statement (e.g. LdapConnector
).