Tutorial - nokia/minifold GitHub Wiki
Tutorial
This tutorial is split in two parts.
- The first part explains how to use minifold primitives on list of dictionaries. This part is especially if you are not used with SQL.
- The second part illustrates how to build a minifold pipelines using connectors. Such pipelines to separate user needs (the
Query
) and the processing required to obtain the dictionaries. For instance, through a single pipeline, you can query several end points and aggregate their results corresponding to a given query.
Your first commands
This section starts with a simple example to present minifold primitives. You can run ipython3
and copy/paste the following lines of code to try by yourself.
Minifold primitives process a list of dictionaries supposed to share the same set of keys, and returns a list of dictionaries.
users = [
{
"firstname" : "John",
"lastname" : "Doe"
}, {
"firstname" : "John",
"lastname" : "Connor"
}, {
"firstname" : "Peter",
"lastname" : "Parker"
}
]
Now, let's see some minifold primitives.
select
: fetch a subset of keys
Suppose you want to fetch last names. Run:
from pprint import pprint
from minifold.select import select
pprint(select(users, ["lastname"]))
Result:
[{'lastname': 'Doe'}, {'lastname': 'Connor'}, {'lastname': 'Parker'}]
Similarly, you could get firstnames as follows:
pprint(select(users, ["firstname"]))
Result:
[{'firstname': 'John'}, {'firstname': 'John'}, {'firstname': 'Peter'}]
unique
: get dictionaries distinct according to a subset of keys
Suppose you want to get only distinct lastnames:
from pprint import pprint
from minifold.select import select
from minifold.unique import unique
pprint(
unique(
["firstname"],
select(users, ["firstname"])
)
)
Result:
[{'firstname': 'John'}, {'firstname': 'Peter'}]
where
. Filtering entries.
Suppose you only want to keep users having the firstname "John":
- Using a dedicated function:
from minifold.where import where
def my_filter(user :dict) -> bool:
return user["firstname"] == "John"
pprint(where(users, my_filter))
- Using a lambda function:
from minifold.where import where
pprint(where(users, lambda user: user["firstname"] == "John"))
Result:
[{'firstname': 'John', 'lastname': 'Doe'},
{'firstname': 'John', 'lastname': 'Connor'}]
lambdas
. Enriching entries.
Supposed you want to add a key is_spiderman
in each dictionary, and you want the corresponding value to be True
iff the record is related to Peter Parker.
from minifold.lambdas import lambdas
pprint(
lambdas(
{
"is_spiderman" : lambda user: user["firstname"] == "Peter" \
and user["lastname"] == "Parker"
},
users
)
)
Result:
[{'firstname': 'John', 'is_spiderman': False, 'lastname': 'Doe'},
{'firstname': 'John', 'is_spiderman': False, 'lastname': 'Connor'},
{'firstname': 'Peter', 'is_spiderman': True, 'lastname': 'Parker'}]
To go further
To discover other primitives, visit the Framework page.
Your first connector and your first queries
We will start with the simplest connector: EntriesConnector
. This is just a wrapper around a collection of dictionaries. Let's start from the previous example:
from minifold.entries_connector import EntriesConnector
users = [
{
"firstname" : "John",
"lastname" : "Doe"
}, {
"firstname" : "John",
"lastname" : "Connor"
}, {
"firstname" : "Peter",
"lastname" : "Parker"
}
]
connector = EntriesConnector(users)
Simple query
You can now query this connector using the query
method. As usual, it returns a list of dictionaries. By default, a Query
fetches everything.
from pprint import pprint
from minifold.query import Query
q = Query()
entries = connector.query(q)
pprint(entries)
Result:
[{'firstname': 'John', 'lastname': 'Doe'},
{'firstname': 'John', 'lastname': 'Connor'},
{'firstname': 'Peter', 'lastname': 'Parker'}]
Refined queries
Query
object can transport "instructions" to indicate which dictionaries you're interested in. You can basically get subset of keys (using attributes
parameter), dictionaries matching some constraints (using filters
etc.).
from pprint import pprint
from minifold.query import Query
q = Query(
attributes = ["lastname"],
filters = lambda user: user["firstname"] == "John"
)
entries = connector.query(q)
pprint(entries)
Result:
[{'lastname': 'Doe'}, {'lastname': 'Connor'}]
Your first minifold pipelines
First example: aggregating streams of dictionaries
Suppose we now want to build a pipeline in charge of returning the set of distinct firstnames appearing in two collections. To this end:
- We need to wrap those two collections, using
EntriesConnector
. - We need to merge them, using
UnionConnector
- We can keep only firstnames, using
SelectConnector
, depending on if we want to keep lastnames or not. - We can remove duplicates, using
UniqueConnector
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from pprint import pprint
from minifold.entries_connector import EntriesConnector
from minifold.query import Query
from minifold.union import UnionConnector
from minifold.unique import UniqueConnector
boys = [
{
"firstname" : "John",
"lastname" : "Doe"
}, {
"firstname" : "John",
"lastname" : "Connor"
}, {
"firstname" : "Peter",
"lastname" : "Parker"
}
]
girls = [
{
"firstname" : "Sarah",
"lastname" : "Connor"
}, {
"firstname" : "Jane",
"lastname" : "Doe"
}
]
pipeline = UniqueConnector(
["firstname"],
UnionConnector([
EntriesConnector(boys),
EntriesConnector(girls)
])
)
Let's run a simple query:
q = Query()
entries = pipeline.query(q)
pprint(entries)
Results:
[{'firstname': 'John', 'lastname': 'Doe'},
{'firstname': 'Peter', 'lastname': 'Parker'},
{'firstname': 'Sarah', 'lastname': 'Connor'},
{'firstname': 'Jane', 'lastname': 'Doe'}]
Let's run a more evolved query:
q = Query(attributes = ["firstname"])
entries = pipeline.query(q)
pprint(entries)
Results:
[{'firstname': 'John'},
{'firstname': 'Peter'},
{'firstname': 'Sarah'},
{'firstname': 'Jane'}]
Second example: enriching the stream of dictionaries
Now, suppose you to build another pipeline which add gender key on top of those two collections. This can be done using LambdasConnector
. Here, we assume that boys
and girls
are well-separated:
pipeline = UnionConnector([
LambdasConnector(
{"gender" : lambda boy: "male"},
EntriesConnector(boys)
),
LambdasConnector(
{"gender" : lambda girl: "female"},
EntriesConnector(girls)
)
])
q = Query()
entries = pipeline.query(q)
pprint(entries)
Results:
[{'firstname': 'John', 'gender': 'male', 'lastname': 'Doe'},
{'firstname': 'John', 'gender': 'male', 'lastname': 'Connor'},
{'firstname': 'Peter', 'gender': 'male', 'lastname': 'Parker'},
{'firstname': 'Sarah', 'gender': 'female', 'lastname': 'Connor'},
{'firstname': 'Jane', 'gender': 'female', 'lastname': 'Doe'}]
Of course, if your collections are a mix of men and women, you would require a more evolved lambda. from minifold.lambdas import LambdasConnector
def gender(user :dict) -> str:
return "male" if user["firstname"] in {"John", "Peter"} else \
"female" if user["firstname"] in {"Jane", "Sarah"} else \
"?"
pipeline = UnionConnector([
LambdasConnector(
{"gender" : gender},
EntriesConnector(boys)
),
LambdasConnector(
{"gender" : gender},
EntriesConnector(girls)
)
])
q = Query()
entries = pipeline.query(q)
pprint(entries)
Results:
[{'firstname': 'John', 'gender': 'male', 'lastname': 'Doe'},
{'firstname': 'John', 'gender': 'male', 'lastname': 'Connor'},
{'firstname': 'Peter', 'gender': 'male', 'lastname': 'Parker'},
{'firstname': 'Sarah', 'gender': 'female', 'lastname': 'Connor'},
{'firstname': 'Jane', 'gender': 'female', 'lastname': 'Doe'}]
Playing with heterogeneous data sources
The principe remains the same. Instead of using EntriesConnector
, you just rely on other connectors, depending on the nature of the data source.
- If the data source is remote, it is a good idea to use
CacheConnector
. Hence, you avoid to run to many query to API that could blacklist you and you improve the performance of your application. Browse this page to discover the full list of connectors. - If the data source requires credentials to get accessed, it is a good idea to configure a template using
Config
. Hence, your credentials are not hard-coded in your script. Browse this page to discover how to configure templates.
Creating your own connectors
- I advise you to start with a simple connector, e.g.
EntriesConnector
to see a minimal example. - Then, as an exercise, copy this file and try to redevelop
JsonConnector
usingjson
package. - Once you're satisfied, compare your implementation and the minifold one. If everything is clear, feel free to see how more complex connectors have been implemented.
Good luck!