Understanding silo systems - normenmueller/f-p GitHub Wiki

F-P wording

On machine M1 I fire up a server S1@M1 and upload some data sets {D1, D2, D3}@S1@M1. Next, I fire up a second server on M1 such that {D4, D5}@S2@M1. Eventually, I end up with something like

 {D1, D2, D3}@S1@M1
     {D4, D5}@S2@M1
     {D6, D7}@S3@M2
{D8, D9, D10}@S4@M3

So I have 3 physical machines, 4 servers, and 10 data sets.

Now I try to map this to F-P wording.

The data sets are silos (S), the servers are silo systems (each with an underlying server) (SS+), and the machines are commonly called nodes (N). With this perspective silo systems[^1] can be understood as entry points to collections of silos.

[^1]: Note, even w/o an underlying server, say, running a silo system in client mode, it can still be understood as an entry point to a collection of silos. Those silos are just not hosted by the silo system itself but by some other silo system.

 {S1, S2, S3}@SS+1@N1
     {S4, S5}@SS+2@N1
     {S6, S7}@SS+3@N2
{S8, S9, S10}@SS+4@N3

Note A silo is a stationary, typed data container. Therefore each data set D# maps to an individual silo S#.

Server vs. Client mode

Let's assume we are running a silo system in client mode (w/o an underlying server) on a machine SS1@N4, and we have started the previous listed silo systems in server mode but have not yet populated some data sets:

{}@SS+1@N1
{}@SS+2@N1
{}@SS+3@N2
{}@SS+4@N3
    SS1@N4

I consider SS1@N4 as the driver, i.e., this silo system uploads initial data, defines some workflow to be executed on the distributed data, and finally collects some results.

Note In terms of master/slave or driver/worker, a silo system acting as a server may be understood as a slave or worker serving silos. From the client perspective, a silo system may be understood as a master or driver defining the to be processed workflow.


Question How are various drivers handled?

Is there a notion of a silo ownership?


So, let's upload some initial data sets: SS1@N4 -- {D1} --> {}SS+1@N1 yielding

{S1}@SS+1@N1
      SS1@N4

Question How does SS1@N4 know where to put the data?

In order to upload some data, say, in order to create the first silo, how does SS1@N4 know where to send the data to? Where does it get this information from, say, how does it know which SS+ exists?

With this question I also want to scrutinise the term "system". In this scenario, re server mode, we have two systems at N1, one at N2, and one at N3. So we have a network of systems. From a user perspective, shouldn't there in turn be a something like a "system of systems", such that a user just posts the data sets to this overall system which in turn takes care of distributing and tracking the respective silos. In this respect, is "silo system" the right name for a system w/o an underlying server, say, the right name for a driver?



Question Does it make sense to have a silo system running in dual mode?

A silo system can be fired up either in client mode or in dual mode. The latter is a silo system running in server mode and in client mode at the same time. Currently, there is no dedicated option to run a silo system in server mode[^2] only.

Thus, a silo system running in dual mode may host silos (on the underlying server) and define and execute workflows on those and remotes. So, in case there are no more silo systems running in server mode, the client would interact just with itself. Does that make sense?

Clearly, I do not refer to a setup like

{}@SS+1@N1
    SS1@N1

This is a setup with two silo systems on the same machine. I refer to one silo system running in dual mode.

[^2]: By "server mode" I refer to not having the ability to interact with such a silo system via method calls (e.g. system.aMethod()) but via the network layer only. A silo system in server mode blocks until shutdown via a dedicated network message.


⚠️ **GitHub.com Fallback** ⚠️