SO 5.8 InDepth Subscription Storage - Stiffstream/sobjectizer GitHub Wiki
Introduction
Subscription storage is a data structure for storing and manipulating information about an agent's subscriptions. Every agent has its own private subscription storage. When an agent creates subscription like:
void some_agent::so_define_agent()
{
so_default_state().event( &some_agent::evt_do_some_work );
}
this subscription is stored in the agent's subscription storage. When agent receives a messages the handler for that message will be searched in this storage.
The problem is selection of the appropriate data structure for that storage.
When an agent uses small amount of subscription (like one or two subscriptions) then a very simple vector-based implementation will be the most efficient. When an agent uses several dozens subscriptions then trivial vector-based implementation becomes inefficient and map-based or flat-set-based storage will be more appropriate. But when an agent uses several hundreds or even thousands of subscriptions then hash-table-based implementation will be more efficient.
Since v.5.5.3 a user can specify which subscription storage should be used for an agent.
Do do so it is necessary to add an approriate factory to agent's context:
class my_agent : public so_5::agent_t
{
public :
my_agent( context_t ctx )
: so_5::agent_t( ctx + so_5::hash_table_based_subscription_storage_factory() )
{}
...
};
Note. The type of subscription storage can be specified only once during agent creation. After creation the subscription storage cannot be changed.
Standard Subscription Storage Factories
There are several implementations of subscription storage:
- vector-based implementation. Uses std::vector and simple linear search. The content of this vector is not ordered. Very efficient on small numbers of subscriptions. Function
so_5::vector_based_subscription_storage_factory()
creates factory for this type of the storage; - flat-set-based implementation. Uses std::vector with ordered content binary search. It's a bit more efficient than map-based implementation described below. Function
so_5::flat_set_based_subscription_storage_factory()
creates factory for this type of the storage. This implementation is available since v.5.8.2; - map-based implementation. Uses std::map and is efficient when count of subscriptions is greater than 10-20 and less than 100-200. Function
so_5::map_based_subscription_storage_factory()
creates factory for this type of the storage; - hash-table-based implementation. Uses std::unordered_map and is most efficient when count of subscription is exceed several hundreds. Function
so_5::hash_table_based_subscription_storage_factory()
creates factory for this type of the storage; - adaptive storage. Uses two storage objects. The first one is used when the count of subscriptions is small. The second is used when the count of subscription exceeds some threshold. This storage dynamically changes implementations -- switches from small storage to the big one when new subscriptions are created and switches back when subscriptions are erased. By default adaptive storage uses vector-based storage as small one, and map-based storage as big one. But this can be changed at the moment of storage creation. The adaptive storage is created by
so_5::adaptive_subscription_storage_factory()
functions.
By default all agents use adaptive subscription storage. It means that if an agent creates very few subscriptions it will use very small and very fast vector-based storage. But if count of subscription grows then agent will switch to more expensive but more appropriate for big amount of subscriptions map-based storage.
But if user knows what count of subscriptions an actor will use then an appropriate storage can be created once and never switches from one implementation to another.
Setting Default Subscription Storage Factory for SObjectizer Environment
Since v.5.8.2 it's possible to set a global default factory for subscription storage. If an agent has no explicit subscription storage factory this factory will be used automatically.
This global default factory has to be specified via so_5::environment_params_t
:
so_5::launch([](so_5::environment_t & env) { ... },
[](so_5::environment_params_t & params) {
// This factory will be used as the default one.
params.default_subscription_storage_factory(
so_5::flat_set_based_subscription_storage_factory(16u));
});
NOTE. The default factory can be set only before start of the SObjectizer Environment. Once the SObjectizer Environment started the factory can't be changed.
Note About Selection of a Subscription Storage
Unfortunately, there is just one obvious recommendation: if your agents have few subscriptions (1-2, maybe 4-5, but no more than a dozen), then simplest vector-based storage with unordered content will be most efficient in almost all circumstances.
For all other types of subscription storages it is necessary to do benchmarks and measurements for your hardware and type of workload. The results are highly depend on several factors:
- cache size;
- number of agents;
- number of messages;
- number of subscriptions.
It seems that if the data is fit into the CPU cache(s) then flat-set-, map- and hash-table-based implementations may show very similar results. Sometimes flat-set-based is the fastest, sometimes the hash-table-based. But hash-table-based storage usually shows the most stable results when the number of subscriptions grows.
However, if agent's data is dropped from the CPU cache(s) too often (this may happen when you have thousands of agents and a lot of messages for them), then results may be quite different. In our internal benchmarks we saw that flat-set-based works better than map-based and hash-table-based storage, and hash-table-based storage performs the worst. But results on your hardware and under your workload may be yet more different.
So, there is no ready to use recipe. If you have to deal with big numbers of subscriptions for an agent please do some experiments and benchmarks.