RTBkit Tracing - datacratic/rtbkit GitHub Wiki
The following pages describes RTBkit Tracing, a lightweight framework which can be used in order to trace various objects throughout RTBkit.
As described here, a running RTBkit stack produces every a large amount of metrics, which can be filtered and graphed, so as to produce a good overview of a running system.
- Non intrusive (dosen't require traced objects to inherit), and
- Lightweight: using exclusively thread local storage, releasing from the need for synchronisation mechanisms.
- Makes no assumption on the storage category of traced objects: both heap and stack allocated objects can be traced.
-
template RTBKIT::Trace<X>
requiresX
to enforce ado_probe
concept -- i.e., implement a function returning a tuple(trace_kind, trace_id, sampling_frequency)
- Neither the collection of traces, nor their analysis is covered by the facility. Custom trace sinks can be used, as long as they are lightweight. Yet a default sink is provided, which will ultimately call into syslog, with facility
LOG_LOCAL7
and levelLOG_INFO
. Of coursesyslog
can be configured to do many interesting things, such as storing traces into a database, and so forth. - A trace can be created on any kind of object (including transient) as long as it is possible to implement a
do_probe
concept on them. In other words, the RTBkit trace is a syntactical tracing facility, leaving semantical questions upon its user: for an objectX
, implementing a randomdo_probe(X const&)
will produce meaningless traces.
This is a convenient way to capture the kind of object being currently traced. It consists in a const char*
which will be collected along with the traces.
It consists in a std::string
: as the production of traces can be scattered across various components of the system, trace_id
provide a convenient way to gather semantically related traces. Again, please note that two distinct traced C++ objects can have their do_probe
function returning a same trace_id
. (e.g., BidRequest/Auction/BidResponse/...). This id needs to be unique, preferably throughout the system (e.g., auction id returned by the exchange).
This consists in an integer. Given a tuple (kind, id, N)
, only traces where citihash(id)%N==0
will be collected. Returning N==1
will trigger the production of a trace for every call.
template <typename T>
class Trace
{
public:
Trace (const std::shared_ptr<T>& t, const std::string& tag)
: Trace (*t.get(), tag)
{
}
Trace (const T& t, const std::string& tag)
{
// ...
}
~Trace ()
{
// ...
}
Trace objects are tiny objects using the RAII idiom in order to produce useful information. They can very much be thought as "execution stacks", in the sense that they can be stacked on top of each other. An example is given below:
struct X {/*...*/};
tuple<char*,string,int> do_probe(X const& x) {/*...*/}
void lovely_class::fascinating_member (X const& x) {
Trace<X> t(x, "lovely_class::fascinating_member"); // timestamp1 collected, stack pushed
// Trace::~Trace() called: timestamp2 collected, stack popped. span collected
}
## `struct Span {/*...*/};`
The structure as a Span is as below:
```c++
struct Span
{
std::string tag_ ;
uint32_t id_, pid_;
std::chrono::steady_clock::time_point start_, end_;
// ...
};
typedef std::function<void(const ProbeCtx&, const std::vector<Span>&)> SinkCb;
For a given trace_id
, the SinkCb
function exposed above will be called as soon as the last span is popped off the execution stack. Then, whatever thread specific storage was used for this trace_id
will be clobbered.
The default Sink is shown below:
void
default_probe_sink(const RTBKIT::ProbeCtx& ctx,
const std::vector<RTBKIT::Span>& vs)
{
using namespace std::chrono;
using std::get;
static auto hostname = boost::asio::ip::host_name();
static auto pid = ::getpid();
auto format = [&] (RTBKIT::Span const& s) {
std::ostringstream oss;
oss << "{"
<< "'tid':'" << std::this_thread::get_id() << "'"
<< ",'host':'" << hostname << "'"
<< ",'kpid':" << pid
<< ",'kind':'" << get<0>(ctx) << "'"
<< ",'uniq':'" << get<1>(ctx) << "'"
<< ",'freq':" << get<2>(ctx)
<< ",'pid':" << s.pid_
<< ",'id':" << s.id_
<< ",'tag':'" << s.tag_ << "'"
<< ",'t1':" << duration_cast<nanoseconds>(s.start_.time_since_epoch()).count()
<< ",'t2':" << duration_cast<nanoseconds>(s.end_.time_since_epoch()).count()
<< "}";
return oss.str();
};
for (const auto& s: vs)
{
const auto str = format(s);
syslog (LOG_INFO, "%s", str.c_str());
}
}