RTBkit Tracing - datacratic/rtbkit GitHub Wiki

The following pages describes RTBkit Tracing, a lightweight framework which can be used in order to trace various objects throughout RTBkit.

Problem description.

As described here, a running RTBkit stack produces every a large amount of metrics, which can be filtered and graphed, so as to produce a good overview of a running system.

Description of the facility

  • Non intrusive (dosen't require traced objects to inherit), and
  • Lightweight: using exclusively thread local storage, releasing from the need for synchronisation mechanisms.
  • Makes no assumption on the storage category of traced objects: both heap and stack allocated objects can be traced.
  • template RTBKIT::Trace<X> requires X to enforce a do_probe concept -- i.e., implement a function returning a tuple (trace_kind, trace_id, sampling_frequency)
  • Neither the collection of traces, nor their analysis is covered by the facility. Custom trace sinks can be used, as long as they are lightweight. Yet a default sink is provided, which will ultimately call into syslog, with facility LOG_LOCAL7 and level LOG_INFO. Of course syslog can be configured to do many interesting things, such as storing traces into a database, and so forth.
  • A trace can be created on any kind of object (including transient) as long as it is possible to implement a do_probe concept on them. In other words, the RTBkit trace is a syntactical tracing facility, leaving semantical questions upon its user: for an object X, implementing a random do_probe(X const&) will produce meaningless traces.

trace_kind

This is a convenient way to capture the kind of object being currently traced. It consists in a const char* which will be collected along with the traces.

trace_id

It consists in a std::string: as the production of traces can be scattered across various components of the system, trace_id provide a convenient way to gather semantically related traces. Again, please note that two distinct traced C++ objects can have their do_probe function returning a same trace_id. (e.g., BidRequest/Auction/BidResponse/...). This id needs to be unique, preferably throughout the system (e.g., auction id returned by the exchange).

sampling_frequency

This consists in an integer. Given a tuple (kind, id, N), only traces where citihash(id)%N==0 will be collected. Returning N==1 will trigger the production of a trace for every call.

Structure of a Trace.

template <typename T> struct Trace {/*...*/};

template <typename T>
class Trace
{
public:
    Trace (const std::shared_ptr<T>& t, const std::string& tag)
        : Trace (*t.get(), tag)
    {
    }
    Trace (const T& t, const std::string& tag)
    {
       // ...
    }
    ~Trace ()
    {
       // ...
    }

Trace objects are tiny objects using the RAII idiom in order to produce useful information. They can very much be thought as "execution stacks", in the sense that they can be stacked on top of each other. An example is given below:

struct X {/*...*/};
tuple<char*,string,int> do_probe(X const& x) {/*...*/}
void lovely_class::fascinating_member (X const& x) {
    Trace<X> t(x, "lovely_class::fascinating_member"); // timestamp1 collected, stack pushed
  
    //  Trace::~Trace() called: timestamp2 collected, stack popped. span collected
} 

## `struct Span {/*...*/};`
The structure as a Span is as below: 
```c++
struct Span
{
    std::string              tag_ ;
    uint32_t                 id_, pid_;
    std::chrono::steady_clock::time_point start_, end_;
    // ...
};
typedef std::function<void(const ProbeCtx&, const std::vector<Span>&)> SinkCb;

Traces collection.

For a given trace_id, the SinkCb function exposed above will be called as soon as the last span is popped off the execution stack. Then, whatever thread specific storage was used for this trace_id will be clobbered. The default Sink is shown below:

void 
default_probe_sink(const RTBKIT::ProbeCtx& ctx, 
                   const std::vector<RTBKIT::Span>& vs)
{
    using namespace std::chrono;
    using std::get;
    static auto hostname = boost::asio::ip::host_name();
    static auto pid = ::getpid();
    auto format = [&] (RTBKIT::Span const& s) {
        std::ostringstream oss;
        oss << "{"
        << "'tid':'" << std::this_thread::get_id() << "'"
        << ",'host':'" << hostname << "'"
        << ",'kpid':" << pid
        << ",'kind':'" << get<0>(ctx) << "'"
        << ",'uniq':'" << get<1>(ctx) << "'"
        << ",'freq':" << get<2>(ctx)
        << ",'pid':" << s.pid_
        << ",'id':" << s.id_
        << ",'tag':'" << s.tag_ << "'"
        << ",'t1':" << duration_cast<nanoseconds>(s.start_.time_since_epoch()).count()
        << ",'t2':" << duration_cast<nanoseconds>(s.end_.time_since_epoch()).count()
        << "}";
        return oss.str();
    };
    for (const auto& s: vs)
    {
        const auto str = format(s);
        syslog (LOG_INFO, "%s", str.c_str());
    }
}
⚠️ **GitHub.com Fallback** ⚠️