CATS, CATS pdf for Continuous Actions - VowpalWabbit/vowpal_wabbit GitHub Wiki
CATS is a contextual bandit algorithm with a continuous action space. You can find the related paper here. It uses epsilon greedy exploration with tree policy classes and smoothing.
CATS, utilizing the features given as input, will first choose a center from a continuous action range using a tree policy, and then will use a bandwidth to determine a radius of randomization around the chosen center (centers or discrete actions). The depth of the tree and the bandwidth need to be specified beforehand.
The cats
reduction calls into sample_pdf
that samples from cats_pdf
. cats_pdf
will in turn call into cb_explore_pdf
which will eventually call into cats_tree
.
The returned pdf (from cats_pdf
) will consist of the chosen action range (chosen action center adjusted with the bandwidth to return a range) with the exploit probability being the density of that range. The remaining range(s) will have a density calculated as the explore probability uniformly distributed amongst the remaining ranges. sample_pdf
will then sample from the given pdf values and return a single chosen action, along with the probability density at the sampled location.
NOTE: CATS was released in VW 8.9 with having the bandwidth
smoothing parameter be a property of the num_actions
. Since VW 8.10, bandwidth
was adjusted to be a property of the continuous range (max_value
- min_value
).
Example: min_value = 0
, max_value = 32
, num_actions = 8
, bandwidth = 1
. This gives us a continuous range of 32
and a unit range of 32 / 8 = 4
Let's say that vw predicted a continuous action inside the second unit range [4, 8]
with a centre of 6
.
In VW 8.9: with bandwidth
being a property of the number of actions the smoothing would happen across 3 unit-ranges i.e. the first unit range, the second unit range (the one predicted), and the third unit range. Resulting in a pdf with higher density inside the range [0, 4] and [4, 8] and [8, 12] => [0, 12]
In master: with bandwidth
being a property of the continuous range the smoothing would happen across the predicted centre 6
plus/minus bandwidth
, resulting in a pdf with higher density inside the range [5, 7]
.
cats (i.e. cats_pdf
after sampling)
vw --cats <num_actions> --bandwidth <bw> --min_value <min value> --max_value <max value> -d <data file>
cats_pdf
vw --cats_pdf <num_actions> --bandwidth <bw> --min_value <min value> --max_value <max value> -d <data file>
The first argument passed (num_actions
) specifies the discrete actions (centers) and therefore the depth of the tree that will be log2(num_actions)
. The bandwidth
determines the randomization radius around the chosen center/action. min_value
and max_value
is the overall range of the action space. All the normal parameters are valid such as the data file and prediction output. The data file should be in the input format described below.
The label type for continuous actions is VW::cb_continuous::continuous_label
. It must be supplied when learning. The cost is the cost associated with the action chosen (negative reward) and the pdf_value
is the density of the pdf of the chosen value (the sampled location).
struct continuous_label_elm
{
float action; // the continuous action
float cost; // the cost of this class
float pdf_value; // the pdf density of the chosen location, specifies the probability the data collection policy chose this action
};
struct continuous_label
{
v_array<continuous_label_elm> costs;
};
The prediction type for continuous actions is VW::continuous_actions::probability_density_function_value
for cats
and VW::continuous_actions::probability_density_function
for cats_pdf
and is defined as follows:
struct probability_density_function_value
{
float action; // continuous action
float pdf_value; // pdf value
};
struct pdf_segment
{
float left; // starting point
float right; // ending point
float pdf_value; // height
};
using probability_density_function = v_array<pdf_segment>;
Continuous actions (cats
, cats_pdf
) format is a single-line format.
Labels are required when learning. If passed during testing, test labels omit the entire action:cost:pdf_value
section.
ca action:cost:pdf_value |[namespace] <features>
action, cost, and pdf_value are floats
and the action
must fall in the [min_value
, max_value
] range.
Labelled example
ca 185.121:0.657567:6.20426e-05 | <features>
ca 772.592:0.458316:6.20426e-05 | <features>
ca 15140.6:0.31791:6.20426e-05 | <features>
For an unlabelled example the action:cost:pdf_value
section can be excluded.
Unlabelled example
| <features>
| <features>