Potential models of goal directed search - mdsteiner/GoalDirectedDecisionMaking GitHub Wiki

Notes

Look at the BEAST (Best estimate plus sampling theory) model by Erev for justification of sampling processes in impression formation.

Models

Definitions

Short Name	Long Name	Description	Notes
SampExHeur	Sample Extrapolation heuristic	Consider N past observations from each option and assume this is the future distribution of the option. Calculate the likelihood that selecting this option with this fixed distribution for the rest of the game would allow one to reach the goal. Select options proportionally to this likelihood.	Could be very similar to the SWIM model by Ashbey and Rakow (but read highly critical commentary by Wulff and Pachur)
StatExInt	Statistic extrapolation integration	Calculate the observed sample mean and standard deviation of each option. Assume that each option is Normally distributed with these sample statistics. Calculate the likelihood that selecting each option for the rest of the game using the definition of the sum of multiple independent Normal distributions. Select options proportionally to this likelihood.	Used in Steiner Thesis
Ludwig's Model (colleague of Sutton)	Not sure	Given N trials left in the game, do a mental simulation (for each option) where you draw N samples from all past samples from the option. Calculate the percent of simulations that would lead to reaching the goal. Then, choose the option that has the highest success rate.	Dirk suggested this. But the paper may or may not be published.

Definitions

Parameter	Definition
X, Y, ...	Choice Options
T	Total number of trials in task
G	Goal number of points to be achieved by trial T
Tt	Number of points earned by trial t

Sample Extrapolation Model (SampEx)

Description In the SampEx model, people are assumed to evaluate each option by drawing a sample of observed outcomes from that option, and using that sample to evaluate the likelihood that selecting that option for the rest of the game would lead them to reaching the goal.
Parameters
- N, The number of samples used to evaluate options
- Theta (or epsilon), a choice sensitivity parameter

Create Recent Distribution (RD)

Consider the last N observations from each option. This creates the Recent Distribution (RD) of each option.
- X RD: {X(t-1), X(t-2), ... X(t-N)}
- Y RD: {Y(t-1), Y(t-2), ... Y(t-N)}

Variation: N samples could also be selected at random from all observed outcomes, or alternatively, could be selected proportionally to their recency of occurrence.

Created Recent Extrapolated Distribution (ReD)

Let R be T - t, the number of trials remaining in the task. Multiply each observation by R. This creates the Recent Extrapolated Distribution (ReD)
- X ReD: {X(t-1) * R, X(t-2) * R, ... X(t-N) * R},
- Y ReD: {Y(t-1) * R, Y(t-2) * R, ... Y(t-N) * R}

Calculate likelihood of reaching goal from ReD

For each extrapolated outcome in the ReD, recode as 1 if it is greater than or equal to G - Yt (i.e. will reach the goal), and 0 if it is less than G - Yt (i.e.; will not reach the goal). Call this ReD_bin (ReD binary)
- X ReD_bin: {X(t-1) * R > (G - Yt), X(t-2) * R > (G - Yt), ... X(t-N) * R > (G - Yt)},
- Y ReD_bin: {Y(t-1) * R > (G - Yt), Y(t-2) * R > (G - Yt), ... Y(t-N) * R > (G - Yt)}
Calculate the percentage of extrapolated outcomes that reach the goal. Call this pi
- X pi: mean(X ReD_bin).
- Y pi: mean(Y ReD_bin)

Select option

Select options proportionally to their pi values (e.g; softmax or e-greedy)

Example

Consider a game with a total number of trials T = 25, a goal G = 25

Assume that on trial 14 a person has observed the following samples (7 from each option)

A: {5, 0, -5, -10, 20, 5, -10} (sum = 5, mean = 0.71, sd = 11.51)
B: {1, 1, 0, 1, -1, 1, 2} (sum = 6, mean = 0.86, sd = 0.95)

Given these observations, the player has a current point total Y14 = 11.

Assume that N, the number of observations used to evaluate options, is N = 5:

Consider the last N = 5 observations from each option. This creates the Recent Distribution (RD) of each option.
- X RD: {5, 0, -5, -10, 20}
- Y RD: {1, 1, 0, 3, -1}
Multiply each outcome in the RD by (T - t) = 25 - 14 = 11 to get the ReD
- X ReD: {55, 0, -55, -110, 220}
- Y ReD: {11, 11, 0, 33, -11}
The number of points necessary to reach the goal is 25 - 11 = 14. Calculate the percentage of extrapolated outcomes that reach the goal.

X ReD_bin: {1, 0, 0, 0, 1}
Y ReD_bin: {0, 0, 0, 1, 0}

Calculate pi for each option

X pi = 2 / 5 = 0.40
Y pi = 1 / 5 = 0.20

Select option proportionally to X pi and Y pi

p(select X) = softmax(X pi, Y pi, Theta), or e-greedy(X pi, Y pi, epsilon)

Statistic extrapolation integration (StatExInt)

Description Calculate the observed sample mean and standard deviation of each option. Assume that each option is Normally distributed with these sample statistics. Calculate the likelihood that selecting each option for the rest of the game using the definition of the sum of multiple independent Normal distributions. Select options proportionally to this likelihood.
Parameters
- Theta (or epsilon), a choice sensitivity parameter, if choice rule is used

Compute mean and standard deviation of the distributions

Let R be T - t, the number of trials remaining in the task. Consider all sampled observations (Os) from each option. From these calculate the mean and standard deviation adapted for a distribution from which one sample represents R samples of the distribution defined solely by Os:
- X mean: mean(X Os * R)
- Y mean: mean(Y Os * R)
- X sd: sqrt(sd(X Os)^2 * R)
- Y sd: sqrt(sd(Y Os)^2 * R)

When no sample has been drawn from an option, assume mean = 0 and sd = 1.

Compute the probability of reaching the goal

Compute the probability of getting at least G - Yt (i.e. reaching the goal) with one sample from normal distributions with the computed means and standard deviations.
- X pi: 1 - pnorm(G - Yt, X mean, X sd)
- Y pi: 1 - pnorm(G - Yt, Y mean, Y sd)

Select option

Select options proportionally to their pi values (e.g; softmax or e-greedy)

Example

Consider a game with a total number of trials T = 25, a goal G = 25

Assume that on trial 14 a person has observed the following samples (7 from each option)

A: {5, 0, -5, -10, 20, 5, -10} (sum = 5, mean = 0.71, sd = 11.51)
B: {1, 1, 0, 1, -1, 1, 2} (sum = 6, mean = 0.86, sd = 0.95)

Given these observations, the player has a current point total Y14 = 11, and there are R = 11 trials left and G - Y14 = 14 points remain untill the goal.

Compute mean and sds
- X mean = 0.71 * 11 = 7.81
- Y mean = 0.86 * 11 = 9.46
- X sd = sqrt(11.51^2 * 11) = 38.17
- Y sd = sqrt(0.95^2 * 11) = 3.15
Compute the probability of reaching the goal
- X pi = 1 - pnorm(14, 7.81, 38.17) = 0.44
- Y pi = 1 - pnorm(14, 9.46, 3.15) = 0.07
Select option proportionally to X pi and Y pi

p(select X) = softmax(X pi, Y pi, Theta), or e-greedy(X pi, Y pi, epsilon)