Contextual Bandits
Usage with Contextual Multi-Armed Bandits
Eppo supports contextual multi-armed bandits. You can read more about them in the high-level documentation. Bandit flag configuration--including setting up the flag key, status quo variation, bandit variation, and targeting rules--are configured within the Eppo application. However, available actions are supplied to the SDK in the code when querying the bandit.
To leverage bandits using the Java SDK, there are two additional steps over regular feature flags:
- Add a bandit action logger to the SDK client instance
- Query the bandit for an action
Define a bandit assignment logger
In order for the bandit to learn an optimized policy, we need to capture and log the bandit's actions. This requires defining a bandit logger in addition to an assignment logger.
EppoClient.builder(sdkKey)
.assignmentLogger(assignmentLogData -> {
System.out.println("TODO: send assignment event data to data warehouse: " + assignmentLogData);
})
.banditLogger(banditLogData -> {
System.out.println("TODO: also send bandit event data to data warehouse, ensuring the column names are as expected: " + banditLogData);
})
.buildAndInit();
The bandit logger receives an event object with the following fields:
timestamp
DateDefault: undefined
The time when the action is taken
featureFlag
StringDefault: undefined
The key of the feature flag corresponding to the bandit
bandit
StringDefault: undefined
The key of the bandit
subject
StringDefault: undefined
The identifier of the subject
subjectNumericAttributes
AttributesDefault: {}
Metadata about numeric attributes of the subject
subjectCategoricalAttributes
AttributesDefault: {}
Metadata about non-numeric attributes of the subject
action
StringDefault: undefined
The action assigned by the bandit
actionNumericAttributes
AttributesDefault: {}
Metadata about numeric attributes of the assigned action
actionCategoricalAttributes
AttributesDefault: {}
Metadata about non-numeric attributes of the assigned action
actionProbability
DoubleDefault: undefined
The weight between 0 and 1 the bandit valued the assigned action
optimalityGap
DoubleDefault: undefined
The difference between the score of the selected action and the highest-scored action
modelVersion
StringDefault: undefined
The key for the version of the bandit parameters used
Query the bandit for an action
To query the bandit for an action, use the getBanditAction()
function:
String flagKey = "shoe-bandit";
String subjectKey = "user123";
// `DiscriminableAttributes` is a set of attributes which can present the attributes sorted into numeric and categorical
// The `Attributes` class sorts these automatically based on data type.
DiscriminableAttributes subjectAttributes = new Attributes(
Map.of(
"age", EppoValue.valueOf(25),
"country", EppoValue.valueOf("BG")
)
);
// `BanditActions` is a map of action key to a set of attributes.
Actions actions = new BanditActions(
Map.of(
"nike",
new Attributes(
Map.of(
"brandAffinity", EppoValue.valueOf(2.3),
"previouslyPurchased", EppoValue.valueOf(true)
)
),
"adidas",
new Attributes(
Map.of(
"brandAffinity", EppoValue.valueOf(0.2),
"previouslyPurchased", EppoValue.valueOf(false)
)
)
)
);
String defaultValue = "control";
BanditResult banditResult = EppoClient.getInstance().getBanditAction(
flagKey,
subjectKey,
subjectAttributes,
actions,
defaultValue
);
if (banditResult.getAction() != null) {
renderShoeAd(banditResult.getAction());
} else {
renderDefaultShoeAd();
}
Subject Context
The subject context contains contextual information about the subject that is independent of bandit actions. For example, the subject's age or country.
The subject context can be provided as Attributes
, which will then assume anything that is number is a numeric
attribute, and everything else is a categorical attribute.
You can also explicitly bucket the attribute types by providing the context as ContextAttributes
. For example, you may
have an attribute named priority
, with possible values 0
, 1
, and 2
that you want to be treated categorically rather
than numeric.
Attributes subjectNumericAttributes = new Attributes(
Map.of(
"age", EppoValue.valueOf(30)
)
);
Attributes subjectCategoricalAttributes = new Attributes(
Map.of(
"priority", EppoValue.valueOf(1),
"country", EppoValue.valueOf("GB")
)
);
ContextAttributes subjectAttributes = new ContextAttributes(
subjectNumericAttributes,
subjectCategoricalAttributes
);
Action Contexts
The action context contains contextual information about each action. They can be provided as a mapping of attribute names to their contexts.
Similar to subject context, action contexts can be provided as Attributes
or as ContextAttributes
. If there is no action
context, you can use a Set<String>
of all the action names when constructing BanditActions
.
If the subject is assigned to the variation associated with the bandit, the bandit selects one of the supplied actions. All actions supplied are considered to be valid. If an action should not be available to a subject, do not include it for that call.
Result
getBanditAction()
returns a BanditResult
which has two fields:
variation
(String): The variation that was assigned to the subjectaction
(String | null): The action that was assigned to the subject by the bandit, ornull
if the bandit was not assigned
When action
is not null
, the bandit has selected an action for the subject. Otherwise, you should use your status quo
algorithm to select an action.
Status Quo Algorithm
In order to accurately measure the performance of the bandit, we need to compare it to the status quo algorithm using an
experiment. This status quo algorithm could be a complicated algorithm that selects an action according to a different
model, or a simple baseline such as selecting a fixed or random action. When you create an analysis allocation for the
bandit and the returned action
is null
, implement the desired status quo algorithm based on the variation
value.