Contextual Bandits
Eppo's Ruby SDK supports coextual multi-armed bandits, which dynamically optimize assignments based on user context. A bandit balances exploration of new actions with exploitation of known successful actions to maximize a specified metric.
Bandit Setup
To leverage Eppo's contextual bandits using the Ruby SDK, there are two additional steps over regular feature flags:
- Add a bandit action logger to the assignment logger
- Query the bandit for an action
Logging Bandit Actions
In order for the bandit to learn an optimized policy, we need to capture and log the bandit's actions.
This requires implementing the log_bandit_action
method in your AssignmentLogger
class:
class MyLogger < EppoClient::AssignmentLogger
def log_assignment(assignment)
# Your assignment logging implementation
end
def log_bandit_action(bandit_action)
# Your bandit action logging implementation
end
end
Bandit Action Schema
The SDK will invoke the log_bandit_action
method with a bandit_action
hash containing the following fields:
timestamp
StringDefault: undefined
The time when the action is taken in UTC. Example: "2024-03-22T14:26:55.000Z"
flagKey
StringDefault: undefined
The key of the feature flag corresponding to the bandit. Example: "bandit-test-allocation-4"
banditKey
StringDefault: undefined
The key (unique identifier) of the bandit. Example: "ad-bandit-1"
subject
StringDefault: undefined
An identifier of the subject or user assigned to the experiment variation. Example: "ed6f85019080"
subjectNumericAttributes
Hash{String => Float}Default: {}
Metadata about numeric attributes of the subject. Example: {"age" => 30}
subjectCategoricalAttributes
Hash{String => String}Default: {}
Metadata about non-numeric attributes of the subject. Example: {"loyalty_tier" => "gold"}
action
StringDefault: undefined
The action assigned by the bandit. Example: "promo-20%-off"
actionNumericAttributes
Hash{String => Float}Default: {}
Metadata about numeric attributes of the assigned action. Example: {"discount" => 0.2}
actionCategoricalAttributes
Hash{String => String}Default: {}
Metadata about non-numeric attributes of the assigned action. Example: {"promoTextColor" => "white"}
actionProbability
FloatDefault: undefined
The weight between 0 and 1 the bandit valued the assigned action. Example: 0.25
modelVersion
StringDefault: undefined
Unique identifier for the version of the bandit parameters used. Example: "v123"
Querying the Bandit
To query the bandit for an action, use the get_bandit_action
method:
client = EppoClient::Client.instance
bandit_result = client.get_bandit_action(
"shoe-bandit", # flag_key
user.id, # subject_key
EppoClient::Attributes.new(
numeric_attributes: { "age" => 25 },
categorical_attributes: { "country" => "GB" }
),
{ # actions with their attributes
"nike" => EppoClient::Attributes.new(
numeric_attributes: { "brand_affinity" => 2.3 },
categorical_attributes: { "previously_purchased" => true }
),
"adidas" => EppoClient::Attributes.new(
numeric_attributes: { "brand_affinity" => 0.2 },
categorical_attributes: { "previously_purchased" => false }
)
},
"control" # default_value
)
if bandit_result.action
show_shoe_ad(bandit_result.action)
else
show_default_ad
end
Subject Context
The subject context contains contextual information about the subject that is independent of bandit actions. For example, the subject's age or country.
The subject context has type Attributes
which has two fields:
numeric_attributes
(Hash): A hash of numeric attributes (such as "age")categorical_attributes
(Hash): A hash of categorical attributes (such as "country")
The categorical_attributes
are also used for targeting rules for the feature flag similar to how subject_attributes
are used with regular feature flags.
Action Contexts
Next, supply a hash with actions and their attributes: actions: Hash{String => Attributes}
.
If the user is assigned to the bandit, the bandit selects one of the actions supplied here.
All actions supplied are considered to be valid; if an action should not be shown to a user, do not include it in this hash.
The action attributes are similar to the subject_attributes
but hold action-specific information.
You can use Attributes.empty
to create an empty attribute context.
Note that action contexts can contain two kinds of information:
- Action-specific context: e.g., the image aspect ratio of the image corresponding to this action
- User-action interaction context: e.g., there could be a "brand-affinity" model that computes brand affinities of users to brands, and scores of this model can be added to the action context to provide additional context for the bandit.
Result
The bandit_result
is an instance of BanditResult
, which has two fields:
variation
(String): The variation that was assigned to the subjectaction
(String or nil): The action that was assigned to the subject
The variation returns the feature flag variation, this can be the bandit itself, or the "status quo" variation if the user is not assigned to the bandit.
If we are unable to generate a variation, for example when the flag is turned off, then the default
variation is returned.
In both of those cases, the action
is nil
, and you should use the status-quo algorithm to select an action.
When action
is not nil
, the bandit has selected that action to be shown to the user.
Status Quo Algorithm
In order to accurately measure the performance of the bandit, we need to compare it to the status quo algorithm using an experiment. This status quo algorithm could be a complicated algorithm that selects an action according to a different model, or a simple baseline such as selecting a fixed or random action.
When you create an analysis allocation for the bandit and the action
in BanditResult
is nil
, implement the desired status quo algorithm based on the variation
value.
Debugging
You may encounter a situation where a bandit assignment produces a value that you did not expect. The SDK provides detailed evaluation information through the get_bandit_action_details
method:
evaluation = client.get_bandit_action_details(
"shoe-bandit",
"test-subject",
subject_attributes,
actions,
"control"
)
puts "Assignment: #{evaluation.variation}"
puts "Action: #{evaluation.action}"
puts "Details: #{evaluation.evaluation_details}"
For more information on debugging assignments, see Debugging Flag Assignment.