Skip to main content

Contextual Bandits

Eppo's Go SDK supports contextual multi-armed bandits, which dynamically optimize assignments based on user context. A bandit balances exploration of new actions with exploitation of known successful actions to maximize a specified metric.

Bandit Setup

To leverage Eppo's contextual bandits using the Go SDK, there are two additional steps over regular feature flags:

  1. Add a bandit action logger to the assignment logger
  2. Query the bandit for an action

Logging Bandit Actions

In order for the bandit to learn an optimized policy, we need to capture and log the bandit's actions. This requires implementing the LogBanditAction method in your logger:

type MyLogger struct{}

func (l *MyLogger) LogAssignment(assignment eppoclient.AssignmentEvent) error {
// Your assignment logging implementation
return nil
}

func (l *MyLogger) LogBanditAction(banditAction eppoclient.BanditAction) error {
// Your bandit action logging implementation
return nil
}

Bandit Action Schema

The SDK will invoke the LogBanditAction method with a BanditAction struct containing the following fields:

TimestampstringDefault: ""

The time when the action is taken in UTC. Example: "2024-03-22T14:26:55.000Z"

FlagKeystringDefault: ""

The key of the feature flag corresponding to the bandit. Example: "bandit-test-allocation-4"

BanditKeystringDefault: ""

The key (unique identifier) of the bandit. Example: "ad-bandit-1"

SubjectstringDefault: ""

An identifier of the subject or user assigned to the experiment variation. Example: "ed6f85019080"

SubjectNumericAttributesmap[string]float64Default: nil

Metadata about numeric attributes of the subject. Example: {"age": 30}

SubjectCategoricalAttributesmap[string]stringDefault: nil

Metadata about non-numeric attributes of the subject. Example: {"loyalty_tier": "gold"}

ActionstringDefault: ""

The action assigned by the bandit. Example: "promo-20%-off"

ActionNumericAttributesmap[string]float64Default: nil

Metadata about numeric attributes of the assigned action. Example: {"discount": 0.2}

ActionCategoricalAttributesmap[string]stringDefault: nil

Metadata about categorical attributes of the assigned action. Example: {"promoTextColor": "white"}

ActionProbabilityfloat64Default: 0

The weight between 0 and 1 the bandit valued the assigned action. Example: 0.25

ModelVersionstringDefault: ""

Unique identifier for the version of the bandit parameters used. Example: "v123"

OptimalityGapfloat64Default: 0

The optimality gap represents the difference between the probability of the chosen action and the probability of the best possible action according to the bandit model. A lower value indicates the chosen action is closer to optimal. This field is optional and will be omitted if zero.

MetaDatamap[string]stringDefault: {}

A key-value map that allows attaching additional contextual information to bandit events and assignments. This field is required and can be used to store custom metadata about the event such as environment information, client details, or any other relevant tracking data.

Querying the Bandit

To query the bandit for an action, use the GetBanditAction method:

attributes := eppoclient.ContextAttributes{
NumericAttributes: map[string]float64{
"age": float64(user.Age),
},
CategoricalAttributes: map[string]string{
"country": user.Country,
},
}

actions := map[string]eppoclient.ContextAttributes{
"nike": {
NumericAttributes: map[string]float64{
"brand_affinity": 2.3,
},
CategoricalAttributes: map[string]string{
"previously_purchased": "true",
},
},
"adidas": {
NumericAttributes: map[string]float64{
"brand_affinity": 0.2,
},
CategoricalAttributes: map[string]string{
"previously_purchased": "false",
},
},
}

result, err := client.GetBanditAction(
"shoe-bandit",
user.ID,
attributes,
actions,
"control",
)
if err != nil {
log.Printf("Error getting bandit action: %v", err)
return
}

if result.Action != nil {
showShoeAd(result.Action)
} else {
showDefaultAd()
}

Subject Context

The subject context contains contextual information about the subject that is independent of bandit actions. For example, the subject's age or country.

The subject context has type ContextAttributes which has two fields:

  • Numeric (map[string]float64): A map of numeric attributes (such as "age")
  • Categorical (map[string]string): A map of categorical attributes (such as "country")
note

The context attributes are also used for targeting rules for the feature flag similar to how the SDK uses ContextAttributes are used with feature flags.

Action Contexts

Next, supply a map with actions and their attributes: actions: map[string]ContextAttributes. If the user is assigned to the bandit, the bandit selects one of the actions supplied here. All actions supplied are considered to be valid; if an action should not be shown to a user, do not include it in this map.

The action attributes are similar to the subject_attributes but hold action-specific information. You can use ContextAttributes{} to create an empty attribute context.

Note that action contexts can contain two kinds of information:

  • Action-specific context: e.g., the image aspect ratio of the image corresponding to this action
  • User-action interaction context: e.g., there could be a "brand-affinity" model that computes brand affinities of users to brands, and scores of this model can be added to the action context to provide additional context for the bandit.

Result

The result is an instance of BanditResult, which has two fields:

  • Variation (string): The variation that was assigned to the subject
  • Action (*string): The action that was assigned to the subject (nil if no action was assigned)

The variation returns the feature flag variation, this can be the bandit itself, or the "status quo" variation if the user is not assigned to the bandit. If we are unable to generate a variation, for example when the flag is turned off, then the default variation is returned. In both of those cases, the Action is nil, and you should use the status-quo algorithm to select an action.

When Action is not empty, the bandit has selected that action to be shown to the user.

Status Quo Algorithm

In order to accurately measure the performance of the bandit, we need to compare it to the status quo algorithm using an experiment. This status quo algorithm could be a complicated algorithm that selects an action according to a different model, or a simple baseline such as selecting a fixed or random action.

When you create an analysis allocation for the bandit and the Action in BanditResult is empty, implement the desired status quo algorithm based on the Variation value.