Skip to main content

Analyzing anonymous user experiments

In the field of A/B testing, it is common to assign unauthenticated users to experiments using anonymous identifiers such as Cookie IDs. However, it is necessary to link these assignments to user-level events in order to calculate metrics for these unauthenticated users. In some cases, certain events may have an associated User ID, while others may not, due to users transitioning between authenticated and unauthenticated states.

For instance, consider a typical E-commerce company that maintains a table of events tracking user clicks and their progress through the purchase funnel (e.g., add-to-cart, checkout, and purchase events). Prior to making their first purchase and authenticating, any events generated by a user will not be associated with a User ID. Once the user authenticates, their events can be linked to a User ID as long as they remain logged in. However, if the user logs out, subsequent events will not be associated with a User ID until they authenticate again. This situation can result in time periods where important events are not linked to a User ID.

To address this issue, attribution models can be developed within the transformation layer of the data warehouse. Although these models may vary, a common approach is to assume that the association between a cookie and a specific user remains valid until proven otherwise. Specifically, once an association between an Anonymous ID and a User ID is established using data collected post-authentication, all events prior to that authentication moment can be assumed to have been performed by the same user. This relationship remains valid until the Anonymous ID is associated with a new User ID in the data. At that point, the observed relationship between the Anonymous ID and User ID can be assumed until another relationship is identified.

By performing a small amount of data transformation within the data warehouse and a few minutes of setup within Eppo, you can analyze these types of anonymized user experiments effectively in Eppo.

Warehouse Setup

To build an anonymous visitor-to-user attribution model, begin with a table similar to the one described above. From this table, build a model that identifies the minimum and maximum time in which an Anonymous ID was associated with a specific User ID. A template query to do this can be viewed below.

WITH
users_lag AS (
SELECT
user_id
, anonymous_id
, LAG(user_id) OVER (PARTITION BY anonymous_id ORDER BY ts) as last_user_id
, LAG(ts) OVER (PARTITION BY anonymous_id ORDER BY ts) as last_ts
, LEAD(ts) OVER (PARTITION BY anonymous_id ORDER BY ts) as next_ts
FROM event_table
)

, user_switch AS (
SELECT
*
, SUM(IF(last_user_id != user_id, 1, 0)) OVER (
PARTITION BY anonymous_id
ORDER BY ts
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_switch
FROM users_lag
)

, user_login_windows_collapsed as (
SELECT
anonymous_id
, user_id
, cumulative_switch
, LOGICAL_OR(last_ts IS NULL) as is_first
, LOGICAL_OR(next_ts IS NULL) as is_last
, MIN(ts) as ts_min
, MAX(next_ts) as ts_max
FROM user_switch
GROUP BY 1,2,3
)

SELECT
anonymous_id
, user_id
, IF(is_first, TIMESTAMP("0001-01-01 00:00:00"), ts_min) as ts_start_window
, IF(is_last, TIMESTAMP("9999-12-31 23:59:59"), ts_max) as ts_end_window
FROM user_login_windows_collapsed
ORDER BY anonymous_id, ts_min;

For the first identified relationship between an Anonymous ID and a User ID, a timestamp infinitely far into the past is used for the ts_start_window in order to provide an inferred User ID for events prior to a user’s first moment of authentication. Similarly, for the last identified relationship between an Anonymous ID and a User ID, a timestamp far into the future is used for the ts_end_window column to ensure that any events created in an unauthenticated state will have an inferred User ID. This association will be used for all unauthenticated events until a new relationship for any given Anonymous ID and User ID is identified. At this point, a new ts_start_window is defined for the given Anonymous ID.

Once this model is built, it can be joined to any fact table within the data warehouse. It should be joined onto these fact tables by User ID wherever a fact event’s timestamp is between a given user’s ts_start_window and ts_end_window. By doing this, all fact tables at the user level can now have an inferred Anonymous ID. This inferred Anonymous ID can then be used by Eppo to link Assignment SQL definitions at the Anonymous ID level to these fact tables.

SELECT
facts.ts
, mapping.anonymous_id
, facts.user_id
FROM fact_table as facts
LEFT JOIN anon_visitor_to_user_mapping AS mapping
ON facts.user_id = mapping.user_id
AND facts.ts BETWEEN
mapping.ts_start_window AND
mapping.ts_end_window

Eppo Setup

Within Eppo, first create an ‘anonymous’ entity. Feel free to give it whatever name makes the most sense for the organization. This entity will be used to organize both the Assignment SQL definition and the Fact SQL together.

Creating Anonymous Experiments

Once this entity is created, create an Assignment SQL definition and link it to the anonymous entity. Be sure to use the anonymous identifier (such as cookie id) for the Experiment Subjects column. This anonymous identifier will be used by Eppo to link your Assignment SQL definition to your Fact SQL definition.

Creating Anonymous SQL Assignments

Finally, create a Fact SQL definition that utilizes the anonymous visitor-to-user model described previously to infer an anonymous idea for the user-level fact table. Be sure to specify the anonymous entity as the Entity and use the anonymous identifier from the anonymous visitor-to-user model for the Entity ID column.

Creating Fact SQL Definition

With this setup, all metrics derived from this fact will successfully link back to assignments with Anonymous IDs. Follow the same pattern described above for other facts associated with metrics that need to be added to the metric repository.