Skip to main content

Analyzing anonymous user experiments

In the field of A/B testing, it is common to assign unauthenticated users to experiments using anonymous identifiers such as Cookie IDs. However, it is necessary to link these assignments to user-level events in order to calculate metrics for these unauthenticated users. In some cases, certain events may have an associated User ID, while others may not, due to users transitioning between authenticated and unauthenticated states.

For instance, consider a typical E-commerce company that maintains a table of events tracking user clicks and their progress through the purchase funnel (e.g., add-to-cart, checkout, and purchase events). Prior to making their first purchase and authenticating, any events generated by a user will not be associated with a User ID. Once the user authenticates, their events can be linked to a User ID as long as they remain logged in. However, if the user logs out, subsequent events will not be associated with a User ID until they authenticate again. This situation can result in time periods where important events are not linked to a User ID.

To address this issue, attribution models can be developed within the transformation layer of the data warehouse. Although these models may vary, a common approach is to assume that the association between a cookie and a specific user remains valid until proven otherwise. Specifically, once an association between an Anonymous ID and a User ID is established using data collected post-authentication, all events prior to that authentication moment can be assumed to have been performed by the same user. This relationship remains valid until the Anonymous ID is associated with a new User ID in the data. At that point, the observed relationship between the Anonymous ID and User ID can be assumed until another relationship is identified.

By performing a small amount of data transformation within the data warehouse and a few minutes of setup within Eppo, you can analyze these types of anonymized user experiments effectively in Eppo.

Warehouse Setup

To build an anonymous visitor-to-user attribution model, begin with a table similar to the one described above. From this table, build a model that identifies the minimum and maximum time in which a User ID was associated with a specific Anonymous ID. A template query to do this can be viewed below.

anonymous_id_lag AS (
, anonymous_id
, LAG(anonymous_id) OVER (PARTITION BY user_id ORDER BY ts) as last_anonymous_id
, LAG(ts) OVER (PARTITION BY user_id ORDER BY ts) as last_ts
, LEAD(ts) OVER (PARTITION BY user_id ORDER BY ts) as next_ts
FROM event_table

, anonymous_id_switch AS (
, SUM(IF(last_anonymous_id != anonymous_id, 1, 0)) OVER (
) AS cumulative_switch
FROM anonymous_id_lag

, user_login_windows_collapsed as (
, user_id
, cumulative_switch
, LOGICAL_OR(last_ts IS NULL) as is_first
, LOGICAL_OR(next_ts IS NULL) as is_last
, MIN(ts) as ts_min
, MAX(next_ts) as ts_max
FROM anonymous_id_switch
GROUP BY 1,2,3

, user_id
, IF(is_first, TIMESTAMP("0001-01-01 00:00:00"), ts_min) as ts_start_window
, IF(is_last, TIMESTAMP("9999-12-31 23:59:59"), ts_max) as ts_end_window
FROM user_login_windows_collapsed
ORDER BY user_id, ts_min;

For the first identified relationship between an Anonymous ID and a User ID, a timestamp infinitely far into the past is used for the ts_start_window in order to provide an inferred User ID for events prior to a user’s first moment of authentication. Similarly, for the last identified relationship between an Anonymous ID and a User ID, a timestamp far into the future is used for the ts_end_window column to ensure that any events created in an unauthenticated state will have an inferred User ID. This association will be used for all unauthenticated events until a new relationship for any given Anonymous ID and User ID is identified. At this point, a new ts_start_window is defined for the given Anonymous ID.

Also, because most facts are tied to User IDs rather than Anonymous IDs (the events underlying most metrics are tied to logged-in states), it is recommended to verify that the mapping table has no overlapping intervals. An overlapping interval means that a particular User ID is tied to multiple Anonymous IDs at the same time. In that case, an event completed by a user at that time would be attributed to multiple Anonymous IDs, leading to double-counting. To verify that there are no overlapping intervals, the following query can be used. It should return no rows.

FROM anon_visitor_to_user_mapping t1
LEFT JOIN anon_visitor_to_user_mapping t2
ON t1.user_id = t2.user_id
-- Checking to see if there's an overlapping attribution window for the SAME user_id with a different anonymous id
-- If an overlap exists, there is the possibility of double counting fact events
AND t1.anonymous_id <> t2.anonymous_id
AND t1.ts_end_window > t2.ts_start_window
AND t2.ts_end_window > t1.ts_start_window
WHERE t2.user_id IS NOT NULL

Once this model is built, it can be joined to any fact table within the data warehouse. It should be joined onto these fact tables by User ID wherever a fact event’s timestamp is between a given user’s ts_start_window and ts_end_window. By doing this, all fact tables at the user level can now have an inferred Anonymous ID. This inferred Anonymous ID can then be used by Eppo to link Assignment SQL definitions at the Anonymous ID level to these fact tables.

, mapping.anonymous_id
, facts.user_id
FROM fact_table as facts
LEFT JOIN anon_visitor_to_user_mapping AS mapping
ON facts.user_id = mapping.user_id
AND facts.ts BETWEEN
mapping.ts_start_window AND

Eppo Setup

Within Eppo, first create an ‘anonymous’ entity. Feel free to give it whatever name makes the most sense for the organization. This entity will be used to organize both the Assignment SQL definition and the Fact SQL together.

Creating Anonymous Experiments

Once this entity is created, create an Assignment SQL definition and link it to the anonymous entity. Be sure to use the anonymous identifier (such as cookie id) for the Experiment Subjects column. This anonymous identifier will be used by Eppo to link your Assignment SQL definition to your Fact SQL definition.

Creating Anonymous SQL Assignments

Finally, create a Fact SQL definition that utilizes the anonymous visitor-to-user model described previously to infer an anonymous idea for the user-level fact table. Be sure to specify the anonymous entity as the Entity and use the anonymous identifier from the anonymous visitor-to-user model for the Entity ID column.

Creating Fact SQL Definition

With this setup, all metrics derived from this fact will successfully link back to assignments with Anonymous IDs. Follow the same pattern described above for other facts associated with metrics that need to be added to the metric repository.