Measuring Cross-Product Adoption Using dbt_set_similarity

Enhancing cross-product insights within dbt workflowsIntroductionFor multi-product companies, one critical metric is often what is called “cross-product adoption”. (i.e. understanding how users engage with multiple offerings in a given product portfolio)One measure suggested to calculate cross-product or cross-feature usage in the popular book Hacking Growth [1] is the Jaccard Index. Traditionally used to measure the similarity between two sets, the Jaccard Index can also serve as a powerful tool for assessing product adoption patterns. It does this by quantifying the overlap in users between products, from which one can identify cross-product synergies and growth opportunities.The dbt package dbt_set_similarity is designed to simplify the calculation of set similarity metrics directly within an analytics workflow. This package provides a method to calculate the Jaccard Indices within SQL transformational layers.To import this package into your dbt project, add the following to the packages.yml file. We will also need dbt_utils for the purposes of this articles example. Run a dbt deps command within your project to install the package.packages: - package: Matts52/dbt_set_similarity version: 0.1.1 - package: dbt-labs/dbt_utils version: 1.3.0The Jaccard IndexThe Jaccard Index, also known as the Jaccard Similarity Coefficient, is a metric used to measure the similarity between two sets. It is defined as the size of the intersection of the sets divided by the size of their union.Mathematically, it can be expressed as:The Jaccard Index represents the “Intersection” over the “Union” of two sets (image by author)Where:A and B are two sets (ex. users of product A and product B)The numerator represents the number of elements in both setsThe denominator represents the total number of distinct elements across both sets(image by author)The Jaccard Index is particularly useful in the context of cross-product adoption because:It focuses on the overlap between two sets, making it ideal for understanding shared user basesIt accounts for differences in the total size of the sets, ensuring that results are proportional and not skewed by outliersFor example:If 100 users adopt Product A and 50 adopt Product B, with 25 users adopting both, the Jaccard Index is 25 / (100 + 50 — 25) = 0.2, indicating a 20% overlap between the two user bases by the Jaccard Index.Example DataThe example dataset we will be using is a fictional SaaS company which offers storage space as a product for consumers. This company provides two distinct storage products: document storage (doc_storage) and photo storage (photo_storage). These are either true, indicating the product has been adopted, or false, indicating the product has not been adopted.Additionally, the demographics (user_category) that this company serves are either tech enthusiasts or homeowners.https://medium.com/media/c3590a4a271c52eb25d51ae884025421/hrefFor the sake of this example, we will read this csv file in as a “seed” model named seed_example within the dbt project.Simple Cross-Product AdoptionNow, let’s say we want to calculate the jaccard index (cross-adoption) between our document storage and photo storage products. First, we need to create an array (list) of the users who have the document storage product, alongside an array of the users who have the photo storage product. In the second cte, we apply the jaccard_coef function from the dbt_set_similarity package to help us easily compute the jaccard coefficient between the two arrays of user id’s.with product_users as ( select array_agg(user_id) filter (where doc_storage = true) as doc_storage_users, array_agg(user_id) filter (where photo_storage = true) as photo_storage_users from {{ ref('seed_example') }})select doc_storage_users, photo_storage_users, {{ dbt_set_similarity.jaccard_coef( 'doc_storage_users', 'photo_storage_users' ) }} as cross_product_jaccard_coeffrom product_usersOutput from the above dbt model (image by author)As we can interpret, it seems that just over half (60%) of users who have adopted either of products, have adopted both. We can graphically verify our result by placing the user id sets into a Venn diagram, where we see three users have adopted both products, amongst five total users: 3/5 = 0.6.What the collection of user id’s and product adoption would look like, verifying our result (image by author)Segmented Cross-Product AdoptionUsing the dbt_set_similarity package, creating segmented jaccard indices for our different user categories should be fairly natural. We will follow the same pattern as before, however, we will simply group our aggregations on the user category that a user belongs to.with product_users as ( select user_category, array_agg(user_id) filter (where doc_storage = true) as doc_storage_users, array_agg(user_id) filter (where photo_storage = true)

Measuring Cross-Product Adoption Using dbt_set_similarity

Enhancing cross-product insights within dbt workflows

Introduction

For multi-product companies, one critical metric is often what is called “cross-product adoption”. (i.e. understanding how users engage with multiple offerings in a given product portfolio)

One measure suggested to calculate cross-product or cross-feature usage in the popular book Hacking Growth [1] is the Jaccard Index. Traditionally used to measure the similarity between two sets, the Jaccard Index can also serve as a powerful tool for assessing product adoption patterns. It does this by quantifying the overlap in users between products, from which one can identify cross-product synergies and growth opportunities.

The dbt package dbt_set_similarity is designed to simplify the calculation of set similarity metrics directly within an analytics workflow. This package provides a method to calculate the Jaccard Indices within SQL transformational layers.

To import this package into your dbt project, add the following to the packages.yml file. We will also need dbt_utils for the purposes of this articles example. Run a dbt deps command within your project to install the package.

packages:
- package: Matts52/dbt_set_similarity
version: 0.1.1
- package: dbt-labs/dbt_utils
version: 1.3.0

The Jaccard Index

The Jaccard Index, also known as the Jaccard Similarity Coefficient, is a metric used to measure the similarity between two sets. It is defined as the size of the intersection of the sets divided by the size of their union.

Mathematically, it can be expressed as:

The Jaccard Index represents the “Intersection” over the “Union” of two sets (image by author)

Where:

  • A and B are two sets (ex. users of product A and product B)
  • The numerator represents the number of elements in both sets
  • The denominator represents the total number of distinct elements across both sets
(image by author)

The Jaccard Index is particularly useful in the context of cross-product adoption because:

  • It focuses on the overlap between two sets, making it ideal for understanding shared user bases
  • It accounts for differences in the total size of the sets, ensuring that results are proportional and not skewed by outliers

For example:

  • If 100 users adopt Product A and 50 adopt Product B, with 25 users adopting both, the Jaccard Index is 25 / (100 + 50 — 25) = 0.2, indicating a 20% overlap between the two user bases by the Jaccard Index.

Example Data

The example dataset we will be using is a fictional SaaS company which offers storage space as a product for consumers. This company provides two distinct storage products: document storage (doc_storage) and photo storage (photo_storage). These are either true, indicating the product has been adopted, or false, indicating the product has not been adopted.

Additionally, the demographics (user_category) that this company serves are either tech enthusiasts or homeowners.https://medium.com/media/c3590a4a271c52eb25d51ae884025421/href

For the sake of this example, we will read this csv file in as a “seed” model named seed_example within the dbt project.

Simple Cross-Product Adoption

Now, let’s say we want to calculate the jaccard index (cross-adoption) between our document storage and photo storage products. First, we need to create an array (list) of the users who have the document storage product, alongside an array of the users who have the photo storage product. In the second cte, we apply the jaccard_coef function from the dbt_set_similarity package to help us easily compute the jaccard coefficient between the two arrays of user id’s.

with product_users as (
select
array_agg(user_id) filter (where doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (where photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
)

select
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users
Output from the above dbt model (image by author)

As we can interpret, it seems that just over half (60%) of users who have adopted either of products, have adopted both. We can graphically verify our result by placing the user id sets into a Venn diagram, where we see three users have adopted both products, amongst five total users: 3/5 = 0.6.

What the collection of user id’s and product adoption would look like, verifying our result (image by author)

Segmented Cross-Product Adoption

Using the dbt_set_similarity package, creating segmented jaccard indices for our different user categories should be fairly natural. We will follow the same pattern as before, however, we will simply group our aggregations on the user category that a user belongs to.

with product_users as (
select
user_category,
array_agg(user_id) filter (where doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (where photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
group by user_category
)

select
user_category,
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users
Output from the above dbt model (image by author)

We can see from the data that amongst homeowners, cross-product adoption is higher, when considering jaccard indices. As shown in the output, all homeowners who have adopted one of the product, have adopted both. Meanwhile, only one-third of the tech enthusiasts who have adopted one product have adopted both of the products. Thus, in our very small dataset, cross-product adoption is higher amongst homeowners as opposed to tech enthusiasts.

We can graphically verify the output by again creating Venn diagram:

Venn diagrams split by the two segments (image by author)

Conclusion

dbt_set_similarity provides a straightforward and efficient way to calculate cross-product adoption metrics such as the Jaccard Index directly within a dbt workflow. By applying this method, multi-product companies can gain valuable insights into user behavior and adoption patterns across their product portfolio. In our example, we demonstrated the calculation of overall cross-product adoption as well as segmented adoption for distinct user categories.

Using the package for cross-product adoption is simply one straightforward application. In reality, there exists countless other potential applications of this technique, for example some areas are:

  • Feature usage analysis
  • Marketing campaign impact analysis
  • Support analysis

Additionally, this style of analysis is certainly not limited to just SaaS, but can apply to virtually any industry. Happy Jaccard-ing!

References

[1] Sean Ellis and Morgan Brown, Hacking Growth (2017)

Resources

dbt package hub


Measuring Cross-Product Adoption Using dbt_set_similarity was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.