Cohort Analysis

Cohort Analysis#

We will analyze different metrics by monthly cohorts.

We will choose a period of 30 days for lifetime, meaning each 30 days, the cohort’s lifetime will increase by 1.

Number of Sales

fig = df_sales.analysis.cohort(
    mode='sales'
    , user_id_col='customer_unique_id'
    , date_col='order_purchase_dt'
    , order_id_col='order_id'
)
pb.to_slide(fig)
fig.show()
../../_images/0bbb2f187602b21cbd31abd7811a563efe32a6c178daf641dc7d05db76823ee5.jpg

Key Observations:

  • Most purchases occur in the cohort’s first month

  • Stabilization after first month without sharp decline

Revenue

Since Olist is a marketplace, it is reasonable to calculate revenue by the total amount of sold products. This is because the revenue for marketplaces typically comes from commissions on sellers.

fig = df_sales.analysis.cohort(
    mode='revenue'
    , user_id_col='customer_unique_id'
    , date_col='order_purchase_dt'
    , order_id_col='order_id'
    , revenue_col='total_products_price'
    , text_auto='.2s'
)
pb.to_slide(fig)
fig.show()
../../_images/289a13d1902763c339654a1223fa279e509ded05cf9202f5b96451b2109325ef.jpg

Key Observations:

  • Revenue follows similar pattern to sales volume

  • Majority of cohort revenue generated in first month

  • No sharp lifetime decline observed

Average Order Value

For calculating the average order value, it is reasonable to take the total payment amount per order. This is because the average order value should reflect the total cost of the order.

fig = df_sales.analysis.cohort(
    mode='aov'
    , user_id_col='customer_unique_id'
    , date_col='order_purchase_dt'
    , order_id_col='order_id'
    , revenue_col='total_payment'
)
pb.to_slide(fig)
fig.show()
../../_images/3a68f860c94be4895dec934ab2a5c68485f9dcd5295548710b679c96bcf53e6f.jpg

Key Observations:

  • No post-first-month decline in average order value (relative metric)

  • No clear growth/decline trend across cohort lifetimes

  • Some peak values in isolated periods

  • Median AOV shows slight decline after 11 months

  • Anomalously high AOV:

    • March 2017 cohort (month 5)

    • April 2017 cohort (month 15)

Number of Buyers

fig = df_sales.analysis.cohort(
    mode='buyers'
    , user_id_col='customer_unique_id'
    , date_col='order_purchase_dt'
    , order_id_col='order_id'
)
pb.to_slide(fig)
fig.show()
../../_images/774876028eb502b69d75ca774f2ab413ccce5306954166679b689e6f32b1e3b7.jpg

Key Observations:

  • Customer count pattern mirrors revenue/sales

  • Nearly all customers don’t return after first month

Retention

fig = df_sales.analysis.cohort(
    mode='retention'
    , user_id_col='customer_unique_id'
    , date_col='order_purchase_dt'
    , order_id_col='order_id'
    , include_period0=False
)
pb.to_slide(fig) 
fig.show()
../../_images/cb13ebb49d2c5b5a3faa59e52a2bf185b3963dc010abe34f117233b41d85a09e.jpg

Key Observations:

  • Extremely low 1st+ month retention

  • Olist shows very poor customer retention

Let’s look at the median retention of cohorts by periods.

fig = df_sales.analysis.cohort(
    mode='retention'
    , user_id_col='customer_unique_id'
    , date_col='order_purchase_dt'
    , order_id_col='order_id'
    , display_mode='summary'
)
pb.to_slide(fig) 
fig.show()
../../_images/3a9ff7d288a3ee5e30ab9ce4dd002186dc6bbc4d3a62aaa896281273664f8ae4.jpg

Key Observations:

  • Clear pattern: minimal customers return for repeat purchases

Average Payment Count

fig = df_sales.analysis.cohort(
    mode='apc'
    , user_id_col='customer_unique_id'
    , date_col='order_purchase_dt'
    , order_id_col='order_id'
) 
pb.to_slide(fig) 
fig.show()
../../_images/58275b853b65cd3155c54c4742c1199ca9817cf45a9a643d375613b5d0db26fd.jpg

Key Observations:

  • Customers typically make 1-1.3 purchases per period

ARPPU

For ARPPU, it is reasonable to take the total cost of products in the order, as we need the revenue per customer.

fig = df_sales.analysis.cohort(
    mode='arppu'
    , user_id_col='customer_unique_id'
    , date_col='order_purchase_dt'
    , order_id_col='order_id'
    , revenue_col='total_products_price'
) 
pb.to_slide(fig) 
fig.show()
../../_images/4ec9b64dd25d1fa2eb1f6141f4e8b41fea972e7522f03bae80cd6a8302d63170.jpg

Key Observations:

  • Pattern similar to average order value

  • ARPPU resembles AOV since most customers make single purchases

LTV (Revenue-Based)

Since we do not have data on expenses or the margin coefficient, we cannot calculate the actual LTV.

We will calculate LTV (Revenue-Based), assuming a margin coefficient of 1. This will serve as a proxy for LTV.

fig = df_sales.analysis.cohort(
    mode='ltv'
    , user_id_col='customer_unique_id'
    , date_col='order_purchase_dt'
    , order_id_col='order_id'
    , margin=1
) 
pb.to_slide(fig) 
fig.show()
../../_images/4f4629ed28b1397872786d6b61a289869bf323183f22abff19e91b9704165ab2.jpg

Key Observations:

  • Revenue-Based LTV remains stable across lifetime periods

  • Expected pattern since most purchases occur in first month

  • LTV reflects cumulative lifetime performance