In [None]:
%run ../../_pre_run.ipynb

# Analysis of Customer Segments

In [None]:
pb.configure(
    df = df_customers
    , metric = 'customer_unique_id'
    , metric_label = 'Share of Customers'
    , agg_func = 'nunique'
    , norm_by='all'
    , axis_sort_order='descending'    
    , text_auto='.1%'
)

Save customer metrics in a separate list.

In [None]:
customers_dim = [
    "activity_segment"
    , "value_segment"
    , "purchase_freq_segment"
    , "repeat_segment"
    , "loyalty_segment"
    , "risk_segment"
    , "weekday_segment"
    , "installment_segment"
    , "products_cnt_segment"
    , "weight_segment"
    , "customer_top_purchase_weekdays"
    , "customer_payment_types"
    , "customer_top_product_categories"
    , "customer_top_general_product_categories"
    , "customer_city"
    , "customer_state"
]

## Distribution of Customers by Segments

Examine how customers are distributed across each segment and compare key metrics between segments.

Select the following key customer metrics.

- total_customer_payment
- avg_total_order_payment 
- buys_cnt
- from_first_to_last_days
- customer_avg_reviews_score
- avg_products_cnt
- avg_delivery_delay_days
- avg_order_total_weight_kg

In [None]:
selected_metrics = [
    'total_customer_payment',
    'avg_total_order_payment', 
    'buys_cnt',
    'from_first_to_last_days',
    'customer_avg_reviews_score',
    'canceled_share',
    'purchase_weekend_share',
    'avg_products_cnt',
    'avg_delivery_delay_days',
    'avg_order_total_weight_kg'
]

Give more readable names for the metrics on the graphs.

In [None]:
metric_labels = {
    'total_customer_payment': 'Total Spending',
    'avg_total_order_payment': 'Average Order Value',
    'buys_cnt': 'Number of Purchases',
    'from_first_to_last_days': 'Customer Lifetime',
    'customer_avg_reviews_score': 'Average Rating',
    'canceled_share': 'Order Cancelation Rate',
    'purchase_weekend_share': 'Weekend Purchase Ratio',
    'avg_products_cnt': 'Average Items per Order',
    'avg_delivery_delay_days': 'Avg Delivery Delay',
    'avg_order_total_weight_kg': 'Avg Order Weight'
}

In [None]:
labels_for_polar={**base_labels, **metric_labels}

**By Activity Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='activity_segment'
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='activity_segment'
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- 3% of all customers made no successful purchases  
- 94% of successful customers made only one purchase  
- 1% in Potential Core segment  
- 1% in Short-Lived Repeat segment  
- Core audience segment is less than 1%  
- Highest metric values in Core segment, followed by Potential Core  
- Median review score is higher for one-time purchasers  
- Core segment has best delivery time performance, One Time has worst  

We will not consider the segment of customers who did not make any successful purchases, as their values will be repetitive.

**By Purchase Amount Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='value_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='value_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- 49% of customers are in medium payment tier  
- 24% in high payment tier, 24% in low  
- High payment tier spends most (expected)  
- No difference in median review scores across tiers  
- High payment tier has higher median order weight  

**By Purchase Frequency Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='purchase_freq_segment'
    , exclude_segments=['Never Converted', 'Non-Repeating']
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='purchase_freq_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Among repeat buyers:  
  - Weekly purchasers: 1% (most common frequency)  
  - Quarterly/Semi-annual buyers show better metrics than other frequencies  

**By Time to Next Purchase Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='repeat_segment'
    , exclude_segments=['Never Converted', 'Non-Repeating']
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='repeat_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Among repeat buyers, medium repurchase time segment is smallest (<1%)  
- Fast repurchase segment shows worse metrics than medium/slow segments  

**By Loyalty Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='loyalty_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='loyalty_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Loyalty segments:  
  - Promoters: 58%  
  - Critics: 13% (lowest)  
- Critics have:  
  - Higher total payment and AOV than promoters/neutrals  
  - Shortest time between first/last purchase (rarely return)  
  - Heavier average orders  
  - Worst median delivery time performance  

**By Risk Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='risk_segment'
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='risk_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- 99.5% of customers are "Reliable" (no order cancellations)  
- Cancellation segment has:  
  - Much shorter time between first/second purchase  
  - Higher median total spend and AOV  

**By Day of the Week Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='weekday_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='weekday_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- 75% of customers only purchased on weekdays  
- Weekend purchasers have significantly longer time between first/last purchase  

**By Installment Payment Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='installment_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='installment_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- 50% used installments at least once  
- 47% always paid in full  
- Installment users have significantly higher:  
  - Median total spend  
  - AOV  
  - Order weight  
  - Time between first/last purchase  

**By Average Number of Products per Order Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='products_cnt_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='products_cnt_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- 88% of customers had ≤1 product per order  
- 8% averaged 1-2 products  
- Only 2% averaged >2 products  
- Customers with 2+ products per order have significantly higher:  
  - Median order weight  
  - Total spend  
  - AOV  
  - Time between first/last purchase  

**By Average Weight of Order Segment**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='weight_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='weight_segment'
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Order weight segments:  
  - Light: 39%  
  - Medium: 37%  
  - Heavy: 21%  
- Heavy segment has significantly higher total spend and AOV  
- Light segment has shorter time between first/last purchase  

**By Top Days of the Week**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='customer_top_purchase_weekdays'
    , exclude_segments=['Never Converted']
    , max_segments=5
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='customer_top_purchase_weekdays'
    , exclude_segments=['Never Converted']
    , max_segments=5
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Most customers only purchased on one weekday (expected due to low repeat purchases)  
- Top 3 purchase days: Monday, Tuesday, Wednesday  
- Monday-only buyers have longer time between first/last purchase than other top segments (possibly coincidental)  

**By Top Payment Types**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='customer_payment_types'
    , exclude_segments=['Never Converted']
    , max_segments=5
    , count_column='customer_unique_id'
    , labels=labels_for_polar
    , text_auto=True
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='customer_payment_types'
    , exclude_segments=['Never Converted']
    , max_segments=5
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Payment methods:  
  - Credit card only: 73%  
  - Boleto only: 19%  
- Voucher-only segment has lower total spend and AOV than other top payment segments  


**By Top Product Categories**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='customer_top_product_categories'
    , exclude_segments=['Never Converted']
    , max_segments=5
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='customer_top_product_categories'
    , max_segments=5
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Most customers only bought from:  
  - Bed Bath Table  
  - Health Beauty categories  
- Sports goods buyers have longer time between first/last purchase than other category segments  


**By Top General Product Categories**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='customer_top_general_product_categories'
    , exclude_segments=['Never Converted']
    , max_segments=5
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='customer_top_general_product_categories'
    , exclude_segments=['Never Converted']
    , max_segments=5
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Top generalized category segments:  
  - Electronics only: 26%  
  - Furniture only: 17%  
  - Home & Garden only: 14%  

**By Customer State**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='customer_state'
    , exclude_segments=['Never Converted']
    , max_segments=5
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='customer_state'
    , max_segments=5
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Customer distribution by state:  
  - São Paulo: 42%  
  - Rio de Janeiro: 13%  
  - Minas Gerais: 12%  

**By Customer City**

In [None]:
fig = df_customers.analysis.segment_polar(
    metrics=selected_metrics
    , dimension='customer_city'
    , exclude_segments=['Never Converted']
    , max_segments=5
    , count_column='customer_unique_id'
    , labels=labels_for_polar
)
pb.to_slide(fig)

In [None]:
df_customers.analysis.segment_table(
    metrics=selected_metrics
    , dimension='customer_city'
    , max_segments=5
    , exclude_segments=['Never Converted']
    , count_column='customer_unique_id' 
)
fig.show()

**Key Observations:**  

- Customer distribution by city:  
  - São Paulo: 16%  
  - Rio de Janeiro: 7%  

## Customer Profiling

**By Purchase Frequency & Loyalty**

- **One-Time Buyers (94%):**
  - Single purchase only
  - Low engagement (short time between purchases)
  
- **Potential Core (1%):**
  - Potentially loyal but not yet core
  - Strong metrics (second only to Core)

- **Core (<1%):**
  - Loyalty core: highest spending, best metrics

- **Short-Lived Repeat (1%):**
  - Short-term loyalty 

**Recommendations:**

1. Convert One-Time to Potential Core:
   - Launch loyalty programs
   - Personalized offers based on first purchase
2. Retain Core customers:
   - Premium service tier
   - Exclusive early access to sales

---

**By Payment Amount**

- **High-Spend (24%):**
  - Large orders
  - Heavy items 
  - Critical risk

- **Medium-Spend (49%):**
  - Stable base 
  - Balanced metrics

- **Low-Spend (24%):**
  - Small orders
  - Likely trial purchases

**Recommendations:**
1. For High-Spend:
   - Improve delivery (current avg. 18 days)
   - Dedicated account managers
2. For Low-Spend:
   - Cross-sell bundles (+15% discount)
   - "Complete your set" prompts

---

**By Repurchase Timing**

- **Fast Repeat (<1%):**
  - Quick repurchase 
  - Low satisfaction 

- **Seasonal (1%):**
  - Quarterly/semi-annual purchases
  - High value 

**Recommendations:**
1. For Fast Repeat:
   - Post-purchase follow-ups
   - Satisfaction surveys
2. For Seasonal:
   - Pre-season reminders
   - "Back in stock" alerts

---

**By Loyalty**

- **Promoters (58%):**
  - High ratings (4-5 stars)
  - Low retention (94% one-time)

- **Critics (13%):**
  - High spenders 
  - Fast churn 

**Recommendations:**
1. For Promoters:
   - "Refer a friend" bonuses
   - Repeat purchase incentives
2. For Critics:
   - Logistics improvements
   - VIP complaint resolution

---

**Behavioral Patterns**

- Customers who made purchases not only on weekends (25%):
  - More loyal (longer time between purchases).
- Customers who use installment payments (50%):
  - Higher order amounts, longer customer lifetime — "serious" customers.
- Customers with 2+ products in an order (2%):
  - Key for revenue (high metrics).
- Customers who use only a voucher:
  - Have lower total purchase amounts and average order amounts.


**Recommendations:**

- Installment campaigns:
  - "0% interest for 3 months"
- Multi-item incentives:
  - "Free shipping on 3+ items"
- Voucher users:
  - Upsell to credit card payments

---

**By Geographic**

- **São Paulo (42%):**
  - Electronics/Furniture focus
  - 18% faster delivery than average

- **Rio de Janeiro (13%):**
  - High Fashion/Beauty demand
  - 22% installment adoption

**Recommendations:**

1. Localized campaigns:
   - "SP Furniture Week" discounts
   - "Rio Beauty Box" bundles
2. Warehouse optimization:
   - Strategic stock placement
   - Regional delivery hubs

## Pairwise Segment Combinations

Examine the distribution of customers across combinations of 2 segments. 

We will exclude the non-converted segment from the analysis.

In [None]:
pb.configure(
    df = df_customers[df_customers.buys_cnt.notna()]
    , metric = 'customer_unique_id'
    , metric_label = 'Share of Customers'
    , agg_func = 'nunique'
    , norm_by='all'
    , axis_sort_order='descending'    
    , text_auto='.1%'
)

**loyalty_segment and value_segment**

In [None]:
pb.cat_compare(
    cat1='loyalty_segment'
    , cat2 = 'value_segment'
    , visible_graphs = [2, 3]
)

**Key Observations:**

- The medium payment tier dominates across all loyalty segments.
- Promoters are the majority in all payment tiers.
- Critics stand out noticeably in the high payment tier segment.

**purchase_freq_segment and value_segment**

In [None]:
pb.cat_compare(
    cat1='purchase_freq_segment'
    , cat2 = 'value_segment'
    , visible_graphs = [2, 3]
)

**Key Observations:**

- The low Value segment has a significantly higher proportion of non-repeat purchasers (logical since they don't make repeat purchases).
- In the high Value segment, the weekly purchase frequency segment underperforms - meaning fewer purchases occurred weekly.

**activity_segment and repeat_segment**

In [None]:
pb.cat_compare(
    cat1='activity_segment'
    , cat2 = 'repeat_segment'
    , visible_graphs = [2, 3]
)

**Key Observations:**  

- The slow repeat segment is clearly highlighted in potential core, meaning they have a long time between repeat purchases. The same pattern is present in the core segment, but it is less pronounced.


**loyalty_segment and risk_segment**

In [None]:
pb.cat_compare(
    cat1='loyalty_segment'
    , cat2 = 'risk_segment'
    , visible_graphs = [2, 3]
)

**Key Observations:**

- The potential core cohort clearly highlights the slow repeat segment, indicating a longer time period before repeat purchases. In the core cohort, the same pattern is observed, but it is less pronounced.

**customer_top_general_product_categories and value_segment**

In [None]:

pb.cat_compare(
    cat1='customer_top_general_product_categories'
    , cat2 = 'value_segment'
    , trim_top_n_cat1=5
    , visible_graphs = [2, 3]
)

**Key Observations:**

- Electronics dominate purchases in the low Value segment, while medium Value segments show noticeably fewer electronics purchases.

**weight_segment and customer_state**

In [None]:
pb.cat_compare(
    cat1='weight_segment'
    , cat2 = 'customer_state' 
    , trim_top_n_cat2=5
    , visible_graphs = [2, 3]
)

**Key Observations:**

- São Paulo has more light-weight orders while Rio de Janeiro has more heavy-weight orders.


**weekday_segment and activity_segment**

In [None]:
pb.cat_compare(
    cat1='weekday_segment'
    , cat2 = 'activity_segment' 
    , visible_graphs = [2, 3]
)

**Key Observations:**

- Core and potential core segments contain more customers who shop beyond just weekdays, while one-time purchasers predominantly shop on weekdays.


**products_cnt_segment and loyalty_segment**

In [None]:
pb.cat_compare(
    cat1='products_cnt_segment'
    , cat2 = 'loyalty_segment' 
    , visible_graphs = [2, 3]
)

**Key Observations:**

- Single-product orders dominate among promoters, while critics tend to have more 2+ product orders.

**installment_segment and repeat_segment**

In [None]:
pb.cat_compare(
    cat1='installment_segment'
    , cat2 = 'repeat_segment' 
    , visible_graphs = [2, 3]
)

**Key Observations:**

- The installment segment contains more customers with longer periods between repeat purchases.
- The non-installment segment shows:
  - Lower proportion of long repeat purchase cycles
  - Dominance of one-time purchasers

In [None]:
%run ../../_post_run.ipynb