In [None]:
%run ../../_pre_run.ipynb

# Seller Analysis

Let’s create a helper function.

In [None]:
def seller_top(metric: str, show_cnt: bool=True, ascending=False):
    """Show Top Customers by Metric"""
    cols = ['seller_id', metric]
    if show_cnt:
        if metric == 'products_cnt':
            cols += ['orders_cnt']
        else:
            cols += ['products_cnt', 'orders_cnt']
    display(
        df_sellers[cols]
        .sort_values(metric, ascending=ascending)
        .set_index('seller_id')
        .head(10)
    )

## Number of Products

Let’s identify the top sellers.

In [None]:
seller_top('products_cnt')

**Key Observations:**  

- Seller "6560211a19b47992c3666cc44a7e94c0" sold the most products  

Let’s see at statistics and distribution of the metric.

In [None]:
df_sellers['products_cnt'].explore.info(
    labels=dict(unique_products_cnt='Number of Products')
    , title='Distribution of Number of Products Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

**Key Observations:**  

- 75% of sellers sold ≤26 products total  
- Top 5% sold >150 products  

## Number of Unique Products

Let’s identify the top sellers.

In [None]:
seller_top('unique_products_cnt')

**Key Observations:**  

- Seller "4a3ca9315b744ce9f8e9374361493884" sold the most unique products  

Let’s see at statistics and distribution of the metric.

In [None]:
df_sellers['unique_products_cnt'].explore.info(
    labels=dict(unique_products_cnt='Number of Unique Products')
    , title='Distribution of Number of Unique Products Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

**Key Observations:**  

- 75% of sellers sold ≤10 unique products  
- Top 5% sold >45 unique products  

## Number of Orders

Let’s identify the top sellers.

In [None]:
seller_top('orders_cnt', show_cnt=False)

**Key Observations:**  

- Seller "6560211a19b47992c3666cc44a7e94c0" participated in the most orders  

Let’s see at statistics and distribution of the metric.

In [None]:
df_sellers['orders_cnt'].explore.info(
    labels=dict(orders_cnt='Number of Orders')
    , title='Distribution of Number of Orders Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

**Key Observations:**  

- 75% of sellers participated in ≤22 orders  
- Top 5% participated in ≥130 orders  

## Total Sales Revenue

Let’s identify the top sellers.

In [None]:
seller_top('revenue')

**Key Observations:**  

- Seller "4869f7a5dfa277a7dca6462dcf3b52b2" generated the most revenue  


Let’s see at statistics and distribution of the metric.

In [None]:
df_sellers['revenue'].explore.info(
    labels=dict(revenue='Seller Revenue')
    , title='Distribution of Seller Revenue'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

**Key Observations:**  

- 75% of sellers made ≤3,500 R$  
- Top 5% made ≥17,000 R$  

## Number of Products per Order

Let’s identify the top sellers.

In [None]:
seller_top('avg_prouducts_cnt')

**Key Observations:**  

- Seller '0b36063d5818f81ccb94b54adfaebbf5' has highest average products per order (single order)  

Let’s see at statistics and distribution of the metric.

In [None]:
df_sellers['avg_prouducts_cnt'].explore.info(
    labels=dict(avg_prouducts_cnt='Average Number of Products in Order')
    , title='Distribution of Average Number of Products in Order per Sellers'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

**Key Observations:**  

- 75% of sellers average 1.14 products per order  
- Top 1% average ≥3 products  

## Total Product Value per Order

Let’s identify the top sellers.

In [None]:
seller_top('avg_order_total_price')

**Key Observations:**  

- Sellers "e3b4998c7a498169dc7bce44e6bb6277" and "80ceebb4ee9b31afb6c6a916a574a1e2" had highest order values (single order each)  

Let’s see at statistics and distribution of the metric.

In [None]:
df_sellers['avg_order_total_price'].explore.info(
    labels=dict(avg_order_total_price='Average Amount of Products in Order, R$')
    , title='Distribution of Average Amount of Products in Order Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

**Key Observations:**  

- 75% of sellers average ≤189 R$ per order  
- Top 5% average ≥641 R$  

## Product Price per Order

Let’s identify the top sellers.

In [None]:
seller_top('avg_product_price')

**Key Observations:**  

- Seller "e3b4998c7a498169dc7bce44e6bb6277" has highest average product price (single order)  


Let’s see at statistics and distribution of the metric.

In [None]:
df_sellers['avg_product_price'].explore.info(
    labels=dict(avg_product_price='Average Product Price in Order, R$')
    , title='Distribution of Average Product Price in Order Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

**Key Observations:**  

- 75% of sellers average ≤174 R$ per product  
- Top 5% average ≥595 R$  

## Product Weight

Let’s identify the top sellers.

In [None]:
seller_top('avg_product_weight_kg')

**Key Observations:**  

- Maximum average product weight: 30kg  

Let’s see at statistics and distribution of the metric.

In [None]:
df_sellers['avg_product_weight_kg'].explore.info(
    labels=dict(avg_product_weight_kg='Average Weight of Products, kg')
    , title='Distribution of Average Weight of Products Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

**Key Observations:**  

- 75% of sellers average ≤2.7kg  
- Top 5% average ≥11kg  

## Carrier Handoff Delay

Let’s identify the top sellers.

In [None]:
seller_top('avg_carrier_delivery_delay_days')

**Key Observations:**  

- Seller "586a871d4f1221763fddb6ceefdeb95e" had maximum carrier handoff delay: 45 days  


Let’s see at statistics and distribution of the metric.

In [None]:
df_sellers['avg_carrier_delivery_delay_days'].explore.info(
    labels=dict(avg_carrier_delivery_delay_days='Average Carrier Delivery Delay Time, days')
    , title='Distribution of Average Carrier Delivery Delay Time Per Seller'
    , lower_quantile=0.05
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

**Key Observations:**  

- Top 5% of sellers delivered to carrier ≥6.5 days early  
- 75% delivered ≥2 days early  
- Bottom 5% delayed ≥1 day  

In [None]:
%run ../../_post_run.ipynb