Seller Analysis#

Let’s create a helper function.

def seller_top(metric: str, show_cnt: bool=True, ascending=False):
    """Show Top Customers by Metric"""
    cols = ['seller_id', metric]
    if show_cnt:
        if metric == 'products_cnt':
            cols += ['orders_cnt']
        else:
            cols += ['products_cnt', 'orders_cnt']
    display(
        df_sellers[cols]
        .sort_values(metric, ascending=ascending)
        .set_index('seller_id')
        .head(10)
    )

Number of Products#

Let’s identify the top sellers.

seller_top('products_cnt')
products_cnt orders_cnt
seller_id
6560211a19b47992c3666cc44a7e94c0 2,003.00 1,826.00
4a3ca9315b744ce9f8e9374361493884 1,952.00 1,775.00
1f50f920176fa81dab994f9023523100 1,926.00 1,399.00
cc419e0650a3c5ba77189a1882b7556a 1,719.00 1,651.00
da8622b14eb17ae2831f4ac5b9dab84a 1,548.00 1,311.00
955fee9216a65b617aa5c0531780ce60 1,475.00 1,264.00
1025f0e2d44d7041d6cf58b6550e0bfa 1,420.00 910.00
7c67e1448b00f6e969d365cea6b010ab 1,356.00 974.00
ea8482cd71df3c1969d7b9473ff13abc 1,188.00 1,132.00
7a67c85e85bb2ce8582c35f2203ad736 1,156.00 1,146.00

Key Observations:

  • Seller “6560211a19b47992c3666cc44a7e94c0” sold the most products

Let’s see at statistics and distribution of the metric.

df_sellers['products_cnt'].explore.info(
    labels=dict(unique_products_cnt='Number of Products')
    , title='Distribution of Number of Products Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "products_cnt" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 2.95k (95%) Max 2.00k Mean 37.33 1 471 (15%)
Missing 148 (5%) 99% 437.26 Trimmed Mean (10%) 15.06 2 278 (9%)
Distinct 262 (8%) 95% 151 Mode 1 3 193 (6%)
Non-Duplicate 114 (4%) 75% 26 Range 2.00k 4 145 (5%)
Duplicates 2.83k (92%) 50% 8 IQR 24 5 143 (5%)
Dup. Values 148 (5%) 25% 2 Std 120.08 6 106 (3%)
Zeros --- 5% 1 MAD 10.38 7 97 (3%)
Negative --- 1% 1 Kurt 119.73 8 95 (3%)
Memory Usage <1 Mb Min 1 Skew 9.61 9 79 (3%)
../../_images/c68e0c6e372d0ee7fc76b2991d232c8ffe427513b18c9f49c13501c3ea407cb7.jpg

Key Observations:

  • 75% of sellers sold ≤26 products total

  • Top 5% sold >150 products

Number of Unique Products#

Let’s identify the top sellers.

seller_top('unique_products_cnt')
unique_products_cnt products_cnt orders_cnt
seller_id
4a3ca9315b744ce9f8e9374361493884 395.00 1,952.00 1,775.00
d91fb3b7d041e83b64a00a3edfb37e4f 309.00 527.00 526.00
cca3071e3e9bb7d12640c9fbe2301306 307.00 804.00 691.00
fa1c13f2614d7b5c4749cbc52fecda94 285.00 581.00 580.00
7142540dd4c91e2237acb7e911c4eba2 265.00 328.00 316.00
6560211a19b47992c3666cc44a7e94c0 254.00 2,003.00 1,826.00
da8622b14eb17ae2831f4ac5b9dab84a 221.00 1,548.00 1,311.00
ea8482cd71df3c1969d7b9473ff13abc 219.00 1,188.00 1,132.00
3d871de0142ce09b7081e2b9d1733cb1 202.00 1,131.00 1,064.00
7c67e1448b00f6e969d365cea6b010ab 197.00 1,356.00 974.00

Key Observations:

  • Seller “4a3ca9315b744ce9f8e9374361493884” sold the most unique products

Let’s see at statistics and distribution of the metric.

df_sellers['unique_products_cnt'].explore.info(
    labels=dict(unique_products_cnt='Number of Unique Products')
    , title='Distribution of Number of Unique Products Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "unique_products_cnt" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 2.95k (95%) Max 395 Mean 11.39 1 669 (22%)
Missing 148 (5%) 99% 126 Trimmed Mean (10%) 6.14 2 414 (13%)
Distinct 127 (4%) 95% 46 Mode 1 3 285 (9%)
Non-Duplicate 49 (2%) 75% 10 Range 394 4 207 (7%)
Duplicates 2.97k (96%) 50% 4 IQR 8 5 166 (5%)
Dup. Values 78 (3%) 25% 2 Std 24.55 6 123 (4%)
Zeros --- 5% 1 MAD 4.45 8 102 (3%)
Negative --- 1% 1 Kurt 60.57 7 101 (3%)
Memory Usage <1 Mb Min 1 Skew 6.49 9 88 (3%)
../../_images/b992d1168bc95ed70a04ae0cd676bcaed14c73321bf4f55eb3574ca40c349bef.jpg

Key Observations:

  • 75% of sellers sold ≤10 unique products

  • Top 5% sold >45 unique products

Number of Orders#

Let’s identify the top sellers.

seller_top('orders_cnt', show_cnt=False)
orders_cnt
seller_id
6560211a19b47992c3666cc44a7e94c0 1,826.00
4a3ca9315b744ce9f8e9374361493884 1,775.00
cc419e0650a3c5ba77189a1882b7556a 1,651.00
1f50f920176fa81dab994f9023523100 1,399.00
da8622b14eb17ae2831f4ac5b9dab84a 1,311.00
955fee9216a65b617aa5c0531780ce60 1,264.00
7a67c85e85bb2ce8582c35f2203ad736 1,146.00
ea8482cd71df3c1969d7b9473ff13abc 1,132.00
4869f7a5dfa277a7dca6462dcf3b52b2 1,124.00
3d871de0142ce09b7081e2b9d1733cb1 1,064.00

Key Observations:

  • Seller “6560211a19b47992c3666cc44a7e94c0” participated in the most orders

Let’s see at statistics and distribution of the metric.

df_sellers['orders_cnt'].explore.info(
    labels=dict(orders_cnt='Number of Orders')
    , title='Distribution of Number of Orders Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "orders_cnt" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 2.95k (95%) Max 1.83k Mean 33.15 1 517 (17%)
Missing 148 (5%) 99% 402.78 Trimmed Mean (10%) 13.27 2 307 (10%)
Distinct 247 (8%) 95% 133 Mode 1 3 209 (7%)
Non-Duplicate 112 (4%) 75% 22 Range 1.82k 4 153 (5%)
Duplicates 2.85k (92%) 50% 7 IQR 20 5 141 (5%)
Dup. Values 135 (4%) 25% 2 Std 105.79 6 110 (4%)
Zeros --- 5% 1 MAD 8.90 7 96 (3%)
Negative --- 1% 1 Kurt 117.00 8 94 (3%)
Memory Usage <1 Mb Min 1 Skew 9.43 9 84 (3%)
../../_images/264fd2878bc796802b4dabd9ab6b222cb9431cb7ebf2966c3018b7f44c2026b9.jpg

Key Observations:

  • 75% of sellers participated in ≤22 orders

  • Top 5% participated in ≥130 orders

Total Sales Revenue#

Let’s identify the top sellers.

seller_top('revenue')
revenue products_cnt orders_cnt
seller_id
4869f7a5dfa277a7dca6462dcf3b52b2 226,987.93 1,148.00 1,124.00
53243585a1d6dc2643021fd1853d8905 218,620.44 404.00 352.00
4a3ca9315b744ce9f8e9374361493884 197,210.82 1,952.00 1,775.00
fa1c13f2614d7b5c4749cbc52fecda94 191,785.04 581.00 580.00
7c67e1448b00f6e969d365cea6b010ab 186,740.04 1,356.00 974.00
7e93a43ef30c4f03f38b393420bc753a 164,582.49 321.00 318.00
da8622b14eb17ae2831f4ac5b9dab84a 159,816.87 1,548.00 1,311.00
7a67c85e85bb2ce8582c35f2203ad736 139,908.68 1,156.00 1,146.00
1025f0e2d44d7041d6cf58b6550e0bfa 138,208.56 1,420.00 910.00
955fee9216a65b617aa5c0531780ce60 132,138.71 1,475.00 1,264.00

Key Observations:

  • Seller “4869f7a5dfa277a7dca6462dcf3b52b2” generated the most revenue

Let’s see at statistics and distribution of the metric.

df_sellers['revenue'].explore.info(
    labels=dict(revenue='Seller Revenue')
    , title='Distribution of Seller Revenue'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "revenue" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 2.95k (95%) Max 226.99k Mean 4.48k 59.90 7 (<1%)
Missing 148 (5%) 99% 54.44k Trimmed Mean (10%) 1.84k 69.90 7 (<1%)
Distinct 2.68k (86%) 95% 16.97k Mode Multiple 120 7 (<1%)
Non-Duplicate 2.52k (81%) 75% 3.51k Range 226.98k 119.90 6 (<1%)
Duplicates 418 (14%) 50% 851.10 IQR 3.28k 180 6 (<1%)
Dup. Values 157 (5%) 25% 228.70 Std 13.96k 49.90 6 (<1%)
Zeros --- 5% 49 MAD 1.11k 29.90 6 (<1%)
Negative --- 1% 17.77 Kurt 102.35 89.90 6 (<1%)
Memory Usage <1 Mb Min 6.50 Skew 8.87 79.90 6 (<1%)
../../_images/cb536f0da862deb4d85c8c2486f9ca070cf622145e8df1b85dbad8c08c6abd02.jpg

Key Observations:

  • 75% of sellers made ≤3,500 R$

  • Top 5% made ≥17,000 R$

Number of Products per Order#

Let’s identify the top sellers.

seller_top('avg_prouducts_cnt')
avg_prouducts_cnt products_cnt orders_cnt
seller_id
0b36063d5818f81ccb94b54adfaebbf5 15.00 15.00 1.00
99a25c39b28a74d1151c35c18d178292 6.00 6.00 1.00
a3b42d266fa8afc874b909422ce88582 6.00 6.00 1.00
c394e193cda3b4225ff2094d32184849 5.50 11.00 2.00
010da0602d7774602cd1b3f5fb7b709e 5.00 5.00 1.00
8ff38bc3969e67c36c48343a07090f66 4.00 8.00 2.00
fc6295add6f51a0936407ead70c1001d 4.00 4.00 1.00
ec8463980a4e0ea9f8517aea1ed0c419 4.00 8.00 2.00
b37c4c02bda3161a7546a4e6d222d5b2 3.75 15.00 4.00
ab27bbbad5239bc31a34709275a70db4 3.67 11.00 3.00

Key Observations:

  • Seller ‘0b36063d5818f81ccb94b54adfaebbf5’ has highest average products per order (single order)

Let’s see at statistics and distribution of the metric.

df_sellers['avg_prouducts_cnt'].explore.info(
    labels=dict(avg_prouducts_cnt='Average Number of Products in Order')
    , title='Distribution of Average Number of Products in Order per Sellers'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "avg_prouducts_cnt" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 2.95k (95%) Max 15 Mean 1.16 1 1.56k (51%)
Missing 148 (5%) 99% 3 Trimmed Mean (10%) 1.07 1.50 69 (2%)
Distinct 511 (17%) 95% 1.82 Mode 1 2 62 (2%)
Non-Duplicate 359 (12%) 75% 1.14 Range 14 1.33 57 (2%)
Duplicates 2.58k (83%) 50% 1 IQR 0.14 1.17 42 (1%)
Dup. Values 152 (5%) 25% 1 Std 0.44 1.25 41 (1%)
Zeros --- 5% 1 MAD 0 1.14 38 (1%)
Negative --- 1% 1 Kurt 346.25 1.11 28 (<1%)
Memory Usage <1 Mb Min 1 Skew 13.39 1.20 28 (<1%)
../../_images/329f9c84915f075c3b7e670a4ddf2529e53f196277436f66339a92b520f77bd7.jpg

Key Observations:

  • 75% of sellers average 1.14 products per order

  • Top 1% average ≥3 products

Total Product Value per Order#

Let’s identify the top sellers.

seller_top('avg_order_total_price')
avg_order_total_price products_cnt orders_cnt
seller_id
e3b4998c7a498169dc7bce44e6bb6277 6,735.00 1.00 1.00
80ceebb4ee9b31afb6c6a916a574a1e2 6,729.00 1.00 1.00
ee27a8f15b1dded4d213a468ba4eb391 6,499.00 1.00 1.00
b37c4c02bda3161a7546a4e6d222d5b2 6,018.75 15.00 4.00
585175ec331ea177fa47199e39a6170a 3,549.00 1.00 1.00
abe021b01ba992245271b9aa422032df 3,360.00 2.00 2.00
a00824eb9093d40e589b940ec45c4eb0 3,133.32 3.00 3.00
e2a1ac9bf33e5549a2a4f834e70df2f8 2,999.89 5.00 5.00
e908c0f3646e8b60375734a350d95d71 2,951.00 1.00 1.00
d63c73efd41eb002280e7ec831424edb 2,799.00 2.00 2.00

Key Observations:

  • Sellers “e3b4998c7a498169dc7bce44e6bb6277” and “80ceebb4ee9b31afb6c6a916a574a1e2” had highest order values (single order each)

Let’s see at statistics and distribution of the metric.

df_sellers['avg_order_total_price'].explore.info(
    labels=dict(avg_order_total_price='Average Amount of Products in Order, R$')
    , title='Distribution of Average Amount of Products in Order Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "avg_order_total_price" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 2.95k (95%) Max 6.74k Mean 196.49 79.90 11 (<1%)
Missing 148 (5%) 99% 1.61k Trimmed Mean (10%) 127.87 59.90 11 (<1%)
Distinct 2.63k (85%) 95% 641.15 Mode Multiple 69.90 11 (<1%)
Non-Duplicate 2.47k (80%) 75% 189 Range 6.73k 29.90 9 (<1%)
Duplicates 466 (15%) 50% 105.53 IQR 128.72 39.90 9 (<1%)
Dup. Values 157 (5%) 25% 60.28 Std 369.92 49.90 8 (<1%)
Zeros --- 5% 26.17 MAD 80.72 119.90 7 (<1%)
Negative --- 1% 14.23 Kurt 127.40 109.90 6 (<1%)
Memory Usage <1 Mb Min 6 Skew 9.16 90 6 (<1%)
../../_images/c9b2ff43f74b1238bc88322866f09b2979e5ef28b74103890448b58a09747657.jpg

Key Observations:

  • 75% of sellers average ≤189 R$ per order

  • Top 5% average ≥641 R$

Product Price per Order#

Let’s identify the top sellers.

seller_top('avg_product_price')
avg_product_price products_cnt orders_cnt
seller_id
e3b4998c7a498169dc7bce44e6bb6277 6,735.00 1.00 1.00
80ceebb4ee9b31afb6c6a916a574a1e2 6,729.00 1.00 1.00
ee27a8f15b1dded4d213a468ba4eb391 6,499.00 1.00 1.00
585175ec331ea177fa47199e39a6170a 3,549.00 1.00 1.00
abe021b01ba992245271b9aa422032df 3,360.00 2.00 2.00
a00824eb9093d40e589b940ec45c4eb0 3,133.32 3.00 3.00
e2a1ac9bf33e5549a2a4f834e70df2f8 2,999.89 5.00 5.00
e908c0f3646e8b60375734a350d95d71 2,951.00 1.00 1.00
d63c73efd41eb002280e7ec831424edb 2,799.00 2.00 2.00
1444c08e64d55fb3c25f0f09c07ffcf2 2,749.00 1.00 1.00

Key Observations:

  • Seller “e3b4998c7a498169dc7bce44e6bb6277” has highest average product price (single order)

Let’s see at statistics and distribution of the metric.

df_sellers['avg_product_price'].explore.info(
    labels=dict(avg_product_price='Average Product Price in Order, R$')
    , title='Distribution of Average Product Price in Order Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "avg_product_price" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 2.95k (95%) Max 6.74k Mean 180.46 59.90 14 (<1%)
Missing 148 (5%) 99% 1.58k Trimmed Mean (10%) 116.39 69.90 12 (<1%)
Distinct 2.58k (83%) 95% 595.09 Mode 59.90 79.90 11 (<1%)
Non-Duplicate 2.42k (78%) 75% 174 Range 6.73k 29.90 11 (<1%)
Duplicates 516 (17%) 50% 97.24 IQR 119.77 99 10 (<1%)
Dup. Values 163 (5%) 25% 54.23 Std 347.44 39.90 10 (<1%)
Zeros --- 5% 23.90 MAD 76.36 119.90 8 (<1%)
Negative --- 1% 12.90 Kurt 138.90 120 8 (<1%)
Memory Usage <1 Mb Min 6 Skew 9.45 49.90 8 (<1%)
../../_images/7e76e089690dc41ba63f7490eb2e99e5545222ff23d4cc4f69ee46ec2be5e3ce.jpg

Key Observations:

  • 75% of sellers average ≤174 R$ per product

  • Top 5% average ≥595 R$

Product Weight#

Let’s identify the top sellers.

seller_top('avg_product_weight_kg')
avg_product_weight_kg products_cnt orders_cnt
seller_id
5d043cd5512d4bd2f88e5ccdd5736c38 30.00 1.00 1.00
d7827b2af99326a03b0ed9c7a24db0d3 30.00 1.00 1.00
731ef20c231d9a7103a425e83fd91271 30.00 3.00 2.00
19a7c9f66ffa6452ff4b4c0cca71da27 30.00 1.00 1.00
a99504ac23ed125aaf5302af6cc30af9 30.00 1.00 1.00
a9415cf907dfa12f9ed18693244f5329 30.00 2.00 1.00
08cdbae123ff67ca4e36d9d641ce0119 30.00 6.00 4.00
ccb83a794700270fde70898fe9ff368b 30.00 12.00 12.00
6fa9202c10491e472dffd59a3e82b2a3 30.00 3.00 3.00
a70e9066dbfd10b6b97a3f54a1356762 30.00 4.00 2.00

Key Observations:

  • Maximum average product weight: 30kg

Let’s see at statistics and distribution of the metric.

df_sellers['avg_product_weight_kg'].explore.info(
    labels=dict(avg_product_weight_kg='Average Weight of Products, kg')
    , title='Distribution of Average Weight of Products Per Seller'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "avg_product_weight_kg" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 2.95k (95%) Max 30 Mean 2.68 0.20 69 (2%)
Missing 148 (5%) 99% 24.05 Trimmed Mean (10%) 1.65 0.30 57 (2%)
Distinct 767 (25%) 95% 11.14 Mode 0.20 0.50 46 (1%)
Non-Duplicate 383 (12%) 75% 2.73 Range 29.95 0.40 46 (1%)
Duplicates 2.33k (75%) 50% 1 IQR 2.27 0.15 46 (1%)
Dup. Values 384 (12%) 25% 0.46 Std 4.35 0.60 35 (1%)
Zeros --- 5% 0.19 MAD 1.04 0.35 34 (1%)
Negative --- 1% 0.10 Kurt 14.02 0.55 33 (1%)
Memory Usage <1 Mb Min 0.05 Skew 3.40 0.25 31 (1%)
../../_images/208dced732651d68fa120798409e461b0e290b226a8108c7fa04d2cc03a20d38.jpg

Key Observations:

  • 75% of sellers average ≤2.7kg

  • Top 5% average ≥11kg

Carrier Handoff Delay#

Let’s identify the top sellers.

seller_top('avg_carrier_delivery_delay_days')
avg_carrier_delivery_delay_days products_cnt orders_cnt
seller_id
586a871d4f1221763fddb6ceefdeb95e 45.43 2.00 1.00
8e670472e453ba34a379331513d6aab1 40.15 1.00 1.00
20f0aeea30bc3b8c4420be8ced4226c0 32.60 1.00 1.00
e09887ca8c7bf8a4621ce481820414ef 32.02 3.00 2.00
6d04126aba80df143fd038e711b8fd96 26.87 1.00 1.00
244b04680fdbded0acf5aebd9c92b44a 23.73 2.00 2.00
8d92f3ea807b89465643c219455e7369 21.52 8.00 3.00
8629a7efec1aab257e58cda559f03ba7 19.31 1.00 1.00
bd4ff04377f974574f7f0bf9d6ce7dde 19.10 1.00 1.00
c004e5ea15737026cecaee0447e00b75 18.52 1.00 1.00

Key Observations:

  • Seller “586a871d4f1221763fddb6ceefdeb95e” had maximum carrier handoff delay: 45 days

Let’s see at statistics and distribution of the metric.

df_sellers['avg_carrier_delivery_delay_days'].explore.info(
    labels=dict(avg_carrier_delivery_delay_days='Average Carrier Delivery Delay Time, days')
    , title='Distribution of Average Carrier Delivery Delay Time Per Seller'
    , lower_quantile=0.05
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "avg_carrier_delivery_delay_days" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 2.95k (95%) Max 45.43 Mean -3.04 -0.39 1 (<1%)
Missing 149 (5%) 99% 9.34 Trimmed Mean (10%) -3.24 -5.22 1 (<1%)
Distinct 2.95k (95%) 95% 1.17 Mode Multiple -3.00 1 (<1%)
Non-Duplicate 2.95k (95%) 75% -2.20 Range 79.55 -3.52 1 (<1%)
Duplicates 148 (5%) 50% -3.28 IQR 2.06 -4.50 1 (<1%)
Dup. Values 0 (<1%) 25% -4.26 Std 3.55 -9.99 1 (<1%)
Zeros --- 5% -6.51 MAD 1.53 -2.93 1 (<1%)
Negative 2.72k (88%) 1% -11.68 Kurt 37.56 -3.35 1 (<1%)
Memory Usage <1 Mb Min -34.12 Skew 3.03 -4.03 1 (<1%)
../../_images/26029fd675f35ee6a8a1f37a1bf4bca93cbb34bea194d14ee2c80ef26d6a439c.jpg

Key Observations:

  • Top 5% of sellers delivered to carrier ≥6.5 days early

  • 75% delivered ≥2 days early

  • Bottom 5% delayed ≥1 day