Delivery Analysis#

Delivery Cost#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , metric = 'total_freight_value'
    , metric_label = 'Average Freight Value per Order, R$'
    , metric_label_for_distribution = 'Freight Value per Order, R$'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
)
print(f'Average Freight Value per Order: {df_sales.total_freight_value.mean():.2f} R$')
Average Freight Value per Order: 22.78 R$

Top Orders.

pb.metric_top()
total_freight_value
order_id
cf4659487be50c0c317cff3564c4a840 1,794.96
2455cbeb73fd04b170ca2504662f95ce 1,002.29
cfed507ac357129f750f05a0d7d71b15 711.33
71dab1155600756af6de79de92e712e3 626.64
17784b9fbb37fb0bdc230d8ed6f6b355 502.98
725cf8e9c24e679a8a5a32cb92c9ce1e 497.42
5bd06bab48e0423fc35d1c236d48a6bb 497.08
be382a9e1ed25128148b97d6bfdb21af 479.28
c52c7fbe316b5b9d549e8a25206b8a1f 458.73
62073ec6b54b8e6322037fc0f3591ad3 456.47

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    upper_quantile=0.95
    , hist_mode='dual_hist_trim'    
)
Summary Statistics for "total_freight_value" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.35k (100%) Max 1.79k Mean 22.78 15.10 2.90k (3%)
Missing --- 99% 104.24 Trimmed Mean (10%) 19.03 7.78 1.80k (2%)
Distinct 7.87k (8%) 95% 54.74 Mode 15.10 14.10 1.49k (2%)
Non-Duplicate 2.80k (3%) 75% 24.01 Range 1.79k 11.85 1.42k (1%)
Duplicates 88.48k (92%) 50% 17.17 IQR 10.16 18.23 1.20k (1%)
Dup. Values 5.07k (5%) 25% 13.85 Std 21.57 7.39 1.13k (1%)
Zeros 336 (<1%) 5% 7.87 MAD 6.49 15.23 811 (<1%)
Negative --- 1% 7.39 Kurt 586.22 16.11 780 (<1%)
Memory Usage 1 Min 0 Skew 12.28 8.72 738 (<1%)
../../_images/3a829f77dc66b7f7627bb27055f41bdcbfad12ca939505ee06cc7d19a5844bc6.jpg

Key Observations:

  • 75% of orders have shipping costs ≤24 R$

  • Top 5% have shipping costs ≥54.7 R$

  • Several extreme outliers exist with very high shipping costs

pb.metric_top(freq='D')
total_freight_value
order_purchase_dt
2017-06-20 33.28
2018-07-02 32.34
2017-06-21 31.19
2017-04-18 30.24
2018-07-15 29.37
2018-07-13 29.25
2018-07-01 29.10
2017-06-25 28.90
2018-07-11 28.78
2017-11-04 28.69

Let’s look by different dimensions.

By Whether the Order is Delayed or Not

pb.histogram(
    color='is_delayed'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
    , nbins=30
).show()
pb.bar_groupby(
    y='is_delayed'
    , show_count=True
).show()
../../_images/f62dcf763862a926a0c64807821c67bae6713a4accbb93bc8b7e2cd36dfb61a5.jpg ../../_images/68502a7a1a1db2e85a100c786b3ed4a8b7f798c3de0508f0a9feaad2362096dd.jpg

Key Observations:

  • Delayed orders have higher shipping costs than non-delayed

By Order Weight Category

pb.histogram(
    color='order_total_weight_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=True
    , show_kde=False
    , nbins=30
).show()
pb.bar_groupby(
    y='order_total_weight_cat'
).show()
../../_images/52d01c71667634c60daf6d37b8bbf03942fdb6c7fc6c6a6e7542bda863beeae0.jpg ../../_images/7c8b095aae4ae44c5d506f0b330f6f12690400179a731a1ee1847f090ea53c7e.jpg

Key Observations:

  • Heavier orders have higher shipping costs (expected pattern)

By Presence of Installment Payments

pb.histogram(
    color='order_has_installment'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=True
    , show_kde=False
    , nbins=30
).show()
pb.bar_groupby(
    y='order_has_installment'
    , show_count=True
    , to_slide=True
).show()
../../_images/f0e3155d77d273827f386b957a09308d5d56d36e7fb9ef370895589151a4c4cc.jpg ../../_images/284177d3a3a41b753095497e6fa09fbc04f9e0518669555cea0d85d5b2688de3.jpg

Key Observations:

  • Installment orders have higher shipping costs

By Top Customer States

pb.box(
    y='customer_state'
    , upper_quantile=0.95
    , show_dual=True
).show()
fig = pb.bar_groupby(
    y='customer_state'
    , show_count=True
).update_layout(xaxis2_title_text='Number of Sales')
pb.to_slide(fig)
fig.show()
../../_images/ac6b63a3076d3b3af7df4557a6381ebd3c497e0b2f1e3f79f792ea847d789dae.jpg ../../_images/33d3a8707e9a813107f1fb675a90e26d4eeeacf312c89a579f1ffdb21f6d6e7d.jpg

Key Observations:

  • Among top states by sales volume:

    • São Paulo has lowest average shipping costs

    • Maranhão has highest

By Top Customer Cities

pb.box(
    y='customer_city'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_city'
    , show_count=True
).update_layout(xaxis2_title_text='Number of Sales')
../../_images/2018716f72f10a2d569f2e07513671f39b0d62341be70be6893687a3823dd77e.jpg ../../_images/d2810154be4f5b1b633fde436be52dd2848a14cde4880158e62fb128357ae077.jpg

Key Observations:

  • Among top cities by sales volume, highest average shipping costs in:

    1. Salvador

    2. Porto Alegre

    3. Brasília

By Review Score

pb.histogram(
    color='order_avg_reviews_score'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
    , nbins=30
).show()
fig = pb.bar_groupby(
    y='order_avg_reviews_score'
    , show_count=True
).update_layout(xaxis2_title_text='Number of Sales')
pb.to_slide(fig)
fig.show()
../../_images/64167bc2e6325b21a1d246e4f1abb115ed8eb339bc95f0976ef095c68042a6bf.jpg ../../_images/71d1b1a9d77f36de223bb86764ae0db8eed35a4a3f02bb18249a34c808362be5.jpg

Key Observations:

  • Higher shipping costs correlate with lower order ratings

Distance Between Customer and Seller#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , metric = 'avg_distance_km'
    , metric_label = 'Average Distance, km'
    , metric_label_for_distribution = 'Average Distance, km'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.1f'
)
print(f'Average Distance: {df_sales.avg_distance_km.mean():.2f} km')
Average Distance: 600.55 km

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    upper_quantile=0.95
    , hist_mode='dual_hist_trim'    
)
Summary Statistics for "avg_distance_km" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 95.87k (99%) Max 8.68k Mean 600.55 0 23 (<1%)
Missing 476 (<1%) 99% 2.48k Trimmed Mean (10%) 491.12 10.64 14 (<1%)
Distinct 90.27k (94%) 95% 2.09k Mode 0 680.55 13 (<1%)
Non-Duplicate 85.91k (89%) 75% 797.49 Range 8.68k 574.96 13 (<1%)
Duplicates 6.07k (6%) 50% 433.74 IQR 610.46 246.96 12 (<1%)
Dup. Values 4.36k (5%) 25% 187.03 Std 593.24 699.86 12 (<1%)
Zeros 23 (<1%) 5% 16.53 MAD 439.42 353.80 10 (<1%)
Negative --- 1% 5.79 Kurt 2.94 576.16 9 (<1%)
Memory Usage 1 Min 0 Skew 1.69 242.40 9 (<1%)
../../_images/f594ba2c4b1abec907246a7353a47d427422195359f0045c9be4d5e49704ba1b.jpg

Key Observations:

  • 75% of orders have seller-buyer distance ≤800km

  • 5% ≤16.5km

  • 5% ≥2,000km

  • Several extreme outliers (>4,000km)

Let’s look by different dimensions.

By Whether the Order is Delayed or Not

pb.histogram(
    color='is_delayed'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
    , nbins=30
).show()
pb.bar_groupby(
    y='is_delayed'
    , to_slide=True
).show()
../../_images/d4e6556f75fad06dc5ca1d9c5749e6a49ba478631d3807ef42830eff098670d0.jpg ../../_images/4676be8eb26dd45ea49913b8126de5035d9ab061d11dfaade2349ba2b15e346f.jpg

Key Observations:

  • Delayed orders have greater average seller-buyer distance

By Presence of Installment Payments

pb.histogram(
    color='order_has_installment'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=True
    , show_kde=False
    , nbins=30
).show()
pb.bar_groupby(
    y='order_has_installment'
    , show_count=True
    , to_slide=True
)
../../_images/2c5a7a35f4363bd0aa693df3bc99bf040718b09e1045760a3725cad9a1bfb0d5.jpg ../../_images/08ca00de8b8be2a2e9bfa7495eb7e018f024daf40212c3212948eae4fad4e5a9.jpg

Key Observations:

  • Installment orders have greater average seller-buyer distance

Delivery Time#

Proportion of Each Stage in Delivery Time#

Let’s look at what percentage of the total delivery time each stage occupies.

We will not consider any anomalous dates, as there are only a few and they will not significantly affect the result.

tmp_df_sales = (
    df_sales[[
        'order_purchase_dt',
        'order_approved_dt',
        'order_delivered_carrier_dt',
        'order_delivered_customer_dt',
    ]]
    [lambda x: (x.order_delivered_customer_dt >= x.order_purchase_dt) & (x.order_approved_dt >= x.order_purchase_dt)
        & (x.order_delivered_carrier_dt >= x.order_approved_dt) & (x.order_delivered_customer_dt >= x.order_delivered_carrier_dt)
     ]
    .dropna()
)
tmp_df_sales['from_purchase_to_customer'] = (tmp_df_sales['order_delivered_customer_dt'] - tmp_df_sales['order_purchase_dt']).dt.total_seconds()
tmp_df_sales['From Purchase to Approved'] = (
    (tmp_df_sales['order_approved_dt'] - tmp_df_sales['order_purchase_dt']).dt.total_seconds() * 100 / tmp_df_sales['from_purchase_to_customer']
).round(2)
tmp_df_sales['From Approved to Carrier'] = (
    (tmp_df_sales['order_delivered_carrier_dt'] - tmp_df_sales['order_approved_dt']).dt.total_seconds() * 100 / tmp_df_sales['from_purchase_to_customer']
).round(2)
tmp_df_sales['From Carrier to Customer'] = (
    (tmp_df_sales['order_delivered_customer_dt'] - tmp_df_sales['order_delivered_carrier_dt']).dt.total_seconds() * 100 / tmp_df_sales['from_purchase_to_customer']
).round(2) 
tmp_df_sales = (
    tmp_df_sales[['order_purchase_dt', 'From Purchase to Approved', 'From Approved to Carrier', 'From Carrier to Customer']]
    .melt(id_vars = 'order_purchase_dt', var_name='Stage', value_name='Percent of All Delivery Time')
    .rename(columns={'order_purchase_dt': 'Date'})
)

Let’s look at what percentage of the total delivery time each stage occupies on average.

sorted_means = tmp_df_sales.groupby('Stage')['Percent of All Delivery Time'].mean().sort_values(ascending=False)
annotations_data = [
    (0.6, -0.1, 'Carrier > Customer'),
    (-0.05, 0.8, 'Approved > Carrier'),
    (0.45, 1.08, 'Purchase > Approved')
]
fig = px.pie(
    values=sorted_means.values,
    names=sorted_means.index,
    title='Average Delivery Time Distribution by Stage',
    labels={'names': 'Delivery Stage', 'values': 'Percentage of Total Time'},
    category_orders={'names': ['From Carrier to Customer', 'From Approved to Carrier']},
    hole=0.4 
)
fig.update_traces(
    textinfo='percent',  
    textposition='inside', 
    texttemplate='%{percent:.1%}', 
    hovertemplate='%{label}: %{percent:.1%}', 
)
fig.update_layout(
    showlegend=False, 
    width=500,
    height=400,
    margin=dict(t=60),
    title_y=0.97
)
for x, y, text in annotations_data:
    fig.add_annotation(
        x=x,
        y=y,
        text=text,
        showarrow=False,
        font=dict(size=12)
    )
pb.to_slide(fig)
fig.show()
../../_images/9057d6b2d6afc213b4c660efbd6155ef53350855d6fe720e3b077ae8f8af744a.jpg

Key Observations:

  • Delivery time distribution:

    • Payment approval: 4%

    • Carrier handoff: 25.5%

    • Carrier delivery: 70.5%

Look at distribution.

tmp_df_sales.viz.box(
    x='Percent of All Delivery Time'
    , y='Stage'
    , title='Percent of All Delivery Time by Stage'
)
../../_images/856c6153bb8e5ce10fdb6ed17b96a5c0d6ce68912bcc90261700fe2d742a25d5.jpg

Key Observations:

  • Carrier delivery consumes most of total delivery time

  • Significant differences between stages (non-overlapping IQRs)

Total Delivery Time#

pb.configure(
    df = df_sales
    , metric = 'delivery_time_days'
    , metric_label = 'Average Order Delivery Time, days'
    , metric_label_for_distribution = 'Order Delivery Time, days'
    , agg_func = 'mean'
    , title_base = 'Average Order Delivery Time and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}
)

Top Orders

pb.metric_top()
delivery_time_days
order_id
ca07593549f1816d26a572e06dc1eab6 209.63
1b3190b2dfa9d789e1f14c05b647a14a 208.35
440d0d17af552815d15a9e41abe49359 195.63
2fb597c2f772eca01b1f5c561bf6cc7b 194.85
285ab9426d6982034523a855f55a885e 194.63
0f4519c5f1c541ddec9f21b3bddd533a 194.05
47b40429ed8cce3aee9199792275433f 191.46
2fe324febf907e3ea3f2aa9650869fa5 189.86
2d7561026d542c8dbd8f0daeadf67a43 188.13
c27815f7e3dd0b926b58552628481575 187.74

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(delivery_time_days='Order Delivery Time, days')
    , title='Distribution of Order Delivery Time'
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "delivery_time_days" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.21k (99%) Max 209.63 Mean 12.54 11.12 3 (<1%)
Missing 135 (<1%) 99% 46.01 Trimmed Mean (10%) 11.14 10.89 3 (<1%)
Distinct 93.55k (97%) 95% 29.21 Mode Multiple 7.89 3 (<1%)
Non-Duplicate 90.94k (94%) 75% 15.68 Range 209.10 2.35 3 (<1%)
Duplicates 2.80k (3%) 50% 10.21 IQR 8.92 14.04 3 (<1%)
Dup. Values 2.60k (3%) 25% 6.76 Std 9.53 6.07 3 (<1%)
Zeros --- 5% 3.01 MAD 6.14 2.07 3 (<1%)
Negative --- 1% 1.82 Kurt 39.68 10.07 3 (<1%)
Memory Usage 1 Min 0.53 Skew 3.85 11.76 3 (<1%)
../../_images/3cbc483504907a213f1f6c465d3d9364a0d846d17ce71098456e359cff601de2.jpg

Key Observations:

  • Median delivery time: ≥10 days

  • 75% deliver in ≥16 days

  • Top 5% take ≥30 days

Let’s look by different dimensions.

By Day of Week

pb.histogram(
    color='purchase_weekday'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='purchase_weekday'
    , show_count=True
).show()
../../_images/c33785a27e4ab78bd729bc766600ced6acd39646e384d12d780ccf05cb196fab.jpg ../../_images/514c614fea634917708fee73471d91172f4ba7594e33d877ed62347335f14930.jpg

Key Observations:

  • Friday/Saturday orders have slightly longer delivery times

By Payment Category

pb.histogram(
    color='order_total_payment_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_total_payment_cat'
    , show_count=True
    , to_slide=True
).show()
../../_images/f343b596cdf9928a7813be7ca262453162568b86d5ac116dbb23582346af74f8.jpg ../../_images/af65648b18fa8fd7708219b92506b278f109e9c10a0a0724cc52ec5a2c935be8.jpg

Key Observations:

  • More expensive orders take longer to deliver

By Order Weight Category

pb.histogram(
    color='order_total_weight_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_total_weight_cat'
    , show_count=True
    , to_slide=True
).show()
../../_images/d112b69a6a45a1a439624953453b4fe27fb30f32f0038a209b55214f12666a09.jpg ../../_images/45159316124ccca0af3ff2ff4cdc26e5ad07cd123d1c9cb4f2b394b3a3bd90bf.jpg

Key Observations:

  • Heavy orders take longer to deliver than light/medium

By Review Score

pb.histogram(
    color='order_avg_reviews_score'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_avg_reviews_score'
    , show_count=True
    , to_slide=True
).show()
../../_images/2f49a3dd90ea5a89d349790b5178e43350ce65ddc454c7224361ff7ad6db2532.jpg ../../_images/64761bad44bd055750e63fbc3d1efeabaf03769ffa0baaaf7ca0965b6b378966.jpg

Key Observations:

  • 1-star rated orders have noticeably longer delivery times

By Top Customer States

pb.box(
    y='customer_state'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_state'
    , show_count=True
    , to_slide=True
).show()
../../_images/62fe4e11a5a3110439dac247cd308a28f0e0645425743198b04a45f0e3982d33.jpg ../../_images/d00e87955a7e5cce5cd6d9e5704ba0a6d8733243cca3c522da936eaf5a3f1849.jpg

Key Observations:

  • Among top states by sales volume, top 3 states with longest delivery times:

    1. Pará

    2. Maranhão

    3. Ceará

By Top Customer Cities

pb.box(
    y='customer_city'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_city'
    , show_count=True
    , to_slide=True
).show()
../../_images/09523fac71c87b314739cd18977492623503ac350c63cbfbca1a8c65a1398792.jpg ../../_images/349686f600a65b665deb06afe38faf765951439e534bcad02e7ab05536f25dc8.jpg

Key Observations:

  • Among top cities by sales volume, top 3 cities with longest delivery times:

    1. Salvador

    2. Porto Alegre

    3. Rio de Janeiro

Delivery Delay#

pb.configure(
    df = df_sales
    , metric = 'delivery_delay_days'
    , metric_label = 'Average Delivery Delay, days'
    , metric_label_for_distribution = 'Delivery Delay, days'
    , agg_func = 'mean'
    , title_base = 'Average Delivery Delay and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}
)

Top Orders

pb.metric_top()
delivery_delay_days
order_id
1b3190b2dfa9d789e1f14c05b647a14a 188.98
ca07593549f1816d26a572e06dc1eab6 181.61
47b40429ed8cce3aee9199792275433f 175.87
2fe324febf907e3ea3f2aa9650869fa5 167.71
285ab9426d6982034523a855f55a885e 166.58
440d0d17af552815d15a9e41abe49359 165.63
c27815f7e3dd0b926b58552628481575 162.72
d24e8541128cea179a11a65176e0a96f 161.78
0f4519c5f1c541ddec9f21b3bddd533a 161.61
2d7561026d542c8dbd8f0daeadf67a43 159.61

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(delivery_time_days='Delivery Delay, days')
    , title='Distribution of Delivery Delay'
    , lower_quantile=0.01
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "delivery_delay_days" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.21k (99%) Max 188.98 Mean -11.11 -12.39 5 (<1%)
Missing 135 (<1%) 99% 18.96 Trimmed Mean (10%) -11.46 -13.28 4 (<1%)
Distinct 91.65k (95%) 95% 3.83 Mode -12.39 -9.09 4 (<1%)
Non-Duplicate 87.39k (91%) 75% -6.38 Range 334.99 -14.44 4 (<1%)
Duplicates 4.69k (5%) 50% -11.82 IQR 9.84 -13.14 4 (<1%)
Dup. Values 4.26k (4%) 25% -16.22 Std 10.09 -13.26 4 (<1%)
Zeros --- 5% -25.37 MAD 7.02 -9.21 4 (<1%)
Negative 88.39k (92%) 1% -34.31 Kurt 28.99 -15.08 4 (<1%)
Memory Usage 1 Min -146.02 Skew 2.12 -7.22 4 (<1%)
../../_images/ab8e6da785969d50950c448a308e87023c563be9a4154ab2ce8a416e5e6bf8dc.jpg

Key Observations:

  • 75% of orders deliver ≥6 days early

  • ~5% are ≥4 days late

By Review Score

pb.histogram(
    color='order_avg_reviews_score'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_avg_reviews_score'
    , show_count=True
    , to_slide=True
).show()
../../_images/ff42298b9920aec6eee8626f10fd17a3526fe52028f3b6aa9311aa55f3f98b5b.jpg ../../_images/96633f09a3c2bf5ce5f9fad8e7c6a7a5b0d1b4edd2015067b79a9b2a89600c53.jpg

Key Observations:

  • Higher rated orders deliver earlier than estimated

pb.box(
    y='customer_state'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_state'
    , show_count=True
    , to_slide=True
).show()
../../_images/c40505f83e27be33c2390ca3ff7f9bec3743c36b5b0183a041406f928e8ccc73.jpg ../../_images/4b5c3cbcaa53d346d25236ed061cfbd4b8ca2a279166dc8b212193106990a95a.jpg

Key Observations:

  • Among top states by sales volume, top 3 states for early delivery:

    1. Mato Grosso

    2. Pará

    3. Rio Grande do Sul

From Purchase to Approved Time#

pb.configure(
    df = df_sales
    , metric = 'from_purchase_to_approved_hours'
    , metric_label = 'Average Order Processing Time, hour'
    , metric_label_for_distribution = 'Order Processing Time, hour'
    , agg_func = 'mean'
    , title_base = 'Average Order Processing Time and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}
)

Top Orders

pb.metric_top()
from_purchase_to_approved_hours
order_id
0a93b40850d3f4becf2f276666e01340 741.44
f7923db0430587601c2aef15ec4b8af4 738.45
de0076b42a023f53b398ce9ab0d9009c 554.78
daed0f3aefd193de33c31e21b16a3b3a 404.23
9c038e10f14d12a96939a0176c4ecc99 319.53
14ef2221cc3570aa6ce512fc353529b3 313.80
0c1426109d8295a688ee4182016bba59 300.43
483b53ea654d3566427a092cdef047fd 299.52
f5194ba2a4560ffa0e87746852c61fc1 298.94
70f357cca87c1162357bf3c0a993cbe5 291.41

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(from_purchase_to_approved_hours='Order Processing Time, hour')
    , title='Distribution of Order Processing Time'
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "from_purchase_to_approved_hours" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.35k (100%) Max 741.44 Mean 10.25 0 1.24k (1%)
Missing --- 99% 89.69 Trimmed Mean (10%) 5.49 0.20 115 (<1%)
Distinct 32.51k (34%) 95% 48.20 Mode 0 0.19 113 (<1%)
Non-Duplicate 26.16k (27%) 75% 14.42 Range 741.44 0.19 112 (<1%)
Duplicates 63.83k (66%) 50% 0.34 IQR 14.21 0.20 109 (<1%)
Dup. Values 6.35k (7%) 25% 0.21 Std 20.51 0.21 109 (<1%)
Zeros 1.24k (1%) 5% 0.14 MAD 0.27 0.17 108 (<1%)
Negative --- 1% 0 Kurt 58.70 0.18 105 (<1%)
Memory Usage 1 Min 0 Skew 4.44 0.20 104 (<1%)
../../_images/061a6c3db1042ddf9508fb260ee3716e941b57ea0d33967236576a4c4ea6a3bd.jpg

Key Observations:

  • 75% of orders take ≥14 hours to process

  • Top 5% take ≥48 hours

Let’s look by different dimensions.

By Day of Week

pb.histogram(
    color='purchase_weekday'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='purchase_weekday'
    , show_count=True
    , to_slide=True
).show()
../../_images/b914b4f99b4a75a63df53af177f4d4b8816ac3f485d5bd2a3ed438bc7c55fcb5.jpg ../../_images/a7439851603f592ccf8030998a15e825873f1a527d06e6644262b210e0357c89.jpg

Key Observations:

  • Friday/Saturday orders process slowest

  • Wednesday orders process fastest

By Time of Day

pb.histogram(
    color='purchase_time_of_day'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='purchase_time_of_day'
    , show_count=True
    , to_slide=True
).show()
../../_images/4f23ac507e61e30295a6a0f14851e37f70a6ebfd4a90a371943295f991c7b081.jpg ../../_images/3e9d2372ac69929a2447642c96f630666eb8f76cf48eb35d91df5f99bad8677d.jpg

Key Observations:

  • Nighttime orders take longer to process

By Whether the Order is Delayed

pb.histogram(
    color='is_delayed'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='is_delayed'
    , show_count=True
).show()
../../_images/7bb824284651139a1c9fcb4535acc63098f92770701f51f3f46e36ba876f6a5e.jpg ../../_images/421d3e299d07592a030093b33a4f44a31733e67bda3ee1d1cbf63ac743473fc7.jpg

Key Observations:

  • Non-delayed orders process faster (expected pattern)

By Weekday vs Weekend

pb.histogram(
    color='purchase_day_type'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='purchase_day_type'
    , show_count=True
    , to_slide=True
).show()
../../_images/8089130e2860006f659915980d427c3b75302a50cc289752c5abb2bd15e3d2f4.jpg ../../_images/cd15482eba1aadd6aefef926ab7db19a3e2956ef957dc58bc6d56626ebd24ad5.jpg

Key Observations:

  • Weekday orders process significantly faster than weekends

By Payment Category

pb.histogram(
    color='order_total_payment_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_total_payment_cat'
    , show_count=True
).show()
../../_images/64405be0f55a93d99c3a99435771c69ee6a1742d929e2b96b4d4b259e643267d.jpg ../../_images/0f7c34ca16084d0e2fbfe6851c3e62af3431c7496fe22b3b3886b2a57607bac6.jpg

Key Observations:

  • Cheap/expensive orders process faster than mid-priced

By Order Weight Category

pb.histogram(
    color='order_total_weight_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_total_weight_cat'
    , show_count=True
).show()
../../_images/19ad18b4aa06c1bf40645a571659a4327215bd3a9dc9baa5a10ec3b299b22fad.jpg ../../_images/cebd0427640a0f27dba4166a30a547a66ebb4377dfca9989a911d122afa392b8.jpg

Key Observations:

  • Heavy orders take longer to process

By Presence of Installment Payments

pb.histogram(
    color='order_has_installment'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_has_installment'
    , show_count=True
    , to_slide=True
).show()
../../_images/d0e439d186ff032bbfec466a6df4b4af91f603605ad7080d168481eb20a60f1c.jpg ../../_images/b75c420abcc45bc3ae1d8b058a8ac5254326421fa0f6df0ced5c55aeb3a1c549.jpg

Key Observations:

  • Installment orders process much faster

By Review Score

pb.histogram(
    color='order_avg_reviews_score'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_avg_reviews_score'
    , show_count=True
).show()
../../_images/4a76511d405e3400711f2a42a6c64874c80928683e48025e770574446982ef7a.jpg ../../_images/b88213474663965f5873b1e1365e7b04fc7dce21e59b322414f537b6023f63db.jpg

Key Observations:

  • 1/2-star rated orders took longer to process

From Approval to Carrier Time#

pb.configure(
    df = df_sales
    , metric = 'from_approved_to_carrier_days'
    , metric_label = 'Average Order Approval to Carrier Time, days'
    , metric_label_for_distribution = 'Order Approval to Carrier Time, days'
    , agg_func = 'mean'
    , title_base = 'Average Order Approval to Carrier Time and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}
)

Top Orders

pb.metric_top()
from_approved_to_carrier_days
order_id
da81fbc27b55e0f3d2813cf2078dc780 125.76
97f48024fcc76f1898e397ad6966e3a0 107.05
8b7fd198ad184563c231653673e75a7f 101.36
866314550f6d7a55c82917d9b4463e1f 64.57
a4a57f1ffa25b90dea9f150fee89db84 61.15
2805499c211b52dfc1e64a1349ef45e2 55.94
7d86c4aa9e59504b23f16c7ca68954a7 54.91
840d7cc10efae8ba460cc8ea84f1b6db 49.59
7141a7eee8944cb711fa1fd4f76300bc 49.34
bc9c2fb123725ad0d99fb76888543245 49.16

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(from_approved_to_carrier_days='Order Approval to Carrier Time, days')
    , title='Distribution of Order Approval to Carrier Time'
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "from_approved_to_carrier_days" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.31k (99%) Max 125.76 Mean 2.81 1.85 1.35k (1%)
Missing 36 (<1%) 99% 16.56 Trimmed Mean (10%) 2.20 0.74 5 (<1%)
Distinct 85.11k (88%) 95% 7.99 Mode 1.85 0.94 5 (<1%)
Non-Duplicate 76.26k (79%) 75% 3.55 Range 125.76 0.82 5 (<1%)
Duplicates 11.23k (12%) 50% 1.85 IQR 2.64 0.91 5 (<1%)
Dup. Values 8.85k (9%) 25% 0.91 Std 3.38 0.88 4 (<1%)
Zeros --- 5% 0.32 MAD 1.64 0.79 4 (<1%)
Negative --- 1% 0.08 Kurt 65.14 2.21 4 (<1%)
Memory Usage 1 Min 0.00 Skew 5.14 0.93 4 (<1%)
../../_images/37020d394f5381174ce11d05b0b4ffa07ed35a29b016e94efbf56662301fd889.jpg

Key Observations:

  • 75% of orders transfer to carrier within ≤3.5 days

  • Top 5% take ≥8 days

Let’s look by different dimensions.

By Day of Week

pb.histogram(
    color='purchase_weekday'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='purchase_weekday'
    , show_count=True
    , to_slide=True
).show()
../../_images/efe0162b49122058df5f12b81e8192dc2d781d6f2f1f96434d2b03cf08e08fc7.jpg ../../_images/dbe2932dca3e8746227347c88ecb19fb16e356990647e238c794304c0d22e870.jpg

Key Observations:

  • Friday/Saturday orders take longest to transfer to carrier

By Time of Day

pb.histogram(
    color='purchase_time_of_day'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='purchase_time_of_day'
    , show_count=True
).show()
../../_images/3783ab3bd8b9c68edcb7c41a0c4a6a8718582937f9e356a2f6e7406e1a832976.jpg ../../_images/f8e684baf2a2cc92a69d88adcf0cc3d31bd8ab89c239013c56465a3f1427040b.jpg

Key Observations:

  • Morning orders transfer fastest to carrier

By Payment Category

pb.histogram(
    color='order_total_payment_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_total_payment_cat'
    , show_count=True
    , to_slide=True
).show()
../../_images/ad165e1b0d1b96424feef616a8c145eac39f277464011d0aecb65c324ecba104.jpg ../../_images/c54136cfb8f4b3999a92d94c1a532e6df061a5d6915ce153db29cacd2afac8ba.jpg

Key Observations:

  • Expensive orders take longer to transfer to carrier

By Order Weight Category

pb.histogram(
    color='order_total_weight_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_total_weight_cat'
    , show_count=True
    , to_slide=True
).show()
../../_images/c6d4d379baf659943574a2bbe3d3e8583de78a4774b451224033401d9cd40d0a.jpg ../../_images/d1eb34c4d1b31f132fa9b2c5cef0a9f1d266f78c9a14455123a94b1135fedabf.jpg

Key Observations:

  • Heavy orders take longer to transfer to carrier

By Presence of Installment Payments

pb.histogram(
    color='order_has_installment'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_has_installment'
    , show_count=True
).show()
../../_images/2baaff8d1fc7ba4daa88ea9ed79e873b22c25368a08c6b67209d57f89d0d67a2.jpg ../../_images/8197439ef30903bc60dcdf142d5e3b11272139096d107d73898c8d88172821e2.jpg

Key Observations:

  • Installment orders take slightly longer to transfer to carrier

By Review Score

pb.histogram(
    color='order_avg_reviews_score'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_avg_reviews_score'
    , show_count=True
).show()
../../_images/06ba35849d0bbb67bc06002db60be581403656478c61fb590e01d125017c7373.jpg ../../_images/38331d3cc722c20659bb46a0e0c8879f451ca5c0905f74a324fdd7d83046c1ba.jpg

Key Observations:

  • Faster carrier transfer correlates with higher ratings

Carrier Delivery Time#

pb.configure(
    df = df_sales
    , metric = 'from_carrier_to_customer_days'
    , metric_label = 'Average Delivery Time from Carrier, days'
    , metric_label_for_distribution = 'Delivery Time from Carrier, days'
    , agg_func = 'mean'
    , title_base = 'Average Delivery Time from Carrier and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}
)

Top Orders

pb.metric_top()
from_carrier_to_customer_days
order_id
1b3190b2dfa9d789e1f14c05b647a14a 205.19
ca07593549f1816d26a572e06dc1eab6 195.03
285ab9426d6982034523a855f55a885e 194.21
2fb597c2f772eca01b1f5c561bf6cc7b 190.24
440d0d17af552815d15a9e41abe49359 188.09
2d7561026d542c8dbd8f0daeadf67a43 187.21
2fe324febf907e3ea3f2aa9650869fa5 186.40
437222e3fd1b07396f1d9ba8c15fba59 182.34
c27815f7e3dd0b926b58552628481575 182.10
dfe5f68118c2576143240b8d78e5940a 181.99

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(from_carrier_to_customer_days='Delivery Time from Carrier, days')
    , title='Distribution of Delivery Time from Carrier'
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "from_carrier_to_customer_days" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.21k (99%) Max 205.19 Mean 9.34 7.10 20 (<1%)
Missing 135 (<1%) 99% 41.00 Trimmed Mean (10%) 7.97 0 8 (<1%)
Distinct 92.06k (96%) 95% 24.23 Mode 7.10 7.10 8 (<1%)
Non-Duplicate 88.10k (91%) 75% 12.04 Range 205.19 7.07 5 (<1%)
Duplicates 4.29k (4%) 50% 7.10 IQR 7.92 1.23 4 (<1%)
Dup. Values 3.96k (4%) 25% 4.11 Std 8.76 7.36 4 (<1%)
Zeros 8 (<1%) 5% 1.13 MAD 5.59 8.12 4 (<1%)
Negative --- 1% 0.82 Kurt 53.10 4.98 4 (<1%)
Memory Usage 1 Min 0 Skew 4.55 1.02 4 (<1%)
../../_images/b2d6d9638302933d6832a7ee40cac593a9854524603e589ded9a6072aab3c0a5.jpg

Key Observations:

  • Median carrier delivery time: ≥7 days

  • 25% take ≥12 days

  • 5% take ≥24 days

Let’s look by different dimensions.

By Payment Category

pb.histogram(
    color='order_total_payment_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_total_payment_cat'
    , show_count=True
    , to_slide=True
).show()
../../_images/e931ff8350d977e7bba3eded11aad4066c12e7ed4108f2077625e9b4743e8f0d.jpg ../../_images/54a80f3606b2312854cb53adedab85bc4f502bd8970378069c99afa4786592a8.jpg

Key Observations:

  • Cheap items deliver fastest via carrier

By Order Weight Category

pb.histogram(
    color='order_total_weight_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_total_weight_cat'
    , show_count=True
).show()
../../_images/7bf7b53552e3e0db1a0c76a793acd3037bad04d543a33ebc1dd0e931262803cd.jpg ../../_images/cf19f697725fef7627f74a408e59c9d0236744178d4254d77cb4221599bf15db.jpg

Key Observations:

  • Light items deliver slightly faster via carrier

By Presence of Installment Payments

pb.histogram(
    color='order_has_installment'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_has_installment'
    , show_count=True
).show()
../../_images/7c463d8202ab3c1ea584d1af0b7c3e41d1b8f5ce73e2033ec7665aa796e86400.jpg ../../_images/a741c9950bb31b14abbf20d5736c2e6d3eaf73817afc0ad29e105004bedacbbb.jpg

Key Observations:

  • Installment orders take slightly longer via carrier

By Review Score

pb.histogram(
    color='order_avg_reviews_score'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_avg_reviews_score'
    , show_count=True
).show()
../../_images/051f3ef4faa2bb9e40574b67d7aea74e5d6fc26ed5f4d3b1e46b42fa2b1603ee.jpg ../../_images/38e7359fc017be2c4019aa1fcba8e56620d1e2ba9c8d3d063df0b21e42d9135e.jpg

Key Observations:

  • Longer carrier delivery times correlate with lower ratings

By Top Customer States

pb.box(
    y='customer_state'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_state'
    , show_count=True
    , to_slide=True
).show()
../../_images/f8d6fac7bcf5ca3e50b775b1a99c50f69683fff84f3edaa569505dc02a7173e5.jpg ../../_images/93971624de03f1f8c8f0e439be62518ae8d0194cbe6e57a8e730cc8e96bf73b6.jpg
  • Among top states by sales volume, top 3 states with longest carrier delivery:

    1. Pará

    2. Maranhão

    3. Ceará

By Top Customer Cities

pb.box(
    y='customer_city'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_city'
    , show_count=True
    , to_slide=True
).show()
../../_images/6dae55a4d81806535c5b84bbb78d3e034241af8c494a4b40ab7f901d1c36c3ca.jpg ../../_images/61b5a53b994965ffb7cdb65d57ce94d9b8b7cac9d30d65b22e4d10d28ee02ccf.jpg

Key Observations:

  • Among top cities by sales volume, top 3 cities with longest carrier delivery:

    1. Salvador

    2. Porto Alegre

    3. Rio de Janeiro

Carrier Handoff Delay#

pb.configure(
    df = df_sales
    , metric = 'avg_carrier_delivery_delay_days'
    , metric_label = 'Average Carrier Delivery Delay, days'
    , metric_label_for_distribution = 'Carrier Delivery Delay, days'
    , agg_func = 'mean'
    , title_base = 'Average Carrier Delivery Delay and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}
)

Top Orders

pb.metric_top()
avg_carrier_delivery_delay_days
order_id
da81fbc27b55e0f3d2813cf2078dc780 116.76
8b7fd198ad184563c231653673e75a7f 95.36
97f48024fcc76f1898e397ad6966e3a0 91.05
866314550f6d7a55c82917d9b4463e1f 59.57
2805499c211b52dfc1e64a1349ef45e2 51.94
7d86c4aa9e59504b23f16c7ca68954a7 48.92
5d6e9993ecc20a59e637ce711858d081 45.91
abbbf52551bc34cd52a7851c06dfca90 45.43
d8734ba226623cf1c86b3cce8cbffa78 43.03
00e054d0da011d5016f31011af488f4f 42.90

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    lower_quantile=0.01
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'    
)
Summary Statistics for "avg_carrier_delivery_delay_days" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.31k (99%) Max 116.76 Mean -3.36 -1.14 4 (<1%)
Missing 36 (<1%) 99% 7.02 Trimmed Mean (10%) -3.38 -5.70 4 (<1%)
Distinct 90.02k (93%) 95% 0.79 Mode Multiple -3.05 4 (<1%)
Non-Duplicate 84.06k (87%) 75% -1.60 Range 1.16k -5.03 4 (<1%)
Duplicates 6.33k (7%) 50% -3.24 IQR 3.62 -2.18 4 (<1%)
Dup. Values 5.96k (6%) 25% -5.22 Std 4.99 -2.96 4 (<1%)
Zeros --- 5% -7.62 MAD 2.77 -1.77 4 (<1%)
Negative 87.72k (91%) 1% -12.79 Kurt 19.89k -5.45 4 (<1%)
Memory Usage 1 Min -1.05k Skew -94.98 -5.04 4 (<1%)
../../_images/4c1038bbdbc346105a069ecb6c8f0e117a40108328e1e262ac1b8372ee41adc5.jpg

Key Observations:

  • 75% of orders transfer to carrier ≥1.6 days early

  • Extreme early transfers due to data anomalies

  • 5% are ≥0.79 days late

  • 1% are ≥7 days late

Let’s look by different dimensions.

By Review Score

pb.bar_groupby(
    y='order_avg_reviews_score'
    , show_count=True
).show()
../../_images/2811fc43c1512f8f923ba30e2aba3c20d2d51aa85118901fa0556e710af2f1f4.jpg

Key Observations:

  • Earlier carrier transfer correlates with higher ratings