Sales Analysis#

Number of Sales#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date' 
    , metric = 'order_id'
    , metric_label = 'Share of Sales'
    , metric_label_for_distribution = 'Number of Sales'
    , agg_func = 'nunique'
    , norm_by='all'
    , axis_sort_order='descending'
    , text_auto='.1%'
    , update_fig={'xaxis': {'tickformat': '.0%'}}
)
print(f'Total number of sales: {df_sales.order_id.nunique()}')
Total number of sales: 96346

Let’s see at statistics and distribution of the metric.

pb.metric_info(freq='D')
Summary Statistics for "nunique_order_id_per_day" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 602 (100%) Max 1.15k Mean 160.04 140 10 (2%)
Missing --- 99% 362.98 Trimmed Mean (10%) 155.10 149 7 (1%)
Distinct 255 (42%) 95% 293 Mode 140 72 7 (1%)
Non-Duplicate 90 (15%) 75% 214.75 Range 1143 103 7 (1%)
Duplicates 347 (58%) 50% 148.50 IQR 114.50 66 6 (<1%)
Dup. Values 165 (27%) 25% 100.25 Std 88.83 138 6 (<1%)
Zeros --- 5% 45 MAD 82.28 252 6 (<1%)
Negative --- 1% 10.01 Kurt 24.43 188 6 (<1%)
Memory Usage <1 Mb Min 4 Skew 2.57 145 6 (<1%)
../../_images/eba622860d1bc0626feed49fe3f0ac9de357f817336e8aaf8143ba6bbe3dade2.jpg

Key Observations:

  • 75% of days had ≤215 orders

  • 5% had ≤45 orders

  • 5% had ≥293 orders

  • Several days exceeded 400 orders

Let’s look by different dimensions.

By Season

Since 2018 has incomplete monthly data, it’s better to also analyze by year…

pb.bar_groupby(
    x='purchase_season'
    , color='purchase_year'
)
../../_images/c6f032c2f9a76a4f9f6d09afa6571f6b10f45d46a213e113b65b28717d82c719.jpg

Key Observations:

  • Lowest sales in summer (both years)

  • Highest sales in autumn (2018)

By Time of Day

pb.bar_groupby(y='purchase_time_of_day')
../../_images/5c1832f4217fd225541314ae8b75815f906b8f290c8d60691d11662b9f4942de.jpg

Key Observations:

  • Sales by time of day:

    • Evening: 36% (peak)

    • Night: 9% (lowest)

    • Morning: 23%

    • Afternoon: 32%

By Day of Week

pb.bar_groupby(y='purchase_weekday')
../../_images/d5c0bb09996ff0fef5fad59722593efb0d3a9ac2c4147f59480d447eb080dc40.jpg

Key Observations:

  • Saturday: 11% (lowest)

  • Monday: 16% (highest)

By Weekday vs Weekend

pb.bar_groupby(y='purchase_day_type')
../../_images/091bcfd91e777682da1ab42cd729cc3cd7d49734dcf782c339f82b060aa1a448.jpg

Key Observations:

  • 77% of orders were placed on weekdays

By Day of the Week and Hour of the Day

fig = pb.heatmap(
    x='purchase_hour'
    , y='purchase_weekday'
    , labels={'color': 'Number of Sales'}
    , title='Number of Sales by Day of the Week and Hour of the Day'
).update_layout(xaxis_dtick=1, xaxis_tickformat=None)
pb.to_slide(fig)
fig.show()
../../_images/f5ee3b35f43c8ee71bd483d21ffc64fa67490d1fdf12a59e39dd515a56182a5d.jpg

Key Observations:

  • 1AM-8AM had lowest sales across all weekdays

By Review Score

pb.bar_groupby(y='order_avg_reviews_score')
../../_images/3af28a2bdb82b463e2091a557196b7164d2ba520e0d4591dbeb8b17423dce314.jpg

Key Observations:

  • Review score distribution:

    • 5 stars: 59%

    • 2 stars: 3% (lowest)

    • More 1-star than 2/3-star orders

By Whether the Order is Delayed or Not

pb.bar_groupby(y='is_delayed')
../../_images/d2f7c3b3d54238db92f147999fc9160cb62c5bf7aa8b4113c1cab5fabd1eb51c.jpg

Key Observations:

  • 92% of orders had no delivery delay

By Payment Category

pb.bar_groupby(y='order_total_payment_cat')
../../_images/f4882dce183393c3f6318d8cb3e0ebc9f51a1b060cdb7653c65877d30e75c5cf.jpg

Key Observations:

  • 63% of orders were medium-priced

By Order Weight Category

pb.bar_groupby(y='order_total_weight_cat')
../../_images/0c07543846a880c8e40f6ad850f330a77ee737f663d3c991b89de3f8219e0ea8.jpg

Key Observations:

  • Order weight distribution:

    • Medium: 46%

    • Light: 40%

By Delivery Time Category

pb.bar_groupby(y='delivery_time_days_cat')
../../_images/39b930d6816dc3f5e40a205ee4ac21a86e45e3ae52bc1cbe1cf2ad9deb4434f7.jpg

Key Observations:

  • 59% of orders had medium delivery time

By Presence of Installment Payments

pb.bar_groupby(y='order_has_installment')
../../_images/c7596ec2fa9baa3374eee5e55a2b297e4ec7cb3a47c6b6203f8b67b921c0fb76.jpg

Key Observations:

  • 51% of orders used installments

By Payment Type

pb.bar_groupby(y='order_payment_types')
../../_images/bdb43b4142ecf44a8f19d8b3dc715aa62fc0e9f0d1257efab64f348ac1689d45.jpg

Key Observations:

  • Payment methods:

    • Credit card: 75%

    • Boleto: 20%

By Product Category

pb.bar_groupby(
    y='order_product_categories'
    , text_auto=False
)
../../_images/110dc571b47113c190d43b7c8c2c5c289ef96c37edbdcd4853b7b04a9b8a1c3a.jpg

Key Observations:

  • Top 3 product categories:

    1. Bed Bath Table: 9%

    2. Health Beauty: 9%

    3. Sports Leisure: 8%

By Generalized Product Category

pb.bar_groupby(
    y='order_general_product_categories'
    , text_auto=False
)
../../_images/f99288a50c7a8fbc1afdc180ee47cb2bd5049e9415e799515773bebbf73ebda7.jpg

Key Observations:

  • Top 3 generalized categories:

    1. Electronics: 27%

    2. Furniture: 18%

    3. Home & Garden: 14%

By Top Customer States

pb.bar_groupby(y='customer_state', text_auto=False)
../../_images/d0fdf39992365503317f30165a3251928f1ae76b1f62c43dc7786a70fd1c1f03.jpg

Key Observations:

  • Sales by state:

    • São Paulo: 42%

    • Rio de Janeiro: 13%

    • Minas Gerais: 12%

    • Others: ≤6%

By Top Customer Cities

pb.bar_groupby(y='customer_city', text_auto=False)
../../_images/3144b800982af51da930eb4649ca7f448f8cfacb64ba5c77599de902d88c8113.jpg

Key Observations:

  • Sales by city:

    • São Paulo: 16%

    • Rio de Janeiro: 7%

    • Others: ≤3%

Sum of Sales#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , metric = 'total_payment'
    , metric_label = 'Sum of Sales, R$'
    , agg_func = 'sum'
    , title_base = 'Sum and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}
)
print(f'Total Sales Amount: {df_sales.total_payment.sum():,.2f}, R$')
Total Sales Amount: 15,399,404.87, R$

Let’s see at statistics and distribution of the metric.

pb.metric_info(freq='D')
Summary Statistics for "sum_total_payment_per_day" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 602 (100%) Max 175.25k Mean 25.58k 707.27 1 (<1%)
Missing --- 99% 57.35k Trimmed Mean (10%) 24.72k 27687.48 1 (<1%)
Distinct 602 (100%) 95% 48.90k Mode Multiple 39162.30 1 (<1%)
Non-Duplicate 602 (100%) 75% 33.76k Range 174.74k 35845.39 1 (<1%)
Duplicates --- 50% 23.46k IQR 18.59k 31531.31 1 (<1%)
Dup. Values --- 25% 15.17k Std 14.28k 34643.87 1 (<1%)
Zeros --- 5% 6.73k MAD 13.47k 24342.08 1 (<1%)
Negative --- 1% 1.57k Kurt 19.15 25377.01 1 (<1%)
Memory Usage <1 Mb Min 507.85 Skew 2.22 27569.29 1 (<1%)
../../_images/aae72c5f61e3a03045c9780fe8baf54afe4651babeb00eebf1fee8c4bb7e8842.jpg

Key Observations:

  • 75% of days had sales ≤33K R$

  • 5% of days had ≤6.7K R$

  • 5% of days had ≥49K R$

  • Several days exceeded 70K R$

Let’s look by different dimensions.

By Season

Since 2018 has incomplete monthly data, it’s better to also analyze by year…

pb.bar_groupby(
    x='purchase_season'
    , color='purchase_year'
    , title='Sum of Sales by Season and Year'
)
../../_images/f9e28dd735d9d7c21e6f5de8045a4474a2c216b7f1fefa84c5ac6a8464a43074.jpg

Key Observations:

  • Lowest sales revenue in summer (both years)

  • Highest revenue in autumn (2018)

By Time of Day

pb.bar_groupby(y='purchase_time_of_day', show_count=True, to_slide=True)
../../_images/833f23a00659eaae1771b3de8f00411ddaf4064ed4bb2ac5a16da5d54fe13b4a.jpg

Key Observations:

  • Highest sales volume and revenue in evenings

  • Lowest at night

By Day of Week

pb.bar_groupby(y='purchase_weekday', show_count=True, to_slide=True)
../../_images/a749439e3585a952e29d776f77718cefba2afe709edf54b917eeb0218572a45c.jpg

Key Observations:

  • Monday has highest sales volume/revenue

  • Saturday has lowest

By Weekday vs Weekend

pb.bar_groupby(y='purchase_day_type', show_count=True, to_slide=True)
../../_images/e621727ba569052cd47e118b44db671bb1501eddbf08ced44dfd873fa1ddb6a2.jpg

Key Observations:

  • Weekday sales/revenue significantly higher than weekends

By Day of the Week and Hour of the Day

fig = pb.heatmap(
    x='purchase_hour'
    , y='purchase_weekday'
    , text_auto='.3s'
    , labels={'color': 'Amount, R$'}
    , title='Sales Amount by Day of the Week and Hour of the Day'
).update_layout(xaxis_dtick=1)
pb.to_slide(fig)
fig.show()
../../_images/276424cf9411c93c86ebb4e884da4c7fd4fccf1d425fba41e060e16b58d4a406.jpg

Key Observations:

  • 1AM-8AM has lowest revenue across all weekdays

By Whether the Order is Delayed or Not

pb.bar_groupby(y='is_delayed', show_count=True)
../../_images/f777de0c263c0f1c043f4b32437608412c31ce9bb4f5a1236d6128565bb6c6eb.jpg

Key Observations:

  • Non-delayed orders have significantly higher sales/revenue

By Order Weight Category

pb.bar_groupby(y='order_total_weight_cat', show_count=True, to_slide=True)
../../_images/ee2c87c23b41b47477f0fa84ae558c4592b6b0a1d1b2db74f1251eacafb18976.jpg

Key Observations:

  • Medium-weight orders generate more revenue than heavy/light

  • Light orders have higher quantity share but lower revenue share

  • Heavy orders are more expensive

By Presence of Installment Payments

pb.bar_groupby(y='order_has_installment', show_count=True, to_slide=True)
../../_images/7627fedf0a8b79f745160ae4a41c7e4f3b69d1f9eae5db9340a84ed04602f7a8.jpg

Key Observations:

  • Installment orders generate significantly more revenue despite similar order counts

  • Installment enables more expensive purchases

By Top Customer States

pb.bar_groupby(y='customer_state', show_count=True, to_slide=True)
../../_images/a5d30550b34f036d94b32077c4318b515ea03e2bf574dfdd90aef121263e2798.jpg

Key Observations:

  • São Paulo state dominates sales volume/revenue

  • Rio de Janeiro and Minas Gerais rank 2nd/3rd

By Top Customer Cities

pb.bar_groupby(y='customer_city', show_count=True, to_slide=True)
../../_images/44834c1a7b3566097c3786c1f4a727237bfaf0102f8cd08784a0fc70012ca4e9.jpg

Key Observations:

  • São Paulo city leads in sales volume/revenue

  • Rio de Janeiro ranks second

By Review Score

pb.bar_groupby(y='order_avg_reviews_score', show_count=True, to_slide=True)
../../_images/0e3a15e01eebfa055c0a0917c3014e26cdd05ba8cb998addb144686ddad30038.jpg

Key Observations:

  • 5-star reviews have highest sales/revenue

  • 2-star reviews have lowest

  • 1-star reviews exceed 2/3-star in volume/revenue

By Payment Type

Since a single order can have multiple payments, we will measure transaction volume based on payment count.

pb.bar_groupby(
    y='order_payment_types'
    , show_count=True
    , to_slide=True
)
../../_images/9f59c453bda3833394b18ce5c678812b24df53cb3f1e89b5bbac2ea84f21e9e8.jpg

Key Observations:

  • Credit card leads payment methods (volume/revenue)

  • Boleto ranks second

By Product Category

For the category product split, we cannot take the payment amount. We will calculate the sum based on the product price and freight value.

The count will be determined by the number of items.

pb.bar_groupby(
    y='order_general_product_categories'
    , show_count=True
    , to_slide=True
)
../../_images/811c32973779f49e26cb41fd932bb110167d795aba492da9badec721100a3a70.jpg

Key Observations:

  • Electronics leads categories (volume/revenue)

  • Furniture ranks second

  • Furniture has smaller price gap in quantity vs revenue

Average Order Value#

pb.configure(
    df = df_sales
    , metric = 'total_payment'
    , metric_label = 'Average Order Value, R$'
    , metric_label_for_distribution = 'Order Value, R$'
    , agg_func = 'mean'
    , title_base = 'Average Order Value and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}    
)

Top Orders

pb.metric_top()
total_payment
order_id
03caa2c082116e1d31e67e9ae3700499 13,664.08
736e1922ae60d0d6a89247b851902527 7,274.88
0812eb902a67711a1cb742b3cdaa65ae 6,929.31
fefacc66af859508bf1a7934eab1e97f 6,922.21
f5136e38d1a14a4dbd87dff67da82701 6,726.66
2cc9089445046817a7539d90805e6e5a 6,081.54
a96610ab360d42a2e5335a3998b4718a 4,950.34
199af31afc78c699f0dbf71fb178d4d4 4,764.34
8dbc85d1447242f3b127dda390d56e19 4,681.78
426a9742b533fc6fed17d1fd6d143d7e 4,513.32

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(total_payment='Order Value, R$')
    , title='Distribution of Order Value'
    , upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "total_payment" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.35k (100%) Max 13.66k Mean 159.83 77.57 250 (<1%)
Missing --- 99% 1.05k Trimmed Mean (10%) 119.85 35 164 (<1%)
Distinct 27.40k (28%) 95% 445.84 Mode 77.57 73.34 161 (<1%)
Non-Duplicate 12.82k (13%) 75% 176.26 Range 13.65k 116.94 131 (<1%)
Duplicates 68.94k (72%) 50% 105.28 IQR 114.36 56.78 118 (<1%)
Dup. Values 14.59k (15%) 25% 61.89 Std 218.93 107.78 118 (<1%)
Zeros --- 5% 32.38 MAD 76.37 65 112 (<1%)
Negative --- 1% 22.38 Kurt 249.10 86.15 106 (<1%)
Memory Usage 1 Min 9.59 Skew 9.37 99.90 105 (<1%)
../../_images/54ec789e70e0e1b3968f125a29b7616ad67c61cb6536c04043e9f6ae8d3a6580.jpg

Key Observations:

  • 75% of orders ≤177 R$

  • 5% ≤33 R$

  • 5% ≥445 R$

  • Many outliers >1000 R$

Let’s look by different dimensions.

By Season

Since 2018 has incomplete monthly data, it’s better to also analyze by year…

pb.bar_groupby(
    x='purchase_season'
    , color='purchase_year'
    , title='Average Order Value by Season and Year'
)
../../_images/47d5b3f8956a9921b6a7428149d9943c371d0022bc05f8c66ae96a2b009b42e8.jpg

Key Observations:

  • Summer 2017 had higher order values

  • Other seasons slightly higher in 2018

By Day of the Week and Hour of the Day

fig = pb.heatmap(
    x='purchase_hour'
    , y='purchase_weekday'
    , text_auto='.1f'
    , labels={'color': 'AOV, R$'}
    , title='Average Order Value by Day of the Week and Hour of the Day'
).update_layout(xaxis_dtick=1)
pb.to_slide(fig)
fig.show()
../../_images/a748c78d941053c762bacb142fd8d12b200089b1555807b4600c3c395a23ada6.jpg

Key Observations:

  • Nighttime doesn’t always have lowest average order value

  • Some weekday nights show value peaks

By Whether the Order is Delayed or Not

pb.histogram(
    color='is_delayed'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=True
    , show_kde=False
    , nbins=30
).show()
pb.bar_groupby(
    y='is_delayed'
    , show_count=True
).show()
../../_images/114775a8e379a04aa11917a380109ce4d33d61d22e140d3d0543d6f1f2c5f47c.jpg ../../_images/fdc1051faeb5058241530ba16f50a48976a0ba8e7df1800662e9cbe6dd1952c4.jpg

Key Observations:

  • Non-delayed orders have lower average values

By Order Weight Category

pb.histogram(
    color='order_total_weight_cat'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=True
    , show_kde=False
    , nbins=30
).show()
pb.bar_groupby(
    y='order_total_weight_cat'
    , show_count=True
).show()
../../_images/a1838c69925c4bd1931cb5adc23dad101ab45cb8d17b80844310d8e13c35cd82.jpg ../../_images/285493c4dd67c51c3b7bfd45423aefe4ace20caeaa29f22e0879a5a222b4f308.jpg

Key Observations:

  • Heavier orders have higher average values

By Presence of Installment Payments

pb.histogram(
    color='order_has_installment'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=True
    , show_kde=False
    , nbins=30
).show()
pb.bar_groupby(
    y='order_has_installment'
    , show_count=True
    , to_slide=True
).show()
../../_images/52a82e54afb5eaf109eaff09027033e9e095c63aaf5c9cfa1ab3ea4a2684f367.jpg ../../_images/e5f1c5c7441dcc3feafa75776b97ba838c8ff05e4c40bc00428f4e7def215821.jpg

Key Observations:

  • Installment orders have much higher average values

By Top Customer States

pb.box(
    y='customer_state'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_state'
    , show_count=True
    , to_slide=True
).show()
../../_images/d9d0c96182873a4af4e224644d162fb6791e200bf8334fb2964c5cb86eeb3bb4.jpg ../../_images/e3cc893dd07a910e9aa78228714ff8cdc5fed968fe4f2de9f817c0073329e939.jpg

Key Observations:

  • São Paulo has most orders but lowest average value among top states

  • Para has highest average order value

By Top Customer Cities

pb.box(
    y='customer_city'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_city'
    , show_count=True
    , to_slide=True
).show()
../../_images/70150b3f643346b6dcae66d158ef686c63777f249617ade73ba59714263ddadf.jpg ../../_images/2a04e96188926c6b4e929abddcbcff9181e18ecda108252a142354927c38ad60.jpg

Key Observations:

  • Rio de Janeiro combines high volume with high average value

  • Salvador has highest average order value among top cities

By Review Score

pb.histogram(
    color='order_avg_reviews_score'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
    , nbins=30
).show()
pb.bar_groupby(
    y='order_avg_reviews_score'
    , show_count=True
    , to_slide=True
).show()
../../_images/30492b46a53a4d9d2e6fb1a1b101111356e06cce18ecc81c880cd0fba64c7528.jpg ../../_images/4688d5380b0cb5d629938e7fdd5c9da834548c5b2ea75cce62369e1b8f9b727a.jpg

Key Observations:

  • 1-star reviews have highest order values

  • 2-star reviews rank second

  • Expensive orders receive more low ratings

Reviews Score#

pb.configure(
    df = df_sales
    , metric = 'order_avg_reviews_score'
    , metric_label = 'Average Order Reviews Score'
    , metric_label_for_distribution = 'Order Reviews Score'
    , agg_func = 'mean'
    , title_base = 'Average Order Reviews Score and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}        
)

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(order_avg_reviews_score='Order Reviews Score')
    , title='Distribution of Order Reviews Score'
    , xaxis_type='category'
)
Summary Statistics for "order_avg_reviews_score" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 96.35k (100%) Max 5 Mean 4.14 5 56.65k (59%)
Missing --- 99% 5 Trimmed Mean (10%) 4.42 4 18.93k (20%)
Distinct 5 (<1%) 95% 5 Mode 5 1 9.71k (10%)
Non-Duplicate 0 (<1%) 75% 5 Range 4 3 8.03k (8%)
Duplicates 96.34k (99%) 50% 5 IQR 1 2 3.03k (3%)
Dup. Values 5 (<1%) 25% 4 Std 1.30
Zeros --- 5% 1 MAD 0
Negative --- 1% 1 Kurt 0.84
Memory Usage 1 Min 1 Skew -1.45
../../_images/261a212a2ad537a3bf6c3daa097d636287c3b87be84f46f80b83b89feee6fe56.jpg

Key Observations:

  • 59% of orders have 5-star reviews

Let’s look by different dimensions.

By Season

pb.bar_groupby(
    x='purchase_season'
    , color='purchase_year'
    , title='Average Order Reviews Score by Season and Year'
)
../../_images/4964ab6380a780d8289fa8144e0007b851d0179d3cd73498c9ed790518c826ea.jpg

Key Observations:

  • Winter 2018 had slightly higher ratings

  • Other seasons slightly higher in 2017

By Day of the Week and Hour of the Day

pb.heatmap(
    x='purchase_hour'
    , y='purchase_weekday'
    , text_auto='.1f'
    , title='Average Order Reviews Score by Day of the Week and Hour of the Day'
    , labels=dict(color = 'Score')
).update_layout(xaxis_dtick=1)
../../_images/a7634fc6714462f33afa4e33cac279c51ad22de1fe44c8493b56bceb56a8665c.jpg

Key Observations:

  • Nighttime shows rating extremes (especially Thursdays)

By Delivery Delay Status

pb.cat_compare(
    cat2='is_delayed'
    , visible_graphs=[2]
)
pb.bar_groupby(
    y='is_delayed'
    , show_count=True
    , to_slide=True
).show()
../../_images/405065e3f064927a26c6ef71fac33d2117fed4c0ca68b6caef4302b52627b1b4.jpg ../../_images/d17ff18bf2c346adc1348880e29b18b3195c1ffac8c0a31226629a7e66b69a28.jpg

Key Observations:

  • Non-delayed orders have significantly higher ratings

  • Higher 5-star share for on-time deliveries

  • “Unknown” delivery status orders mostly get 1-star

By Delivery Time Category

pb.cat_compare(
    cat2='delivery_time_days_cat'
    , visible_graphs=[2]
)
pb.bar_groupby(
    y='delivery_time_days_cat'
    , show_count=True
).show()
../../_images/434cff2067acb840519e1419acbbbe8033e076fd74d66fcf4a2b7fe8ca620d21.jpg ../../_images/c12727d72a9903b9f4d4409dcaa565f9bb04dfedd4e1be0a11b4d2ed58b41504.jpg

Key Observations:

  • Faster deliveries get better ratings

By Customer State

pb.cat_compare(
    cat2='customer_state'
    , visible_graphs=[2]
    , trim_top_n_cat2=7
)
fig = pb.bar_groupby(
    y='customer_state'
    , show_count=True
).update_layout(xaxis_domain=[0, 0.4], xaxis2_domain=[0.6, 1])
pb.to_slide(fig)
fig.show()
../../_images/7b22fcabb5b1e6d5e19e79a263003feca3e156d503c4bbaeb41ae0a80f12a6e3.jpg ../../_images/912ec4e091b12b095c10975c64688258833f6da44e973ea9df925bcf88a31841.jpg

Key Observations:

  • Maranhão has lowest average rating among top states

  • Rio de Janeiro and Bahia have highest 1-star share

By Customer City

pb.cat_compare(
    cat2='customer_city'
    , visible_graphs=[2]
    , trim_top_n_cat2=7
)
pb.bar_groupby(
    y='customer_city'
    , show_count=True
).update_layout(xaxis_domain=[0, 0.4], xaxis2_domain=[0.6, 1]).show()
../../_images/cdb7b402cf821cfcd5e5fa1d33ffa1bf3b6100560f357699188c3973ab210a04.jpg ../../_images/a96d27519de4124178eac75e56ba0b7ab35bccbdc67c53b9a6db5e25f98f907e.jpg

Key Observations:

  • Rio de Janeiro and Porto Alegre have notable 1-star concentrations

Order Weight#

pb.configure(
    df = df_sales
    , metric = 'total_weight_kg'
    , metric_label = 'Average Weight of Order, kg'
    , metric_label_for_distribution = 'Weight of Order, kg'
    , title_base = 'Average Weight of Order and Number of Sales'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}        
)

Top Orders

pb.metric_top()
total_weight_kg
order_id
9aec4e1ae90b23c7bf2d2b3bfafbd943 184.40
2455cbeb73fd04b170ca2504662f95ce 154.20
8f6f263a5e96515d5534199b74cb7748 144.30
be382a9e1ed25128148b97d6bfdb21af 129.34
cf4659487be50c0c317cff3564c4a840 112.20
53cbc02ffe278ca84b6f4920d9d3ecd5 108.50
60b570df9018c6b6441017314a2dd081 98.40
f60ce04ff8060152c83c7c97e246d6a8 97.00
1446ae966d68c3abad1ca3a3ce58033e 96.80
35990049382e07dba1a9ef3550cad655 93.90

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(total_weight_kg='Weight of Order, kg')
    , title='Distribution of Weight of Order'
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "total_weight_kg" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.35k (100%) Max 184.40 Mean 2.39 0.20 5.36k (6%)
Missing --- 99% 22.35 Trimmed Mean (10%) 1.30 0.15 4.16k (4%)
Distinct 1.48k (2%) 95% 10.50 Mode 0.20 0.30 3.69k (4%)
Non-Duplicate 384 (<1%) 75% 2.05 Range 184.40 0.25 3.69k (4%)
Duplicates 94.87k (98%) 50% 0.75 IQR 1.75 0.40 3.30k (3%)
Dup. Values 1.09k (1%) 25% 0.30 Std 4.77 0.10 2.87k (3%)
Zeros 5 (<1%) 5% 0.15 MAD 0.82 0.35 2.75k (3%)
Negative --- 1% 0.10 Kurt 96.51 0.50 2.36k (2%)
Memory Usage 1 Min 0 Skew 6.58 0.60 2.27k (2%)
../../_images/8d8edcd743a65f85a9c276854a97dcda76dff14811c1307d59b6664064dc94ef.jpg

Key Observations:

  • 75% of orders ≤2kg

  • 5% ≤150g

  • 5% ≥10kg

Let’s look by different dimensions.

By Season

Since 2018 has incomplete monthly data, it’s better to also analyze by year…

pb.bar_groupby(
    x='purchase_season'
    , color='purchase_year'
    , title='Average Weight of Order by Season and Year'
)
../../_images/cd1db3478ce77c15f7f76413ebe2a1ad623ad8e8a6bba23607b775ba8c8b6da3.jpg

Key Observations:

  • 2017 had heavier orders across all seasons

By Time of Day

pb.histogram(
    color='purchase_time_of_day'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='purchase_time_of_day'
    , show_count=True
).show()
../../_images/2bfba56670add7b9e33d093f3a6618b4db58419e194eb119788368c769f47d6b.jpg ../../_images/99204f195f988b1470c0afef86763aa958ec9986f9ad942bbcf5846dff5c0183.jpg

Key Observations:

  • Afternoons have heaviest orders

  • Nights have lightest

By Whether the Order is Delayed or Not

pb.histogram(
    color='is_delayed'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='is_delayed'
    , show_count=True
    , to_slide=True
).show()
../../_images/e0304697897015c994d0376df7191504a5b40b881f7b715a7d8ed09ce6dc9d3d.jpg ../../_images/85fddbcd6e74f34919d606488b087c65afa417fe03bfac7e1a17738464a03b33.jpg

Key Observations:

  • Delayed orders are heavier

By Presence of Installment Payments

pb.histogram(
    color='order_has_installment'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_has_installment'
    , show_count=True
    , to_slide=True
)
../../_images/08dc28d92a6bfe99394fdfe94f4dff68601c44875ba725b077bccf51effcbbd4.jpg ../../_images/dd909b893f9309e0a638dfe613bef60213f746642cf45f97557a84e3fa77dc90.jpg

Key Observations:

  • Installment orders are heavier

By Top Customer States

pb.box(
    y='customer_state'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_state'
    , show_count=True
    , to_slide=True
).show()
../../_images/3ca67e0650f2102ed0bc3609651cc0f12b195b3c845285bf70f6db69fb0b75e8.jpg ../../_images/bc9d10043962f405d00fd8015a021e1fa0ac3bb3730c42087486d35f1b98630e.jpg

Key Observations:

  • Mato Grosso has heaviest average orders among top states

By Top Customer Cities

pb.box(
    y='customer_city'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_city'
    , show_count=True
    , to_slide=True
)
../../_images/82b2e0692bda64ed9c6e374ab1f726d15b339fd037df97b2cea00eda1292bd7e.jpg ../../_images/0f1b3bef6b945338272d6607ae2707a0e805db4ad2e6d44b02f6335144468960.jpg

Key Observations:

  • Santos and Rio de Janeiro have heaviest average orders

By Review Score

pb.bar_groupby(y='order_avg_reviews_score', show_count=True, to_slide=True)
../../_images/14575f14a0bcbe932b8898302da889b501ea60881f06b8c6a85599fe328ab439.jpg

Key Observations:

  • 1-star reviews have significantly heavier orders

  • 2-star reviews rank second

  • Heavy orders receive lower ratings

Number of Products per Order#

pb.configure(
    df = df_sales
    , metric = 'products_cnt'
    , metric_label = 'Average Number of Products in Order'
    , metric_label_for_distribution = 'Number of Products in Order'
    , title_base = 'Number of Products in Order and Number of Sales'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}          
)

Top Orders

pb.metric_top()
products_cnt
order_id
8272b63d03f5f79c56e9e4120aec44ef 21.00
1b15974a0141d54e36626dca3fdc731a 20.00
ab14fdcfbe524636d65ee38360e22ce8 20.00
428a2f660dc84138d969ccd69a0ab6d5 15.00
9ef13efd6949e4573a18964dd1bbe7f5 15.00
9bdc4d4c71aa1de4606060929dee888c 14.00
73c8ab38f07dc94389065f7eba4f297a 14.00
37ee401157a3a0b28c9c6d0ed8c3b24b 13.00
2c2a19b5703863c908512d135aa6accc 12.00
af822dacd6f5cff7376413c03a388bb7 12.00

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(products_cnt='Number of Products in Order')
    , title='Distribution of Number of Products in Order'
)
Summary Statistics for "products_cnt" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 96.35k (100%) Max 21 Mean 1.14 1 86.74k (90%)
Missing --- 99% 3 Trimmed Mean (10%) 1 2 7.38k (8%)
Distinct 17 (<1%) 95% 2 Mode 1 3 1.30k (1%)
Non-Duplicate 2 (<1%) 75% 1 Range 20 4 493 (<1%)
Duplicates 96.33k (99%) 50% 1 IQR 0 5 192 (<1%)
Dup. Values 15 (<1%) 25% 1 Std 0.54 6 189 (<1%)
Zeros --- 5% 1 MAD 0 7 22 (<1%)
Negative --- 1% 1 Kurt 116.86 8 8 (<1%)
Memory Usage 1 Min 1 Skew 7.57 10 8 (<1%)
../../_images/9ec76491beafc30ab05dcf50807ccf3624f8b6e568c864e1a4b9221e3655d156.jpg

Key Observations:

  • 90% of orders contain single product

  • Two anomalies had 20-21 products

Let’s look by different dimensions.

By Whether the Order is Delayed or Not

pb.bar_groupby(
    y='is_delayed'
    , show_count=True
).show()
../../_images/b7737ff986b554be8fbd5dcf19dc803b74f584662711657edf36c04b7e8d14ce.jpg

Key Observations:

  • Non-delayed orders have slightly more products

By Review Score

pb.bar_groupby(y='order_avg_reviews_score', show_count=True, to_slide=True)
../../_images/e0e9e2dbcd6242f1ce92ac616da73a55da777fc58a3275c3a8f1e752b4e88cf9.jpg

Key Observations:

  • 1/2-star reviews have more products per order

Product Price per Order#

pb.configure(
    df = df_sales
    , metric = 'avg_products_price'
    , metric_label = 'Average Product Price in Order, R$'
    , metric_label_for_distribution = 'Product Price in Order, R$'
    , agg_func = 'mean'
    , title_base = 'Average Product Price in Order and Number of Sales'
    , axis_sort_order='descending'
    , text_auto='.3s'
    , update_fig={'xaxis2': {'title_text': 'Number of Sales'}}         
)

Top Orders

pb.metric_top()
avg_products_price
order_id
0812eb902a67711a1cb742b3cdaa65ae 6,735.00
fefacc66af859508bf1a7934eab1e97f 6,729.00
f5136e38d1a14a4dbd87dff67da82701 6,499.00
a96610ab360d42a2e5335a3998b4718a 4,799.00
199af31afc78c699f0dbf71fb178d4d4 4,690.00
8dbc85d1447242f3b127dda390d56e19 4,590.00
426a9742b533fc6fed17d1fd6d143d7e 4,399.87
68101694e5c5dc7330c91e1bbc36214f 4,099.99
b239ca7cd485940b31882363b52e6674 4,059.00
86c4eab1571921a6a6e248ed312f5a5a 3,999.90

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(avg_products_price='Average Product Price in Order, R$')
    , title='Distribution of Average Product Price in Order'
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)
Summary Statistics for "avg_products_price" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 96.35k (100%) Max 6.74k Mean 125.24 59.90 1.96k (2%)
Missing --- 99% 899.90 Trimmed Mean (10%) 90.41 69.90 1.70k (2%)
Distinct 7.01k (7%) 95% 362.86 Mode 59.90 49.90 1.58k (2%)
Non-Duplicate 3.51k (4%) 75% 139.90 Range 6.73k 89.90 1.30k (1%)
Duplicates 89.34k (93%) 50% 79 IQR 97.95 99.90 1.23k (1%)
Dup. Values 3.50k (4%) 25% 41.95 Std 189.82 39.90 1.07k (1%)
Zeros --- 5% 18.49 MAD 63.90 29.90 1.04k (1%)
Negative --- 1% 10.99 Kurt 119.27 79.90 1.04k (1%)
Memory Usage 1 Min 0.85 Skew 7.87 19.90 990 (1%)
../../_images/d0c9ec4971a46a8f65082ad18ef838ed859939ae7ecba52954ecbddd1b145585.jpg

Key Observations:

  • 75% of orders have average product price ≤140 R$

  • 5% have ≥363 R$

Let’s look by different dimensions.

By Season

Since 2018 has incomplete monthly data, it’s better to also analyze by year…

pb.bar_groupby(
    x='purchase_season'
    , color='purchase_year'
    , title='Average Product Price in Order by Season and Year'
)
../../_images/c353d9d3911c76969f416f31fea2053bfc17c90d07d9490d59d1764ab58b5050.jpg

Key Observations:

  • Summer/fall 2017 had higher product prices

  • Winter 2018 was higher

By Time of Day

pb.histogram(
    color='purchase_time_of_day'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='purchase_time_of_day'
    , show_count=True
).show()
../../_images/2817db7f2bf77cac94c9753ab2b0c2788133a3a4bd7f1bfa5a2716e17b83afc4.jpg ../../_images/85e66e5f5adfcc5f8b9e59fc34f8f8b7524d5aa8f7d52874304d319a8c8bd37a.jpg

Key Observations:

  • Nighttime has lower product prices

By Whether the Order is Delayed or Not

pb.histogram(
    color='is_delayed'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='is_delayed'
    , show_count=True
).show()
../../_images/510abf68da353327e8e63002d419162207e476886270847878781ce1f2dd9d6b.jpg ../../_images/ceb9cfee0e024226637535f160987ed4e0291c81efdb956c476fa183b7a1d2b0.jpg

Key Observations:

  • Delayed orders have higher product prices

By Presence of Installment Payments

pb.histogram(
    color='order_has_installment'
    , upper_quantile=0.95
    , mode='dual_box_trim'
    , show_box=True
    , show_hist=False
    , show_kde=True
).show()
pb.bar_groupby(
    y='order_has_installment'
    , show_count=True
    , to_slide=True
).show()
../../_images/d9950670df11c5a053ebb490bb07b49b0298f16052aae3583fb55f5acb5e98c7.jpg ../../_images/1dedbee0538ab0b000082db54b011d5e5ad8581661485ffbe6ab783f03afcaa4.jpg

Key Observations:

  • Installment orders have significantly higher product prices

By Top Customer States

pb.box(
    y='customer_state'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_state'
    , show_count=True
).show()
../../_images/17a40ad1fd542b7741baf7d64d4aeec76b8383e45b516c69b1a77f9448877224.jpg ../../_images/f1eba4dd35228c656a4f9921190ab893643d9b68c61c5f5e53124f125755a96c.jpg

Key Observations:

  • Para has highest average product price among top states

  • São Paulo has lowest

By Top Customer Cities

pb.box(
    y='customer_city'
    , upper_quantile=0.95
    , show_dual=True
).show()
pb.bar_groupby(
    y='customer_city'
    , show_count=True
).show()
../../_images/67cdd6bb326de6838a76a74cc2352743bdcf4251abc32d63a1b52e4c93943fd5.jpg ../../_images/3220932d607163607b98a6017fe029eab9e99874dfaf8bc287bd2519fd9b7580.jpg

Key Observations:

  • Brasília, Rio de Janeiro and Salvador have highest product prices among top cities.

Number of Sellers per Order#

pb.configure(
    df = df_sales
    , metric = 'sellers_cnt'
    , metric_label = 'Average Number of Sellers in Order'
    , agg_func = 'mean'
    , axis_sort_order='descending'
)

Top Orders

pb.metric_top()
sellers_cnt
order_id
1c11d0f4353b31ac3417fbfa5f0f2a8a 5.00
cf5c8d9f52807cb2d2f0a0ff54c478da 5.00
91be51c856a90d7efe86cf9d082d6ae3 4.00
8c2b13adf3f377c8f2b06b04321b0925 4.00
1d23106803c48c391366ff224513fb7f 4.00
338ffe54b65a3b8124c81ebe6d1cc4b0 3.00
a8c403407745aac434946d5058faa6a6 3.00
ffb8f7de8940249a3221252818937ecb 3.00
30bdf3d824d824610a49887486debcaf 3.00
3040863957c9336e7389512584639bb5 3.00

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(sellers_cnt='Number of Sellers in Order')
    , title='Distribution of Number of Sellers in Order'
    , xaxis_type='category'
)
Summary Statistics for "sellers_cnt" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 96.35k (100%) Max 5 Mean 1.01 1 95.07k (99%)
Missing --- 99% 2 Trimmed Mean (10%) 1 2 1.21k (1%)
Distinct 5 (<1%) 95% 1 Mode 1 3 54 (<1%)
Non-Duplicate 0 (<1%) 75% 1 Range 4 4 3 (<1%)
Duplicates 96.34k (99%) 50% 1 IQR 0 5 2 (<1%)
Dup. Values 5 (<1%) 25% 1 Std 0.12
Zeros --- 5% 1 MAD 0
Negative --- 1% 1 Kurt 118.85
Memory Usage 1 Min 1 Skew 9.88
../../_images/a9ab3efb53d7f922d0c908afb43b42b168a859ae40c1a5561d75ae0013543153.jpg

Key Observations:

  • 99% of orders have single seller

Number of Categories per Order#

pb.configure(
    df = df_sales
    , metric = 'product_categories_cnt'
    , metric_label = 'Average Number of Categories in Order'
    , agg_func = 'mean'
    , axis_sort_order='descending'
)

Top Orders

pb.metric_top()
product_categories_cnt
order_id
8c2b13adf3f377c8f2b06b04321b0925 3.00
e8c92cfd87f5f0c6d2fc5bc1df5f02b4 3.00
1fcbc88015c88c1a14d4b8ec35ea8ed7 3.00
91be51c856a90d7efe86cf9d082d6ae3 3.00
4ca4a1922b582950b25cce6e7ef34315 3.00
2f8f31eb2f7b6572836d662a6625c8e4 3.00
ceb35b18f6b84c1c75f02859ce3160d9 3.00
cbb7694680a105281d391bf7002c0477 3.00
6616fa4c89b8bf2a7e17271cdc542fca 3.00
d4bec1a24c97bd17be18d77297a0f6a0 3.00

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(product_categories_cnt='Number of Categories in Order')
    , title='Distribution of Number of Categories in Order'
    , xaxis_type='category'
)
Summary Statistics for "product_categories_cnt" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 96.35k (100%) Max 3 Mean 1.01 1 95.57k (99%)
Missing --- 99% 1 Trimmed Mean (10%) 1 2 760 (<1%)
Distinct 3 (<1%) 95% 1 Mode 1 3 18 (<1%)
Non-Duplicate 0 (<1%) 75% 1 Range 2
Duplicates 96.34k (99%) 50% 1 IQR 0
Dup. Values 3 (<1%) 25% 1 Std 0.09
Zeros --- 5% 1 MAD 0
Negative --- 1% 1 Kurt 141.03
Memory Usage 1 Min 1 Skew 11.56
../../_images/da08b731eeada8d71b3b6697024f0297366774f6bf69c1c55cb0bc07bd5e5ad1.jpg

Key Observations:

  • 99% of orders have single category