Time Series Analysis#

Order Volume vs. Successful Purchases#

Order Status Distribution

First, we examine order counts by status. Note that for purchase analysis, we’ll focus solely on delivered orders since other statuses don’t represent completed purchases.

labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , order_id = 'Number of Orders'
    , order_status = 'Order Status'
))
df_orders.viz.line(
    x=labels.index[0]
    , y=labels.index[1]
    , color='order_status'
    , agg_func='nunique'
    , freq='ME'
    , labels=labels
    , width=900
    , title='Number of Orders by Order Status and Month'
)
../../_images/bd4bc6a63b6896bf74a4fe89fab982c9ad6500c545a4ae0ca068d28954f68dbc.jpg

Key Observations:

  • Delivered orders consistently dominated all other statuses throughout the period.

Let’s look without delivered.

labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , order_id = 'Number of Orders'
    , order_status = 'Order Status'
))
df_orders[lambda x: x.order_status != 'Delivered'].viz.line(
    x=labels.index[0]
    , y=labels.index[1]
    , color='order_status'
    , agg_func='nunique'
    , freq='ME'
    , labels=labels
    , width=900
    , markers=True
    , title='Number of Orders by Order Status and Month (Without Delivered)'
)
../../_images/9a36c5abab69a43ecb9dcbf012b881a79919c73c61abc30ff43fb27ab9adb8b5.jpg

Key Observations:

  • March-April 2018 saw a sharp spike in orders stuck in “shipped” status

  • February and August 2018 peaks in “canceled” status

  • November 2017 “unavailable” peak coincided with Black Friday promotions

  • As identified during anomaly detection, most of these orders ultimately weren’t delivered for various reasons

Let’s compare the number of orders and sales.

tmp_df_orders_resampled = (
    df_orders.resample('ME', on='order_purchase_dt')['order_id']
    .nunique()
    .reset_index(name='Orders')
)
tmp_tmp_df_resampled = (
    df_sales.resample('ME', on='order_purchase_dt')['order_id']
    .nunique()
    .reset_index(name='Sales')
    .merge(tmp_df_orders_resampled, on='order_purchase_dt', how='inner')
)
tmp_tmp_df_resampled['sale_share'] = tmp_tmp_df_resampled['Sales'] / tmp_tmp_df_resampled['Orders']
tmp_tmp_df_resampled_melted = tmp_tmp_df_resampled.melt(id_vars='order_purchase_dt', value_vars=['Orders', 'Sales'], var_name='order_or_sale', value_name='count')

By month

labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , count = 'Number of Orders'
    , order_or_sale = 'Orders or Sales'
))
tmp_tmp_df_resampled_melted.viz.line(
    x=labels.index[0]
    , y=labels.index[1]
    , color='order_or_sale'
    , labels=labels
    , category_orders=dict(order_or_sale=['Orders', 'Sales'])
    , title='Number of Orders and Sales by Month'
)
../../_images/1a836ebce50a1c3398d37a11922c994cad9abe18396a4dd427dbcbbdafac13b1.jpg

Key Observations:

  • Stable purchase-to-order ratio maintained month-over-month

Let’s look at conversion to success sale

labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , sale_share = 'Conversion to Sale'
))
fig = tmp_tmp_df_resampled.viz.line(
    x=labels.index[0]
    , y=labels.index[1]
    , labels=labels
    , title='Conversion to Success Sale by Month'
)
pb.to_slide(fig)
fig.show()
../../_images/7ba9901f6f0d479d1d79aa348e356c5f2fe1f03c82f0e6671763c31417705e98.jpg

Key Observations:

  • Consistent month-over-month improvement in successful purchase conversion

Share of Canceled Orders#

pb.configure(
    time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'target_share'
    , metric_label = 'Share of Canceled Orders'
    , freq='ME'
)

By month

tmp_tmp_df_res = df_orders['order_status'].preproc.calc_target_category_share(
    target_category='Canceled'
    , group_columns=['order_purchase_dt']
    , resample_freq = 'ME'
)
pb.line(
    data_frame=tmp_tmp_df_res
    , to_slide=True
)
../../_images/525c7605348b8859cd14bf6b8995ce366f209fd5472ede7cf9ce2a45666ac19c.jpg

Key Observations:

  • Fluctuated between 0.2% and 1.2% across months

By Review Score

tmp_tmp_df_res = df_orders['order_status'].preproc.calc_target_category_share(
     target_category='Canceled'
    , group_columns=['order_purchase_dt', 'order_avg_reviews_score']
    , resample_freq = 'ME'
)
pb.line(
    data_frame=tmp_tmp_df_res
    , color='order_avg_reviews_score'
    , to_slide=True
)
../../_images/edc1eb61336c55cc09918e778932149b82e69bbfcc82a181ee93dc13b05c940a.jpg

Key Observations:

  • Orders with score 1 showed significantly higher cancelation rates

  • Scores 4-5 maintained 0% cancelation rates for most months

del tmp_tmp_df_res

Number of Sales#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'order_id'
    , metric_label = 'Number of Sales'
    , agg_func = 'nunique'
    , freq = 'ME'
)

By Day and Month

for freq in ['D', 'ME']:
    pb.line_resample(
        freq=freq
        , to_slide=True if freq == 'ME' else False
    ).show()
../../_images/676477e0a9d05af735aea6618cd61170ee7a3126917debf019b0bd6b2109cafe.jpg ../../_images/9d4d8c6f3006f3e7ba117c1de785e0dd4e5d11b85e3c9f8a0d2b7c122daf01bb.jpg

Key Observations:

  • There was an anomalous peak in sales volume on November 24, 2017 - Black Friday

  • The number of sales steadily increased until November 2017, reaching a peak of approximately 7.3k orders per month. After that, sales stabilized and fluctuated in the range of 6-7k orders per month.

Monthly Growth

pb.period_change(
    period='mom'
    , to_slide=True
)
../../_images/3e149b5b243a641c6a3e6b98958af5b72a5467d3a719d4c2184ec8a21c8192fd.jpg

Key Observations:

  • The most significant decline in sales (more than 5% month-over-month) was observed in April, June, and December 2017, as well as in February and June 2018.

  • Conversely, the most pronounced spikes in growth (exceeding 50%) were recorded in February, March, and May 2017, and in November 2018 (Black Friday).

  • This indicates strong demand volatility in specific months, possibly linked to seasonality or marketing activities.

By Time of Day

pb.line_resample(
    color='purchase_time_of_day'
    , to_slide=True
)
../../_images/caf15f7f21264208ccd7b50d4855a05cd0c386768cee8796393a4dee8137f65c.jpg

Key Observations:

  • Nighttime sales are on average lower than other times of day

  • The highest sales volume occurs in the evening

By Day of Week

Analyzing trends by month would be incorrect, as one month might have 4 Mondays while another has 5.

Therefore, we’ll examine weekly patterns.

pb.area_resample(color='purchase_weekday', freq='W', title='Number of Sales by Day of Week and Week')
../../_images/410dfcd093306e3e92444df19ce4a2dc59b1ea56d4548fe446a47da3651a2ef4.jpg
pb.heatmap(x=pd.Grouper(key=pb.time_column, freq='W')
        , y='purchase_weekday'
        , text_auto=False
        , title='Number of Sales by Day of Week and Week'
)
../../_images/6a4ca66841fcd7d3358c005727c46f8dabf7c37dba38aed3b0b0bbdcea0fddf0.jpg

Key Observations:

  • There’s no significant difference in sales by day of week, though weekends show slightly lower volumes

  • The peak value occurred on Black Friday

By Weekday vs Weekend

pb.line_resample(color='purchase_day_type')
../../_images/b97e44aae09b232c4d75b36f1b56233aa5551fc62fcd36e7659755f5618c6335.jpg

Key Observations:

  • Weekday sales grew faster than weekend sales

By Review Score

pb.line_resample(
    color='order_avg_reviews_score'
    , to_slide=True
)
../../_images/e3e882c2caaab61bd3dbc4260c6b332846308a971df09d75ee6ac17130d0d4b7.jpg

Key Observations:

  • Orders with 5-star reviews significantly outnumber others each month

  • Orders with 2-star reviews are consistently the least common

  • 5-star orders continued growing in 2018 despite overall sales plateauing, due to declining 1-star orders

By Delivery Delay Status

pb.line_resample(
    color='is_delayed'
    , to_slide=True
)
../../_images/7bbceb5c39cde7adb4d081b202db2e662a9d6edab6c4bbc59a045ec647ea39cc.jpg

Key Observations:

  • Delayed orders aren’t increasing proportionally with total orders - a positive trend

  • Peak months for delays: November 2017 (Black Friday) and March 2018

By Payment Category

pb.line_resample(color='order_total_payment_cat')
../../_images/97ee385a11e01e3bf275317b1351a3802dbbcc8b7cbaa2343674a7ea5f73c98f.jpg

Key Observations:

  • The overall sales trend remains consistent across cheap, medium, and expensive orders

By Order Weight Category

pb.line_resample(color='order_total_weight_cat')
../../_images/b9b00bb4cde58bffb702eb6df3d763258a7fb739cf30819351edfdd99d54845f.jpg

Key Observations:

  • The overall sales trend remains consistent across light, medium, and heavy orders

By Delivery Time Category

pb.line_resample(
    color='delivery_time_days_cat'
    , to_slide=True
)
../../_images/d1f808f53db25073ae75fdcca5788f3b22deb8939c8f5b49ff312b6551df1dd8.jpg

Key Observations:

  • November 2017 saw a sharp spike in long-delivery orders (likely Black Friday effect)

  • High volumes of long deliveries persisted until March 2018

  • By August 2018, long deliveries became less common than medium and fast ones

By Presence of Installment Payments

pb.line_resample(color='order_has_installment')
../../_images/25461af3278db8dbc24e5d3a82c2487012ef892569289f5f72f10a7d6d3b1f48.jpg

Key Observations:

  • Before 2018, non-installment orders lagged behind installment ones, but volumes equalized in 2018

By Top Customer States

pb.line_resample(
    color='customer_state'
    , to_slide=True
)
../../_images/adb7d6def83bc6f086afc2d132370f019007c42edcbf33232c8b7b26f2bb13ae.jpg

Key Observations:

  • São Paulo state consistently led in sales volume throughout the period

  • Unlike other states, São Paulo maintained stable monthly sales in 2018

  • Rio de Janeiro and Minas Gerais ranked second and third respectively

By Top Customer Cities

pb.line_resample(
    color='customer_city'
    , to_slide=True
)
../../_images/e7a81eab408c48098e841ea649775de63ac5ee3268cd0d4c7b5864f52c44f943.jpg

Key Observations:

  • São Paulo city consistently had the highest sales volume

  • Rio de Janeiro ranked second

  • Unlike other cities, São Paulo showed monthly sales growth in 2018

Sales Amount#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'total_payment'
    , metric_label = 'Sales Amount, R$'
    , metric_label_for_distribution = 'Order Value, R$'
    , title_base = 'Sales Amount'
    , agg_func = 'sum'
    , freq='ME'
)

By Day and Month

pb.box(mode='time_series', freq='M').show()
pb.box(mode='time_series', freq='M', upper_quantile=0.95).show()
for freq in ['D', 'ME']:
    pb.line_resample(
        freq=freq
        , to_slide=True if freq == 'ME' else False
    ).show()
../../_images/0bca4a3e8c4bf41eadf177478967d2078593f2282a7306400a995b6a65aade90.jpg ../../_images/2d72ac349f38c379c0501b4a336082b3788ba0527a36efa32a505eda60a9bb1a.jpg ../../_images/fdba9ab55d99b8cbaaa7c8c1a0d66955490b37ddffd04136993259fe1ab090ac.jpg ../../_images/26408813b8b00d8bae35789c39a30cc87554ce20facf07029478c13fb6c63f7c.jpg

Key Observations:

  • There was an anomalous peak in sales revenue on November 24, 2017 - Black Friday

  • Monthly sales revenue grew until 2018, then stabilized at 1-1.2 million R$ per month

Monthly Growth

pb.period_change(
    period='mom'
    , to_slide=True
)
../../_images/62d45e66b9c49abb63f436290e543ce95d3a3604fecc4e7ad29e2788df5f827f.jpg

Key Observations:

  • Sales revenue dropped by more than 5% month-over-month in April, June, and December 2017, and February and June 2018

By Time of Day

pb.line_resample(color='purchase_time_of_day', to_slide=True)
../../_images/13fca4eaf8dec230118a675bc5e55d1c71fa858ce5034e0f87f0826859b0c190.jpg

Key Observations:

  • Nighttime sales revenue is on average lower than other times

  • Highest revenue occurs in evenings and afternoons

By Day of Week

pb.area_resample(freq='W', color='purchase_weekday', title='Sales Amount by Day of the Week and Week')
../../_images/48c97ba3747a6e0cf296a2393829982fbf71044c0c1605d86fe94ea9144c09e1.jpg
pb.heatmap(x=pd.Grouper(key=pb.time_column, freq='W')
        , y='purchase_weekday'
        , text_auto=False
        , title='Sales Amount by Day of the Week and Week'
)
../../_images/4f29baa200cc955c3393906c0e8503a6b592f1667d4826a46920ae54bd696166.jpg

Key Observations:

  • No significant difference in revenue by day of week, though weekends are slightly lower

By Weekday vs Weekend

pb.line_resample(color='purchase_day_type')
../../_images/18afbe5ce9fe8b917bde095e2a4c6baa9feac66d3a8b1ef905805df07b6774bc.jpg

Key Observations:

  • Weekday revenue grew faster than weekend revenue

By Review Score

pb.line_resample(color='order_avg_reviews_score', to_slide=True)
../../_images/b60daa0e4d0b3603479590be3bdd467ca1fe03f5bd298f70cddd9e29e7a27bab.jpg

Key Observations:

  • Orders with 5-star reviews generate significantly more revenue each month

  • Orders with 2-star reviews consistently generate the least revenue

  • Black Friday saw the strongest revenue spikes for both 5-star and 1-star orders

  • 5-star order revenue continued growing in 2018 despite overall stagnation, partly due to declining 1-star order revenue

By Whether the Order is Delayed or Not

pb.line_resample(color='is_delayed', to_slide=True)
../../_images/f46d4d2cdb75cc8fbd8e8db13e7a9fdb9ca79204569a7b74c6c561e89c6aec89.jpg

Key Observations:

  • Peak months for delayed order revenue: November 2017 (Black Friday) and March 2018

By Payment Category

pb.line_resample(color='order_total_payment_cat')
../../_images/cd3cf64ef411d4660e662c886fb1b3842cfc178344b3a42a2082fc075e0e5b50.jpg

Key Observations:

  • The overall revenue trend remains consistent across cheap, medium, and expensive orders

By Order Weight Category

pb.line_resample(color='order_total_weight_cat')
../../_images/61fca8b67841fbf560b45f1802a8ee2927612832e0bd19bc93f45478eae22f6a.jpg

Key Observations:

  • The overall revenue trend remains consistent across light, medium, and heavy orders

By Delivery Time Category

pb.line_resample(color='delivery_time_days_cat', to_slide=True)
../../_images/3a3ccc90bd6a20c2d3e03489b7bfa44e67bd9995e56b614e6be6abf0dd14dcee.jpg

Key Observations:

  • November 2017 saw a sharp spike in revenue from long-delivery orders (likely Black Friday effect)

  • High volumes of long deliveries persisted until April 2018 (unexplained by Black Friday)

  • The subsequent sharp decline in long deliveries wasn’t matched by growth in other categories

  • This may explain why overall revenue stopped growing in 2018

By Presence of Installment Payments

pb.line_resample(color='order_has_installment', to_slide=True)
../../_images/d742ace525aff833ae40647ac8527f85e5f9657cbaeb11d0542fde9977493b0e.jpg

Key Observations:

  • Installment orders consistently generated higher revenue than non-installment orders

By Top Customer States

pb.line_resample(color='customer_state', to_slide=True)
../../_images/ff8d3de91286d0badd7c96e6c09b9c7fe655a2f17ee2526a686bcaa33f3b6245.jpg

Key Observations:

  • São Paulo state consistently led in sales revenue

  • Rio de Janeiro and Minas Gerais ranked second and third respectively

By Top Customer Cities

pb.line_resample(color='customer_city', to_slide=True)
../../_images/16d1e4dd958137c693964e60b12ad1a5acb419adf6fb7f1a3ffc81b387d5411e.jpg

Key Observations:

  • São Paulo city consistently generated the highest sales revenue

  • Rio de Janeiro ranked second

By Payment Type

pb.line_resample(color='order_payment_types', to_slide=True)
../../_images/581f746401f0e16630d7f62ca741add7d84fb85f8c8eeb97ac0568a2ed8cccf3.jpg

Key Observations:

  • Credit card payments consistently generated the highest revenue, with boleto second

  • Debit card payment revenue grew from June 2018

By Product Category

pb.line_resample(color='order_general_product_categories', to_slide=True)
../../_images/56002f59bf91000bb4dfd6b9163a9235454fbb13156d552b693562ce739d45cb.jpg

Key Observations:

  • Electronics consistently generated the highest revenue, followed by furniture

  • ‘Beauty/Health’ and ‘Home/Garden’ categories continued growing in 2018 while others slowed or declined

Average Order Value#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'total_payment'
    , metric_label = 'Average Order Value, R$'
    , metric_label_for_distribution = 'Order Value, R$'
    , title_base = 'Average Order Value'
    , agg_func = 'mean'
    , freq='ME'
)

By Day and Month

for freq in ['D', 'ME']:
    pb.line_resample(
        freq=freq
        , to_slide=True if freq == 'ME' else False
    ).show()
../../_images/ed1a1e2450ec41bee097200f5cc4b55f26ab531adf91575b1130d8be6aeaa05f.jpg ../../_images/1d56efad0013e5d0e9e4d16a826238563543a8f2f994f98355a13fec08a43146.jpg

Key Observations:

  • The average order value fluctuates between 100-250 R$ daily

  • Black Friday didn’t show a spike in average order value

  • Monthly average order value remains stable at 150-170 R$ without growth

Monthly Growth

pb.period_change(period='mom', to_slide=True)
../../_images/5578624a6a4319673f22a929abd74bb53b21cfff0c4636e439130040bdce4a2e.jpg

Key Observations:

  • Average order value dropped >5% month-over-month in May, July, November 2017 and August 2018

By Time of Day

pb.line_resample(color='purchase_time_of_day')
../../_images/53e47b81b6d55d398cedbc2bc636d139960e889f2439a24148015ebe2c44ed13.jpg

Key Observations:

  • No significant difference in average order value by time of day

  • Nighttime averages are slightly lower

By Whether the Order is Delayed or Not

pb.line_resample(color='is_delayed')
../../_images/f9b2ef777c1b855e7b4a164c887545621f6d993f911046d538cb2f78446bb6d3.jpg

Key Observations:

  • June 2017 saw a major peak in average value for delayed orders

  • Smaller peaks occurred in April/December 2017 and January/June 2018

By Order Weight Category

pb.line_resample(color='order_total_weight_cat')
../../_images/3424be83a16857a4d3559f5950c1cd4021089f5d36639183beca8b5615937555.jpg

Key Observations:

  • Heavy orders show more variability in average value over time

  • Consistently: heavy > medium > light order values

  • April-July 2017 saw significant decline in heavy order values, followed by fluctuating growth

By Delivery Time Category

pb.line_resample(color='delivery_time_days_cat', to_slide=True)
../../_images/71e15d295b65836649366e76bad8273f6c9e3890a8f50d0ecde6ea02a9a4d26f.jpg

Key Observations:

  • For most months, expensive items had longer delivery times

By Presence of Installment Payments

pb.line_resample(color='order_has_installment', to_slide=True)
../../_images/b4cc9225db3b9387659eeea8451f13fe22f906e3b22d89a62e2b86d837037a87.jpg

Key Observations:

  • Installment orders consistently show higher average values (logical as customers can afford more)

  • Installment order values fluctuate more over time

  • Pre-July 2017: steady decline in installment order values

  • Post-July 2017: fluctuating growth

By Review Score

pb.line_resample(color='order_avg_reviews_score', to_slide=True)
../../_images/26a6354c14c0d825294a8a9bfc3bce70e8d383a7251e9b59d6ea88c730c30197.jpg

Key Observations:

  • Orders with 1-star reviews typically had higher average values

  • Orders with 2-star reviews often had higher values than 3/4/5-star orders

  • Conclusion: lower ratings were more common for higher-value orders

By Top Customer Cities

pb.line_resample(color='customer_city')
../../_images/2ba00c99ff322812e9cf0def321c871c60a7b8761d0e47163dbd23bc05b0aa34.jpg

Key Observations:

  • February 2017 saw a major spike in Brasília’s average order value

  • Otherwise, top 5 cities show similar average values

Number of Purchases per Customer per Week#

tmp_df_res = (df_sales.groupby([pd.Grouper(key = 'order_purchase_dt', freq='ME'), 'customer_unique_id'])['order_id']
          .nunique()
          .reset_index(name='order_cnt')
)
tmp_df_res['weeks_in_month_cnt'] = tmp_df_res.order_purchase_dt.dt.days_in_month / 7
tmp_df_res['avg_orders_per_week'] = tmp_df_res['order_cnt'] / tmp_df_res['weeks_in_month_cnt']
tmp_df_res.drop(['order_cnt', 'weeks_in_month_cnt'], axis=1, inplace=True)
tmp_df_res.sort_values('avg_orders_per_week', ascending=False).head(10)
order_purchase_dt customer_unique_id avg_orders_per_week
62 2017-01-31 12f5d6e1cbf93dafd9dcc19095df0b3d 1.35
1722 2017-02-28 a239b8e2fbce33780f1f1912e2ee5275 1.00
51293 2018-02-28 3e43e6105506432c953e165fb2acf44c 1.00
54267 2018-02-28 b4e4f24de1e8725b74e4a1f4975116ed 1.00
24422 2017-09-30 b08fab27d47a1eb6deda07bfd965ad43 0.93
5199 2017-04-30 25a560b9a6006157838aab1bdbd68624 0.93
375 2017-01-31 83e7958a94bd7f74a9414d8782f87628 0.90
93685 2018-08-31 c8460e4251689ba205045f3ea17884a1 0.90
55050 2018-02-28 d3882d7abd0c66064d740d7ed04dd1ef 0.75
54211 2018-02-28 b2bd387fdc3cf05931f0f897d607dc88 0.75

Key Observations:

  • User ‘12f5d6e1cbf93dafd9dcc19095df0b3d’ had the highest weekly purchase frequency (January 2017)

labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , avg_orders_per_week = 'Average number of sales per week'
))
tmp_df_res.viz.line(
          x=labels.index[0]
          , y=labels.index[1]
          , agg_func='mean'
          , freq='ME'
          , labels=labels
          , title='Average number of sales per week by month'
)
../../_images/049238391b86497ac576e12f2d144257dae9ec515254680d33ff72e063ed41e4.jpg

Key Observations:

  • Average weekly purchases per customer remains stable at 0.23-0.25

ARPPU#

pb.configure(
    time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'arppu'
    , metric_label = 'ARPPU, R$'
    , title_base = 'ARPPU'
    , freq = 'ME'
)

By Day and Month

tmp_df_res = (
    df_sales.resample('D', on='order_purchase_dt')
    .agg({'customer_unique_id': 'nunique', 'total_products_price': 'sum'})
    .reset_index()
)
tmp_df_res['arppu'] = tmp_df_res['total_products_price'] / tmp_df_res['customer_unique_id']
fig_days = pb.line(data_frame=tmp_df_res, title='ARPPU by day')
tmp_df_res = (
    df_sales.resample('ME', on='order_purchase_dt')
    .agg({'customer_unique_id': 'nunique', 'total_products_price': 'sum'})
    .reset_index()
)
tmp_df_res['arppu'] = tmp_df_res['total_products_price'] / tmp_df_res['customer_unique_id']
fig_days.show()
pb.line(data_frame=tmp_df_res, to_slide=True)
../../_images/ffd91e8b2584a1b8b678b99ddf1b18370e6f1731bd87dcf91484eec15435e212.jpg ../../_images/96b67f65ca73bdf1e740a5aa1693048989d1a8c4ab953bb7697cc84c0662d400.jpg

Key Observations:

  • ARPPU fluctuates daily between 100-250 R$

  • No Black Friday spike in ARPPU

  • Monthly ARPPU remains stable at 130-150 R$

By Time of Day

tmp_df_res = (
    df_sales.groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'purchase_time_of_day'], observed=False)
    .agg({'customer_unique_id': 'nunique', 'total_products_price': 'sum'})
    .reset_index()
)
tmp_df_res['arppu'] = tmp_df_res['total_products_price'] / tmp_df_res['customer_unique_id']
pb.line(data_frame=tmp_df_res, color='purchase_time_of_day')
../../_images/2296f49732621c0247b7d74d6b65b778ae0cf5b9afd330974f5afa3a69c76c64.jpg

Key Observations:

  • No significant ARPPU differences by time of day

  • Nighttime ARPPU slightly lower

By Review Score

tmp_df_res = (
    df_sales.groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'order_avg_reviews_score'], observed=False)
    .agg({'customer_unique_id': 'nunique', 'total_products_price': 'sum'})
    .reset_index()
)
tmp_df_res['arppu'] = tmp_df_res['total_products_price'] / tmp_df_res['customer_unique_id']

Since the review score does not have a categorical type, we might lose groups with no data after grouping.

Let’s restore the dates.

tmp_df_res = tmp_df_res.preproc.restore_full_index(
    date_cols='order_purchase_dt'
    , group_cols='order_avg_reviews_score'
    , freq='ME'
)
pb.line(data_frame=tmp_df_res, color='order_avg_reviews_score', to_slide=True)
../../_images/445aa3419a3fa137384a3348374fd1fee6a70ba91d7fe72e2da5c33301c7788d.jpg

Key Observations:

  • 1-star review orders typically had higher ARPPU

  • 2-star orders often had higher ARPPU than 3/4/5-star orders

By Whether the Order is Delayed or Not

tmp_df_res = (
    df_sales.groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'is_delayed'], observed=False)
    .agg({'customer_unique_id': 'nunique', 'total_products_price': 'sum'})
    .reset_index()
)
tmp_df_res['arppu'] = tmp_df_res['total_products_price'] / tmp_df_res['customer_unique_id']
pb.line(data_frame=tmp_df_res, color='is_delayed')
../../_images/339a7513647c2ea8f5c71ad8b0f6f5da3dd8cb34ca4bb4ae0384baea8496825d.jpg

Key Observations:

  • June 2017 saw major ARPPU peak for delayed orders

  • Smaller peaks in April/December 2017 and January/June 2018

By Order Weight Category

tmp_df_res = (
    df_sales.groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'order_total_weight_cat'], observed=False)
    .agg({'customer_unique_id': 'nunique', 'total_products_price': 'sum'})
    .reset_index()
)
tmp_df_res['arppu'] = tmp_df_res['total_products_price'] / tmp_df_res['customer_unique_id']
pb.line(data_frame=tmp_df_res, color='order_total_weight_cat')
../../_images/f9fc16b9b7c353cab71b4f4d262a28c667594186d47d3331f9e41d2c61b33d02.jpg

Key Observations:

  • Heavy orders show more ARPPU variability

  • Consistently: heavy > medium > light order ARPPU

  • April-July 2017: significant heavy order ARPPU decline, then fluctuating growth

By Delivery Time Category

tmp_df_res = (
    df_sales.groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'delivery_time_days_cat'], observed=False)
    .agg({'customer_unique_id': 'nunique', 'total_products_price': 'sum'})
    .reset_index()
)
tmp_df_res['arppu'] = tmp_df_res['total_products_price'] / tmp_df_res['customer_unique_id']
pb.line(data_frame=tmp_df_res, color='delivery_time_days_cat', to_slide=True)
../../_images/8956a9afb6657c52ff09e321d26ac8b89664a53b5968f43457a8ad7bb51adf70.jpg

Key Observations:

  • Higher ARPPU typically correlates with longer delivery times

By Presence of Installment Payments

tmp_df_res = (
    df_sales.groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'order_has_installment'], observed=False)
    .agg({'customer_unique_id': 'nunique', 'total_products_price': 'sum'})
    .reset_index()
)
tmp_df_res['arppu'] = tmp_df_res['total_products_price'] / tmp_df_res['customer_unique_id']
pb.line(data_frame=tmp_df_res, color='order_has_installment', to_slide=True)
../../_images/0d5af834f6764e51ec5f06911667c96967de4c2f6ea2a9f5626eb76006f879ef.jpg

Key Observations:

  • Installment orders consistently show higher ARPPU

  • More ARPPU variability for installment orders

  • Pre-July 2017: steady ARPPU decline for installments

  • Post-July 2017: fluctuating growth

By Top Customer Cities

Since there are cities with very few sales, we will select the top 5 cities by sales volume.

top_cities = (
    df_sales.groupby('customer_city', observed=False)['order_id']
    .nunique()
    .nlargest(5)
    .index.tolist()
)
tmp_df_res = (
    df_sales[lambda x: x.customer_city.isin(top_cities)]
    .groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'customer_city'], observed=False)
    .agg({'customer_unique_id': 'nunique', 'total_products_price': 'sum'})
    .reset_index()
)
tmp_df_res['arppu'] = tmp_df_res['total_products_price'] / tmp_df_res['customer_unique_id']
pb.line(data_frame=tmp_df_res, color='customer_city')
../../_images/56b5c6110ae014cfe55d1faf0a845bbdb0e442cad41f5e849a57c81c48953d01.jpg

Key Observations:

  • February 2017: major ARPPU spike in Brasília

  • Otherwise minimal differences among top 5 cities

Number of Customers#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'customer_unique_id'
    , metric_label = 'Number of Customers'
    , agg_func = 'nunique'
    , freq='ME'
)

By Day and Month

for freq in ['D', 'ME']:
    pb.line_resample(
        freq=freq
        , to_slide=True if freq == 'ME' else False
    ).show()
../../_images/d5391932a4d14a57edd68057ab96d2147ddd432db052faf63762ccc35096eb70.jpg ../../_images/68101c50490fac06d7233d3ad0258a96d71bc6c324679611c88d10ccbcce6c42.jpg

Key Observations:

  • Black Friday (Nov 24, 2017) saw anomalous customer count peak

  • Customer growth continued until 2018, then stabilized at 6-7k monthly

Monthly Growth

pb.period_change(period='mom', to_slide=True)
../../_images/1094d42e43af89332c33f2eaef44fc0ba0a9f982fdd00042c8bb973bcd22ef64.jpg

Key Observations:

  • Customer counts dropped >5% month-over-month in April/June/December 2017 and February/June 2018

By Time of Day

pb.line_resample(color='purchase_time_of_day', to_slide=True)
../../_images/addf3834143df35f5399f6472378b8dd3677d17b33bc2e63e380ef44b56306ac.jpg

Key Observations:

  • Fewer customers at night

  • Evening sees most customer activity

By Day of Week

pb.area_resample(color='purchase_weekday', freq='W', title='Number of Sales by Day of Week and Week')
../../_images/c304a376458f08e27509dbe5108346c43176da7440532fa99c1b4bde6f9ff761.jpg
pb.heatmap(x=pd.Grouper(key=pb.time_column, freq='W')
        , y='purchase_weekday'
        , text_auto=False
        , title='Number of Sales by Day of Week and Week'
)
../../_images/a552150d94578695d75c5b7fdb60987acdc72536431ec3f5d60f3fbc21923a01.jpg

Key Observations:

  • Minimal weekday/weekend customer count differences

  • Weekends slightly lower

By Weekday vs Weekend

pb.line_resample(color='purchase_day_type')
../../_images/43ee3dd7a40f15ead5af61a9cca832f1e9031b01aec7bc96c63be133a1661f5a.jpg

Key Observations:

  • Weekday customer growth outpaced weekends

By Top Customer States

pb.line_resample(color='customer_state', to_slide=True)
../../_images/80c5bb7f0a0e5de0c0c970dec2a94fc905e9751b533e626ada1504bc9a8a662b.jpg

Key Observations:

  • São Paulo consistently led in customer counts

  • Unlike other states, maintained 2018 customer levels

  • Rio de Janeiro and Minas Gerais ranked 2nd/3rd

By Top Customer Cities

pb.line_resample(color='customer_city', to_slide=True)
../../_images/9d7a9edad24f00f7b54657ea977013094f07acf0e8faadf6e39146f807f4dc2e.jpg

Key Observations:

  • São Paulo city consistently had most customers

  • Rio de Janeiro ranked second

  • Only São Paulo showed 2018 monthly growth

Share of New Customers#

pb.configure(
    time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'new_customers_share'
    , metric_label = 'Share of New Customers'
    , freq='ME'
    , update_fig={'yaxis': {'tickformat': '.1%'}}
)

By Day and Month

Since one customer could make multiple orders at the same time, we need to remove such duplicates.

tmp_df_res = (df_sales[['order_purchase_dt', 'customer_unique_id', 'sale_is_customer_first_purchase']].drop_duplicates()
              .resample('D', on='order_purchase_dt')
              .agg(
                  new_customers_cnt=('sale_is_customer_first_purchase', 'sum'),
                  all_customers_cnt=('customer_unique_id', 'nunique')
              )
              .reset_index()
)
tmp_df_res['new_customers_share'] = tmp_df_res['new_customers_cnt'] / tmp_df_res['all_customers_cnt']
fig_days = pb.line(data_frame=tmp_df_res, title='Share of New Customers by Day')
tmp_df_res = (df_sales[['order_purchase_dt', 'customer_unique_id', 'sale_is_customer_first_purchase']].drop_duplicates()
              .resample('ME', on='order_purchase_dt')
              .agg(
                  new_customers_cnt=('sale_is_customer_first_purchase', 'sum'),
                  all_customers_cnt=('customer_unique_id', 'nunique')
              )
              .reset_index()
)
tmp_df_res['new_customers_share'] = tmp_df_res['new_customers_cnt'] / tmp_df_res['all_customers_cnt']
fig_days.show()
pb.line(data_frame=tmp_df_res, to_slide=True)
../../_images/0221c4c76f2d32737b69edba39008fbfcca57c4a8e143da17581d83590922a66.jpg ../../_images/eb3c3c76bda76ff0547e9f22c0f6a3bc912839d2db849af8bbca2485d7d77e5f.jpg

Key Observations:

  • Daily new customer share never fell below 92%

  • Monthly new customer share gradually declined (still >97%)

  • Nearly all active customers are new

By Weekday vs Weekend

tmp_df_res = (df_sales[['order_purchase_dt', 'customer_unique_id', 'sale_is_customer_first_purchase', 'purchase_day_type']].drop_duplicates()
              .groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'purchase_day_type'], observed=True)
               .agg(
                   new_customers_cnt=('sale_is_customer_first_purchase', 'sum'),
                   all_customers_cnt=('customer_unique_id', 'nunique')
               )
               .reset_index()
)
tmp_df_res['new_customers_share'] = tmp_df_res['new_customers_cnt'] / tmp_df_res['all_customers_cnt']
pb.line(data_frame=tmp_df_res, color='purchase_day_type')
../../_images/7b7aebdf6ac2170e68c49d310682970c933b6a2cd896b519cbec74e442afb71d.jpg

Key Observations:

  • Weekend new customer share fluctuates more than weekdays

  • Weekends typically have lower new customer share

By Top Customer States

top_states = (df_sales
              .groupby('customer_state', observed=False)['order_id']
              .nunique()
              .nlargest(5)
              .index.tolist()
)
tmp_df_res = (df_sales[lambda x: x.customer_state.isin(top_states)]
              [['order_purchase_dt', 'customer_unique_id', 'sale_is_customer_first_purchase', 'customer_state']].drop_duplicates()
              .groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'customer_state'], observed=False)
               .agg(
                   new_customers_cnt=('sale_is_customer_first_purchase', 'sum'),
                   all_customers_cnt=('customer_unique_id', 'nunique')
               )
               .reset_index()
)
tmp_df_res['new_customers_share'] = tmp_df_res['new_customers_cnt'] / tmp_df_res['all_customers_cnt']
pb.line(data_frame=tmp_df_res, color='customer_state')
../../_images/b65bd6d102a26bc4a51f2dc2dcdada7b3bbca21e7a8587bb2f50f916d5d6f07c.jpg

Key Observations:

  • Minimal state-level differences in new customer share

  • Rio Grande do Sul showed more variability

By Top Customer Cities

top_cities = (df_sales
              .groupby('customer_city', observed=False)['order_id']
              .nunique()
              .nlargest(5)
              .index.tolist()
)
tmp_df_res = (df_sales[lambda x: x.customer_city.isin(top_cities)]
              [['order_purchase_dt', 'customer_unique_id', 'sale_is_customer_first_purchase', 'customer_city']].drop_duplicates()
              .groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'customer_city'], observed=False)
               .agg(
                   new_customers_cnt=('sale_is_customer_first_purchase', 'sum'),
                   all_customers_cnt=('customer_unique_id', 'nunique')
               )
               .reset_index()
)
tmp_df_res['new_customers_share'] = tmp_df_res['new_customers_cnt'] / tmp_df_res['all_customers_cnt']
pb.line(data_frame=tmp_df_res, color='customer_city')
../../_images/deb6b2d2ba9662bf6f71b9548d023bdf7cea90f0e8ab146a8de798e1a3f9bc86.jpg

Key Observations:

  • São Paulo and Rio de Janeiro show less monthly fluctuation in new customer share

Number of Sellers#

tmp_df_sales_sellers = (
    df_sales.merge(df_items[['order_id', 'seller_id']], on='order_id', how='left')
    .merge(df_sellers[['seller_id', 'seller_state', 'seller_city']], on='seller_id', how='left')
)
pb.configure(
    df = tmp_df_sales_sellers
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'seller_id'
    , metric_label = 'Number of Sellers'
    , agg_func = 'nunique'
    , freq = 'ME'
)

By Day and Month

for freq in ['D', 'ME']:
    pb.line_resample(
        freq=freq
        , to_slide=True if freq == 'ME' else False
    ).show()
../../_images/b663511956d60d7161385cf0d914fb9984a2e6c3fb107e5db79c9bf148815fed.jpg ../../_images/acb839403fec558dc6e5f6dc36a05dca0736aae19e63b21ee9f3a3b980a38b7e.jpg

Key Observations:

  • Black Friday (Nov 24, 2017) saw anomalous seller count peak

  • Monthly seller counts show steady growth

By Time of Day

pb.line_resample(color='purchase_time_of_day', to_slide=True)
../../_images/0f71cb0a1b6318f2c19939bc3dd4f593cdd1cd33b0fef8131033591111148cae.jpg

Key Observations:

  • Fewer active sellers at night

  • Evening/afternoon see most seller activity

By Weekday vs Weekend

pb.line_resample(color='purchase_day_type')
../../_images/8fde50dc7f009916da81e1719b3ba814e845b426c50aacb32ce6aee30f85b2d4.jpg

Key Observations:

  • Weekday seller growth outpaced weekends

By Top Seller States

pb.line_resample(color='seller_state', to_slide=True)
../../_images/1b9e589c84fcbeb78d922816b87d10afd5c15bb1e644d13470e6935857d22a26.jpg

Key Observations:

  • São Paulo consistently led in seller counts with strongest growth

  • Paraná and Minas Gerais ranked 2nd/3rd

  • All top 5 states showed steady seller growth

By Top Seller Cities

pb.line_resample(color='seller_city', to_slide=True)
../../_images/18d14f88dbe19fff17c58c01eb2a3579ccd65af438ee8eca245dafbb57f68c69.jpg

Key Observations:

  • São Paulo city consistently had most sellers with strongest growth

  • Curitiba ranked second

Share of New Sellers#

pb.configure(
    time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'new_sellers_share'
    , metric_label = 'Share of New Sellers'
    , freq='ME'
    , update_fig={'yaxis': {'tickformat': '.1%'}}
)
tmp_df_sales_sellers['seller_first_order_dt'] = tmp_df_sales_sellers.groupby('seller_id')['order_purchase_dt'].transform('min')
tmp_df_sales_sellers['is_seller_first_order'] = tmp_df_sales_sellers['order_purchase_dt'] == tmp_df_sales_sellers['seller_first_order_dt']

By Day and Month

Since one seller could appear in multiple orders at the same time in the first order, we need to remove such duplicates.

tmp_df_res = (tmp_df_sales_sellers[['order_purchase_dt', 'seller_id', 'is_seller_first_order']].drop_duplicates()
              .resample('D', on='order_purchase_dt')
              .agg(
                  new_sellers_cnt=('is_seller_first_order', 'sum'),
                  all_sellers_cnt=('seller_id', 'nunique')
              )
              .reset_index()
)
tmp_df_res['new_sellers_share'] = tmp_df_res['new_sellers_cnt'] / tmp_df_res['all_sellers_cnt']
fig_days = pb.line(data_frame=tmp_df_res
               , title='Share of New Sellers by Day'
)
tmp_df_res = (tmp_df_sales_sellers[['order_purchase_dt', 'seller_id', 'is_seller_first_order']].drop_duplicates()
              .resample('ME', on='order_purchase_dt')
              .agg(
                  new_sellers_cnt=('is_seller_first_order', 'sum'),
                  all_sellers_cnt=('seller_id', 'nunique')
              )
              .reset_index()
)
tmp_df_res['new_sellers_share'] = tmp_df_res['new_sellers_cnt'] / tmp_df_res['all_sellers_cnt']
fig_days.show()
pb.line(data_frame=tmp_df_res, to_slide=True)
../../_images/d0e28ea1a6b4d9b80d64b7e2ca764560f31a73f465b6fe33994f9ff153765f36.jpg ../../_images/1e8bcb9009135e06ccfc6505add4fd633348560cb6352ecacec3bff2a33a263a.jpg

Key Observations:

  • New seller share declined until June 2017

  • Stabilized at 10-20% thereafter

By Weekday vs Weekend

tmp_df_res = (tmp_df_sales_sellers[['order_purchase_dt', 'seller_id', 'is_seller_first_order', 'purchase_day_type']].drop_duplicates()
              .groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'purchase_day_type'], observed=True)
              .agg(
                  new_sellers_cnt=('is_seller_first_order', 'sum'),
                  all_sellers_cnt=('seller_id', 'nunique')
              )
              .reset_index()
)
tmp_df_res['new_sellers_share'] = tmp_df_res['new_sellers_cnt'] / tmp_df_res['all_sellers_cnt']
pb.line(data_frame=tmp_df_res, color='purchase_day_type')
../../_images/7f880c11572474871a27865c10c3f3505480637d61a40874d5f5f65c9b5bed1d.jpg

Key Observations:

  • Weekends consistently had lower new seller share than weekdays

By Top Seller States

top_states = (tmp_df_sales_sellers
              .groupby('seller_state', observed=False)['order_id']
              .nunique()
              .nlargest(5)
              .index.tolist()
)
tmp_df_res = (tmp_df_sales_sellers[lambda x: x.seller_state.isin(top_states)]
              [['order_purchase_dt', 'seller_id', 'is_seller_first_order', 'seller_state']].drop_duplicates()
              .groupby([pd.Grouper(key='order_purchase_dt', freq='ME'), 'seller_state'], observed=False)
               .agg(
                  new_sellers_cnt=('is_seller_first_order', 'sum'),
                  all_sellers_cnt=('seller_id', 'nunique')
               )
               .reset_index()
)
tmp_df_res['new_sellers_share'] = tmp_df_res['new_sellers_cnt'] / tmp_df_res['all_sellers_cnt']
pb.line(data_frame=tmp_df_res, color='seller_state')
../../_images/00dfafa9f3091ad0a0099e90fd5f33f71467cce5455f44f3eed21681b79cfd68.jpg

Key Observations:

  • Minimal state-level differences in new seller share

  • Minas Gerais often slightly lower

del tmp_df_res

Ratio of Number of Sellers and Customers#

Active Sellers and Customers

customers_cnt_all = (tmp_df_sales_sellers.resample('ME', on='order_purchase_dt')['customer_unique_id']
                 .nunique()
                 .to_frame()
)
sellers_cnt_all = (tmp_df_sales_sellers.resample('ME', on='order_purchase_dt')['seller_id']
                 .nunique()
                 .to_frame()
)
tmp_df_res = customers_cnt_all.merge(sellers_cnt_all, left_index=True, right_index=True).reset_index()
tmp_df_res['seller_customer_ratio'] = (tmp_df_res['seller_id'] / tmp_df_res['customer_unique_id']).round(2)
labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , seller_customer_ratio = 'Ratio of Number of Sellers and Customers'
))
fig = tmp_df_res.viz.line(
    x=labels.index[0]
    , y=labels.index[1]
    , labels=labels
    , title='Ratio of Number of Sellers and Customers by Month'
)
pb.to_slide(fig)
fig.show()
../../_images/00352188d3a24a41a5566b79812c4e951da6d8d54c475c81bc075e8dacef60f3.jpg

Key Observations:

  • Pre-November 2017: customer growth outpaced sellers

  • Post-November 2017: reversed trend

New Sellers and Customers

tmp_df_sales_sellers['first_order_customer_dt'] = tmp_df_sales_sellers.groupby('customer_unique_id')['order_purchase_dt'].transform('min')
tmp_df_sales_sellers['is_first_month_for_customer'] = (
    (tmp_df_sales_sellers['order_purchase_dt'].dt.month == tmp_df_sales_sellers['first_order_customer_dt'].dt.month) &
    (tmp_df_sales_sellers['order_purchase_dt'].dt.year == tmp_df_sales_sellers['first_order_customer_dt'].dt.year)
)
tmp_df_sales_sellers['first_order_seller_dt'] = tmp_df_sales_sellers.groupby('seller_id')['order_purchase_dt'].transform('min')
tmp_df_sales_sellers['is_first_month_for_seller'] = (
    (tmp_df_sales_sellers['order_purchase_dt'].dt.month == tmp_df_sales_sellers['first_order_seller_dt'].dt.month) &
    (tmp_df_sales_sellers['order_purchase_dt'].dt.year == tmp_df_sales_sellers['first_order_seller_dt'].dt.year)
)
customers_cnt_new = (tmp_df_sales_sellers[tmp_df_sales_sellers.is_first_month_for_customer].resample('ME', on='order_purchase_dt')['customer_unique_id']
                 .nunique()
                 .to_frame()
)
sellers_cnt_new = (tmp_df_sales_sellers[tmp_df_sales_sellers.is_first_month_for_seller].resample('ME', on='order_purchase_dt')['seller_id']
                 .nunique()
                 .to_frame()
)
tmp_df_res = customers_cnt_new.merge(sellers_cnt_new, left_index=True, right_index=True).reset_index()
tmp_df_res['seller_customer_ratio'] = (tmp_df_res['seller_id'] / tmp_df_res['customer_unique_id']).round(2)
labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , seller_customer_ratio = 'Ratio of New Sellers and New Customers'
))
fig = tmp_df_res.viz.line(
    x=labels.index[0]
    , y=labels.index[1]
    , labels=labels
    , title='Ratio of Number of New Sellers and Number of New Customers by Month'
)
pb.to_slide(fig)
fig.show()
../../_images/beb5399e52d85e245dfe06ad40bc34e3a68c12ebf434d0a0b39ac5c43a789962.jpg

Key Observations:

  • Pre-July 2017: stronger new customer growth

  • Post-July 2017: similar growth rates for new customers/sellers

Number of Orders per Customer#

By month

tmp_df_res = (
    df_sales.groupby([pd.Grouper(key = 'order_purchase_dt', freq='ME'), 'customer_unique_id'])['order_id']
    .nunique()
    .reset_index()
)
labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , order_id = 'Average Number of Orders per Customer'
))
fig = tmp_df_res.viz.line(
    x=labels.index[0]
    , y=labels.index[1]
    , agg_func='mean'
    , freq='ME'          
    , labels=labels
    , title='Average Number of Orders per Customer by Month'
)
pb.to_slide(fig)
fig.show()
../../_images/6641ecf4517290e2b1b7d8ec986d707d0c79c6f4414d9b4df590dbb6af18c007.jpg

Key Observations:

  • The average number of orders per user fluctuates around 1 throughout the entire period.

Number of Reviews#

pb.configure(
    df = df_reviews
    , time_column = 'review_creation_dt'
    , time_column_label = 'Date'
    , metric = 'review_id'
    , metric_label = 'Number of Reviews'
    , agg_func = 'nunique'
    , freq = 'ME'
)

By Day and Month

for freq in ['D', 'ME']:
    pb.line_resample(
        freq=freq
        , to_slide=True if freq == 'ME' else False
    ).show()
../../_images/205cf08218ca39043adb8f166e7d78fdfe9daf32d228e991a65755bf4d4dff82.jpg ../../_images/f6540f5872c38436f783e1d3d9983fb21c947a656768c4d078730f84248baed0.jpg

Key Observations:

  • Daily review counts fluctuate significantly with extreme drops

  • Reviews may not be recorded daily but in batches

  • Review counts grew steadily until 2018, then stabilized at 6-8k monthly

By Day of Week

pb.area_resample(color='review_creation_weekday', freq='W', title='Number of Reviews by Day of Week and Week')
../../_images/623c1ce459670205951f645c84ab95198cd937dd13e65341757d6ed5c1688d32.jpg
pb.heatmap(x=pd.Grouper(key=pb.time_column, freq='W')
        , y='review_creation_weekday'
        , text_auto=False
        , title='Number of Reviews by Day of Week and Week'
        , to_slide=True
)
../../_images/8b6906dbcda14518c78712bf5bea7091eec9df9decbc9775804a198817289a41.jpg

Key Observations:

  • Significantly fewer reviews created on Mondays/Sundays

  • Minimal differences between other weekdays

By Weekday vs Weekend

pb.line_resample(color='review_day_type')
../../_images/8e8a2772e026c3a066aa9b965bf3f2809aff75e7d55b4f1e826eadf08b382072.jpg

Key Observations:

  • Workdays consistently generate more reviews than weekends

By Review Score

pb.line_resample(color='review_score')
../../_images/d04c46278c492980572423ed3494f2bb9aec14697a0d28c1b824f1546f477183.jpg

Key Observations:

  • 5-star reviews significantly outnumber others monthly

  • 2-star reviews are consistently the least common

  • More 1-star reviews than 2/3-star reviews

Review Score#

pb.configure(
    df = df_reviews
    , time_column = 'review_creation_dt'
    , time_column_label = 'Date'
    , metric = 'review_score'
    , metric_label = 'Average Review Score'      
    , agg_func = 'mean'
    , freq = 'ME'
)

By Day and Month

for freq in ['D', 'ME']:
    pb.line_resample(
        freq=freq
        , to_slide=True if freq == 'ME' else False
    ).show()
../../_images/b3d65e81d6252e49fcff6dfbd1797e544d2811f770debbd35a21c25f5db1ca2b.jpg ../../_images/c5f0e18f41afd9f0a93c238bae97191c5dcc661f56b6c43d655f8583fa3c9e21.jpg

Key Observations:

  • Daily average review scores fluctuate wildly (3-5)

  • Monthly averages declined from August 2017 to March 2018, then spiked

By Weekday vs Weekend

pb.line_resample(color='review_day_type', to_slide=True)
../../_images/b6a9f7e70c3e93a077bc22bc0f83210822bf3f057eb3d29b90e4befae50525a4.jpg

Key Observations:

  • Workdays consistently have higher average review scores than weekends

NPS#

For calculating NPS, we will divide customers into the following groups:

  • Promoters: customers who gave a rating of 5

  • Passive: customers who gave a rating of 4

  • Detractors: customers who gave a rating of 1-3

Let’s look at how NPS changed by month.

tmp_df_res = (
    df_reviews.pivot_table(index=pd.Grouper(key='review_creation_dt', freq='ME'), columns='review_score', values='review_id', aggfunc='nunique')
)
tmp_df_res['total_responses'] = tmp_df_res.sum(axis=1)
tmp_df_res['promoters'] = tmp_df_res[5]
tmp_df_res['detractors'] = tmp_df_res[1] + tmp_df_res[2] + tmp_df_res[3]
tmp_df_res['nps'] = (tmp_df_res['promoters'] - tmp_df_res['detractors']) * 100 / tmp_df_res['total_responses']
tmp_df_res.reset_index(inplace=True)
labels = pd.Series(dict(
    review_creation_dt = 'Date'
    , nps = 'NPS'
))
fig = tmp_df_res.viz.line(
    x=labels.index[0]
    , y=labels.index[1]
    , labels=labels
    , title='NPS by month'
)
pb.to_slide(fig)
fig.show()
../../_images/e36bc726065a40c938b308578e048d9e57a70cbb975ea2fd1bea7bfa87640769.jpg

Key Observations:

  • NPS remains satisfactory (0-49) throughout

  • Many neutral customers, few critical issues

  • Significant NPS drop in March 2018

Freight Cost Ratio#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'freight_ratio'
    , metric_label = 'Average Freight Cost Ratio'
    , agg_func = 'mean'
    , freq = 'ME'
    , update_fig={'yaxis': {'tickformat': '.1%'}}
)

By Day and Month

for freq in ['D', 'ME']:
    pb.line_resample(
        freq=freq
        , to_slide=True if freq == 'ME' else False
    ).show()
../../_images/5e9c0206305c59d0981203b45dc6efe8d5628cba39719b01af35744b22af21ef.jpg ../../_images/286a15b9eb0f2cdf28edbe0c5764769f60de370c561fd70c30daf26e8c8d3867.jpg

Key Observations:

  • Shipping cost share of order value remains stable (0.2-0.22)

By Order Weight Category

pb.line_resample(color='order_total_weight_cat')
../../_images/df825e6777d6b70c30763e57befc8befcc3cfd3a54e1967be60b2a7d6a0cadb6.jpg

Key Observations:

  • Light items have higher shipping cost share than medium/heavy

  • Minimal difference between medium/heavy items

By Presence of Installment Payments

pb.line_resample(color='order_has_installment')
../../_images/e3174dd1e9783fa8cfbeb2a9145c4b458757aeefeb9605a4cf59f56b219c6433.jpg

Key Observations:

  • Installment orders consistently have lower shipping cost share

By Top Customer States

pb.line_resample(color='customer_state')
../../_images/48668f3d3e10e9ba013c7ff4d7cffbc6aba621697c7ab28dd1f97db228442047.jpg

Key Observations:

  • São Paulo consistently has lowest shipping cost share among top states

By Top Customer Cities

pb.line_resample(color='customer_city')
../../_images/4e015fe38dbe3b64e715892aaf4affff4713aee94dfe399b7f4ee71f76f3cad1.jpg

Key Observations:

  • São Paulo city usually has lowest shipping cost share among top cities

Delivery Time#

pb.configure(
    df = df_sales.dropna(subset='delivery_time_days')
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'delivery_time_days'
    , metric_label = 'Average Delivery Time, days'
    , metric_label_for_distribution = 'Delivery Time, days'     
    , agg_func = 'mean'
    , freq = 'ME'
)

By Day and Month

pb.box(mode='time_series', freq='M').show()
pb.box(mode='time_series', freq='M', upper_quantile=0.95).show()
for freq in ['D', 'ME']:
    pb.line_resample(
        freq=freq
        , to_slide=True if freq == 'ME' else False
    ).show()
../../_images/848fb0a55b08703b797d6060fc69e5afcdb533c072167d86d71c199fba96cf46.jpg ../../_images/2efde2141afc25cc116b03f9b216bf02424a293442c3c2c537c05d7eb5f0a6e8.jpg ../../_images/fc730a03788614b2eac2785fe5ebb0b2bbc0036c85822afcf78de346ff1dcb08.jpg ../../_images/b308dd090dfcc264779dfb38c3dc2101bced9b232e0a29ed0fbf26f5e63e4b2b.jpg

Key Observations:

  • Average delivery time fluctuates daily (5-20 days)

  • Grew from August 2017-February 2018, then dropped sharply to ~8 days

By Review Score

pb.line_resample(color='order_avg_reviews_score', to_slide=True)
../../_images/92bc40a7ddf71fe4861b064b3b6a6371ef38e54ccecbce5e4d7e76427cebe7cd.jpg

Key Observations:

  • Lower ratings typically correlate with longer delivery times

By Top Customer States

pb.line_resample(color='customer_state', to_slide=True)
../../_images/ca21d80d947bf7bc780fd380695cd1c8599fe782bce97508cd6bdbc781cc30c9.jpg

Key Observations:

  • São Paulo consistently has fastest delivery among top states

  • Rio de Janeiro and Rio Grande do Sul show slowest deliveries

By Top Customer Cities

pb.line_resample(color='customer_city', to_slide=True)
../../_images/11c2b1b9b49898c734a4dad580f38757855d0407034b2e779cf5a4e1074e0ecd.jpg

Key Observations:

  • Rio de Janeiro had steeper delivery time increases (Oct 2017-Feb 2018)

Delivery Delay Time#

By Day and Month

labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , delivery_delay_days = 'Average Delivery Delay Time, days'
))
for freq, period in [('D', 'Day'), ('ME', 'Month')]:
    fig = df_sales.viz.line(
            x=labels.index[0]
            , y=labels.index[1]
            , agg_func='mean'
            , freq=freq
            , labels=labels
            , title=f'Average Delivery Delay Time by {period}'
    )
    if freq == 'ME':
        pb.to_slide(fig)
    fig.show()
../../_images/5b5bf254b4a165d1809a2608195680a1be5116cc07baa0bd9b47eb88a0baf810.jpg ../../_images/1c59a7487d69ec3a29eb1d51b887975ccd0c2a729712d2bc702bc2978c37c805.jpg

Key Observations:

  • All months show faster-than-expected deliveries

  • Early 2017 to March 2017: improving lead times (~12 days ahead)

  • June 2018: deliveries ~20 days ahead of estimates

Carrier Handoff Delay#

By Day and Month

labels = pd.Series(dict(
    order_purchase_dt = 'Date'
    , avg_carrier_delivery_delay_days = 'Average Carrier Handoff Delay, days'
))
for freq, period in [('D', 'Day'), ('ME', 'Month')]:
    fig = df_sales.viz.line(
            x=labels.index[0]
            , y=labels.index[1]
            , agg_func='mean'
            , freq=freq
            , labels=labels
            , title=f'Average Carrier Handoff Delay by {period}'
    )
    if freq == 'ME':
        pb.to_slide(fig)
    fig.show()
../../_images/7724834b829eeade5bdf28ca35f98d6cd8c2c9df4b3c9d3bc49b0d579e9fa63f.jpg ../../_images/fc98bfd0a4cde8c6b2c758b701e34565cd3bd11daba1337a8200bd85d3e5c64d.jpg

Key Observations:

  • Carrier handoff consistently faster than limits

  • Pre-May 2017: improving lead times (peaked at 4.5 days ahead)

  • Post-May 2017: declining to ~2.5 days ahead by August

Proportion of Each Stage in Delivery Time#

Let’s look at what percentage of the total delivery time each stage occupies.

We will not consider any anomalous dates, as there are only a few and they will not significantly affect the result.

tmp_df_sales = (
    df_sales[[
        'order_purchase_dt',
        'order_approved_dt',
        'order_delivered_carrier_dt',
        'order_delivered_customer_dt',
    ]]
    [lambda x: (x.order_delivered_customer_dt >= x.order_purchase_dt) & (x.order_approved_dt >= x.order_purchase_dt)
        & (x.order_delivered_carrier_dt >= x.order_approved_dt) & (x.order_delivered_customer_dt >= x.order_delivered_carrier_dt)
     ]
    .dropna()
)
tmp_df_sales['from_purchase_to_customer'] = (tmp_df_sales['order_delivered_customer_dt'] - tmp_df_sales['order_purchase_dt']).dt.total_seconds()
tmp_df_sales['From Purchase to Approved'] = (
    (tmp_df_sales['order_approved_dt'] - tmp_df_sales['order_purchase_dt']).dt.total_seconds() * 100 / tmp_df_sales['from_purchase_to_customer']
).round(2)
tmp_df_sales['From Approved to Carrier'] = (
    (tmp_df_sales['order_delivered_carrier_dt'] - tmp_df_sales['order_approved_dt']).dt.total_seconds() * 100 / tmp_df_sales['from_purchase_to_customer']
).round(2)
tmp_df_sales['From Carrier to Customer'] = (
    (tmp_df_sales['order_delivered_customer_dt'] - tmp_df_sales['order_delivered_carrier_dt']).dt.total_seconds() * 100 / tmp_df_sales['from_purchase_to_customer']
).round(2) 
tmp_df_sales = (
    tmp_df_sales[['order_purchase_dt', 'From Purchase to Approved', 'From Approved to Carrier', 'From Carrier to Customer']]
    .melt(id_vars = 'order_purchase_dt', var_name='Stage', value_name='Percent of All Delivery Time')
    .rename(columns={'order_purchase_dt': 'Date'})
)
category_orders = {
    'Stage': ['From Purchase to Approved', 'From Approved to Carrier', 'From Carrier to Customer']
}
fig = tmp_df_sales.viz.area(
    x='Date'
    , y='Percent of All Delivery Time'
    , color='Stage'
    , agg_func='mean'
    , freq='ME'
    , title='Average Percent of All Delivery Time by Stage and Month'
    , category_orders=category_orders
)
pb.to_slide(fig)
fig.show()
../../_images/e7b8b10ed0ca360543f43057ed089e67f9d03b1ff47c887f1a55669503475b6b.jpg

Key Observations:

  • Carrier delivery consumes most of total delivery time

Order Weight#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'total_weight_kg'
    , metric_label = 'Average Weight of Orders, kg'
    , metric_label_for_distribution = 'Weight of Orders, kg'   
    , agg_func = 'mean'
    , freq = 'ME'
)

By Day and Month

pb.box(mode='time_series', freq='M').show()
pb.box(mode='time_series', freq='M', upper_quantile=0.95).show()
for freq in ['D', 'ME']:
    pb.line_resample(freq=freq).show()
../../_images/114031fea30ac5ea675ca1bc9d688e618b84280e4d807e74c5da183d8893db96.jpg ../../_images/98ac7093b62c489694530b99be2066a0d978eafcb4604f97ea0fd37933d75145.jpg ../../_images/87902f6a62b085b6fd254b79e631eece0371d5b694b11c7b487e23d9a1a11897.jpg ../../_images/5337eee649cf1bb3d4b6bc0114d9d47f7521e95eabd85f28c7519e83dcae07e7.jpg

Key Observations:

  • Average order weight declining monthly (2.8kg → 1.9kg)

By Review Score

pb.line_resample(color='order_avg_reviews_score')
../../_images/c405a5a8f2d45e818db716dcf86e2cd73d54abd9e90a85215c77d680c630f9f5.jpg

Key Observations:

  • 1-star orders typically heaviest

  • 2-star orders show most monthly weight variability

By Whether the Order is Delayed or Not

pb.line_resample(color='is_delayed')
../../_images/474e182a5909742d52bac0e18ee999eae5c8df282b85addc77c26f117710f024.jpg

Key Observations:

  • Delayed orders usually heavier

  • Non-delayed weights more stable monthly

By Presence of Installment Payments

pb.line_resample(color='order_has_installment')
../../_images/40b3db5fc24301b0151ac5d0bafd646bf502ac1f3ea0d4a661b356826bce75be.jpg

Key Observations:

  • Installment orders consistently heavier

By Delivery Time Category

pb.line_resample(color='delivery_time_days_cat')
../../_images/cade97cab2de0dc67f876d2575294c1d013b7d62e3b3c54b8c4510525505fbdc.jpg

Key Observations:

  • Lighter orders typically deliver faster

By Top Customer Cities

pb.line_resample(color='customer_city')
../../_images/b3272e65274add5a0f5aefacc44a6a392ffb4a7cd835877d7edb3159f6eb4e0c.jpg

Key Observations:

  • Curitiba shows most monthly weight variability among top cities

Number of Products in Order#

Prepare dataframe for analysis.

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'products_cnt'
    , metric_label = 'Average Number of Products in Order'
    , metric_label_for_distribution = 'Number of Products in Order'   
    , agg_func = 'mean'
    , freq = 'ME'
)

By Day and Month

pb.box(mode='time_series', freq='M').show()
for freq in ['D', 'ME']:
    pb.line_resample(freq=freq).show()
../../_images/0feecd742f90e2cb4411482935fc5391cd73a7b972cfe715554ec9b973006fda.jpg ../../_images/cccf82de481b2ed095dbb29b14f4f4af0897c5a3f33e9c12f92a56177d38bb16.jpg ../../_images/e4a2bd186f00305144ac323173043a4455afabf64ed0f8d105acb4025767d57a.jpg

Key Observations:

  • Average products per order remains stable (1.12-1.16)

By Review Score

pb.line_resample(color='order_avg_reviews_score')
../../_images/560effc7bc62edf196c9f0e555ab3fe86994059c8cd133d59f43fa025a90a1a3.jpg

Key Observations:

  • 1/2-star orders typically have more products

  • 4/5-star orders usually have fewest products

By Whether the Order is Delayed or Not

pb.box(mode='time_series', color='is_delayed', freq='M').show()
pb.line_resample(color='is_delayed')
../../_images/ca68ae3342e2c841eba7e986c80fbbf1dfe6fe90de07c900da895827843d161f.jpg ../../_images/2a615afac8e9005a14e640f759210a075f27f5b83e52002210a949a85021e68c.jpg

Key Observations:

  • Delayed orders show more product count variability

By Presence of Installment Payments

pb.box(mode='time_series', color='order_has_installment', freq='M').show()
pb.line_resample(color='order_has_installment')
../../_images/719047c9aa209bd5834b51949e2e9255cba6a22042df7036572e2e8bb95627ee.jpg ../../_images/59858ee7e0711ff2c71f4aae56c4812909f6e1f36baaad960172c2978d7ab3c1.jpg

Key Observations:

  • Installment orders usually contain more products

By Top Customer Cities

pb.line_resample(color='customer_city')
../../_images/a676bace410e282e09757b218c87d2f9aba83697514bd46c14b05c4e8e016acd.jpg

Key Observations:

  • Curitiba shows most monthly product count variability

Number of Unique Products in Order#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'unique_products_cnt'
    , metric_label = 'Average Number of Unique Products in Order'
    , metric_label_for_distribution = 'Number of Unique Products in Order'    
    , agg_func = 'mean'
    , freq = 'ME'
)

By Day and Month

pb.box(mode='time_series', freq='M').show()
for freq in ['D', 'ME']:
    pb.line_resample(freq=freq).show()
../../_images/c8231641aee44034a6140c781c357920907e552b6072a3bcec9e4b34374c1e3c.jpg ../../_images/3fb04262b6fb7926e27a23606d1c7931026408b07d6ecfe8be8a5b3e3766072c.jpg ../../_images/a2c376b6f1d8fc1028c454a8678036350d573bafba9ee67bdaa32331a2863561.jpg

Key Observations:

  • Average unique products per order remains stable (1.03-1.045)

By Review Score

pb.line_resample(color='order_avg_reviews_score')
../../_images/b778736f23972728fc9a3c164432a584b809d860fee886f797e22a44637715c2.jpg

Key Observations:

  • 1/2-star orders have more unique products

  • 4/5-star orders have fewest unique products

By Whether the Order is Delayed or Not

pb.box(mode='time_series', color='is_delayed', freq='M').show()
pb.line_resample(color='is_delayed')
../../_images/9f7b705fe78e37e7cb9e8bd0375065c00f046265fe55767dbb64c72259988ddf.jpg ../../_images/e7ded77320fd0090dcbb3471694a2dddd1305bd7cf8ddbfafdc5dc312d89c254.jpg

Key Observations:

  • Non-delayed orders slightly higher in unique products

  • Delayed orders show more variability

By Presence of Installment Payments

pb.box(mode='time_series', color='order_has_installment', freq='M').show()
pb.line_resample(color='order_has_installment')
../../_images/d03b3694130910f1cb774649209d72676ae15ba9b6996e0576119f78246df68d.jpg ../../_images/6877764cde450ea23250a48187063f779ea901a9d7ee1b045bc029715afa93e8.jpg

Key Observations:

  • Installment orders consistently have more unique products

Product Price in Order#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'avg_products_price'
    , metric_label = 'Average Product Price in Order, R$'
    , metric_label_for_distribution = 'Product Price in Order, R$'  
    , agg_func = 'mean'
    , freq = 'ME'
)

By Day and Month

pb.box(mode='time_series', freq='M').show()
pb.box(mode='time_series', freq='M', upper_quantile=0.95).show()
for freq in ['D', 'ME']:
    pb.line_resample(freq=freq).show()
../../_images/e86cf0b29faa14a516d2102ba1f7d438592412088350c6870289bcff8d104add.jpg ../../_images/7a98eddfc6ff06f4df92cbb44eed8190d061d549fd501a0fa16c63fbf708f9d0.jpg ../../_images/b327658ebf9c199dddf2aabad07512eeb7cbb7f9ef47f8c44fd29c3804c1f9ff.jpg ../../_images/8241003027a84bf1dce32c1e7a7b0f37d933524a9b13dbabb64efe0777a2808a.jpg

Key Observations:

  • No Black Friday spike in average product price

  • Monthly average product price fluctuates (115-135 R$)

  • Clear seasonality:

    • Pre-July 2017: decline

    • July-Oct 2017: growth

    • Oct 2017-Mar 2018: decline

    • Mar-Apr 2018: growth

    • Post-Apr 2018: decline

By Review Score

pb.line_resample(color='order_avg_reviews_score')
../../_images/4a27403e7ee7c403b7ff8bb71346cca5919b45ec71879f779098cf604a4b15f5.jpg

Key Observations:

  • 1-star orders typically have highest product prices

  • 3-star orders usually lowest

By Whether the Order is Delayed or Not

pb.line_resample(color='is_delayed')
../../_images/78f0cd89699c6291faceb7239f9862da2ca81a11038d3b96462ae866bed57530.jpg

Key Observations:

  • Delayed orders usually have higher product prices

  • June 2017 saw sharp price spike in delayed orders

By Presence of Installment Payments

pb.line_resample(color='order_has_installment')
../../_images/bad070b755767d77d5a7f1154aa914ff4efa302516db606b97eaa857675c9790.jpg

Key Observations:

  • Installment orders have significantly higher product prices

Number of Sellers in Order#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'sellers_cnt'
    , metric_label = 'Average Number of Sellers in Order'
    , metric_label_for_distribution = 'Number of Sellers in Order'    
    , agg_func = 'mean'
    , freq = 'ME'
)

By Day and Month

pb.box(mode='time_series', freq='M').show()
for freq in ['D', 'ME']:
    pb.line_resample(freq=freq).show()
../../_images/74cd32fa586ba3eb756a0f0f1e64a448a7bacf7cc2e6a1584185b27d4971892e.jpg ../../_images/d0237a1b35915ca3f607622af3deea2ebfb2de56fe9f5111fb6573f0aa044e21.jpg ../../_images/75906d44bad8d5a2815a8f70bfb6b00ec5cf4cd2241e04ebcfe0cd491f412522.jpg

Key Observations:

  • Average sellers per order grows minimally (1→1.02)

By Review Score

pb.line_resample(color='order_avg_reviews_score')
../../_images/edcde6e7a7322abba176f2b84c1e8383c5a1f1809b7f4368b3b12e0d85c2e139.jpg

Key Observations:

  • 1/2-star orders typically involve more sellers

  • 4/5-star orders involve fewest sellers

By Whether the Order is Delayed or Not

pb.box(mode='time_series', color='is_delayed', freq='M').show()
pb.line_resample(color='is_delayed')
../../_images/1ff5a189b7459e1fb1c50f319787186834fb65263d28461e6171c6cbb6e6dd67.jpg ../../_images/018f0765cd468f37732a05415b82c5cb257a420c837fab27ae449240af6ffaa3.jpg

Key Observations:

  • Non-delayed orders slightly higher in seller count

  • April 2018 saw sharp seller count spike in undelivered orders

By Presence of Installment Payments

pb.box(mode='time_series', color='order_has_installment', freq='M').show()
pb.line_resample(color='order_has_installment')
../../_images/37dcedb5e1d9ed946a99863523643fb976d76487896c160a08958c2f49fcb2a8.jpg ../../_images/3d2d952ed12b33b59002f8961ed9d3c562eb2d0d662a0e3f639a8736f744566c.jpg

Key Observations:

  • Installment orders consistently involve more sellers

Number of Categories in Order#

pb.configure(
    df = df_sales
    , time_column = 'order_purchase_dt'
    , time_column_label = 'Date'
    , metric = 'product_categories_cnt'
    , metric_label = 'Average Number of Categories in Order'
    , metric_label_for_distribution = 'Number of Categories in Order'
    , agg_func = 'mean'
    , freq = 'ME'
)

By Day and Month

pb.box(mode='time_series', freq='M').show()
for freq in ['D', 'ME']:
    pb.line_resample(freq=freq).show()
../../_images/da519e9bd2993a52b0e6b310e226f3ff01bf14a0d52d5aa2a4495e855f60bee3.jpg ../../_images/5687fc1eb5b7037d51ff9e1a2c5f437e1c63ddc590870532280c15d4a50ae0a4.jpg ../../_images/219005e67111f4cb8eda277e6fd7c1d7a4c266696225ec2b1495343afd35ca17.jpg

Key Observations:

  • Average categories per order remains stable

By Review Score

pb.line_resample(color='order_avg_reviews_score')
../../_images/ec1258212133d20acf74c87013393550231a69fc14a3a30d6282f3a15dc7b7d6.jpg

Key Observations:

  • 1/2-star orders involve more categories

By Presence of Installment Payments

pb.box(mode='time_series', color='order_has_installment', freq='M').show()
pb.line_resample(color='order_has_installment')
../../_images/5146416aeb128e43800c133e9d0dc81678e1638c7c3e487df020a8ddc767e45c.jpg ../../_images/b44d55ed8f990828b3f3ab2eacc1efa942148c0e50b3ee63e515221df7f16420.jpg

Key Observations:

  • Installment orders have slightly more categories