Payment Analysis#

Number of Payments#

pb.configure(
    df = df_payments
    , metric = 'payment_sequential'
    , metric_label = 'Share of Payments'
    , metric_label_for_distribution = 'Number of Payments' 
    , agg_func = 'count'
    , norm_by = 'all'
    , axis_sort_order='descending'
    , text_auto='.1%'
    , update_fig={'xaxis': {'tickformat': '.0%'}}
)
print(f'Total number of payments in order: {df_payments.payment_sequential.count():,}')
Total number of payments in order: 103,520

Let’s look at the statistics and distribution of the number of payments per day.

tmp_df_res = (
    df_orders.merge(df_payments, on='order_id', how='left')
    .groupby(pd.Grouper(key='order_purchase_dt', freq='D'), observed=False)['payment_sequential']
    .count()
    .to_frame('payments_cnt_per_day')
)
tmp_df_res['payments_cnt_per_day'].explore.info(
    labels=dict(payments_cnt_per_day='Number of Payments per Day')
    , title='Distribution of Number of Payments per Day'
)
Summary Statistics for "payments_cnt_per_day" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 604 (100%) Max 1.21k Mean 171.39 202 10 (2%)
Missing --- 99% 379 Trimmed Mean (10%) 166.52 160 9 (1%)
Distinct 268 (44%) 95% 312.55 Mode 202 80 7 (1%)
Non-Duplicate 102 (17%) 75% 229 Range 1213 147 7 (1%)
Duplicates 336 (56%) 50% 160 IQR 121 138 6 (<1%)
Dup. Values 166 (27%) 25% 108 Std 93.56 204 6 (<1%)
Zeros --- 5% 47.15 MAD 84.51 108 6 (<1%)
Negative --- 1% 8.12 Kurt 24.75 122 5 (<1%)
Memory Usage <1 Mb Min 1 Skew 2.58 128 5 (<1%)
../../_images/7bf247aa40695251ace7df85f752ad247b86ea87dbd95609336d4bc2234406f9.jpg

Key Observations:

  • 75% of days have ≤230 payments

  • Top 5% have ≥312 payments

  • Several days exceeded 500 payments

Let’s look at top days.

tmp_df_res.sort_values('payments_cnt_per_day', ascending=False).head()
payments_cnt_per_day
order_purchase_dt
2017-11-24 1214
2017-11-25 538
2017-11-27 420
2017-11-26 411
2017-11-28 393

Key Observations:

  • Black Friday had anomalous payment volumes

By Payment Type

pb.bar_groupby(y='payment_type', to_slide=True)
../../_images/83bf92bf0017920a4df54035108bfb8125ba0969c3b3593e05e8d768e5d0fff0.jpg

Key Observations:

  • Payment method distribution:

    • Credit card: 74%

    • Boleto: 19%

    • Voucher: 5.5%

    • Debit card: 1.5%

Payment Value#

pb.configure(
    df = df_payments
    , metric = 'payment_value'
    , metric_label = 'Average Payment Value, R$'
    , metric_label_for_distribution = 'Payment Value, R$'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
)

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    upper_quantile=0.95
    , hist_mode='dual_hist_trim'        
)
Summary Statistics for "payment_value" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 103.52k (100%) Max 13.66k Mean 154.02 50 323 (<1%)
Missing --- 99% 1.04k Trimmed Mean (10%) 114.80 20 272 (<1%)
Distinct 28.99k (28%) 95% 437.19 Mode 50 100 254 (<1%)
Non-Duplicate 13.06k (13%) 75% 171.77 Range 13.66k 77.57 250 (<1%)
Duplicates 74.53k (72%) 50% 100 IQR 114.99 35 165 (<1%)
Dup. Values 15.93k (15%) 25% 56.78 Std 217.46 73.34 160 (<1%)
Zeros 8 (<1%) 5% 26.13 MAD 76.40 30 133 (<1%)
Negative --- 1% 6.71 Kurt 242.72 116.94 131 (<1%)
Memory Usage 1 Min 0 Skew 9.27 56.78 122 (<1%)
../../_images/69842b221ed5ccc90ff5b9b9c9189fef0be2b367fceee286641f0b6eefd228d6.jpg

Key Observations:

  • 75% of payments are ≤172 R$

  • Top 5% are ≥440 R$

  • Several exceed 6,000 R$

By Payment Type

pb.bar_groupby(y='payment_type', to_slide=True)
../../_images/5fc3ff06f077da6143f410501034662b9eaa0bce7252dfb975c15516efad2dca.jpg

Key Observations:

  • Credit card payments have highest average value

  • Vouchers have lowest

Number of Payment Installments#

pb.configure(
    df = df_payments
    , metric = 'payment_installments'
    , metric_label = 'Average Payment Installments'
    , metric_label_for_distribution = 'Payment Installments'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.2s'
)

Let’s see at statistics and distribution of the metric.

pb.metric_info()
Summary Statistics for "payment_installments" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 103.52k (100%) Max 24 Mean 2.85 1 52.38k (51%)
Missing --- 99% 10 Trimmed Mean (10%) 2.28 2 12.38k (12%)
Distinct 23 (<1%) 95% 10 Mode 1 3 10.42k (10%)
Non-Duplicate 2 (<1%) 75% 4 Range 23 4 7.07k (7%)
Duplicates 103.50k (99%) 50% 1 IQR 3 10 5.29k (5%)
Dup. Values 21 (<1%) 25% 1 Std 2.69 5 5.22k (5%)
Zeros --- 5% 1 MAD 0 8 4.26k (4%)
Negative --- 1% 1 Kurt 2.56 6 3.90k (4%)
Memory Usage 1 Min 1 Skew 1.66 7 1.61k (2%)
../../_images/45dfbff7ce8f0a7087486fea3336d141d9a9910c1cddafe6ea56a6060195b544.jpg

Key Observations:

  • 51% of payments are single-installment

  • Some have ≥9 installments

By Payment Type

pb.bar_groupby(y='payment_type', to_slide=True)
../../_images/1bc5c2e04b9992f588e1660f8490c6949ec8dc1d59c8f717bd718c1ed82a4a83.jpg

Key Observations:

  • Credit cards average 3.5 installments

  • Other methods have no installments