Product Analysis

Product Analysis#

Number of Products#

pb.configure(
    df = df_products
    , metric = 'product_sales_cnt'
    , metric_label = 'Share of Sold Products'
    , agg_func = 'sum'
    , norm_by='all'
    , axis_sort_order='descending'
    , text_auto='.1%'
    , update_fig={'xaxis': {'tickformat': '.0%'}}
)

print(f'Total sold products count: {df_products.product_sales_cnt.sum():,.0f}')

Total sold products count: 100,051

Let’s look at the top values of the metrics

pb.metric_top(id_column='product_id')

	product_sales_cnt
product_id
99a4788cb24856965c36a24e339b6058	456.00
aca2eb7d00ea1a7b8ebd4e68314663af	425.00
422879e10f46682990de24d770e7f83d	352.00
d1c427060a0f73f6b889a5c7c61f2ac4	313.00
389d119b48cf3043d311335e499d9c6b	309.00
53b36df67ebb7c41585e8d54d6772e08	304.00
368c6c730842d78016ad823897a372db	291.00
53759a2ecddad2bb87a079a1f1519f73	287.00
154e7e31ebfa092203795c972e5804a6	262.00
2b4609f8948be18874494203496bc318	254.00

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(product_sales_cnt='Number of Units Sold per Product')
    , title='Distribution of Number of Units Sold per Product'
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)

Summary Statistics for "product_sales_cnt" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.12k (97%)	Max	456	Mean	3.11	1	19.05k (58%)
Missing	829 (3%)	99%	30	Trimmed Mean (10%)	1.68	2	5.30k (16%)
Distinct	124 (<1%)	95%	10	Mode	1	3	2.34k (7%)
Non-Duplicate	38 (<1%)	75%	2	Range	455	4	1.31k (4%)
Duplicates	32.83k (99%)	50%	1	IQR	1	5	882 (3%)
Dup. Values	86 (<1%)	25%	1	Std	9.43	6	576 (2%)
Zeros	---	5%	1	MAD	0	7	450 (1%)
Negative	---	1%	1	Kurt	622.35	8	311 (<1%)
Memory Usage	<1 Mb	Min	1	Skew	19.93	9	252 (<1%)

../../_images/1272ea95bfce67e4c1c07afebd28547027b470e754e81acc2874f74c8d2722b3.jpg

Key Observations:

75% of products sold 1-2 units total
Top 5% sold ≥10 units

Let’s look at the statistics and distribution of the number of sold products per day.

tmp_df_res = (
    df_sales.merge(df_items, on='order_id', how='left')
    .groupby(pd.Grouper(key='order_purchase_dt', freq='D'), observed=False)['product_id']
    .nunique()
    .to_frame('products_cnt_per_day')
)

tmp_df_res['products_cnt_per_day'].explore.info(
    labels=dict(orders_cnt_per_day='Number of Sold Products per Day')
    , title='Distribution of Number of Sold Products per Day'
)

Summary Statistics for "products_cnt_per_day" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	602 (100%)	Max	901	Mean	153.55	150	7 (1%)
Missing	---	99%	333.93	Trimmed Mean (10%)	150.00	239	7 (1%)
Distinct	258 (43%)	95%	276.95	Mode	Multiple	98	6 (<1%)
Non-Duplicate	101 (17%)	75%	206.75	Range	897	67	6 (<1%)
Duplicates	344 (57%)	50%	143.50	IQR	108.75	141	6 (<1%)
Dup. Values	157 (26%)	25%	98	Std	79.36	66	6 (<1%)
Zeros	---	5%	45.10	MAD	76.35	192	6 (<1%)
Negative	---	1%	8.03	Kurt	12.14	103	6 (<1%)
Memory Usage	<1 Mb	Min	4	Skew	1.62	102	6 (<1%)

../../_images/55d91a8f3d22c6f506cb4d5dd259d12a25eb85953ca8b34ac35481969d201283.jpg

Key Observations:

75% of days sold ≤207 products
Top 5% sold ≥277 products
Several days exceeded 400 products

Let’s look by different dimensions.

By Product Category

fig = pb.bar_groupby(
    y='product_category'
    , trim_top_n_y=20
    , width=1100
    , height=500   
    , show_top_and_bottom_n = 15
    , show_count=False
).update_layout(xaxis_domain=[0, 0.4], xaxis2_domain=[0.6, 1], xaxis2_tickformat='.2%')
pb.to_slide(fig)
fig.show()

../../_images/38d41cefd2a23208b796ce4d96e3f9508379d1ec2154ac81bf65f924a317b6ae.jpg

Key Observations:

Best-selling categories: Bed Bath Table, Health Beauty
Lowest-selling: Security and Services

By Generalized Product Category

pb.bar_groupby(y='general_product_category', to_slide=True)

../../_images/882903b037cc3d1285fdd38a69f7bd393d6d34fb307e9e94495c7e6ebff69c9d.jpg

Key Observations:

Top 3 generalized categories by units sold:
1. Electronics (27%)
2. Furniture (19%)
3. Home & Garden (15%)
Lowest: Food & Drinks (1%)

Product Price#

pb.configure(
    df = df_products
    , metric = 'avg_price'
    , metric_label = 'Average Product Price, R$'
    , metric_label_for_distribution = 'Product Price, R$'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
)

Top products.

pb.metric_top(id_column='product_id')

	avg_price
product_id
489ae2aa008f021502940f251d4cce7f	6,735.00
69c590f7ffc7bf8db97190b6cb6ed62e	6,729.00
1bdf5e6731585cf01aa8169c7028d6ad	6,499.00
a6492cc69376c469ab6f61d8f44de961	4,799.00
c3ed642d592594bb648ff4a04cee2747	4,690.00
259037a6a41845e455183f89c5035f18	4,590.00
a1beef8f3992dbd4cd8726796aa69c53	4,399.87
6cdf8fc1d741c76586d8b6b15e9eef30	4,099.99
6902c1962dd19d540807d0ab8fade5c6	3,999.90
4ca7b91a31637bd24fb8e559d5e015e4	3,999.00

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(product_sales_cnt='Average Product Price, R$')
    , title='Distribution of Average Product Price'
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)

Summary Statistics for "avg_price" (Type: Float)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.12k (97%)	Max	6.74k	Mean	144.59	59.90	508 (2%)
Missing	829 (3%)	99%	1.20k	Trimmed Mean (10%)	96.88	39.90	387 (1%)
Distinct	8.64k (26%)	95%	469.90	Mode	59.90	69.90	378 (1%)
Non-Duplicate	6.35k (19%)	75%	153.33	Range	6.73k	49.90	363 (1%)
Duplicates	24.31k (74%)	50%	79	IQR	113.43	19.90	320 (<1%)
Dup. Values	2.29k (7%)	25%	39.90	Std	247.00	29.90	319 (<1%)
Zeros	---	5%	16.90	MAD	71.16	89.90	274 (<1%)
Negative	---	1%	9.82	Kurt	103.20	79.90	256 (<1%)
Memory Usage	<1 Mb	Min	0.85	Skew	7.62	99.90	252 (<1%)

../../_images/d2a48b7c4c33df3809e6a86ae22337fdf39d0825df27fe83c2ef9ee88ffbda05.jpg

Key Observations:

75% of products had average price ≤153 R$
Bottom 5% ≤17 R$
Top 5% ≥470 R$

Let’s see at statistics and distribution of the metric per day.

tmp_df_res = (
    df_sales.merge(df_items, on='order_id', how='left')
    .groupby(pd.Grouper(key='order_purchase_dt', freq='D'), observed=False)['price']
    .mean()
    .to_frame('avg_price_per_day')
)

tmp_df_res['avg_price_per_day'].explore.info(
    labels=dict(avg_price_per_day='Average Product Price per Day, R$')
    , title='Distribution of Average Product Price per Day, R$'
)

Summary Statistics for "avg_price_per_day" (Type: Float)
Summary		Percentiles		Detailed Stats		Value Counts
Total	602 (100%)	Max	270.38	Mean	121.70	12.40	1 (<1%)
Missing	---	99%	229.02	Trimmed Mean (10%)	118.95	108.33	1 (<1%)
Distinct	602 (100%)	95%	162.41	Mode	Multiple	113.69	1 (<1%)
Non-Duplicate	602 (100%)	75%	129.87	Range	257.98	110.72	1 (<1%)
Duplicates	---	50%	117.27	IQR	21.90	108.30	1 (<1%)
Dup. Values	---	25%	107.97	Std	24.95	135.27	1 (<1%)
Zeros	---	5%	93.56	MAD	16.13	106.63	1 (<1%)
Negative	---	1%	81.65	Kurt	8.41	120.96	1 (<1%)
Memory Usage	<1 Mb	Min	12.40	Skew	1.96	93.40	1 (<1%)

../../_images/9aa1a8cdae1acb42f9ab543800780236f83ad2042586d0d6e2e63d23bda63061.jpg

Key Observations:

Daily average product prices:
- Bottom 5% ≤94 R$
- Middle 50% 108-130 R$
- Top 5% ≥162 R$

Let’s look by different dimensions.

By Product Category

print('Top Best')
pb.box(y='product_category').show()
print('Top Worst')
pb.box(
    y='product_category'
    , trim_top_n_direction='bottom'
).show()
pb.bar_groupby(
    y='product_category'
    , show_top_and_bottom_n=15
    , to_slide=True
).show()

Top Best

../../_images/c24160c962883be3da99bda123587e66bf525630ef9243c5276e238ca7c11eb1.jpg

Top Worst

../../_images/76479fbb5ba957ab4e0292e782b0aeae413ae2b86e4a2e09d40bb7d448a8ccf3.jpg

../../_images/7f48c16c677fda1904fd8e011d36f75770f7465917e302edf22a3d900292d2d8.jpg

Key Observations:

Highest priced category: Watches Gifts
Lowest priced: Flowers

By Generalized Product Category

pb.box(y='general_product_category').show()
fig = pb.bar_groupby(
    y='general_product_category'
    , show_count=True
).update_layout(xaxis2_title_text='Number of Sold Products')
pb.to_slide(fig)
fig.show()

../../_images/5f23238bd033e11551b39da8096fb7c9bd3378a302deaaf3f0e21df9d2b99b02.jpg

../../_images/1b54bdb81e4506779d25ca5790943b78347dc26efa137470f7e7c5ac811b02fe.jpg

Key Observations:

Top 3 categories by average price:
1. Industry & Construction
2. Electronics
3. Fashion
Lowest: Food & Drinks

Sales Amount of Products#

pb.configure(
    df = df_products
    , metric = 'total_sales_amount'
    , metric_label = 'Total Sales Amount of Products, R$'
    , metric_label_for_distribution = 'Total Sales Amount per Product, R$'
    , agg_func = 'sum'
    , axis_sort_order='descending'
    , text_auto='.3s'
)

Top products.

pb.metric_top(id_column='product_id')

	total_sales_amount
product_id
bb50f2e236e5eea0100680137654686c	63,560.00
6cdd53843498f92890544667809f1595	53,652.30
d6160fb7873f184099d9bc95e30376af	45,949.35
d1c427060a0f73f6b889a5c7c61f2ac4	45,620.56
99a4788cb24856965c36a24e339b6058	42,049.66
3dd2a17168ec895c781a9191c1e95ad7	40,782.80
25c38557cf793876c5abdd5931f922db	38,907.32
5f504b3a1c75b73d6151be81eb05bdc9	37,733.90
53b36df67ebb7c41585e8d54d6772e08	37,454.63
aca2eb7d00ea1a7b8ebd4e68314663af	37,104.30

Let’s see at statistics and distribution of the metric per day.

tmp_df_res = (
    df_sales.merge(df_items, on='order_id', how='left')
    .groupby(pd.Grouper(key='order_purchase_dt', freq='D'), observed=False)['price']
    .sum()
    .to_frame('total_price_per_day')
)

tmp_df_res['total_price_per_day'].explore.info(
    labels=dict(avg_price_per_day='Total Product Price per Day, R$')
    , title='Distribution of Total Product Price per Day, R$'
)

Summary Statistics for "total_price_per_day" (Type: Float)
Summary		Percentiles		Detailed Stats		Value Counts
Total	602 (100%)	Max	149.92k	Mean	21.93k	396.90	1 (<1%)
Missing	---	99%	49.69k	Trimmed Mean (10%)	21.20k	23507.49	1 (<1%)
Distinct	602 (100%)	95%	41.51k	Mode	Multiple	33539.34	1 (<1%)
Non-Duplicate	602 (100%)	75%	28.81k	Range	149.52k	30891.45	1 (<1%)
Duplicates	---	50%	20.34k	IQR	15.82k	27074.86	1 (<1%)
Dup. Values	---	25%	12.99k	Std	12.23k	30571.07	1 (<1%)
Zeros	---	5%	5.83k	MAD	11.60k	20579.17	1 (<1%)
Negative	---	1%	1.44k	Kurt	18.99	22014.31	1 (<1%)
Memory Usage	<1 Mb	Min	396.90	Skew	2.20	23255.73	1 (<1%)

../../_images/3aa97f82264422721088932c0727ba7340075f57b9b7fa2fa744c8935b5c92c7.jpg

Key Observations:

75% of days had product revenue ≤29K R$
Top 5% ≥42K R$

Let’s look by different dimensions.

By Product Category

print('Top Best')
pb.box(y='product_category').show()
print('Top Worst')
pb.box(
    y='product_category'
    , trim_top_n_direction='bottom'
).show()
pb.bar_groupby(
    y='product_category'
    , show_top_and_bottom_n=15
    , horizontal_spacing=0.25
    , to_slide=True
).show()

Top Best

../../_images/2dafc40add4ce4e0c001e252a5179f29ff7f76079c295f1443c5a6710d2cfac2.jpg

Top Worst

../../_images/160aa80cb27c3119093532191b955a7dc23c8f14862c1a5f550e0e80ab455c03.jpg

../../_images/ee06bd06f5e513277053790905e030ab89f89cba70b4b9850fc68e5426c7d7e4.jpg

Key Observations:

Highest revenue categories: Health beauty, Watches gifts
Lowest: Security and services

By Generalized Product Category

pb.box(y='general_product_category').show()
fig = (
    pb.bar_groupby(y='general_product_category', show_count=True)
    .update_layout(
        xaxis2_title_text='Number of Sold Products'
    )
)
pb.to_slide(fig)
fig.show()

../../_images/51486ebad3de2c1204662130b1e95b37b0a7822080f76b2e6edf97263367c52c.jpg

../../_images/8b7a42b8291f917990509dc49d1610e7c1358283ab5c9679812f392907d301e8.jpg

Key Observations:

Top 3 categories by revenue:
1. Electronics
2. Furniture
3. Home & Garden
Lowest: Food & Drinks

Sales Amount per Product#

pb.configure(
    df = df_products
    , metric = 'total_sales_amount'
    , metric_label = 'Average Sales Amount per Products, R$'
    , metric_label_for_distribution = 'Total Sales Amount per Product, R$'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'    
)

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(total_sales_amount='Total Sales Amount per Product, R$')
    , title='Distribution of Total Sales Amount per Product'
    , upper_quantile=0.99
    , hist_mode='dual_hist_trim'
)

Summary Statistics for "total_sales_amount" (Type: Float)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.12k (97%)	Max	63.56k	Mean	410.98	59.90	363 (1%)
Missing	829 (3%)	99%	4.69k	Trimmed Mean (10%)	198.54	69.90	251 (<1%)
Distinct	10.14k (31%)	95%	1.46k	Mode	59.90	39.90	245 (<1%)
Non-Duplicate	7.12k (22%)	75%	325.88	Range	63.56k	49.90	219 (<1%)
Duplicates	22.81k (69%)	50%	135.99	IQR	265.98	29.90	218 (<1%)
Dup. Values	3.02k (9%)	25%	59.90	Std	1.36k	89.90	200 (<1%)
Zeros	---	5%	20.03	MAD	138.61	19.90	195 (<1%)
Negative	---	1%	11.57	Kurt	499.53	79.90	173 (<1%)
Memory Usage	<1 Mb	Min	2.20	Skew	17.70	99.90	171 (<1%)

../../_images/3df7938713225f2009f8c80ed3062526089acf56caa8f91f107fcda0a38e2d1a.jpg

Key Observations:

75% of products generated ≤325 R$ lifetime revenue

Let’s look by different dimensions.

By Product Category

print('Top Best')
pb.box(y='product_category').show()
print('Top Worst')
pb.box(
    y='product_category'
    , trim_top_n_direction='bottom'
).show()
pb.bar_groupby(
    y='product_category'
    , show_top_and_bottom_n=15
    , horizontal_spacing=0.25
    , to_slide=True
)

Top Best

../../_images/e38087ba7fec508b85ce3ac847f6fe13bf715e32387b1060c725e46213879d5e.jpg

Top Worst

../../_images/c21d0ff8ed4c390ef189aa9e1c8c709db5ca698fa162b4067439c59250f247b1.jpg

../../_images/c16e7d9c24df5e6abf3f6e986b4170bd8e1eac0011da73590cc406b94a20746e.jpg

Key Observations:

Highest average revenue per product: Watches gifts
Lowest: Flowers

By Generalized Product Category

pb.box(y='general_product_category').show()
pb.bar_groupby(y='general_product_category', to_slide=True).show()

../../_images/3fba60e8723d14c56688a545b59aa51a170a9cd0fcae3001ae537ea038cf761e.jpg

../../_images/e7bd3ea88be60b706da9f5ecdd56ed28395608a106f657fda3a21d2f8b16ad48.jpg

Key Observations:

Top 3 categories by average revenue:
1. Electronics
2. Beauty & Health
3. Industry & Construction
Lowest: Books & Stationery

Price Range#

pb.configure(
    df = df_products
    , metric = 'price_range'
    , metric_label = 'Average Price Range per Product, R$'
    , metric_label_for_distribution = 'Price Range per Product, R$'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.2f'
)

Top products.

pb.metric_top(id_column='product_id')

	price_range
product_id
8b502ca34e28d30605bc667b965b6abf	999.10
5237739bb5fee495dbd337755a138660	740.00
ba16581014183c8415da15145f3d4c24	660.99
a7c87b1bbdd51e0d68b0307cffd03d47	650.01
18209df52bc87a69b84db4df602397c1	519.01
68f3adaef1620e7b0c4c7cd9f78d7ed0	497.35
4cce3fa9fee9eb2361e0b9bd32516958	461.00
d6160fb7873f184099d9bc95e30376af	449.99
f819f0c84a64f02d3a5606ca95edd272	400.09
c1afa44a5a60e2e7cf7280e57eba0597	400.00

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

Summary Statistics for "price_range" (Type: Float)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.12k (97%)	Max	999.10	Mean	3.94	0	26.35k (80%)
Missing	829 (3%)	99%	70.01	Trimmed Mean (10%)	0.44	10	518 (2%)
Distinct	1.83k (6%)	95%	20	Mode	0	20	235 (<1%)
Non-Duplicate	1.29k (4%)	75%	0	Range	999.10	5	230 (<1%)
Duplicates	31.12k (94%)	50%	0	IQR	0	1	153 (<1%)
Dup. Values	536 (2%)	25%	0	Std	19.74	4	150 (<1%)
Zeros	26.35k (80%)	5%	0	MAD	0	2	141 (<1%)
Negative	---	1%	0	Kurt	465.73	3	128 (<1%)
Memory Usage	<1 Mb	Min	0	Skew	16.31	6	120 (<1%)

../../_images/6611b1cb7b9503dfc6834a6e93623e9bd6ebdb9ee4b10bf896885211d49bd9d7.jpg

Key Observations:

80% of products maintained stable prices
5% had price changes ≥20 R$

Let’s look by different dimensions.

By Product Category

print('Top Best')
pb.box(y='product_category').show()
print('Top Worst')
pb.box(
    y='product_category'
    , trim_top_n_direction='bottom'
).show()
pb.bar_groupby(
    y='product_category'
    , show_top_and_bottom_n=15
    , horizontal_spacing=0.25
    , to_slide=True
).show()

Top Best

../../_images/04a8fe1d45950b867921006655518bddab0cd8a8439189d256e242ec40325d82.jpg

Top Worst

../../_images/e6e4b2784d89ebcad02dcd2699cae342284dfcf997052061d1ad4aa310298873.jpg

../../_images/c5f6533ce36b0d36bd75cba7dba2184ff7182e0c8b5eaeb61cb01ff04476fcfd.jpg

Key Observations:

Most price volatility: Watches gifts

By Generalized Product Category

pb.box(y='general_product_category').show()
pb.bar_groupby(y='general_product_category', to_slide=True)

../../_images/a41068cd1630d7e0b87502de5fdc91ad27913cd1411b8c237a745f03a048628f.jpg

../../_images/aa7de336565db3acfe7580d2bfb403160fd8e8032e45c0f968637c48cfd70598.jpg

Key Observations:

Top 3 categories by price changes:
1. Electronics
2. Industry & Construction
3. Beauty & Health
Lowest volatility: Food & Drinks

Quantity of Product per Order#

pb.configure(
    df = df_products
    , metric = 'avg_product_qty_per_order'
    , metric_label = 'Average Quantity of Product per Order'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
)

Top products.

pb.metric_top(id_column='product_id')

	avg_product_qty_per_order
product_id
9571759451b1d780ee7c15012ea109d4	20.00
37eb69aca8718e843d897aa7b82f462d	15.00
05b515fdc76e888aada3c6d66c201dff	10.00
270516a3f41dc035aa87d220228f844c	10.00
89b190a046022486c635022524a974a8	10.00
5769ef0a239114ac3a854af00df129e4	8.00
810cfa5dd36b001cfc186499381f72ab	7.00
4cce2fad3d2dec6f82510d2521aebdd3	7.00
ce6184189a523c1eb5fe5113061780b9	6.00
ac1ad58efc1ebf66bfadc09f29bdedc0	6.00

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    upper_quantile=0.95
    , hist_mode='dual_hist_trim'
)

Summary Statistics for "avg_product_qty_per_order" (Type: Float)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.12k (97%)	Max	20	Mean	1.11	1	27.88k (85%)
Missing	829 (3%)	99%	3	Trimmed Mean (10%)	1.00	2	1.22k (4%)
Distinct	305 (<1%)	95%	2	Mode	1	1.50	534 (2%)
Non-Duplicate	175 (<1%)	75%	1	Range	19	1.33	254 (<1%)
Duplicates	32.65k (99%)	50%	1	IQR	0	3	203 (<1%)
Dup. Values	130 (<1%)	25%	1	Std	0.45	1.25	182 (<1%)
Zeros	---	5%	1	MAD	0	1.20	156 (<1%)
Negative	---	1%	1	Kurt	188.05	1.17	104 (<1%)
Memory Usage	<1 Mb	Min	1	Skew	9.45	4	101 (<1%)

../../_images/b4705ad0a47547f6b33b4a238f3fd5f8665b73834ebd307e158b75721c12fe66.jpg

Key Observations:

85% of products appeared as single units in orders

Let’s look by different dimensions.

By Generalized Product Category

pb.box(y='general_product_category').show()
pb.bar_groupby(y='general_product_category').show()

../../_images/144d209a48252712731d6ca6120121d68c18d325bad73f7da611953af5d59d84.jpg

../../_images/ebce946024620c26082d57e602b565f06b56a7f4cf974d4d85aaa505a410079b.jpg

Key Observations:

Highest average quantity per order: Food & Drinks

Length of Product Name#

pb.configure(
    df = df_products
    , metric = 'product_name_lenght'
    , metric_label = 'Length of Product Name'
)

Top products.

pb.metric_top(id_column='product_id')

	product_name_lenght
product_id
aac3cc525702d53c8a2f4733ed214098	76.00
52b3af7304d611855714d9b3d1724ea7	72.00
2e2b9bfa91068239d5fb2b39764b92a5	69.00
2b19dbb6e225fc04cf9a80d83f949b88	68.00
df6e62772d439c7afc2d284339cf9425	67.00
023a60ac6b3484afe23d788ce2444df0	66.00
3ed31fdf8af68a6c0b059a256420a5c5	64.00
33041fc111e526d9dd16e06678ff5eeb	64.00
81a5ad77c14cd83d0c853e4c317cb837	64.00
05a6b99c0460f22f5f8101493c0e3c0e	64.00

Let’s see at statistics and distribution of the metric.

pb.metric_info()

Summary Statistics for "product_name_lenght" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.34k (98%)	Max	76	Mean	48.48	60	2.18k (7%)
Missing	610 (2%)	99%	63	Trimmed Mean (10%)	49.63	59	2.02k (6%)
Distinct	66 (<1%)	95%	60	Mode	60	58	1.89k (6%)
Non-Duplicate	7 (<1%)	75%	57	Range	71	57	1.72k (5%)
Duplicates	32.88k (99%)	50%	51	IQR	15	55	1.68k (5%)
Dup. Values	59 (<1%)	25%	42	Std	10.25	56	1.68k (5%)
Zeros	---	5%	29	MAD	10.38	54	1.44k (4%)
Negative	---	1%	20	Kurt	0.19	53	1.33k (4%)
Memory Usage	<1 Mb	Min	5	Skew	-0.90	52	1.26k (4%)

../../_images/5197c5a4c70f2982f049879d7b399a5694884e3814c9766960f8f27032ac381f.jpg

Key Observations:

75% of products have names ≤57 characters

Length of Product Description#

pb.configure(
    df = df_products
    , metric = 'product_description_lenght'
    , metric_label = 'Length of Product Description'
)

Top products.

pb.metric_top(id_column='product_id')

	product_description_lenght
product_id
47d52bb24ef8a3aa09724f00604be3ba	3,992.00
e6f1f7e12ef3f7c254164e35be6420db	3,988.00
84fad62439091ff986a3885bfd6d299d	3,985.00
ddebc97ddf43a9787d1ee7012e394ccc	3,976.00
7a40001d3da620600ab80109510f3496	3,963.00
c6d339a1fa8873b1ffe76fb1b7cc10f1	3,956.00
1bdf5e6731585cf01aa8169c7028d6ad	3,954.00
cd8c7501d1e3a66f282dfed8dbd5ab9f	3,954.00
ed43976cb61c922803515e4963d9e5cc	3,950.00
8a90417eb713be09fd87a5d077ae06a2	3,949.00

Let’s see at statistics and distribution of the metric.

pb.metric_info()

Summary Statistics for "product_description_lenght" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.34k (98%)	Max	3.99k	Mean	771.50	404	94 (<1%)
Missing	610 (2%)	99%	3.29k	Trimmed Mean (10%)	662.91	729	86 (<1%)
Distinct	2.96k (9%)	95%	2.06k	Mode	404	651	66 (<1%)
Non-Duplicate	669 (2%)	75%	972	Range	3.99k	703	66 (<1%)
Duplicates	29.99k (91%)	50%	595	IQR	633	236	65 (<1%)
Dup. Values	2.29k (7%)	25%	339	Std	635.12	184	65 (<1%)
Zeros	---	5%	150	MAD	434.40	303	63 (<1%)
Negative	---	1%	84	Kurt	4.83	352	62 (<1%)
Memory Usage	<1 Mb	Min	4	Skew	1.96	375	60 (<1%)

../../_images/4b2e217ee4a47450983e90f3a24c6bc77f65dbd518466820ee7791fa8429beb2.jpg

Key Observations:

75% of products have descriptions ≤1000 characters

Number of Product Photos#

pb.configure(
    df = df_products
    , metric = 'product_photos_qty'
    , metric_label = 'Number of Product Photos'
)

Top products.

pb.metric_top(id_column='product_id')

	product_photos_qty
product_id
f95d5d21561ea085ba1e1a4e53840844	20
234495ab7809d4517bc1330c439da1bb	19
e9880042522806f124fdd4f8c8514d0d	18
b659034bc6cfc3d9baeda101c0c281fe	18
f9aa001a859b11fd798bb386f3d07eb0	17
801f0a5ea1ac28df44d65195cf4e2620	17
7f38cf4e517ec6bb1d31c4e6b6df18ef	17
5948868c402a614a2dd3b90ebb06a253	17
b085d8c8840e8dd3d6ccdf3d86c6145e	17
28763a4fd1b597a9c4f31a9579e7d1b4	17

Let’s see at statistics and distribution of the metric.

pb.metric_info()

Summary Statistics for "product_photos_qty" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.95k (100%)	Max	20	Mean	2.17	1	17.10k (52%)
Missing	---	99%	8	Trimmed Mean (10%)	1.81	2	6.26k (19%)
Distinct	19 (<1%)	95%	6	Mode	1	3	3.86k (12%)
Non-Duplicate	2 (<1%)	75%	3	Range	19	4	2.43k (7%)
Duplicates	32.93k (99%)	50%	1	IQR	2	5	1.48k (5%)
Dup. Values	17 (<1%)	25%	1	Std	1.73	6	968 (3%)
Zeros	---	5%	1	MAD	0	7	343 (1%)
Negative	---	1%	1	Kurt	7.39	8	192 (<1%)
Memory Usage	<1 Mb	Min	1	Skew	2.22	9	105 (<1%)

../../_images/16042c2c5d29a5f0d8e9aeff1c41e7df3b20256613ff0e589684f249b4d4eb1b.jpg

Key Observations:

52% of products have 1 photo
Top 5% have ≥6 photos

Product Weight#

pb.configure(
    df = df_products
    , metric = 'product_weight_g'
    , metric_label = 'Product Weight, g'
)

Top products.

pb.metric_top(id_column='product_id')

	product_weight_g
product_id
26644690fde745fc4654719c3904e1db	40425
07f7c5fe95aa4a3b8ea56a5119546939	30000
0e9dfb804bafa3d68ef3ee7a621abfb2	30000
bad9cd5ad615c0b5ba87448e03ec954c	30000
9dfef86fb34051388a7263a31642386c	30000
46e24ce614899e36617e37ea1e4aa6ff	30000
f97ad9066c718a6cef93dfcf253d3e0d	30000
b575098a6da9b81384a7df56314b7f70	30000
cb26d15d1b6eabaac7c0803774245884	30000
0a859d8dc68f6a746b4709217110c439	30000

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    upper_quantile=0.95
    , hist_mode='dual_hist_trim'    
)

Summary Statistics for "product_weight_g" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.95k (100%)	Max	40.42k	Mean	2.28k	200	2.08k (6%)
Missing	---	99%	22.54k	Trimmed Mean (10%)	1.20k	300	1.56k (5%)
Distinct	2.20k (7%)	95%	10.85k	Mode	200	150	1.26k (4%)
Non-Duplicate	1.03k (3%)	75%	1.90k	Range	40423	400	1.21k (4%)
Duplicates	30.75k (93%)	50%	700	IQR	1.60k	100	1.19k (4%)
Dup. Values	1.17k (4%)	25%	300	Std	4.28k	500	1.11k (3%)
Zeros	---	5%	107	MAD	741.30	250	1.00k (3%)
Negative	---	1%	62	Kurt	15.14	600	957 (3%)
Memory Usage	<1 Mb	Min	2	Skew	3.61	350	832 (3%)

../../_images/63cc2c547aef2782a02b829449162e02563b312b252c15cbd5bb10a73ce4be32.jpg

Key Observations:

75% of products weigh ≤1.9kg
Top 5% weigh ≥11kg

Product Length#

pb.configure(
    df = df_products
    , metric = 'product_length_cm'
    , metric_label = 'Product Length, cm'
)

Top products.

pb.metric_top(id_column='product_id')

	product_length_cm
product_id
ca8abbdcac2d082a56ff54df35aec76a	105
e10c5041c0752194622a7a7016d8c9b5	105
199f076c011ea5e8b546bff05d4f2477	105
e1717ed4c8d10ca8117e64019d6cb0d0	105
34742604f6cd1e891726b849b6890f81	105
2a9e3d335bbb23ffee8c9da6cecb7bb8	105
032e8352e7d34cc2558d4ae22132866c	105
60dfe129ad287ca8cd2656a8151138ad	105
5ea241885816f6957dfaa9a8592c6b37	105
b33e45187a97dab72b3c819e76efc972	105

Let’s see at statistics and distribution of the metric.

pb.metric_info()

Summary Statistics for "product_length_cm" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.95k (100%)	Max	105	Mean	30.81	16	5.52k (17%)
Missing	---	99%	100	Trimmed Mean (10%)	27.79	20	2.82k (9%)
Distinct	99 (<1%)	95%	65	Mode	16	30	2.03k (6%)
Non-Duplicate	1 (<1%)	75%	38	Range	98	18	1.50k (5%)
Duplicates	32.85k (99%)	50%	25	IQR	20	25	1.39k (4%)
Dup. Values	98 (<1%)	25%	18	Std	16.91	17	1.31k (4%)
Zeros	---	5%	16	MAD	11.86	19	1.27k (4%)
Negative	---	1%	16	Kurt	3.51	40	1.22k (4%)
Memory Usage	<1 Mb	Min	7	Skew	1.75	22	972 (3%)

../../_images/888e369241d160a1f147ef1740ac23e0f97d99c4d8392d270192d7143e914395.jpg

Key Observations:

75% of products are ≤38cm long
Top 5% ≥65cm

Product Width#

pb.configure(
    df = df_products
    , metric = 'product_width_cm'
    , metric_label = 'Product Width, cm'
)

Top products.

pb.metric_top(id_column='product_id')

	product_width_cm
product_id
b17808303e15dd50538c011b44295427	118
6d30e5e702df2b8719d9c6be1bdf425b	105
3b17f6528c9e2a01b2f75f844a60ddae	105
a2f4e28e50f60566eeb99f842ffc0fd9	105
cb428376d66b5e216d2ef9f3b27fc172	105
3758055ab2434bd36ac78e00b15b5cf6	105
83d68ad4e5707409089afff26a40b2df	104
e7248872169a7ab67e20c182aaf17976	103
e54c0428a8cf1b79f63823b92e20aacc	102
68e6e8fd8c5f5b252b105d00daa9b57b	102

Let’s see at statistics and distribution of the metric.

pb.metric_info()

Summary Statistics for "product_width_cm" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.95k (100%)	Max	118	Mean	23.20	11	3.72k (11%)
Missing	---	99%	63	Trimmed Mean (10%)	21.39	20	3.05k (9%)
Distinct	95 (<1%)	95%	47	Mode	11	16	2.81k (9%)
Non-Duplicate	9 (<1%)	75%	30	Range	112	15	2.39k (7%)
Duplicates	32.86k (99%)	50%	20	IQR	15	30	1.79k (5%)
Dup. Values	86 (<1%)	25%	15	Std	12.08	12	1.54k (5%)
Zeros	---	5%	11	MAD	8.90	25	1.33k (4%)
Negative	---	1%	11	Kurt	4.07	14	1.26k (4%)
Memory Usage	<1 Mb	Min	6	Skew	1.67	13	1.13k (3%)

../../_images/097e9ab0ff30092bd95b085333869d6b0285a29aa04623d043b80840afe53af1.jpg

Key Observations:

75% of products are ≤30cm wide
Top 5% ≥47cm

Product Height#

pb.configure(
    df = df_products
    , metric = 'product_height_cm'
    , metric_label = 'Product Height, cm'
)

Top products.

pb.metric_top(id_column='product_id')

	product_height_cm
product_id
bc3c6d2a621414f2e1df7a8a32a2828e	105
d14495a85be157b5cacef4eaaf825791	105
1ca99da10c4b800de39096631ed2e773	105
995a6110a2705a9401669fb4cf939241	105
011967a30ceeaa86acb72e79664544ad	105
83f6e0a993efdfa2bf9550a204422cb7	105
e42ad1ff7ad0843110435858ec10a2c6	105
a7f4ab2b8fc3ee762e05b6eda08acb93	105
65b183dcbb9689b176730d709a0003dd	105
e940125d0a3c309f58f41cd21e39af06	105

Let’s see at statistics and distribution of the metric.

pb.metric_info()

Summary Statistics for "product_height_cm" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.95k (100%)	Max	105	Mean	16.94	10	2.55k (8%)
Missing	---	99%	69	Trimmed Mean (10%)	14.74	15	2.02k (6%)
Distinct	102 (<1%)	95%	44	Mode	10	20	1.99k (6%)
Non-Duplicate	3 (<1%)	75%	21	Range	103	16	1.60k (5%)
Duplicates	32.85k (99%)	50%	13	IQR	13	11	1.55k (5%)
Dup. Values	99 (<1%)	25%	8	Std	13.64	5	1.53k (5%)
Zeros	---	5%	3	MAD	8.90	12	1.52k (5%)
Negative	---	1%	2	Kurt	6.68	8	1.47k (4%)
Memory Usage	<1 Mb	Min	2	Skew	2.14	2	1.36k (4%)

../../_images/684928d983084fd25a26ff54cc3f2418144c02e3010648f716e199100e0ecea9.jpg

Key Observations:

75% of products are ≤21cm tall
Top 5% ≥44cm

Product Volume#

pb.configure(
    df = df_products
    , metric = 'product_volume_cm3'
    , metric_label = 'Product Volume, cm3'
)

Top products.

pb.metric_top(id_column='product_id')

	product_volume_cm3
product_id
256a9c364b75753b97bee410c9491ad8	296,208.00
3eb14e65e4208c6d94b7a32e41add538	294,000.00
0b48eade13cfad433122f23739a66898	294,000.00
c1e0531cb1864fd3a0cae57dca55ca80	294,000.00
f227e2d44f10f7dad30fb4dfa839e7a2	294,000.00
90c1b4e040d1d1c45897ec2dad4a809d	293,706.00
8d6f2c3454002d3f5aa7479a7fad7794	288,000.00
99ff40856c47a638df807c0a144470cc	288,000.00
c6fdec160d0f8f488d9041316c85051d	288,000.00
0e9dfb804bafa3d68ef3ee7a621abfb2	287,980.00

Let’s see at statistics and distribution of the metric.

pb.metric_info()

Summary Statistics for "product_volume_cm3" (Type: Integer)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.95k (100%)	Max	296.21k	Mean	16.56k	8000	604 (2%)
Missing	---	99%	135.91k	Trimmed Mean (10%)	10.58k	352	588 (2%)
Distinct	4.53k (14%)	95%	63.37k	Mode	8.00k	4096	419 (1%)
Non-Duplicate	1.98k (6%)	75%	18.48k	Range	296.04k	2560	327 (<1%)
Duplicates	28.43k (86%)	50%	6.84k	IQR	15.60k	27000	327 (<1%)
Dup. Values	2.54k (8%)	25%	2.88k	Std	27.06k	23625	259 (<1%)
Zeros	---	5%	832	MAD	7.37k	12000	256 (<1%)
Negative	---	1%	352	Kurt	24.62	1936	239 (<1%)
Memory Usage	<1 Mb	Min	168	Skew	4.18	6000	231 (<1%)

../../_images/53bb980d0db397eab03237e12012333def6317d6b09fad87049dd928319e3a42.jpg

Key Observations:

75% of products have volume ≤19K cm3
Top 5% ≥64K cm3

Weight to Volume Ratio#

pb.configure(
    df = df_products
    , metric = 'weight_to_volume_ratio'
    , metric_label = 'Product Weight to Volume Ratio'
)

Top products.

pb.metric_top(id_column='product_id')

	weight_to_volume_ratio
product_id
2b752ed328ea866e4721ca4e236a416c	85.23
672ffb2231575afa70f7fee73fe400a1	65.91
fec8c282124bc6f504cbc6e9ec38450a	62.78
194dad3cf9aa121860ba80cca80331bb	48.15
30f902329cc4fd7dba514f7a5b629b2d	45.88
00ffe57f0110d73fd84d162252b2c784	45.45
29abbd57526dab494a667e62d361f6cf	40.06
b0c7a71c8620bd389e240f63a507dc50	37.78
c224f464aeeb2c6af33f0682a181efa7	35.80
0c22c51625fc11357a8356efa31fe89f	32.39

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    upper_quantile=0.99
    , hist_mode='dual_hist_trim'    
)

Summary Statistics for "weight_to_volume_ratio" (Type: Float)
Summary		Percentiles		Detailed Stats		Value Counts
Total	32.95k (100%)	Max	85.23	Mean	0.20	0.08	1.85k (6%)
Missing	---	99%	1.22	Trimmed Mean (10%)	0.13	0.07	1.84k (6%)
Distinct	298 (<1%)	95%	0.52	Mode	0.08	0.05	1.74k (5%)
Non-Duplicate	113 (<1%)	75%	0.20	Range	85.23	0.06	1.70k (5%)
Duplicates	32.65k (99%)	50%	0.12	IQR	0.13	0.09	1.69k (5%)
Dup. Values	185 (<1%)	25%	0.07	Std	1.01	0.04	1.67k (5%)
Zeros	58 (<1%)	5%	0.03	MAD	0.09	0.10	1.52k (5%)
Negative	---	1%	0.01	Kurt	3.17k	0.11	1.38k (4%)
Memory Usage	<1 Mb	Min	0	Skew	50.31	0.17	1.38k (4%)

../../_images/6bf51e8183f8b939f83534bcf07a272ede3da71be17e82643412820fd0ddc065.jpg

Key Observations:

75% of products have weight/volume ratio ≤0.2
Top 5% ≥0.5

Other metrics#

What fraction of products were not sold at all?

We have missing values in the number of sold units for products that were never sold.

products_no_sales_share = (df_products.total_units_sold.isna()).mean()

print(f'Share of Products with No Sales: {products_no_sales_share:.1%}')

Share of Products with No Sales: 2.5%

Product Analysis

On this page

Product Analysis#

Number of Products#

Product Price#

Sales Amount of Products#

Sales Amount per Product#

Price Range#

Quantity of Product per Order#

Length of Product Name#

Length of Product Description#

Number of Product Photos#

Product Weight#

Product Length#

Product Width#

Product Height#

Product Volume#

Weight to Volume Ratio#

Other metrics#