Review Analysis#

Number of Reviews#

pb.configure(
    df = df_reviews
    , time_column = 'review_creation_dt'
    , metric = 'review_id'
    , metric_label = 'Share of Reviews'
    , metric_label_for_distribution = 'Number of Reviews'
    , agg_func = 'nunique'
    , norm_by='all'
    , axis_sort_order='descending'
    , text_auto='.1%'
    , update_fig={'xaxis': {'tickformat': '.0%'}}    
)
print(f'Total number of reviews: {df_reviews.review_id.nunique():,}')
Total number of reviews: 98,838

Let’s see at statistics and distribution of the metric.

pb.metric_info(freq='D')
Summary Statistics for "nunique_review_id_per_day" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 597 (100%) Max 464 Mean 165.56 1 16 (3%)
Missing --- 99% 419.20 Trimmed Mean (10%) 159.12 0 14 (2%)
Distinct 303 (51%) 95% 374.20 Mode 1 5 8 (1%)
Non-Duplicate 143 (24%) 75% 269 Range 464 9 7 (1%)
Duplicates 294 (49%) 50% 154 IQR 220 17 7 (1%)
Dup. Values 160 (27%) 25% 49 Std 124.62 192 6 (1%)
Zeros 14 (2%) 5% 1.80 MAD 161.60 8 6 (1%)
Negative --- 1% 0 Kurt -1.07 2 6 (1%)
Memory Usage <1 Mb Min 0 Skew 0.33 4 6 (1%)
../../_images/f180311d73670c441d5375d92f2276847b778d45a2990e98333e4d0fb101f7a6.jpg

Key Observations:

  • Typical day: 1 review created

  • 75% of days had ≤270 reviews

  • Top 5% ≥375 reviews

Let’s look by different dimensions.

By Day of Week

pb.bar_groupby(y='review_creation_weekday', to_slide=True)
../../_images/b62ab70a219caa00ef68b9f45d44f500d55962328e6621cebca5f7503154f2b4.jpg

Key Observations:

  • Fewest reviews on Mondays

  • Sundays slightly more than Mondays but still low

  • Possible review registration pattern

By Day Type

pb.bar_groupby(y='review_day_type')
../../_images/75144088cdca54f0345ae868c220ffb28e4b646e8e95850c2ded4fde2a47bd43.jpg

Key Observations:

  • 76% of reviews created on weekdays

  • Matches fewer weekend days

By Review Score

pb.bar_groupby(y='review_score', to_slide=True)
../../_images/0bd783b0f534d1ac059ecdab5beea5aae945a454af8a1a62f1e42a14a439db40.jpg

Key Observations:

  • Review score distribution:

    • 5 stars: 58%

    • 4 stars: 19%

    • 1 star: 12%

    • 3 stars: 8%

    • 2 stars: 3%

Review score#

pb.configure(
    df = df_reviews
    , time_column = 'review_creation_dt'
    , metric = 'review_score'
    , metric_label = 'Average Review score'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
)
print(f'Average Review score: {df_reviews.review_score.mean():.2f}')
Average Review score: 4.07

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    labels=dict(review_score='Review score')
    , title='Distribution of Review score'
    , xaxis_type='category'
)
Summary Statistics for "review_score" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 99.65k (100%) Max 5 Mean 4.07 5 57.26k (57%)
Missing --- 99% 5 Trimmed Mean (10%) 4.34 4 19.15k (19%)
Distinct 5 (<1%) 95% 5 Mode 5 1 11.75k (12%)
Non-Duplicate 0 (<1%) 75% 5 Range 4 3 8.26k (8%)
Duplicates 99.64k (99%) 50% 5 IQR 1 2 3.22k (3%)
Dup. Values 5 (<1%) 25% 4 Std 1.36
Zeros --- 5% 1 MAD 0
Negative --- 1% 1 Kurt 0.43
Memory Usage 1 Min 1 Skew -1.34
../../_images/abb8aaf3ccd35a466546cd64c0dab826a50cc71a856abac1363e70699421cc0d.jpg

Key Observations:

  • 58% of reviews had score 5.

Let’s see at statistics and distribution of the metric per day.

pb.metric_info(freq='D')
Summary Statistics for "mean_review_score_per_day" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 583 (98%) Max 5 Mean 4.05 5 17 (3%)
Missing 14 (2%) 99% 5 Trimmed Mean (10%) 4.09 4 15 (3%)
Distinct 511 (86%) 95% 4.60 Mode 5 4.25 5 (<1%)
Non-Duplicate 479 (80%) 75% 4.26 Range 4 4.60 3 (<1%)
Duplicates 85 (14%) 50% 4.14 IQR 0.35 3.50 3 (<1%)
Dup. Values 32 (5%) 25% 3.92 Std 0.43 3 3 (<1%)
Zeros --- 5% 3.26 MAD 0.23 3.75 3 (<1%)
Negative --- 1% 2.70 Kurt 7.18 4.33 3 (<1%)
Memory Usage <1 Mb Min 1 Skew -1.61 4.67 3 (<1%)
../../_images/d16ce34a388322153cd65ddcb7f077b59c5f88e9d1c5e618a8ed826ee07121dc.jpg

Key Observations:

  • Daily average ratings:

    • Bottom 5% <3.26

    • Middle 50% 3.9-4.3

    • Top 5% >4.6

Let’s look by different dimensions.

By Day Type

pb.cat_compare(cat2='review_day_type'
            , visible_graphs=[2]
)
pb.bar_groupby(y='review_day_type').show()
../../_images/3a37d01055c749cf30d8f2f4815327fd44735d6b2023c7418bb3a4c13c4c9fc1.jpg ../../_images/62b1123bf1f9160c8254719a0a05b3f01e1647660146e826b1deee3af43c3511.jpg

Key Observations:

  • Weekdays have slightly higher ratings

  • More 5-star reviews weekdays

  • More 1-star reviews weekends

By Day of Week

pb.cat_compare(cat2='review_creation_weekday'
            , visible_graphs=[2]
)
pb.bar_groupby(y='review_creation_weekday').show()
../../_images/d69f6e21f4883663146fa70f924a977eaa72a13252bd7a7268a1a3edc007d000.jpg ../../_images/17425c71234a76afffc8a7b747374ea9b840fbc82e60091160a0cf12731c1df1.jpg

Key Observations:

  • Sundays have lowest ratings

  • Highest 1-star share on Sundays

  • Lowest 5-star share on Sundays

Review Answer Time#

pb.configure(
    df = df_reviews
    , time_column = 'review_creation_dt'
    , metric = 'review_answer_time_days'
    , metric_label = 'Average Review Answer Time, days'
    , metric_label_for_distribution = 'Review Answer Time, days'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
)
print(f'Average Review Answer Time: {df_reviews.review_answer_time_days.mean():.2f} days')
Average Review Answer Time: 3.13 days

Let’s see at statistics and distribution of the metric.

pb.metric_info(
    upper_quantile=0.95
    , hist_mode='dual_hist_trim'    
)
Summary Statistics for "review_answer_time_days" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 99.65k (100%) Max 518.70 Mean 3.13 1.03 7 (<1%)
Missing --- 99% 21.95 Trimmed Mean (10%) 2.08 1.05 7 (<1%)
Distinct 82.39k (83%) 95% 6.96 Mode Multiple 0.96 6 (<1%)
Non-Duplicate 68.47k (69%) 75% 3.10 Range 518.61 0.97 6 (<1%)
Duplicates 17.26k (17%) 50% 1.67 IQR 2.10 1.05 6 (<1%)
Dup. Values 13.92k (14%) 25% 1.00 Std 9.75 0.96 6 (<1%)
Zeros --- 5% 0.62 MAD 1.29 0.97 6 (<1%)
Negative --- 1% 0.38 Kurt 792.26 0.98 6 (<1%)
Memory Usage 1 Min 0.09 Skew 23.69 3.47 6 (<1%)
../../_images/56f2d16d225d7cf9b083ce360aced7d6a0ebefd8cf845b07b9a860ac84b91026.jpg

Key Observations:

  • Review response time bimodal: ~1 day and ~3.5 days

  • 75% responded within 3.1 days

  • Top 5% took ≥7 days

Let’s see at statistics and distribution of the metric per day.

pb.metric_info(freq='D')
Summary Statistics for "mean_review_answer_time_days_per_day" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 583 (98%) Max 22.62 Mean 3.18 1.85 1 (<1%)
Missing 14 (2%) 99% 8.11 Trimmed Mean (10%) 2.96 2.52 1 (<1%)
Distinct 583 (98%) 95% 5.85 Mode Multiple 2.66 1 (<1%)
Non-Duplicate 583 (98%) 75% 3.48 Range 22.24 3.51 1 (<1%)
Duplicates 13 (2%) 50% 2.86 IQR 0.99 3.30 1 (<1%)
Dup. Values 0 (<1%) 25% 2.49 Std 1.68 2.05 1 (<1%)
Zeros --- 5% 1.67 MAD 0.66 1.91 1 (<1%)
Negative --- 1% 0.90 Kurt 50.02 3.64 1 (<1%)
Memory Usage <1 Mb Min 0.38 Skew 5.50 2.18 1 (<1%)
../../_images/bf826ca018730459ed732265e991ddebfa3fc6430e12e3b60c2ed22b331a5641.jpg

Key Observations:

  • 5% of review days had average response time ≥5.85 days

Let’s look by different dimensions.

By Day of Week

pb.histogram(color='review_creation_weekday').show()
pb.bar_groupby(y='review_creation_weekday').show()
../../_images/81115d9f3b60bc5ceeca1ebdcacc056283ac9fcc9a83748cf19c1f7ff44dc275.jpg ../../_images/ae6ef4361820bcf0166182499131cc0fa1f712bec17ca16ad352ae8f02605459.jpg

Key Observations:

  • Slowest responses to Friday reviews

  • Fastest responses to Monday reviews

Comment Message Lenght#

pb.configure(
    df=df_reviews
    , time_column='review_creation_dt'
    , metric='review_comment_message_len'
    , metric_label='Median Review Comment Message Lenght'
    , metric_label_for_distribution='Review Comment Message Lenght'
    , agg_func='median'
    , text_auto='.3s'
)
print(f'Median Review comment message lenght: {df_reviews.review_comment_message_len.median():.2f}')
Median Review comment message lenght: 54.00

Let’s see at statistics and distribution of the metric.

pb.metric_info()
Summary Statistics for "review_comment_message_len" (Type: Integer)
Summary Percentiles Detailed Stats Value Counts
Total 41.46k (42%) Max 347 Mean 71.16 9 1.36k (1%)
Missing 58.18k (58%) 99% 219 Trimmed Mean (10%) 63.62 13 573 (<1%)
Distinct 255 (<1%) 95% 194 Mode 9 17 557 (<1%)
Non-Duplicate 6 (<1%) 75% 100 Range 347 3 520 (<1%)
Duplicates 99.39k (99%) 50% 54 IQR 72 29 498 (<1%)
Dup. Values 249 (<1%) 25% 28 Std 56.60 31 495 (<1%)
Zeros 114 (<1%) 5% 9 MAD 47.44 28 488 (<1%)
Negative --- 1% 3 Kurt 0.16 11 485 (<1%)
Memory Usage 1 Min 0 Skew 1.02 33 478 (<1%)
../../_images/72d244811a89a659562a02abf4e6b5c956cac69a50158316e5c216a9ca501ccc.jpg

Key Observations:

  • 75% of reviews have messages ≤100 characters

Let’s look by different dimensions.

By Review Score

pb.bar_groupby(y='review_score', to_slide=True)
../../_images/1caf62f7e4708fe162008fa32f307edf566f13cc4750d0a883ce13b45133cfc6.jpg

Key Observations:

  • Lower ratings correlate with longer messages

  • Negative reviews tend to be more detailed

NPS#

For calculating NPS, we will divide customers into the following groups:

  • Promoters: customers who gave a rating of 5

  • Passive: customers who gave a rating of 4

  • Detractors: customers who gave a rating of 1-3

Let’s look at how NPS changed by month.

tmp_df_res = (
    df_reviews.pivot_table(index=pd.Grouper(key='review_creation_dt', freq='D'), columns='review_score', values='review_id', aggfunc='nunique')
)
tmp_df_res['total_responses'] = tmp_df_res.sum(axis=1)
tmp_df_res['promoters'] = tmp_df_res[5]
tmp_df_res['detractors'] = tmp_df_res[1] + tmp_df_res[2] + tmp_df_res[3]
tmp_df_res['nps'] = (tmp_df_res['promoters'] - tmp_df_res['detractors']) * 100 / tmp_df_res['total_responses']
tmp_df_res.reset_index(inplace=True)

Let’s see at statistics and distribution of the metric per day.

tmp_df_res['nps'].explore.info(
    labels=dict(nps='NPS per Day')
    , title='Distribution of NPS per Day'
)
Summary Statistics for "nps" (Type: Float)
Summary Percentiles Detailed Stats Value Counts
Total 476 (82%) Max 70.97 Mean 31.54 0 7 (1%)
Missing 107 (18%) 99% 57.75 Trimmed Mean (10%) 33.80 25 3 (<1%)
Distinct 449 (77%) 95% 51.30 Mode 0 30 3 (<1%)
Non-Duplicate 429 (74%) 75% 42.87 Range 141.97 36.43 2 (<1%)
Duplicates 133 (23%) 50% 36.16 IQR 18.59 34.57 2 (<1%)
Dup. Values 20 (3%) 25% 24.27 Std 17.46 40 2 (<1%)
Zeros 7 (1%) 5% -2.59 MAD 12.81 45.07 2 (<1%)
Negative 28 (5%) 1% -27.43 Kurt 3.84 37.08 2 (<1%)
Memory Usage <1 Mb Min -71 Skew -1.56 32.46 2 (<1%)
../../_images/620eea11c00e4da9b55a80555554fd9f60c0e4527699626f8b51c10c055961a8.jpg

Key Observations:

  • Only ~5% of days had good NPS (>50)

  • 5% had negative NPS

  • Indicates customer dissatisfaction spikes

Comment Title#

Let’s look at the word cloud from review titles.

df_reviews.viz.wordcloud('review_comment_title')
../../_images/b0e182867d212c344652601e38e370bb0e6d4029d86d8777c6ccf62f1cc87cbf.jpg

Key Observations:

  • Most review titles use positive language

Let’s look at the top words by frequency.

fig = df_reviews.analysis.word_frequency(
    'review_comment_title'
    , text_auto=True
    , title='Top 10 Most Frequent Words in Review Title'
)
pb.to_slide(fig)
fig.show()
../../_images/cd460e0ca11dff4cd5f0f0977fcb05209c65732db72a9b3e4bcf7cad26709155.jpg

Key Observations:

  • Most common title words: “recomend”, “excellent”

Let’s analyze the sentiment of the text.

df_reviews.analysis.sentiment('review_comment_title')
../../_images/e6241c7f08fb85bb30d3e67ef723de81e275af4bdde0bc9c94bdaf0088a05305.jpg

Key Observations:

  • ~10% of titles are negative

  • Sentiment IQR above 0 (neutral/positive bias)

Comment Message#

Let’s look at the word cloud and the top words by frequency from the review messages.

df_reviews.viz.wordcloud('review_comment_message')
fig = df_reviews.analysis.word_frequency(
    'review_comment_message'
    , text_auto=True
    , title='Top 10 Most Frequent Words in Review Message'
)
pb.to_slide(fig)
fig.show()
../../_images/6a64d038c8b12ef61af894126dc1f6c38f1979934e5917706262c9bfa4f979d7.jpg ../../_images/03c080f976b42eaa163b6ff3c287456cc7a7ad0258b5f8426bd5c3c719e0b565.jpg

Key Observations:

  • Many words relate to delivery

  • Most common review word: “product”

Let’s analyze the sentiment of the text.

df_reviews.analysis.sentiment('review_comment_message')
../../_images/7ab9e79183c45a65bfb5a9ea3a62d28f499b8daae34a87cd57cb7f362f0865da.jpg

Key Observations:

  • ~15% of messages are negative

  • Overall sentiment leans positive

Impact of Rating on Review Text#

Score 1

Let’s look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 1.

df_reviews[lambda x: x.review_score==1].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==1].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==1].analysis.sentiment('review_comment_message')
../../_images/1991394fcceb5231cb7d7591f80353135bfcdbb38f6d24d75c6dc4a23714f600.jpg ../../_images/0716c12f1ecfaca2f8e6ddece502b82cbaa548f776c404532c8aa44a7ebbb6ba.jpg ../../_images/2d3030d32b5f5af6097a0d6c4a5f116b35cdb16e326633bac69bb97a74154ca9.jpg

Key Observations:

  • 1-star reviews:

    • Contain negative words

    • Clearly negative sentiment (IQR <0)


Score 2

Let’s look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 2.

df_reviews[lambda x: x.review_score==2].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==2].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==2].analysis.sentiment('review_comment_message')
../../_images/cc16ec31845189f2f9ce1c51e0efb28911f3236beffd5fee4281be52d28ad981.jpg ../../_images/4e11b65e9788ff7c073b4639ca9e88b65a0af4eca1df68ec846697e094c23d38.jpg ../../_images/30a1ad7166e8aa5b53a40c585e6587853b4760d00b9d9b3bbdfcc991afd087ad.jpg

Key Observations:

  • 2-star reviews:

    • Contain negative words

    • Mostly negative sentiment


Score 3

Let’s look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 3.

df_reviews[lambda x: x.review_score==3].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==3].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==3].analysis.sentiment('review_comment_message')
../../_images/8915f062e7b6237c8329e42857184f692933c57b074f3a179fa834cadf9c7672.jpg ../../_images/f95b76133a5f3b0d8f5655872d40a11a896c18eabef38cb2e48a6049f95e1857.jpg ../../_images/54cbb7ad1f5cf2caf28536b5a9df1b9166beb3c7c32653e4d9dd13892c89f3ab.jpg

Key Observations:

  • 3-star reviews:

    • Fewer negative words

    • Leans positive overall


Score 4

Let’s look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 4.

df_reviews[lambda x: x.review_score==4].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==4].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==4].analysis.sentiment('review_comment_message')
../../_images/07001d84d243016ee87b484bef0bd1f419db3ff16b2dd393e27212275bc99fc1.jpg ../../_images/79089670e7cf56af9e0e824a86194f75a546a31bcd014fd260841ed89b77be71.jpg ../../_images/631b3e847ee3c116936385fe1f8bf0ed47e595dc0f0fb6bf7eb95bd20086fd6d.jpg

Key Observations:

  • 4-star reviews:

    • Many positive words

    • Clearly positive sentiment


Score 5

Let’s look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 5.

df_reviews[lambda x: x.review_score==5].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==5].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==5].analysis.sentiment('review_comment_message')
../../_images/0f1892c6b4ee3ba9513d51a0bb9d3f3bb02d54c9491903d37f6abba8a435818b.jpg ../../_images/173d2c038819422e77493e22b0fac857e5c61d8141a0f3c3dbcc55c5fd75a079.jpg ../../_images/cfcab86e2334c85994aa0520ce6f5486204977d6456b272b9bed8495638f21ba.jpg

Key Observations:

  • 5-star reviews:

    • Dominated by positive words

    • Strongly positive sentiment