In [None]:
%run ../../_pre_run.ipynb

# Review Analysis

## Number of Reviews

In [None]:
pb.configure(
    df = df_reviews
    , time_column = 'review_creation_dt'
    , metric = 'review_id'
    , metric_label = 'Share of Reviews'
    , metric_label_for_distribution = 'Number of Reviews'
    , agg_func = 'nunique'
    , norm_by='all'
    , axis_sort_order='descending'
    , text_auto='.1%'
    , update_fig={'xaxis': {'tickformat': '.0%'}}    
)

In [None]:
print(f'Total number of reviews: {df_reviews.review_id.nunique():,}')

Let’s see at statistics and distribution of the metric.

In [None]:
pb.metric_info(freq='D')

**Key Observations:**  

- Typical day: 1 review created  
- 75% of days had ≤270 reviews  
- Top 5% ≥375 reviews  

Let’s look by different dimensions.

**By Day of Week**

In [None]:
pb.bar_groupby(y='review_creation_weekday', to_slide=True)

**Key Observations:**  

- Fewest reviews on Mondays  
- Sundays slightly more than Mondays but still low  
- Possible review registration pattern  

**By Day Type**

In [None]:
pb.bar_groupby(y='review_day_type')

**Key Observations:**  

- 76% of reviews created on weekdays  
- Matches fewer weekend days  

**By Review Score**

In [None]:
pb.bar_groupby(y='review_score', to_slide=True)

**Key Observations:**  

- Review score distribution:  
  - 5 stars: 58%  
  - 4 stars: 19%  
  - 1 star: 12%  
  - 3 stars: 8%  
  - 2 stars: 3% 

## Review score

In [None]:
pb.configure(
    df = df_reviews
    , time_column = 'review_creation_dt'
    , metric = 'review_score'
    , metric_label = 'Average Review score'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
)

In [None]:
print(f'Average Review score: {df_reviews.review_score.mean():.2f}')

Let’s see at statistics and distribution of the metric.

In [None]:
pb.metric_info(
    labels=dict(review_score='Review score')
    , title='Distribution of Review score'
    , xaxis_type='category'
)

**Key Observations:**  

- 58% of reviews had score 5.

Let’s see at statistics and distribution of the metric per day.

In [None]:
pb.metric_info(freq='D')

**Key Observations:**  

- Daily average ratings:  
  - Bottom 5% <3.26  
  - Middle 50% 3.9-4.3  
  - Top 5% >4.6  

Let’s look by different dimensions.

**By Day Type**

In [None]:
pb.cat_compare(cat2='review_day_type'
            , visible_graphs=[2]
)
pb.bar_groupby(y='review_day_type').show()

**Key Observations:**  

- Weekdays have slightly higher ratings  
- More 5-star reviews weekdays  
- More 1-star reviews weekends  

**By Day of Week**

In [None]:
pb.cat_compare(cat2='review_creation_weekday'
            , visible_graphs=[2]
)
pb.bar_groupby(y='review_creation_weekday').show()

**Key Observations:**  

- Sundays have lowest ratings  
- Highest 1-star share on Sundays  
- Lowest 5-star share on Sundays  

## Review Answer Time

In [None]:
pb.configure(
    df = df_reviews
    , time_column = 'review_creation_dt'
    , metric = 'review_answer_time_days'
    , metric_label = 'Average Review Answer Time, days'
    , metric_label_for_distribution = 'Review Answer Time, days'
    , agg_func = 'mean'
    , axis_sort_order='descending'
    , text_auto='.3s'
)

In [None]:
print(f'Average Review Answer Time: {df_reviews.review_answer_time_days.mean():.2f} days')

Let’s see at statistics and distribution of the metric.

In [None]:
pb.metric_info(
    upper_quantile=0.95
    , hist_mode='dual_hist_trim'    
)

**Key Observations:**  

- Review response time bimodal: ~1 day and ~3.5 days  
- 75% responded within 3.1 days  
- Top 5% took ≥7 days  

Let’s see at statistics and distribution of the metric per day.

In [None]:
pb.metric_info(freq='D')

**Key Observations:**  

- 5% of review days had average response time ≥5.85 days  


Let’s look by different dimensions.

**By Day of Week**

In [None]:
pb.histogram(color='review_creation_weekday').show()
pb.bar_groupby(y='review_creation_weekday').show()

**Key Observations:**  

- Slowest responses to Friday reviews  
- Fastest responses to Monday reviews  

## Comment Message Lenght

In [None]:
pb.configure(
    df=df_reviews
    , time_column='review_creation_dt'
    , metric='review_comment_message_len'
    , metric_label='Median Review Comment Message Lenght'
    , metric_label_for_distribution='Review Comment Message Lenght'
    , agg_func='median'
    , text_auto='.3s'
)

In [None]:
print(f'Median Review comment message lenght: {df_reviews.review_comment_message_len.median():.2f}')

Let’s see at statistics and distribution of the metric.

In [None]:
pb.metric_info()

**Key Observations:**  

- 75% of reviews have messages ≤100 characters  

Let’s look by different dimensions.

**By Review Score**

In [None]:
pb.bar_groupby(y='review_score', to_slide=True)

**Key Observations:**  

- Lower ratings correlate with longer messages  
- Negative reviews tend to be more detailed  

## NPS

For calculating NPS, we will divide customers into the following groups:

- Promoters: customers who gave a rating of 5
- Passive: customers who gave a rating of 4
- Detractors: customers who gave a rating of 1-3

Let's look at how NPS changed by month.

In [None]:
tmp_df_res = (
    df_reviews.pivot_table(index=pd.Grouper(key='review_creation_dt', freq='D'), columns='review_score', values='review_id', aggfunc='nunique')
)
tmp_df_res['total_responses'] = tmp_df_res.sum(axis=1)
tmp_df_res['promoters'] = tmp_df_res[5]
tmp_df_res['detractors'] = tmp_df_res[1] + tmp_df_res[2] + tmp_df_res[3]
tmp_df_res['nps'] = (tmp_df_res['promoters'] - tmp_df_res['detractors']) * 100 / tmp_df_res['total_responses']
tmp_df_res.reset_index(inplace=True)

Let’s see at statistics and distribution of the metric per day.

In [None]:
tmp_df_res['nps'].explore.info(
    labels=dict(nps='NPS per Day')
    , title='Distribution of NPS per Day'
)

**Key Observations:**  

- Only ~5% of days had good NPS (>50)  
- 5% had negative NPS  
- Indicates customer dissatisfaction spikes  

## Comment Title

Let's look at the word cloud from review titles.

In [None]:
df_reviews.viz.wordcloud('review_comment_title')

**Key Observations:**  

- Most review titles use positive language  

Let's look at the top words by frequency.

In [None]:
fig = df_reviews.analysis.word_frequency(
    'review_comment_title'
    , text_auto=True
    , title='Top 10 Most Frequent Words in Review Title'
)
pb.to_slide(fig)
fig.show()

**Key Observations:**  

- Most common title words: "recomend", "excellent"  

Let’s analyze the sentiment of the text.

In [None]:
df_reviews.analysis.sentiment('review_comment_title')

**Key Observations:**  

- ~10% of titles are negative  
- Sentiment IQR above 0 (neutral/positive bias)  

## Comment Message

Let's look at the word cloud and the top words by frequency from the review messages.

In [None]:
df_reviews.viz.wordcloud('review_comment_message')
fig = df_reviews.analysis.word_frequency(
    'review_comment_message'
    , text_auto=True
    , title='Top 10 Most Frequent Words in Review Message'
)
pb.to_slide(fig)
fig.show()

**Key Observations:**  

- Many words relate to delivery  
- Most common review word: "product"  

Let’s analyze the sentiment of the text.

In [None]:
df_reviews.analysis.sentiment('review_comment_message')

**Key Observations:**  

- ~15% of messages are negative  
- Overall sentiment leans positive  

## Impact of Rating on Review Text

**Score 1**

Let's look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 1.

In [None]:
df_reviews[lambda x: x.review_score==1].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==1].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==1].analysis.sentiment('review_comment_message')

**Key Observations:**  

- 1-star reviews:  
  - Contain negative words  
  - Clearly negative sentiment (IQR <0)  

---

**Score 2**

Let's look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 2.

In [None]:
df_reviews[lambda x: x.review_score==2].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==2].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==2].analysis.sentiment('review_comment_message')

**Key Observations:**  

- 2-star reviews:  
  - Contain negative words  
  - Mostly negative sentiment  

---

**Score 3**

Let's look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 3.

In [None]:
df_reviews[lambda x: x.review_score==3].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==3].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==3].analysis.sentiment('review_comment_message')

**Key Observations:**  

- 3-star reviews:  
  - Fewer negative words  
  - Leans positive overall  

---

**Score 4**

Let's look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 4.

In [None]:
df_reviews[lambda x: x.review_score==4].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==4].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==4].analysis.sentiment('review_comment_message')

**Key Observations:**  

- 4-star reviews:  
  - Many positive words  
  - Clearly positive sentiment  

---

**Score 5**

Let's look at the word cloud, top 20 words by frequency, and the emotional tone of the text for a rating of 5.

In [None]:
df_reviews[lambda x: x.review_score==5].viz.wordcloud('review_comment_message')
df_reviews[lambda x: x.review_score==5].analysis.word_frequency('review_comment_message').show()
df_reviews[lambda x: x.review_score==5].analysis.sentiment('review_comment_message')

**Key Observations:**  

- 5-star reviews:  
  - Dominated by positive words  
  - Strongly positive sentiment  

In [None]:
%run ../../_post_run.ipynb