Importing Libraries

🌊 Deep Sales Analysis of Olist Marketplace

Author:

Pavel Grigoryev

Project Description:

Olist is a Brazilian e-commerce platform that connects sellers and buyers, offering a wide range of products and convenient conditions for online sales. Olist also acts as an intermediary, allowing sellers to connect to multiple marketplaces simultaneously, thereby increasing their reach.

This project conducts an in-depth analysis of sales data on the Olist platform. The analysis focuses on identifying key trends, patterns in customer behavior, and operational insights. The findings will help formulate recommendations to improve business processes and enhance sales efficiency.

Goal:

The goal of the analysis is to identify key trends, patterns, and insights in sales data, customer behavior, and the effectiveness of marketing strategies on the Olist platform. The results will help develop recommendations to improve business strategies and enhance customer experience.

Project Resources:

Data Sources:

Sales data from the Olist marketplace.

Analysis Timeframe:

Data from September 2016 to October 2018 will be used for the analysis.

Main Conclusions:

  • Sales Trends:

    • Growth & Stabilization: Sales volume and revenue grew until 2018, then stabilized at 6–7K orders and 1–1.2M R$ per month.

    • Black Friday (11/24/2017): Record spikes in orders, revenue, and buyers.

    • Geography: Sao Paulo dominates (42% of sales), with steady growth in 2018, unlike other regions.

  • Customer Behavior:

    • Low Retention: 97% of buyers made only one purchase; repeat buyers are rare.

    • High-Value Buyers: Clients using installment plans (50%) spend 2x more (higher average order value/weight).

    • Loyalty: Promoters (58% of buyers) leave positive reviews but rarely return. Critics (13%) spend more but churn faster due to delivery delays.

  • Operational Insights:

    • Delayed orders correlate with lower ratings (avg. rating: 1–2 vs. 4–5 for on-time).

    • Heavy/expensive orders take longer to deliver and are more likely to be delayed.

    • Orders with installments process faster, have higher AOV, and show better retention.

  • Payment & Risk:

    • Credit cards dominate: 74% of transactions, with 35% higher AOV vs. other methods.

    • Installments boost value: Orders with installments have 2x higher AOV (premium/heavy items).

    • Voucher Payments: Orders paid with vouchers have 3x higher cancellation rates (16% vs. 5% for credit cards).

  • Product & Logistics:

    • Top Categories: Electronics (27% of sales) and furniture (18%) drive revenue.

    • Northern states take 2x longer delivery.

    • Heavy orders (+40% delivery time) and premium items face delays.

    • Delivery Bottlenecks: 70% of total delivery time is spent with carriers, notably slower in Rio de Janeiro and Salvador.

  • Critical Challenges:

    • Declining Ratings: Average review scores dropped from 4.5 (2017) to 3.9 (2018), linked to delivery delays.

    • Peak Season Failures: Black Friday 2017 caused a surge in delayed deliveries, with complaints tied to carrier handoff bottlenecks.

    • Abandoned Carts: Canceled orders spike in February/August 2018, often for high-value items paid via vouchers.

  • Customer Feedback & Ratings:

    • Majority of reviews are positive: 58% of reviews received a rating of 5. Only 12% of reviews received a rating of 1, while a mere 3% received a rating of 2.

    • Negative Reviews: 15% of review text mentions “slow delivery” or “missing items,” heavily impacting NPS.

  • Data Highlights:

    • Negative Feedback Drivers: Low ratings correlate with longer delivery times, higher order value, and heavier items.

    • Success Factors: Fast carrier handoff (≤3 days) and installment options boost ratings and repeat purchases.

Key Recommendations:

  • Boost Customer Retention & Repeat Purchases:

    • Launch a loyalty program targeting one-time buyers (97% of customers), offering discounts on second purchases or bonus points.

    • Personalized win-back campaigns for high-value clients (top 1% driving 15% of revenue) with exclusive offers.

    • Reduce time between purchases (currently 29+ days for 50% of repeat buyers) via time-bound promotions (e.g., “7-day discount”).

  • Improve Product & Pricing Strategy:

    • Expand “Beauty & Health” and “Home & Garden” (18% YoY growth categories) with curated bundles or subscriptions.

    • Reprice problem categories (e.g., “Watches & Gifts”) to offset delivery costs or offer guaranteed faster shipping.

  • Enhance High-Value Segments:

    • Premium installment plans for big spenders (avg. 3+ orders) with perks like free shipping or priority support.

    • Target voucher users (3x higher cancellation risk) with limited-time combo deals to convert abandoned carts.

  • Fix Delivery Pain Points:

    • Prioritize carrier performance in critical regions (e.g., Rio de Janeiro, Salvador), where delays are 30% longer than average.

    • Expedite high-value/heavy orders (>500 R$ or >10kg), which face 2x more 1-star ratings due to delays.

    • Optimize Black Friday logistics to prevent repeat of 2017’s 4x surge in delays (pre-stock inventory, add temporary carriers).

  • Regional Growth Tactics:

    • Hyper-local campaigns in SĂŁo Paulo (42% of sales): Leverage its 20% faster delivery and 30% higher retention to test scalable models.

    • Fix underperformers (e.g., MaranhĂŁo, Ceará) with subsidized shipping or partner pickup points.

  • Mitigate Negative Reviews:

  • Automate compensation for delayed orders (e.g., 10% off next purchase if delivery exceeds 15 days).

  • Sunday support surge: Add staff to cut response times, reducing low weekend ratings.

Importing Libraries#

from typing import Dict, Optional, Union
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from functools import partial
from IPython.display import HTML, display
import geopandas as gpd
import frameon as fron
from frameon import FrameOn as fo
import json
from plotly.subplots import make_subplots
from scipy.cluster.hierarchy import dendrogram, linkage
import plotly.figure_factory as ff
import statsmodels.api as sm
from urllib.request import urlopen
from sklearn.cluster import KMeans, AgglomerativeClustering
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import silhouette_score
pd.set_option('display.max_columns', None)
pd.options.display.float_format = '{:,.2f}'.format
pio.renderers.default = "jpg" 
pio.defaults.default_width = None 
pio.defaults.default_height = None