Data Description#
The dataset is an extensive collection of e-commerce data from Brazil. It was gathered by Olist, a company that provides an online sales platform. The dataset covers the period from 2016 to 2018 and includes information about purchases made on the Olist platform.
Olist operates as a marketplace, enabling small and medium-sized businesses to sell their products through various channels, including major platforms like Amazon and Mercado Livre. Integration with other marketplaces allows sellers to manage orders and inventory centrally, significantly expanding their reach and simplifying the sales process.
Products listed on Olist can automatically be offered for sale on other platforms, increasing visibility and potential sales. However, the dataset only includes data on sales made directly through the Olist platform. It is important to note that since Olist also sells its products through various marketplaces, buyers may prefer to make purchases on those platforms rather than on the Olist website.
Key notes about the data:
The data is a random sample of all purchases that received customer reviews.
Each product can be shipped by different sellers.
The dataset follows this schema:

Orders:
Field |
Description |
---|---|
order_id |
Order ID. |
customer_id |
Customer ID. |
order_status |
Order status. |
order_purchase_timestamp |
Date and time of purchase. |
order_approved_at |
Date and time when the payment was approved. The payment was successful, and the order was approved for processing. |
order_delivered_carrier_date |
Date and time when the order was handed over to the carrier. |
order_delivered_customer_date |
Date when the order was delivered to the customer. |
order_estimated_delivery_date |
Estimated delivery date (set before actual delivery begins). |
For analyzing purchases and user behavior, it is advisable to use the order_purchase_timestamp time.
This is the time when the customer completes the purchase process, allowing precise tracking of when the purchase decision was made.
Using this time is particularly important for correctly identifying purchase days (weekdays or weekends) and analyzing temporal dynamics. Choosing another time (e.g., payment approval time) may distort the data, as it does not reflect the moment when the user took action.
The difference between order_purchase_timestamp and order_approved_at:
order_purchase_timestamp
Indicates the moment when the customer completes the purchase process. This means the order was placed and confirmed but does not necessarily mean it was created in the system. This is the moment when the customer clicks the “Buy” button and initiates the process. At this point, the order status is set to “created”.
order_approved_at
Indicates the moment when the order was approved after successful payment verification. This means the funds were confirmed, and the order is ready for further processing. At this point, the order status changes to “approved”.
Order status can be one of the following:
created
The customer visits the Olist platform and selects a product they want to buy. After adding the product to the cart and completing the checkout process, the order status is set to “created”. This means the order was successfully created but has not yet been processed.
approved
After the order is created, the system checks the payment information. If the payment is successful, the order status changes to “approved”. This means the order is approved for further processing.
invoiced
At this stage, an invoice may be issued. The status changes to “invoiced”, meaning the order cost information has been recorded, and the customer has been provided with an invoice. This status may not always appear, as not all orders require it.
processing
After the order is approved, the seller begins processing it. The status changes to “processing”, indicating the order is being prepared for shipment.
shipped
Once the order is packed, it is handed over to the courier service for delivery. The status changes to “shipped”, indicating the order has left the seller’s warehouse and is on its way to the buyer.
delivered
When the courier delivers the order to the customer, the status changes to “delivered”. This means the customer has received their product, and the order fulfillment process is complete.
unavailable
If the product becomes unavailable after the order is created (e.g., sold out), the status may change to “unavailable”. This can happen during processing.
canceled
If the customer decides to cancel the order at any stage, the status may change to “canceled”.
Customers:
Field |
Description |
---|---|
customer_id |
ID assigned to each order in the dataset (each order has a unique customer_id). |
customer_unique_id |
Customer ID used to identify a specific customer in the system. |
customer_zip_code_prefix |
First five digits of the customer’s postal code. |
customer_city |
Customer’s city. |
customer_state |
State where the customer is located. |
Geolocation:
Field |
Description |
---|---|
geolocation_zip_code_prefix |
First five digits of the postal code. |
geolocation_lat |
Latitude. |
geolocation_lng |
Longitude. |
geolocation_city |
City. |
geolocation_state |
State. |
Order_items:
Field |
Description |
---|---|
order_id |
Order ID. |
order_item_id |
Order item ID. |
product_id |
Product ID. |
seller_id |
Seller ID. |
shipping_limit_date |
Date by which the seller must hand over the product to the logistics company. |
price |
Product price. |
freight_value |
Shipping cost (if the order includes multiple products, the shipping cost is divided among them). |
Order_payments:
Field |
Description |
---|---|
order_id |
Order ID. |
payment_sequential |
Sequential number of the payment in the order. |
payment_type |
Payment type: credit and debit cards, voucher (coupon or certificate), boleto (electronic check). |
payment_installments |
Number of installments (if the payment is split into multiple parts). |
payment_value |
Payment amount. |
Boleto is a document representing a payment invoice. It contains information about the amount to be paid and the recipient’s details.
The customer selects this payment method, after which the store issues a boleto voucher to the customer. The customer must pay it by the specified deadline.
The customer can pay it via online banking, ATMs, or bank tellers. After payment, the bank processes the transaction and credits the amount to the issuer’s account. The payment confirmation process typically takes 1 to 3 business days.
Order_reviews:
Field |
Description |
---|---|
review_id |
Review ID. |
order_id |
Order ID. |
review_score |
Review score (1 to 5). |
review_comment_title |
Review title. |
review_comment_message |
Review text. |
review_creation_date |
Date the review was created. |
review_answer_time_daysstamp |
Date the review was answered. |
Products:
Field |
Description |
---|---|
product_id |
Product ID. |
product_category_name |
Product category. |
product_name_lenght |
Length of the product name. |
product_description_lenght |
Length of the product description. |
product_photos_qty |
Number of product photos. |
product_weight_g |
Product weight in grams. |
product_length_cm |
Product length in centimeters. |
product_height_cm |
Product height in centimeters. |
product_width_cm |
Product width in centimeters. |
Sellers:
Field |
Description |
---|---|
seller_id |
Seller ID. |
seller_zip_code_prefix |
First five digits of the seller’s postal code. |
seller_city |
Seller’s city. |
seller_state |
Seller’s state. |
Product_category_name:
Field |
Description |
---|---|
product_category_name |
Product category in Portuguese. |
product_category_name_english |
Product category in English. |