Data Description

Data Description#

The dataset is an extensive collection of e-commerce data from Brazil. It was gathered by Olist, a company that provides an online sales platform. The dataset covers the period from 2016 to 2018 and includes information about purchases made on the Olist platform.

Olist operates as a marketplace, enabling small and medium-sized businesses to sell their products through various channels, including major platforms like Amazon and Mercado Livre. Integration with other marketplaces allows sellers to manage orders and inventory centrally, significantly expanding their reach and simplifying the sales process.

Products listed on Olist can automatically be offered for sale on other platforms, increasing visibility and potential sales. However, the dataset only includes data on sales made directly through the Olist platform. It is important to note that since Olist also sells its products through various marketplaces, buyers may prefer to make purchases on those platforms rather than on the Olist website.

Key notes about the data:

  • The data is a random sample of all purchases that received customer reviews.

  • Each product can be shipped by different sellers.

The dataset follows this schema:

Orders:

Field

Description

order_id

Order ID.

customer_id

Customer ID.

order_status

Order status.

order_purchase_timestamp

Date and time of purchase.

order_approved_at

Date and time when the payment was approved. The payment was successful, and the order was approved for processing.

order_delivered_carrier_date

Date and time when the order was handed over to the carrier.

order_delivered_customer_date

Date when the order was delivered to the customer.

order_estimated_delivery_date

Estimated delivery date (set before actual delivery begins).

  • For analyzing purchases and user behavior, it is advisable to use the order_purchase_timestamp time.

  • This is the time when the customer completes the purchase process, allowing precise tracking of when the purchase decision was made.

  • Using this time is particularly important for correctly identifying purchase days (weekdays or weekends) and analyzing temporal dynamics. Choosing another time (e.g., payment approval time) may distort the data, as it does not reflect the moment when the user took action.

The difference between order_purchase_timestamp and order_approved_at:

  • order_purchase_timestamp

    • Indicates the moment when the customer completes the purchase process. This means the order was placed and confirmed but does not necessarily mean it was created in the system. This is the moment when the customer clicks the “Buy” button and initiates the process. At this point, the order status is set to “created”.

  • order_approved_at

    • Indicates the moment when the order was approved after successful payment verification. This means the funds were confirmed, and the order is ready for further processing. At this point, the order status changes to “approved”.

Order status can be one of the following:

  • created

    • The customer visits the Olist platform and selects a product they want to buy. After adding the product to the cart and completing the checkout process, the order status is set to “created”. This means the order was successfully created but has not yet been processed.

  • approved

    • After the order is created, the system checks the payment information. If the payment is successful, the order status changes to “approved”. This means the order is approved for further processing.

  • invoiced

    • At this stage, an invoice may be issued. The status changes to “invoiced”, meaning the order cost information has been recorded, and the customer has been provided with an invoice. This status may not always appear, as not all orders require it.

  • processing

    • After the order is approved, the seller begins processing it. The status changes to “processing”, indicating the order is being prepared for shipment.

  • shipped

    • Once the order is packed, it is handed over to the courier service for delivery. The status changes to “shipped”, indicating the order has left the seller’s warehouse and is on its way to the buyer.

  • delivered

    • When the courier delivers the order to the customer, the status changes to “delivered”. This means the customer has received their product, and the order fulfillment process is complete.

  • unavailable

    • If the product becomes unavailable after the order is created (e.g., sold out), the status may change to “unavailable”. This can happen during processing.

  • canceled

    • If the customer decides to cancel the order at any stage, the status may change to “canceled”.

Customers:

Field

Description

customer_id

ID assigned to each order in the dataset (each order has a unique customer_id).

customer_unique_id

Customer ID used to identify a specific customer in the system.

customer_zip_code_prefix

First five digits of the customer’s postal code.

customer_city

Customer’s city.

customer_state

State where the customer is located.

Geolocation:

Field

Description

geolocation_zip_code_prefix

First five digits of the postal code.

geolocation_lat

Latitude.

geolocation_lng

Longitude.

geolocation_city

City.

geolocation_state

State.

Order_items:

Field

Description

order_id

Order ID.

order_item_id

Order item ID.

product_id

Product ID.

seller_id

Seller ID.

shipping_limit_date

Date by which the seller must hand over the product to the logistics company.

price

Product price.

freight_value

Shipping cost (if the order includes multiple products, the shipping cost is divided among them).

Order_payments:

Field

Description

order_id

Order ID.

payment_sequential

Sequential number of the payment in the order.

payment_type

Payment type: credit and debit cards, voucher (coupon or certificate), boleto (electronic check).

payment_installments

Number of installments (if the payment is split into multiple parts).

payment_value

Payment amount.

Boleto is a document representing a payment invoice. It contains information about the amount to be paid and the recipient’s details.

The customer selects this payment method, after which the store issues a boleto voucher to the customer. The customer must pay it by the specified deadline.

The customer can pay it via online banking, ATMs, or bank tellers. After payment, the bank processes the transaction and credits the amount to the issuer’s account. The payment confirmation process typically takes 1 to 3 business days.

Order_reviews:

Field

Description

review_id

Review ID.

order_id

Order ID.

review_score

Review score (1 to 5).

review_comment_title

Review title.

review_comment_message

Review text.

review_creation_date

Date the review was created.

review_answer_time_daysstamp

Date the review was answered.

Products:

Field

Description

product_id

Product ID.

product_category_name

Product category.

product_name_lenght

Length of the product name.

product_description_lenght

Length of the product description.

product_photos_qty

Number of product photos.

product_weight_g

Product weight in grams.

product_length_cm

Product length in centimeters.

product_height_cm

Product height in centimeters.

product_width_cm

Product width in centimeters.

Sellers:

Field

Description

seller_id

Seller ID.

seller_zip_code_prefix

First five digits of the seller’s postal code.

seller_city

Seller’s city.

seller_state

Seller’s state.

Product_category_name:

Field

Description

product_category_name

Product category in Portuguese.

product_category_name_english

Product category in English.