MLOps Live

Join our webinar on Improving LLM Accuracy & Performance w/ Databricks - Tuesday 30th of April 2024 - 12 noon EST

17 Best Free Retail Datasets for Machine Learning

Alexandra Quinn | November 15, 2023

The retail industry has been shaped and fundamentally transformed by disruptive technologies in the past decade. From AI assisted customer service experiences to advanced robotics in operations, retailers are pursuing new technologies to address margin strains and rising customer expectations. AI use cases like personalized product recommendations, demand forecasting for optimized inventory and supply chain management, optimized pricing strategies based on market dynamics, and sales forecasting are generating value for companies who have adopted AI. By leveraging AI, retailers can maintain or increase their competitiveness in a saturated market. 

How can retailers use, grow and optimize their use of data and machine learning? For data scientists tasked with building and training machine learning models for retailers, open and free retail datasets are an important starting point. But these datasets for retailers can be hard to come by, since they include personal customer information and business competitive information, which is why not many retailers share this data. This blog post is here to help. Here are 13 excellent open datasets and data sources for retailer data for machine learning.

Customer Behavior and Items

E-commerce data from a real website that includes customer behavior data, item properties and a category tree. The behavior data includes events like clicks, add to cart and transactions and was collected over a period of four months and a half.

Get the dataset here.

E-Commerce Sales Data

A comprehensive dataset with sales data across channels and financial information. Data includes SKUs, design numbers, stock levels, product categories, product sizes, product colors, the amount paid, rate per piece, date of sale, gross amounts and much more.

According to the contributors, this data can be used in a number of ways: analyzing sales trends, comparing and analyzing profitability, comparing prices, looking at customer specific data, using stock details, and much more.

Get the retail sales dataset here.

Electronic Product Pricing

10 fields of pricing information for 7,000 electronic products.

Get the dataset here.

Apparel Product SKUs

Real SKUs for 500 apparel items.

Get the retail inventory dataset here.

Amazon Items

A dataset containing information about 22,000 items on Amazon. The information can be used for rating, reviews and pricing analyses.

Get the dataset here.

New call-to-action

Men’s Shoes Pricing

10,000 items of men's shoes. Information includes the shoe name, the brand type and the price.

Get the dataset here.

Women’s Shoes Pricing

Same as above, but for women - 10,000 items of women's shoes. Information includes the shoe name, the brand type and the price.

Get the dataset here.

Online Reviews

There are more than 71,000 online reviews in this dataset, spanning 1,000 different products. Information includes the review text and title, reviewer metadata, the product name and manufacturer, and more.

Get the dataset here.

Online Reviews for Electronic Products

A dataset with information from 7,000 online reviews for 50 electronic products. The data was taken from online websites like Best Buy and Amazon. Information includes the date of review, its source, the rating, reviewer metadata, title, and more.

Get the dataset here.

Online Review for Women’s Clothing

A dataset containing real and anonymized  reviews of women’s clothing from an e-commerce website. The dataset has more than 23000 rows and includes 10 feature variables: clothing ID, age, title, review text, rating, recommended IND, positive feedback count, division name, department name and class name.

Get the dataset here.

Grocery Market Basket Analysis

A dataset containing nearly 39,000 rows of grocery purchase orders. The contributors recommend using algorithms like Apriori Algorithm to analyze the Market Basket Analysis. An example is provided in the dataset’s landing page.

Get the retail dataset for analytics here.

Historical Sales Data

This dataset contains anonymized historical sales data from 45 stores. The information provided includes the type of store, its size, department, regional activity, dates, temperature, fuel cost in the region, CPI, unemployment rate, whether the week was a special holiday, and more. While this data is not fresh, it is from 2010-2012, we added it to the list because of the holiday sales data that can be used and could still be relevant.

Get the dataset here.

Shopping Locations in Leeds

A dataset containing information about potential shopping spots in Leeds.

Get this online retail dataset here.

Warehouse and Retail Sales

A list of sales and movement data per item and department for each month. The dataset has 308,000 rows and contains information about the year, month, supplier name, item code, item description, item type and number of items sold.

Get the dataset here.

UK Sales Datasets

11 datasets containing detailed information about UK sales. Datasets include:

Get the datasets here.

Average Residential Retail Kerosene Prices

The average residential retail kerosene pricing in New York State and by region starting from September 2000. Pricing was obtained through surveys.

Get the dataset here.

Quarterly Retail Sales Tax Data by County and City

A dataset containing tax data per county and city.

Get the dataset here.

The Future of Retailer Data for Machine Learning

Retailers that harness the power of ML can enhance customer experiences, streamline operations, increase their sales, and gain a competitive advantage. Whether you’re a retailer giant, a small mom-and-pop shop or anywhere in the middle, ML can help you stand out competitively while overcoming the challenges of economic uncertainty.

This credit card company, for example, uses MLOps to deploy a real-time location-based recommendation engine. To learn more about ML and retailers, click here.