MLOps Live

Join our upcoming webinar: Transforming Enterprise Operations with Gen AI with McKinsey, May 28th 9am PST

Best 10 Free Datasets for Manufacturing [UPDATED]

Alexandra Quinn | March 1, 2024

The manufacturing industry can benefit from AI, data and machine learning to advance manufacturing quality and productivity, minimize waste and reduce costs. With ML, manufacturers can modernize their businesses through use cases like forecasting demand, optimizing scheduling, preventing malfunctioning and managing quality. These all significantly contribute to bottom line improvement. In times of global recession, supply chain cut-offs and difficulties meeting consumer demands for materials and products, manufacturing optimization becomes even more important for companies that wish to remain competitive and relevant without impairing their revenue streams.

How can manufacturers develop, grow and optimize their use of data and ML? Open and free datasets for machine learning are an important starting point for data scientists and engineers who are developing and training ML models for manufacturing. But these datasets for manufacturing can be hard to come by, since manufacturing often takes a legacy approach and data is not always available. Here are 10 excellent open manufacturing datasets and data sources for manufacturing data for machine learning.

1. Eurostat Industrial Production Index

The output and activity of the European industry sector, measured on a monthly basis. The dataset’s base year is 2015 and depicts monthly growth rates.

Get the dataset here.

2. US Manufacturing Trends

Manufacturing trends in the US related to wage rates, profits, employment, production, capacity utilization, productivity, exports and shipments. The dataset provides information for the present and year-to-date.

Get the dataset here.

3. Energy Consumption

A dataset providing information about energy consumption at manufacturing sites, homes, commercial buildings and transportation. The data in this dataset is updated monthly or annually.

Get the dataset here.

4. Personal Protective Equipment Computer Vision Dataset and Model

A dataset and model for identifying the use of protective equipment (like helmets, shoes, gloves, goggles, etc.) in warehouses and manufacturing plants through object detection. The business use of this data set is to minimize workplace injuries that derive from lack of safety equipment, by automating safety inspections. The dataset provides 19,000 training set images, 3,600 validation set images and 1,900 testing set images.

Get the dataset here.

5. Degradation Measurement of Robot Arm Position Accuracy

A dataset with information to support robot health management and the development of robot health solutions. The dataset contains the examined robot’s high-level tool center position (TCP) health data and controller-level components' information: joint positions, velocities, currents, temperatures and currents.

Get the dataset here.

6. On-Site Construction Equipment Computer Vision Project

An object detection manufacturing data science project for identifying on-site work equipment: excavators, dump trucks and wheel loaders. The business use case the data supports is inventory management, preventing accidents and tracking construction progress. The dataset contains 6,700 testing images, 267 validation images, and 144 testing images, ready for training.

Get the dataset here.

Best 10 Free Datasets for Manufacturing [UPDATED]

Let's Discuss Your Gen AI for Manufacturing Use Case

Streamline the way your enterprise builds, operationalizes and scales generative AI applications.

7. Global Value Chain and Manufacturing Analysis on Geothermal Power Plant Turbines

An analysis of the global supply chain and the cost of manufacturing components of Organic Rankine Cycle (ORC) Turboexpander and steam turbines used in geothermal power plants. The business use case is to help identify manufacturing costs and requirements for equipment, materials, labor and facilities.

Get the dataset here.

8. Materials Discovery: Inorganic Crystals

Crystal structure data to help solve research and applications challenges when researching materials. Common use cases include materials design, property prediction and compound identification. This dataset includes 210,000 entries of non-organic compounds of crystal structure data: inorganics, ceramics, minerals, pure elements, metals, intermetallic systems and more. The dataset is user-friendly and enables easily searching through the data and analyzing results.

Get the dataset here.

9. Radio Frequency Measurements

A dataset based on a PN Code Sounding methodology for understanding how radio waves at 2.4 GHz and 5 GHz propagate in industrial environments. The measurements in the dataset include complex impulse responses and spectrum analysis traces. 

Get the dataset here.

10. NIST Investment Tool

Investment analysis data documented by NIST. The data includes net present value, internal rate of return and payback period. In addition, it provides sensitivity analysis with Monte Carlo techniques. The business use case is to identify investments with the highest ROI.

Get the dataset here.

[Updated 27 March 2023] More Free Manufacturing Datasets

46 datasets of rich and detailed governmental information related to multiple aspects of US manufacturing. Datasets include:

Get the datasets here.

UK Manufacturers' Sales

Two datasets for annual sales estimates for UK manufacturers, released in July 2023. The first dataset includes information per product the second includes indicators on standard errors, response rates, revisions and any product code changes.

Get the datasets here and here.

Measurements of Mechanical Material Properties

This dataset contains material properties  under compressive and bending load of additive manufactured polymers. The specimens were made of  polylactic acid (PLA), polycarbonate (PC), polyamide (PA) and polyethylene terephthalate glycol-modified (PETG).

Get the dataset here.

Liquid Battery Electrolyte Formulations

A dataset obtained from a sequence of a high-throughput electrolyte formulation to high-throughput conductivity measurement.

Get the dataset here.

Pill Quality Control

A visual dataset containing three classes of images: pills that are free of defects (149 images), pills with dirt contamination (138 images), and pills with a chip defect (43 images).

Get the dataset here.

Wafer Map Production Defects

A dataset for defect pattern recognition of wafer maps and production defects. The dataset includes more than 38,000 wafer maps.

Get the dataset here.

Automobile Recalls

A detailed dataset by the National Highway Traffic Safety Administration, containing information of automobile recalls per manufacturer.

Get the dataset here.

The Future of Manufacturing Data for Machine Learning

Manufacturers who identify the opportunity in digital transformation will be able to leverage data with ML to help optimize manufacturing, increase productivity, reduce waste and improve quality. ML can help plants, factories, suppliers and government organizations (and more) transform their strategy and bottom line and differentiate themselves for customers while increasing productivity. They will be able to ride the storm of global recession while maintaining and improving their market share. Innovation at these times is key for remaining relevant and cash-flow positive.

Iguazio customer Seagate leveraged MLOps to transition their chip manufacturing from a manual inspection process via microscope to fully automated deep learning & computer vision inspection. To learn more about ML and manufacturing, click here.