Unsupervised machine learning algorithms can discover underlying features of a data set for further downstream processing and prediction tasks.
On this page, you will learn:
In data science, unsupervised machine learning enables machines to reveal patterns that humans might easily miss due to an abundance of data or bias in our thinking process. It explores raw data with an unknown structure, discovering patterns and structures that data scientists would otherwise have no idea about.
Another advantage of unsupervised machine learning is that it does not require labeled data, which is expensive to manufacture since it requires human experts to identify, categorize, and annotate the data. As such, most of the data is unlabeled. Unsupervised machine learning algorithms, however, can create value from unlabeled data by recognizing previously unknown patterns and discovering features helpful for developing AI products.
Unsupervised machine learning is also known as self-supervised machine learning, emphasizing that those algorithms use part of the input data as supervisory signals. Turing Award winners Yann LeCun and Yoshua Bengio refer to self-supervised learning as the “key to human-level intelligence.” LeCun believes that as self-supervised learning begins to see more use, the prevalence of supervised machine learning will decrease.
Supervised machine learning algorithms learn from training data sets to perform tasks such as classification and regression. Among the many benefits of supervised machine learning is the ability to measure performance (i.e., accuracy) during training to determine how well the model has learned from the data.
In classification problems, a model will categorize the data into predefined groups. One example of a classification model is an email spam filter.
In regression problems, a model will use the data it’s been given to predict continuous numerical values. A sales projection estimator based on related historical data is an example of a regression model.
Unsupervised machine learning algorithms discover the underlying patterns and structures of a data set. Yet unlike supervised machine learning, you do not need to prepare annotated data sets for training. This enables you to tap into an abundance of unlabeled data.
For example, an unsupervised machine learning model can identify the purchasing patterns of online shopping users. Another example is the detection of suspicious activity in credit card transactions or statements.
Typical algorithms include clustering, dimensionality reduction, and anomaly detection. Additional types of unsupervised machine learning algorithms are discussed in the example section of this glossary.
Advantages of unsupervised machine learning include:
Disadvantages of unsupervised machine learning include:
Typical unsupervised machine learning algorithms include:
Clustering automatically categorizes data into groups according to similarity criteria:
Dimensionality reduction condenses the number of dimensions in a data set, extracting critical information in the process:
Anomaly detection analyzes outliers in data to discover rare events or unusual data points, such as fraudulent transactions, hardware problems, software errors, or changes in buyer behavior.
Unsupervised Neural Networks
Similar to other AI models, unsupervised machine learning algorithms are able to tap into big data to perform complex feature engineering for downstream processing. However, AI models often require a massive amount of data in order to be valid, which calls for tools capable of handling these data sets and artifacts efficiently.
Unsupervised machine learning models benefit from automated data pipelines and efficient deployment. Offering accelerated deployment, end-to-end automation of ML pipelines, and out-of-the-box model monitoring, the Iguazio MLOps Platform enables you to industrialize your unsupervised models.