MLOps Live

Join our webinar on Improving LLM Accuracy & Performance w/ Databricks - Tuesday 30th of April 2024 - 12 noon EST

What is self-supervised learning in machine learning and how is it different from supervised learning?

Self-supervised learning is an evolving technique of helping ML models to learn from more data, without the bottleneck of human-labelled datasets. In this technique, the model predicts any hidden part of the input from an unhidden part of the input. This is similar to how you’re able to predict the missing letters in Wheel of Fortune: your brain fills in the hidden letters based on the clues in the letters you’re shown.

By contrast, supervised learning is a technique where the ML model learns from labelled data sets, typically for tasks like classification. One benefit of supervised learning is that its performance can be measured during training, to assess how well the model has learned.

So why would you choose one over the other?

Machine learning models need to learn from massive amounts of good-quality labelled data. Finding or creating labelled data is a huge bottleneck, because it takes so long and costs so much—for instance, to teach a model to recognize a picture of a dog, a human has to label images as “dog” or “not a dog”. Practically speaking, it’s impossible for humans to label everything in the world (especially unstructured data), and there are some tasks for which there is simply not enough data, meaning any potential AI system would be limited by a small training set. Self-supervised learning addresses these limitations, allowing ML teams to scale the research and development of ML models at a low cost.

Need help?

Contact our team of experts or ask a question in the community.

Have a question?

Submit your questions on machine learning and data science to get answers from out team of data scientists, ML engineers and IT leaders.