Self-supervised learning is an evolving technique of helping ML models to learn from more data, without the bottleneck of human-labelled datasets. In this technique, the model predicts any hidden part of the input from an unhidden part of the input. This is similar to how you’re able to predict the missing letters in Wheel of Fortune: your brain fills in the hidden letters based on the clues in the letters you’re shown.
By contrast, supervised learning is a technique where the ML model learns from labelled data sets, typically for tasks like classification. One benefit of supervised learning is that its performance can be measured during training, to assess how well the model has learned.
So why would you choose one over the other?
Machine learning models need to learn from massive amounts of good-quality labelled data. Finding or creating labelled data is a huge bottleneck, because it takes so long and costs so much—for instance, to teach a model to recognize a picture of a dog, a human has to label images as “dog” or “not a dog”. Practically speaking, it’s impossible for humans to label everything in the world (especially unstructured data), and there are some tasks for which there is simply not enough data, meaning any potential AI system would be limited by a small training set. Self-supervised learning addresses these limitations, allowing ML teams to scale the research and development of ML models at a low cost.