Overlapping areas of responsibility (as in cases where multiple roles are expected to do things like data preparation or model building) is a common problem on ML teams. By defining a “contract” between the different components of MLOps, as well as creating automations, you eliminate a lot of those challenges. Another way to make teams more efficient is for everyone to work on the same platform, using the same APIs and metadata databases.
- If a data scientist has built some data preparation code in a Jupyter notebook, it can be automatically turned into a pipeline that uses Spark. Now all you need is data from a data warehouse and it will use the same business logic, just on a scalable machine. You eliminated a lot of that friction between roles using automation.
- When a data scientist builds a feature, he/she doesn’t need to care about validation policies and biases. Instead, his contract is to define those features, and then move on to the training. The data engineer can take that definition and continue working on it—validating or cleaning the data in a different manner.
- One member of the team can build a serving pipeline just to test it out to see that it works, and then another member in charge of the production side can update it to work with GPUs, or spot instances.