Imagine a system where one can easily develop a machine learning model, click on some magic button and run the code in production without any heavy lifting from data engineers…Why? Because the market is currently struggling with the entire data-science-to-production pipeline. We’ve seen lots of cases where data scientists work on their laptops with just a subset of data, using their own tools which are typically very different from production tools. The result is a long delay from the moment their model is ready to the point it actually runs.
Shifting the ParadigmWe at Iguazio believe that the current paradigm is broken – there must be an easier way, based on the following principles:
- Data scientists work on datasets that are similar to the ones in production, with minimal deviation. This means that their behavior in training and in production is the same.
- Data is not moved around or duplicated just for the sake of building or training a model.
- The transition from training to inferencing is smooth: once a prediction pipeline (model and ETL) is created, it doesn’t require any further development effort in order to work in production.
- Updating models is automatic without requiring human interference.
- Models are validated automatically as an ongoing process and new models are automatically transferred to production.
- The environment supports languages and frameworks that are popular with data scientist while at the same time enables popular analytics framework for data exploration.
- Data scientist are able to collaborate and share notebooks in a secured environment, making sure users view data securely, based on their individual permission.
- GPU resources are easily shared and used by data science teams without DevOps overhead.