MLOps Live

Join our webinar on Improving LLM Accuracy & Performance w/ Databricks - Tuesday 30th of April 2024 - 12 noon EST

MLOps Challenges, Solutions and Future Trends

Yaron Haviv | February 19, 2020

Summary of my MLOps NYC talk, major AI/ML & Data challenges and how they will be solved with emerging open source technologies

Challenges

According to different surveys, data-science teams don’t do data science, they spend most of their time on data wrangling, data preparation, managing software packages and frameworks, configuring infrastructure, and integrating various components. Those can be generalized as feature management tasks and MLOps tasks (i.e. DevOps for ML).

 

The MLOps Challenge

Data-science originated in research organizations and was later used to produce reports and detect anomalies within mountains of data. The emerging trend to incorporate data-science in every business application, in order to intelligently react to events and data as they occur, is creating fundamental changes in machine learning practices.

 

The Data Challenge

Data scientists start with sample data, they work on Jupyter notebooks or use AutoML tools to identify patterns and train models. At a certain point, they need to train the models on larger data sets. This is when things start to become difficult. They might find that most of the tools which work off CSV files and load data into memory just can’t work at scale, and that they need to re-architect everything to fit distributed platforms and structured databases.

Solutions and Future Trends

In my session I outline the industry’s vision to overcoming the challenges outlined above. The way to solve these challenges is through:

  1. Adoption of automation and higher-level abstractions where possible.
  2. Design for collaboration and re-use.

Serverless ML Functions

The way we eliminate ML pipelines complexity is by adopting the concept of serverless “ML Functions”. Serverless technologies allow you to write code and specification which automagically translates itself to auto-scaling production workloads. Until recently, these were limited to stateless and event driver workloads, but now with the new open-source technologies we demonstrated (MLRun+Nuclio+KubeFlow), Serverless functions can take on larger challenges of real-time, extreme scale data-analytics and machine learning.

  1. Minimize the amount of resources and skill level needed to complete the project
 

Built-in Feature Stores

The second challenge is the complexity in building, managing, and consuming offline and online features. Digital giants like Uber, Netflix and others have all built “Feature Stores” internally to overcome this. Most organizations can’t afford to or don’t have the skills internally to build a feature store from scratch and need it to be an integral part of the data platform they use.

 

New call-to-action