Top Trends for Data Science in 2020
Adi Hirschtein | December 18, 2019
With 2019 coming to an end and 2020 just around the corner, we reflect on a year that was full of new innovations related to machine learning, deep learning and real-time analytics. Our customers, partners, and the industry in general is doing some incredible things with data science; from self-healing systems to personalized recommendations for millions of customers in real-time. However, these innovations are just a glimpse into how AI will revolutionize our world in the coming year – and the question is whether the data science infrastructure will be able to keep up with these innovative AI applications. Because the truth is that even - or especially- today companies still struggle to turn their advanced AI models into real business applications. Most companies lack the data science strategy and infrastructure needed to support their ambitions. 2020 will be about simplifying the way from data science to production, with an emphasis on bringing real – and scalable - business value.
- MLOps and Serverless will be the Foundation of Data Science Applications
As more businesses are focused on bringing data science to production, it’s no surprise that MLOps and serverless have become hot buzzwords and will continue to grow in 2020. Today, the overwhelming number of tools, data complexities and siloed development and engineering environments, make machine learning in production a true center of gravity. MLOps - Machine Learning Operations - incorporates standardized methods to streamline machine learning to production and manage end-to-end pipelines. It brings CI/CD to machine learning, with open source tools like Kubeflow Pipelines for workflow management and Nuclio for serverless automation. Serverless, one of the most dominant features in MLOps is also transforming accordingly - Instead of handling only stateless, event-driven tasks, serverless is making every part of the pipeline automated, including data intensive tasks like data preparation and training. MLOps and serverless are important trends which will enable data scientists to spend more time on building models and less time on infrastructure overhead, ensuring that the models built will bring true business value.
- Companies will Apply Data Science on Real-time Use-Cases
As infrastructure and data complexities begin to unravel, companies will reach higher performance across large data volumes. Sophisticated algorithms built offline in 2019 for detection will be served online in real-time against fresh data, ensuring immediate prevention of abnormal events with machine learning models identifying suspicious patterns continuously. The ability to correlate fresh data with historical data will also help identify and prevent model drifting for more accurate models. Companies will transition from detection to prevention, reducing risks and saving costs.
- Online and Offline Feature Stores will Gain Traction
Corporate giants like Uber and Twitter have already built feature stores to avoid countless hours wasted for recreating and reusing features for common models, but what about the rest of us? Features help machine learning models process datasets for training and production but developing them is often an engineering heavy task. Just like all luxuries which begin in enterprises and slowly reach everyone else, feature stores will become more common in 2020 because of their huge value. They allow machine learning teams to share, organize, discover and leverage features for models, increasing reliability, accuracy and efficiency.
- Experiment Tracking will be Simplified
Running and tracking data science jobs may require lots of DevOps and labor-intensive tasks if you’re trying to do everything on your own. New projects like MLRun have recently emerged to provide the flexibility to run any job either locally on your laptop or on a distributed cluster at scale in a much simpler way. They provide generic mechanisms for data scientists and developers/engineers to describe and track code, metadata, inputs and outputs of machine learning related tasks (executions) and present all running jobs as well as historical jobs in a single report.
- The Cloud Experience will be Extended to the Edge
Companies will need to process data closer to its source and generate faster and more accurate insights, overcoming the cloud’s latency constraints. We will see more hybrid solutions leveraging elasticity in the cloud and extending the cloud experience to the edge for high performance. Edge devices will not only serve as gateways, they will be intelligent and enable AI-driven actions.
- Data Science Applications will be Democratized
MLOps, automation and managed platforms will simplify the deployment of data science applications and therefore data science will become more accessible to any business. Large data science and engineering teams, as well as months of development which only large enterprises can afford, will no longer be a requirement. Instead, a small data science team or even a single data scientist will work with a managed platform, create models in a Jupyter notebook and automate its path to production.