MLOps Live

Join our webinar on Improving LLM Accuracy & Performance w/ Databricks - Tuesday 30th of April 2024 - 12 noon EST

How do I serve models for real-time enterprise applications?

You are basically asking for model serving or a way to manage and deliver your models in a secure and governed way to production.

There are a few things you need to think about:

  1. How will my models be managed?
  2. How will my models be delivered (served) for inferencing?
  3. Do I need real-time or batch level delivery?

In its simplest form, you store or deploy the trained model to a remote repository known as a model server. Then at runtime, you retrieve the model, pass features (inputs) into it and predict.
There's a lot of value in this simple model. Firstly, your models are stored in a central repository which provides governance, share-ability, versioning and reusability. It should be as easy as a few function calls.
Secondly, retrieving the model should also be as easy as a single function call. However, you must ensure the appropriate protocols are supported and are secure.


A great way to accomplish this is using  MLRun Serving Pipelines.  Using MLRun Serving uses the Nuclio real-time serverless framework for the pipelines.

Need help?

Contact our team of experts or ask a question in the community.

Have a question?

Submit your questions on machine learning and data science to get answers from out team of data scientists, ML engineers and IT leaders.