News

Announced at GTC Paris - #MLRun now integrates with NVIDIA#NeMo microservices

Build Observable Data Flywheels for Production with Iguazio’s MLRun and NVIDIA NeMo Microservices

Guy Lecker  and  Yonatan Shelach | June 11, 2025

We are proud to announce a new integration between MLRun, the open-source AI orchestration framework, and NVIDIA NeMo microservices,  by extending NVIDIA Data Flywheel Blueprint. This integration streamlines training, evaluation, fine-tuning and monitoring of AI models at scale, ensuring high-performance, low latency and lowering costs while significantly reducing the manual effort required through intelligent automation.

MLRun is built and maintained by Iguazio, now a part of QuantumBlack, McKinsey’s AI arm. As part of NVIDIA’s ecosystem and with a longstanding history of collaboration, we’re excited to help developers worldwide build agentic systems that bring real business value. Read the blog for more details, or go straight to the blueprint to try it out for yourself.

What is MLRun?

MLRun is an open-source AI orchestration framework for managing GenAI and ML applications across their lifecycle. It automates data preparation, model tuning, customization, validation and optimization of LLMs, ML models and live AI applications over elastic resources.

MLRun enables the rapid deployment of scalable real-time serving and application pipelines, while providing built-in observability and flexible deployment options, supporting multi-cloud, hybrid, and on-prem environments.

What are NVIDIA NeMo Microservices?

NVIDIA NeMo is an end-to-end platform that accelerates data flywheel to build and continuously optimize agentic AI systems for peak performance with the latest business intelligence, user inputs, and AI and human feedback.

NeMo enables enterprises to optimize AI agents with the latest information and feedback. This modular microservices platform helps implement RAG, customize and evaluate models, incorporate guardrails to keep agents delivering peak performance. NeMo seamlessly integrates and powers partner platforms and builds data flywheels to continuously optimize AI agentic systems.

What is an AI Data Flywheel?

Data flywheels are processes that enrich and optimize AI agent applications with inference, business and user preference data. AI data flywheels work by creating a loop where AI models continuously improve by integrating institutional knowledge and user feedback. For example, LLM prompt or response logs, and expert labeling. This cyclical process of data collection and model refinement is done to enhance model accuracy, improve operational efficiency, and reduce costs. 

According to NVIDIA, a high level flow of a Data Flywheel flow looks like this:

How MLRun Integrates with NeMo for Enterprise Data Flywheels

Iguazio has collaborated with NVIDIA to power enterprise data flywheels with MLRun, for building and optimizing agentic AI performance. MLRun acts as the flywheel orchestrator, wrapping the flywheel and powering training, fine-tuning to a specific use case, evaluation and monitoring.

A Data FlyWheel Blueprint for improving small model performance. Experiments are run with NeMo microservices while MLRun orchestrates updates and redeployments.

How the Integration Works:

Monitoring

Once a user interacts with the deployed AI app, MLRun captures user-agent interaction logs (completions) through ElasticSearch. This information is used to monitor interactions in MLRun. MLRun evaluates performance, integrity, operational stability, resource use, and more. This helps organizations detect and mitigate risks associated with GenAI and AI.

Evaluating, Training and Fine-tuning

Logs are also used by MLRun to orchestrate evaluation and customization, supported by NVIDIA NeMo Customizer and NVIDIA NeMo Evaluator. NeMo Customizer is used for training and fine-tuning, leveraging techniques like LoRA, p-tuning and supervised fine-tuning. NeMo Evaluator uses techniques like zero-shot learning, RAG and LLM-as-a-Judge for LLM evaluations.

This data flywheel Blueprint, for example, is designed to improve performance of smaller models based on larger models' performance. The flywheel orchestrates automated experiments with NeMo microservices, using the agent’s production logs against candidate models.  The goal is to surface smaller and more efficient models that maintain the same accuracy targets. These actions, updates and redeployments are orchestrated by MLRun.

In addition, if human-in-the-loop decisions are needed, MLRun will orchestrate that feedback to the deployed application.  This blueprint, powered by Iguazio MLRun and NVIDIA NeMo, can be used for additional agentic system self-improvement use cases.

What Developers Gain From the MLRun and NeMo Data Flywheel Integration

By using the MLRun and NeMo microservices integration in the Data Flywheel Blueprint, organizations can benefit from:

  • 60% Code Reduction - MLRun reduces the need to write code by up to 60%, making the process more straightforward and reducing engineering resources and time needed.
  • Automation - MLRun automates the monitoring, training, evaluation and fine-tuning process, ensuring it is performed accurately and effortlessly. NeMo reduces the need for manual activities, like labeling, with features like LLM-as-a-Judge.
  • Continuous Improvement - Application output and models constantly improve with a continuous learning loop. There is no need to redeploy every time. In this case, it is used for discovering and creating smaller, faster, more cost-efficient models without sacrificing accuracy.
  • Streamlined LLM Tuning and Evaluation - NeMo Customizer accelerates and simplifies high-performance LLM fine-tuning and alignment for domain-specific use cases. NeMo Evaluator automates the evaluation of these models across custom and industry benchmarks, enabling continuous improvement, rapid deployment, and efficient scaling of AI agents in enterprise environments
  • Scalability - This process can be easily replicated for multiple models, workflows and services, addressing real enterprise needs.
  • Cost Reduction – Reduced inference costs and production latency, alongside code reduction and automation, free up engineering and compute resources to help reduce costs.
  • Future-Proof - Continuous fine-tuning and optimization ensure the GenAI app is always up-to-date with the latest models and capabilities.

Explore the joint Iguazio MLRun and NVIDIA blueprint to try for yourself.