Folk-Wisdom’s Fallacy: Busting MLOps Myths

With its ever-increasing traction in ML-backed services, MLOps has become a necessity. And this is with a good reason, as AI projects face possibly never being deployed and consequently never generating actual business values. Hence, various practices are being implemented at the development and operations level to scale ML (Machine Learning) projects.

The pace at which technologies and concepts are emerging is seldom so worrisome that we tend to often misunderstand the crux along with the applications of MLOps in general. So let’s debunk some of the common myths around MLOps!

Myth 1: Development readiness and production readiness are the same.
Reality: MLOps provides a bridge between model development and model deployment.

Most of the developments around ML are done within an environment that’s visible and accessible to the developers. As important as the performance of the models is, it is equally important to develop the project to maximize portability. Unfortunately, when the projects are taken to production, a massive amount of work often still remains. Subsequently, the models end up in a queue and deployment is compromised.

Deploying ML projects into production is a critical task, since data scientists start by experimenting with data in their native environments. Up till the production stage, many artifacts are generated that also need to be put into production systems. Additionally, these artifacts should be kept up to date for real-time predictions and scoring; hence, a to-and-fro connection is formed. Thus, MLOps is more about facilitating and streamlining both lab and production environments.

Myth 2: MLOps is all about models and metrics, not pipeline.
Reality: Modelling is just a subset of a list of work that has to be done.

There are multiple crucial parts beyond model development – setting up environment, collecting the data, configuration, extracting features, data auditing and verification, resource management, infrastructure serving, and governance and monitoring.

All of these processes come together to form a production environment and get the models operationalized. Similarly, during monitoring, model accuracy comes second to pipeline and service health, including data and model drift detection.

Models and metrics play a vital role in research and development and are nice to have in monitoring. But looking from a deployment lens, various factors supersede (such as service response times, SLAs, throughput, etc.) that are essential for stable functioning.

Myth 3: Stack Overflow can fix all errors.
Reality: Fixing production model errors require prior planning and fallbacks.

We all agree that almost all errors or bugs have already been discussed on Stack Overflow. But have you ever wondered why no one asks about production-level bugs or errors? That is because production bugs are very subjective; you won’t be getting plain solutions anywhere. Such problems can arise due to various reasons – model drift, data mismatch, infrastructure fallacies, etc. This is why laying down a plan during deployment and development to make robust models and pipeline robust. This can include:

Keeping a backup or a baseline model ready to swap.
Having default values that can override / fill in while using rules.
Including audit trails, event logs, etc. as much as possible to find issues within pipelines.
Any maintenance / upgrades should not interrupt downstream services.

Incorporating the above points, one can plan break-fix lifecycle management for ML projects so that once services are up and running, you can continue to serve results and make the required changes simultaneously.

Myth 4: You don’t need MLOps if you have AI governance.
Reality: MLOps is distinct and can help support governance objectives.

While it is fairly correct to say that MLOps and AI governance are related, there are certain differences. The primary focus of AI governance is to regulate compliances and manage risks associated with machine learning. MLOps is mostly concerned about the uptime of services within the production systems, ensuring that models are delivering the level of performance and desired high-quality results.

The following intersections help distinguish what their relative objectives of AI governance vs. MLOps:

Access Control – Limiting access only trained operators vs. minimizing downtime to regulate compliance.

Audit Trails – Using trails to demonstrate compliance vs. using them for troubleshooting and process improvement.

Failover Plans – Using plans to remedy actual breakages vs. using plans to keep the system operational.

References

VentureBeat: 7 MLOps myths debunked
DataTechVibe: MLOps myths that are hampering your productivity
Towards Data Science: What is MLOps and why should we care?

Authored by: Avesh Kumar Verma, Analyst at Absolutdata