By  Insight Editor / 1 Dec 2022 / Topics: IT optimization Automation Cloud IT modernization
Note: This post first appeared on November 30, 2022, on the Engineering Blog for SADA, An Insight Company on Medium.
Deploying custom containers on Vertex AI in Google Cloud Platform (GCP) provides flexibility to add all the runtime-related dependencies. Some data science teams also tend to build and distribute ML models as Python packages. Due to the ubiquitous nature of Python as the chosen programming language for all data science applications, some teams find this level of abstraction adequate. However, containers provide the robustness and functionality to seamlessly expand the scope of ML models from training to production-ready.
In this article, we’ll look at some best practices and the process of deploying machine learning models using custom containers for real-time inference.

Converting a machine learning model to generate inferences in real-time in a production environment requires the following:
There are several libraries available for developing a varied range of machine learning models, including Scikit-learn, Tensorflow, PyTorch and others. Google Cloud Vertex AI also provides standard runtimes based on the version of the underlying ML library.
There are several ways to wrap the ML model using a serving application. The compatibility matrix below lists various ways to do it:
Serving framework
ML library
| Serving framework | ML library |
|---|---|
| KServe | Scikit-learn, XGBoost, PyTorch |
| TFServe | Tensorflow |
| TorchServe | PyTorch |
If your runtimes can be sufficiently covered by using one of these libraries, it simplifies the process of containerizing the ML application. As mentioned earlier, building custom containers provides the ultimate flexibility to design the runtimes in any way we want.
Borrowing best practices from traditional web application development, the key to deploying real-time ML models is to set up a CI/CD process capable of releasing new features and products seamlessly. Creating an array of test suites is an integral part of setting up a CI/CD process. However, the testing requirements for ML models are quite different from traditional web apps. The array of test suites can be broken down into the following categories:
The Vertex AI offering from GCP includes functionality that is designed to help you start from scratch to train the models and deploy them for production readiness. In addition to deploying models for real-time inference, a few custom features and services are required to complement it:
Deploying machine learning algorithms to generate real-time inferences is a constantly evolving field. This particular task spans multidisciplinary efforts and coordination from data scientists to operations engineers and everyone in-between to pull it off. Vertex AI simplifies the operational overhead and deployment process in the data science stack. We have outlined some of the best practices we follow at Insight Engineering to build a robust and performant machine learning pipeline, and we hope these insights are useful for you on your own machine learning journey.
About Subash Padmanaban
Subash Padmanaban is a Senior Data Engineer at Insight. His background and experience is in building scalable data and machine learning pipelines.