Deploying Machine Learning models w/ Vertex AI on GCP

Exploring Vertex AI Workbench for ML deployment (detailed tutorial on Vertex AI on GCP).
Feb 25, 2025 · Stane Aurelius

There has been some reports stating that upwards of 87% of data science projects never make it into production.

When I first read about it, I immediately started pondering about the data flywheel. Data collection, data verification and pre-processing, model creation, deployment and monitoring, has always been the holy grail of Machine Learning — the ideal flow every data scientists want to achieve. To be honest, the report was not that surprising, considering there are lots of learning resources on the internet teaching people how to develop a Machine Learning model, and it was not that clear what would come after that stage. Additionally, model deployment can be done in multiple distinct ways, depending on the available tools and their costs — which part to start from was not that clear.

Unified AI platform on Google Cloud Platform (GCP)

If you try to look for a way to deploy a model, there is a big chance you have came across a tutorial or documentation on deploying a model to Google Cloud Platform (GCP). GCP offers tons of incredible services for development. When it comes to model deployment, the highlighted services are BigQuery — a server-less data warehouse, and Vertex AI — a unified AI platform providing every ML tools you need. Those services can be integrated with each other, and Vertex AI can be used to perform many things — feature engineering, model training, hyperparameters tuning, model deployment, etc.

Despite all the features, one might still be confused on where to start. After all, there are multiple ways to train and deploy a model. The most popular ways are:

Performing everything in Vertex AI: from creating managed datasets from a bigquery table or view, building the pipelines, performing model training and tuning, up to deploying the model. All of this can be done via the GUI in Vertex AI, and to make a request for prediction to the deployed model, Cloud Functions allows the possibility to invoke the ML model through REST API. There is a slight limitation to this: upon creating a particular dataset in BigQuery, the region of that dataset will not be changeable. In addition, (up to this point of writing) Vertex AI does not support as much region as BigQuery, and the managed dataset created from BigQuery must belong to the same region as the hosted model.
Creating the model using BigQuery ML: where you can make an ML model in SQL. This is an incredible service, you basically do not need any experience on developing an ML model using Python or any other programming language. You just need to make a particular table or view, and use it as a trainable dataset for developing a model. The thing that blew my mind was that using SQL, we can even train a Tensorflow model in BigQuery ML. Integrated with Vertex AI, it is also possible to deploy the model trained from BigQuery. However, some of the data wrangling task might be too hard to be done in SQL. So this method is more suitable for dataset that does not require rigorous wrangling process.

Vertex AI Workbench is the simpler alternative

Amidst all the available services, there is a service I found out to be the simplest, yet the most powerful. It is a well-known fact that most data scientists practiced developing an ML model using a notebook. Fortunately, GCP offers Vertex AI Workbench, a Jupyter-based infrastructure where you can host and manage your notebook for development. The notebook can also be integrated with BigQuery and Vertex AI, enabling an easy access from within the notebook — use every service you need in Python.

Getting Started with Vertex AI Workbench

Before using Vertex AI Workbench, firstly prepare your project ID/number, you will need this later when establishing a connection with BigQuery and Cloud Storage. Your project ID and number can be viewed in the project info section from the home page's dashboard.

The next thing you need to do is create a Google Cloud Storage (GCS) bucket — a container for storing and organizing your data. From the navigation menu, navigate to storage and click on create bucket. Enter the name of your bucket and every other settings — the bucket region, storage class, etc (you can also use the default settings).

Click on the create button afterwards. This bucket will be used to:

store the output of your notebook,
storing the exported model artifact, before being imported to Vertex AI.

You will also need to enable 3 APIs:

Notebooks API to manage notebook resources in Google Cloud,
BigQuery API to use BigQuery services,
Vertex AI API to use Vertex AI services.

To do this, navigate to APIs & Services and click on enable APIs and services. In the API library, search for those 3 APIs and enable them.

Creating a Managed Notebooks

At this point, I assume you already have an available dataset from BigQuery and a ready-to-use notebook or Python codes for training the model. If you navigate to Vertex AI Workbench, you will see 4 tabs:

Managed Notebooks: contains a managed notebook.
User-Managed Notebooks: contains a user-managed notebook.
Executions: storing the status of every notebook's execution.
Schedule: every notebook's scheduled execution you set up will be shown here.

For creating a workflow of data science production, you want to create a managed notebook. This type of notebook is the one you can use to create a scheduled executions, which will prove useful for re-training a particular model and generating predictions on a regular basis. On the other hand, a user-managed notebook is a highly customizeable instance. It is great since user can control a lot of their environment. However, if you tried creating an instance of it, you will notice that there is no submit to executor button; nor there is an option to schedule an execution. I would say that it is more suitable for collaborating and testing the code before migrating them to the production phase.

Now navigate to the Managed Notebooks tab and click on New Notebook button. An important thing you need to pay attention to when creating a notebook is to always see the Advanced settings — this is where you can manage your hardware configuration. Do not allocate unnecessary resources to the notebook, or the billing will be very high. GCP will need a couple minutes to create your notebook.

Migrating Python code to the Notebook

After GCP has created your notebook instance, you will se an Open JupyterLab button. Click on that button to open a JupyterLab in a new tab. Notice that in the launcher, you can use any environment GCP has created — Pyspark, Pytorch, Tensorfow, etc. It even allows you to work in R! You can create your notebook from scratch, or you can import it from your local machine using the upload files button.

A slight changes you need to work on in your code is the function used to fetch the data. When working on a local machine, you probably import the data from a .csv file. Vertex AI Workbench is integrated with BigQuery, so you can fetch your data directly from BigQuery. To do so, create a new markdown cell in your notebook and try typing #@bigquery. You will see that the markdown cell changes its appearance. Open the query editor, write your query to get the data from BigQuery, and click on Submit Query. After the table has been shown, you can click on Copy code for DataFrame and GCP will automatically make the code for fetching the data. The code still needs to be modified, you need to explicitly state your project number as a parameter when initializing the connection to BigQuery.

In the code, it is shown by client = Client() and you need to add a parameter project = ''.

The reason why we need to explicitly state our project selection is because Vertex AI might not always connect to the correct Google Cloud project by default. Vertex AI does not run the code directly in our project, instead it runs the code in one of several projects managed by Google. Hence, if we do not explicitly state the project, we might encounter permission error when executing the notebook since it might connect to the wrong project. Now update the code in your notebook up to the point of model fitting.

Importing ML Model to Vertex AI

Up to this point of writing, Vertex AI only supports models that are trained using one of these 3 frameworks: TensorFlow, XGBoost, and scikit-learn. Unfortunately, once you finished fitting a model to the training data, you cannot directly deploy them on an endpoint. Instead, here is the workflow of deploying a model on GCP:

Create your notebook and fit your model
Export your model artifacts to Google Cloud Storage (GCS)
Import the model artifact from GCS to Vertex AI
Create an endpoint for hosting your model
Deploy the model on the endpoint

We are currently in step 2, so we need to export our model artifacts to our GCS bucket.

Model artifacts are the output obtained from training a model. It consists of trained parameters, model definitions, and other metadata — basically a saved/exported model. If you are using scikit-learn or XGBoost, I recommend creating a new directory in your GCS bucket (you can do this via GCS or notebook left-side panel) before exporting your model.

The location you want to save your model will be gs:///. This is called the GCS URI. You can copy it via the left-side panel in notebook. For example, Use this code for each framework:

TensorFlow

You need to export your model as TensorFlow SavedModel directory. There are several ways to do this:

If you have used Keras for training, use tf.keras.Model.save
If you use an Estimator for training, use tf.estimator.Estimator.export_saved_model

XGBoost & scikit-learn

You can use either joblib library or Python's pickle module to export the model. Take note that the name of the model artifacts must be either model.joblib or model.pkl.

Use this code if you are using joblib to export your model:

import os
import joblib
from google.cloud import storage
 
# Save model artifact to local filesystem (doesn't persist)
artifact_filename = 'model.joblib'
joblib.dump(my_trained_model, artifact_filename)
 
# Upload model artifact to Cloud Storage
# Change the model directory to your GCS bucket URI
model_directory = 'gs:///'
storage_path = os.path.join(model_directory, artifact_filename)
blob = storage.blob.Blob.from_string(storage_path, 
                                    client = storage.Client(project=''))
blob.upload_from_filename(local_path)

And use this code if you want to use pickle module to export your model:

import os
import pickle
from google.cloud import storage
 
# Save model artifact to local filesystem (doesn't persist)
artifact_filename = 'model.pkl'
with open(artifact_filename, 'wb') as model_file:
  pickle.dump(my_trained_model, model_file)
 
# Upload model artifact to Cloud Storage
# Change the model directory to your GCS bucket URI
model_directory = 'gs:///'
storage_path = os.path.join(model_directory, artifact_filename)
blob = storage.blob.Blob.from_string(storage_path, 
                                    client=storage.Client(project=''))
blob.upload_from_filename(local_path)

Verify that your model has been exported to your GCS bucket. Now you need to import that model artifacts into Vertex AI. Again, GCP offers multiple ways to do this task. You can do it via GUI, gcloud console, etc. But since we want to automate this task, we will be doing it programatically in Python codes. In a new cell in your notebook, use this code to import the model artifacts to Vertex AI:

from google.cloud import aiplatform
 
# Use this line so we do not need to explicitly specify the project number and region whenever we use AI Platform (Vertex AI) services
aiplatform.init(project='', location='')
 
# Importing model artifacts
model = aiplatform.Model.upload(display_name = '',
    description = '',
    artifact_uri = '',
    serving_container_image_uri = ''
)

The parameter serving_container_image_uri is used to specify which pre-built container we want to use for our model. You can see the list of available pre-built container in this link. For example, if I want to use scikit-learn 1.0 pre-built container for the Asia region, I will pass the parameter as serving_container_image_uri = 'asia-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-0:latest'.

Verify whether the model has been imported by navigating to the Models section in Vertex AI.

Deploying the Model on an Endpoint

After the model has been imported to Vertex AI, you can deploy it to an endpoint. A model must be deployed to an endpoint before they can serve online predictions. When you deploy a model, GCP associates physical resources to that model to be used when client request a prediction to the endpoint where the model is deployed.

You can deploy multiple models in an endpoint, and you can also deploy a model into multiple endpoints. Deploying two models to the same endpoint enables you to gradually replace one model with the other. On the other hand, you might also want to deploy your models with different resources for different application environments, such as testing and production; so you need to deploy a model to multiple endpoints.

I did mention that before deploying the model, we need to create an endpoint. However, that step is actually optional. Even if we do not have any endpoint, GCP will automatically create a new endpoint when we deploy our model — with the default name _endpoint.

To create an endpoint, use this code in a new notebook cell:

# optional code to create an endpoint
endpoint = aiplatform.Endpoint.create(display_name = '', 
                                      project = '', 
                                      location = '')

Verify that you have successfully created an endpoint by navigating to the Endpoints section in Vertex AI.

Now we just need to deploy our model to the endpoint. Notice that when we import the model artifact to Vertex AI, we are actually assigning aiplatform.Model.upload(...) into the model variable. The same goes when we create an endpoint, we assign it into the endpoint variable. So, to deploy the model, we simply need to use this code:

# if you do not specify the endpoint parameter, a new endpoint will be created
model.deploy(endpoint = '',
             machine_type = '')

The machine_type parameter is basically the physical devices that will be associated with your deployed model — it is very similar to hardware configuration.

If you do not know what machine type to use, you can see all the available machine types by navigating to the Models section in Vertex AI, click on the 3 dots for any model, and click on Deploy to endpoint. Go into model settings and you can see all the available machine types.

It takes a couple minutes for the model to be deployed. The deployment status of a model can be seen in the Models section in Vertex AI.

Scheduling a Notebook Execution

For the purpose of periodically train our model, we need to set up a scheduled notebook execution.

Scheduling a notebook execution in Vertex AI Workbench is quite straightforward. Simply click on the Execute button on top of the notebook to submit it to Executor. Define your execution name, hardware configurations, and environment. Change the execution type to Schedule-based recurring executions and choose your time preferences. In the advanced options, choose the GCS bucket where your notebook’s output will be stored.

Deploying Machine Learning models w/ Vertex AI on GCP

Unified AI platform on Google Cloud Platform (GCP)

Vertex AI Workbench is the simpler alternative

Getting Started with Vertex AI Workbench

Creating a Managed Notebooks

Migrating Python code to the Notebook

Importing ML Model to Vertex AI

TensorFlow

XGBoost & scikit-learn

Deploying the Model on an Endpoint

Scheduling a Notebook Execution

Read More

Serving PyTorch Models Using TorchServe

On this page

The latest in AI and Enterprise Analytics

Supertype | Industry-Leading AI Consultancy