What is Serverless Machine Learning ?

Unbundling Development and Operations for ML with Serverless ML

February 7, 2023

Serverless machine learning solves the problem of how to build and operate supervised machine learning systems in Python without having to first learn how to install, configure, and operate complex computational and data storage infrastructure.

Improved infrastructural support for the serverless orchestration of feature pipelines, training pipelines, and inference pipelines enables Python developers to build and operate ML systems without first having to become experts in either Kubernetes or cloud infrastructure.

Introduction

Serverless machine learning (ML) is a new category of loosely coupled serverless services that provide the operational services (compute and storage) for AI-enabled products and services. Serverless compute services orchestrate and run the feature pipelines, training pipelines, and batch inference pipelines. The outputs of these ML pipelines include reusable features, training data, models, and prediction logs, and a serverless feature/model store manages state in serverless ML. Finally, serverless model serving provides support for online models that are accessible via network endpoints.

“The main reason many Data Scientists have been stuck in the model development and training stage is the lack of infrastructural support for easily connecting models to data with feature pipelines, and easily deploying models and prediction services with inference pipelines. Serverless machine learning enables Data Scientists to deploy prediction services powered by machine learning with only Python.”

Building ML systems flow — ‍**Figure 1.** Serverless Machine Learning enables Data Scientists, fluent in Python, to build complete ML Systems, from feature engineering to model training to inference and user interfaces.

Machine learning systems implement a data flow of processing and storage steps, starting from input data to features to trained models, finishing with a prediction service (that uses the model and inference data) and a monitoring service with prediction logs for observability and debugging.

In a serverless ML system, the machine learning pipeline stages are refactored into independent feature engineering, training, and inference pipelines. These pipelines communicate by storing their outputs and reading their inputs in a feature store or model registry. Even prediction logs (features and predicted labels) can be stored back in the feature store to enable the monitoring of models for correctness, performance, and observability.

**Figure 2**. Essentially all monolithic ML pipelines can be refactored into independent feature pipelines, training pipelines, and inference pipelines. These pipelines are decoupled sharing state through a feature store and model registry. The 3 pipelines can be scaled independently, run on a schedule or on-demand - training is often just run on-demand, while feature/inference pipelines are operational services that are either run on a schedule or run 24x7.

Who is Serverless ML not for?

Serverless ML is not for everybody, yet. Many serverless services do not have a managed cloud offering, where the service can be deployed and managed inside the customer’s cloud account. For this reason, some enterprises will eschew using SaaS services that manage sensitive data about customers.

MLOps without the Infra

You don't need to learn Kubernetes or Cloud Infrastructure to put ML in production. However, you do need to know both the principles of MLOps and how to apply them to operate a ML system in production.

They key principles of MLOps are versioning and testing of ML assets. The two most important assets are (1) models, and (2) data (features).

It is widely known that you should version your ML models, so that you can perform A/B tests of those models, helping you figure out if the new model you trained should replace the old one or not. In operational ML systems, models typically require historical data (e.g., about a user) or context data (what is trending). So, your versioned models also need versioned data (features).

In fact, there is a hierarchy of dependencies between data, models, and the ML-Applications that use the models. The data we use to train models and make predictions with models is called features. If you don't test your features, it is hard to trust the models that are built on those features. So, you should test your features with data validation and unit tests for feature logic.

Similarly, ML-enabled applications build on models that should be tested against bias and poor performance. So you should have model validation tests that need to pass before a model can be deployed in production in a A/B test.

**Figure 3**. Building ML systems means scaling this pyramid of testing. You should test your features before they are used by models. Models should be tested before used by ML-powered applications.

Example Serverless ML System

We run a free online course on Serverless ML. In the course we bulid both analytical ML systems and interactive ML systems.

The analytical ML systems are typically a Dashboard that has new predictions once day/hour. Some example Dashboards built from the course include predicting surf height at a beach in Ireland, predicting BitCoin sentiment based on recent tweets, air quality predictions for Poland, and predicting electricity demand/prices for the coming 24 hours. These system runs a feature pipeline once/day to synthetically generate a new Iris Flower and write it to the feature store. Then a batch inference pipeline, that also runs once/day but just after the feature pipeline, reads the single flower added that day, and downloads the Iris model trained to predict the type of Iris flower based on the 4 input features: sepal length, sepal width, petal length, and petal width. The model’s prediction is written to an online Dashboard, and actual flower (outcome) is read from the feature store and also published to the same Dashboard - so you can see if the model predicted correctly.

The interactive ML systems are typically a Gradio or Streamlit UI (on Hugging Face Spaces or Streamlit Cloud) and work with a model either hosted or downloaded from Hopsworks. They typically take user input and join it with historical features from Hopsworks Feature Store, and produce predictions in the UI. Some examples of systems built in the course include predicting the house price for a given address in Stockholm, song recommendation for a given playlist on Spotify, vocals removal from a song/sound file, and predicting whether a post to Reddit will be liked or not.

All of the above systems also include a monitoring dashboard to evaluate model performance, helping inform on errors/bugs and when to retrain models.

‍

Example of Serverless Machine Learning — **Figure 4**. The serverless ML services used in the earlier examples included: (1) Modal for the scheduled feature pipelines, (2) Colaboratory or Modal for training pipelines (run on-demand), (3) Hugging Face (Gradio/Streamlit) or Modal or for inference pipelines, and (4) Hopsworks as the serverless Feature Store, Model Registry, and Model Serving layer.

Serverless ML Ecosystem

There are many SaaS platforms for machine learning that can be considered to as building blocks for Serverless ML Systems. In the figure below, we loosely categorize them into the raw data layer (data warehouses and object stores), the compute layer for orchestrated pipelines (feature and inference pipelines), the ML Development services for model training and experiment management, and the state layer for features and models as well as model serving and model monitoring.

**Figure 5**. Many companies are providing serverless services that can be plugged in together at the language API level. Here is a selection of some of the serverless ML products available today.

Summary

With Serverless Machine Learning, Data Scientists can move beyond Jupyter notebooks and just training models to building fully fledged prediction services that use ML models. With Severless ML, all that is needed is Python skills to build interactive, self-monitoring ML systems.