What is Serverless Machine Learning ?

Unbundling Development and Operations for ML with Serverless ML

Serverless machine learning solves the problem of how to build and operate supervised machine learning systems in Python without having to first learn how to install, configure, and operate complex computational and data storage infrastructure.

Improved infrastructural support for the serverless orchestration of feature pipelines, training pipelines, and inference pipelines enables Python developers to build and operate ML systems without first having to become experts in either Kubernetes or cloud infrastructure. Serverless ML may trigger the unbundling of Data Science platforms as it massively reduces the integration costs for best-of-breed platforms, as integration is at the Python language level.
Serverless Machine Learning Tools
Figure 0. Many companies are providing serverless services that can be plugged in together at the language API level. Here is a selection of some of the serverless ML products available today. 


Serverless machine learning (ML) is a new category of loosely coupled serverless services that provide the operational services (compute and storage) for AI-enabled products and services. Serverless compute services orchestrate and run the feature pipelines, training pipelines, and batch inference pipelines. The outputs of these ML pipelines include reusable features, training data, models, and prediction logs, and a serverless feature/model store manages state in serverless ML. Finally, serverless model serving provides support for online models that are accessible via network endpoints. 

“The main reason many Data Scientists have been stuck in the model development and training stage is the lack of infrastructural support for easily connecting models to data with feature pipelines, and easily deploying models and prediction services with inference pipelines. Serverless machine learning enables Data Scientists to deploy prediction services powered by machine learning with only Python.” 

Building ML systems flow
Figure 1. Serverless Machine Learning enables Data Scientists, fluent in Python, to build complete ML Systems, from feature engineering to model training to inference and user interfaces.

Machine learning systems implement a data flow of processing and storage steps, starting from input data to features to trained models, finishing with a prediction service (that uses the model and inference data) and a monitoring service with prediction logs for observability and debugging.

classic flow of data for ML
Figure 2. Machine learning systems transform input data to intelligent decisions made by prediction services, with many processing and storage stages on that journey.

In a serverless ML system, the machine learning pipeline stages are refactored into independent feature engineering, training, and inference pipelines. These pipelines communicate by storing their outputs and reading their inputs in a feature store or model registry. Even prediction logs (features and predicted labels) can be stored back in the feature store to enable the monitoring of models for  correctness, performance, and observability. 

On-demand Serverless Machine Learning system
Figure 3. Essentially all monolithic ML pipelines can be refactored into independent feature pipelines, training pipelines, and inference pipelines. These pipelines are decoupled sharing state through a feature store and model registry. The 3 pipelines can be scaled independently, run on a schedule or on-demand - training is often just run on-demand, while feature/inference pipelines are operational services that are either run on a schedule or run 24x7. 

Unbundling of End-to-End ML Platforms?

There are two competing trends in ML  platforms for development and operation: (1) all-in-one platforms for both development and operation of ML systems, such as AWS Sagemaker, Databricks, and Vertex, and (2) the integration of best-of-breed platforms for experiment management, compute, state (features/models), and serving.

Until now, all-in-one platforms have been growing in adoption for small and medium-sized teams that need a low cost, integrated solution. This has left best-of-breed systems to sell to larger teams with special needs (performance, integration, legacy, governance) that necessitated bearing the cost of integrating separate systems. And, naturally, best-of-breed systems move into adjacent verticals and functionality in the ML stack, with the overarching trend of consolidation seemingly set in stone. Or is it?

Serverless ML massively reduces the integration costs when building best-of-breed platforms, as integration is at the Python language, or API, level. There is no need to configure IAM roles, VPC peering, S3 access, write Dockerfiles or, worse still, YAML or JSON files for reproducible infrastructural services. What if you could develop in Colab or VSCode, instead of waiting minutes for your cloud-managed notebook to start in your VPC? What if you could develop a feature pipeline locally in Python, and with just an annotation orchestrate its execution in the cloud?

What if installing a Python library was just annotating a function instead of compiling a Docker Image? What if all you needed is an API key to securely store features, training data, and models, instead of having to configure your client with AWS STS and temporary IAM credentials to store your data in S3? What if monitoring a model was just an API call (Gantry) instead of a cloud formation template? What if adding a useful UI to your prediction service was possible in Python without needing a Javascript developer? Serverless ML enables you to easily integrate the development tools and operational services you want for your ML system. 

Who is Serverless ML not for?

Serverless ML is not for everybody, yet. Many serverless services do not have a managed cloud offering, where the service can be deployed and managed inside the customer’s cloud account. For this reason, some enterprises will eschew using SaaS services that manage sensitive data about customers.

Example Serverless ML System

We run a free online course on Serverless ML, and in the first lab you learn how to build and operate a serverless ML system based on the Iris Flowers Dataset: a daily Iris Flower prediction service. The system runs a feature pipeline once/day to synthetically generate a new Iris Flower and write it to the feature store. Then a batch inference pipeline, that also runs once/day but just after the feature pipeline, reads the single flower added that day, and downloads the Iris model trained to predict the type of Iris flower based on the 4 input features: sepal length, sepal width, petal length, and petal width. The model’s prediction is written to an online Dashboard, and actual flower (outcome) is read from the feature store and also published to the same Dashboard - so you can see if the model predicted correctly. 

Example of Serverless Machine Learning
Figure 4. The Iris Daily Flower Prediction service is a Serverless Machine Learning Application, with serverless compute (features, models, predictions) provided by Modal, serverless state (features, models, prediction logs) provided by Hopsworks, and a serverless UI provided by Hugging Face Spaces.  

The serverless Dashboard below is deployed on Hugging Face Spaces at a Python program (app.py). The 31 lines of code uses an API key environment variable to connect to Hopsworks, and then download the images produced by the batch inference pipeline, including the flower prediction/outcome, recent historical predictions, and a confusion matrix with historical prediction performance, used for monitoring.

Iris UI as a serverless ML service.
Figure 5. The Iris Daily Flower Prediction Service, with a hindsight for recent predictions and historical model performance monitoring (confusion matrix).


With Serverless Machine Learning, Data Scientists can move beyond Jupyter notebooks and just training models to building fully fledged prediction services that use ML models. With Severless ML, all that is needed is Python skills to build interactive, self-monitoring ML systems.

Community newsletetter - featurestore.org

join the featurestore.org community

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.