Unbundling Development and Operations for ML with Serverless ML
Serverless machine learning solves the problem of how to build and operate supervised machine learning systems in Python without having to first learn how to install, configure, and operate complex computational and data storage infrastructure.
Improved infrastructural support for the serverless orchestration of feature pipelines, training pipelines, and inference pipelines enables Python developers to build and operate ML systems without first having to become experts in either Kubernetes or cloud infrastructure. Serverless ML may trigger the unbundling of Data Science platforms as it massively reduces the integration costs for best-of-breed platforms, as integration is at the Python language level.
Serverless machine learning (ML) is a new category of loosely coupled serverless services that provide the operational services (compute and storage) for AI-enabled products and services. Serverless compute services orchestrate and run the feature pipelines, training pipelines, and batch inference pipelines. The outputs of these ML pipelines include reusable features, training data, models, and prediction logs, and a serverless feature/model store manages state in serverless ML. Finally, serverless model serving provides support for online models that are accessible via network endpoints.
Machine learning systems implement a data flow of processing and storage steps, starting from input data to features to trained models, finishing with a prediction service (that uses the model and inference data) and a monitoring service with prediction logs for observability and debugging.
In a serverless ML system, the machine learning pipeline stages are refactored into independent feature engineering, training, and inference pipelines. These pipelines communicate by storing their outputs and reading their inputs in a feature store or model registry. Even prediction logs (features and predicted labels) can be stored back in the feature store to enable the monitoring of models for correctness, performance, and observability.
There are two competing trends in ML platforms for development and operation: (1) all-in-one platforms for both development and operation of ML systems, such as AWS Sagemaker, Databricks, and Vertex, and (2) the integration of best-of-breed platforms for experiment management, compute, state (features/models), and serving.
Until now, all-in-one platforms have been growing in adoption for small and medium-sized teams that need a low cost, integrated solution. This has left best-of-breed systems to sell to larger teams with special needs (performance, integration, legacy, governance) that necessitated bearing the cost of integrating separate systems. And, naturally, best-of-breed systems move into adjacent verticals and functionality in the ML stack, with the overarching trend of consolidation seemingly set in stone. Or is it?
Serverless ML massively reduces the integration costs when building best-of-breed platforms, as integration is at the Python language, or API, level. There is no need to configure IAM roles, VPC peering, S3 access, write Dockerfiles or, worse still, YAML or JSON files for reproducible infrastructural services. What if you could develop in Colab or VSCode, instead of waiting minutes for your cloud-managed notebook to start in your VPC? What if you could develop a feature pipeline locally in Python, and with just an annotation orchestrate its execution in the cloud?
Serverless ML is not for everybody, yet. Many serverless services do not have a managed cloud offering, where the service can be deployed and managed inside the customer’s cloud account. For this reason, some enterprises will eschew using SaaS services that manage sensitive data about customers.
We run a free online course on Serverless ML, and in the first lab you learn how to build and operate a serverless ML system based on the Iris Flowers Dataset: a daily Iris Flower prediction service. The system runs a feature pipeline once/day to synthetically generate a new Iris Flower and write it to the feature store. Then a batch inference pipeline, that also runs once/day but just after the feature pipeline, reads the single flower added that day, and downloads the Iris model trained to predict the type of Iris flower based on the 4 input features: sepal length, sepal width, petal length, and petal width. The model’s prediction is written to an online Dashboard, and actual flower (outcome) is read from the feature store and also published to the same Dashboard - so you can see if the model predicted correctly.
The serverless Dashboard below is deployed on Hugging Face Spaces at a Python program (app.py). The 31 lines of code uses an API key environment variable to connect to Hopsworks, and then download the images produced by the batch inference pipeline, including the flower prediction/outcome, recent historical predictions, and a confusion matrix with historical prediction performance, used for monitoring.
With Serverless Machine Learning, Data Scientists can move beyond Jupyter notebooks and just training models to building fully fledged prediction services that use ML models. With Severless ML, all that is needed is Python skills to build interactive, self-monitoring ML systems.
join the featurestore.org community