Back to Blogs

From 50 ML projects, 48 made it to production within 2 weeks. How?

March 1, 2023
by
Jim Dowling

TL;DR

Putting machine learning (ML) models in production is considered an operational challenge that is performed after all the hard work on training and optimizing the model is completed. In contrast, serverless ML starts with a minimal model, including the operational feature pipeline(s) and inference pipeline. The feature and inference pipelines are needed to ensure the viability of the data that will be fed to the model for both predictions. In this article, we show you that writing feature pipelines and inference pipelines should not be hard and if you don’t have to configure/build the MLOps infrastructure yourself, getting to a minimal viable production model within a couple of weeks should be feasible for most models - like it was for the 90% of successful ML systems in our scalable ML course at KTH, where projects took, on average, 2 weeks to complete.

The MVP for Machine Learning

One of the best practices for systems software development is to start by getting to a working MVP (minimal viable product) as soon as possible. For machine learning, the MVP is a model that makes predictions on new data and publishes predictions for either users or downstream services.

graph
Figure 1: ML Systems should be developed in the same way as systems software and much application software: start with a minimal working system. Iteratively improve your system, using tests and versioning to increase your confidence in building working software and not breaking clients, respectively.

But many Data Scientists, who work mostly with notebooks, find it impossible to even imagine getting to a working ML system. One answer to getting there is to decompose this bigger problem into separate manageable programs that together make up your working ML system.

3 Programs: Feature, Training, and Inference Pipelines

All ML systems can be decomposed into 3 pipelines:

  • a feature pipeline that takes raw data and converts it into features
  • a training pipeline that takes features and outputs a trained model
  • an inference pipeline that takes unseen data (also features) and a trained model, and outputs predictions that are consumed by some client or service.

These pipelines have a shared data layer (a feature store and model registry) that mean each pipeline can run independently at its own cadence. For example, new data might arrive once per hour, so you run the feature pipeline once per hour. The inference pipeline might be a batch job run once per day, so you schedule it to run once per day. The training pipeline might be run on-demand, for example, when your model is stale or when you have more/better data to train your model with.

Note that there is no single “ML Pipeline”. There are 3 pipelines that, working together, make up your ML system. So, when people say “this ML pipeline” - ask them, is it a feature pipeline, a training pipeline, an inference pipeline? Some people might like to couple them together in a single monolithic pipeline, but if you want to write production systems, we strongly recommend against it!

graph
Figure 2: A MVP ML system has 3 programs: a feature pipeline, a training pipeline, and an inference pipeline. The feature pipeline needs to be scheduled, and if it is a batch ML system, the inference pipeline also needs to be scheduled. If it is an online ML system, the inference pipeline will be part of a deployed model. The Prediction Consumer could be an ML-enabled App or an interactive UI or a Dashboard.

When >90% build complete ML Systems

In the course KTH ID2223 on scalable machine learning, students start by building complete ML systems. The first lab introduced them to the concepts of a feature pipeline, training pipeline, and a batch inference pipeline using the well-known Titanic survival dataset. They implemented a synthetic passenger generator function as a feature pipeline so that new data would keep getting created, a Dashboard in Gradio/Hugging Faces as the inference pipeline (to show predictions if the new passenger survived or not). The students also implemented a UI in Gradio to monitor the performance of their model. Model training was typically a Colab/Jupyter notebook with an XGBoost classifier model.

So, students started by building a complete ML system - they put their models to work in their first lab. In fact, more than 90% of students succeeded in putting their first model into production, and for many of them, it was their first exposure to ML (they came from a software engineering background), while many were well versed in ML theory, but not practice.

After the first 2 labs, the students were confident in building a complete ML system with a feature pipeline and a UI. They then undertook a project where they would identify their own prediction problem and dataset, and build a ML system that adds value by making predictions to be consumed by users.

UI example
Figure 3: PM10 predictions for air quality in Poland

A list of selected student projects is available here. The table below shows  a few of them, and how they decomposed the ML system into feature/training/inference pipelines.

Project Feature Pipeline Training Pipeline Inference Pipeline
Stockholm House Prices Scrape house prices from a website, feature engineering in Pandas, write to Hopsworks Notebook that can use either XGBoost or Autogluon Gradio User Interface that runs on Hugging Face Spaces
New York Electricity Demand Daily Python feature pipeline scrapes electricity info into Pandas, writes to Hopsworks Notebook with XGBoost and sklearn transformers. "Batch inference job on Modal writes to Hopsworks. Interactive UI, HF Monitoring UI "
Vocal remover from sound/music files Feature pipeline on Modal that takes youtube recordings once/week, writes to Hopsworks Notebook that uses a pre-trained model that is saved to Hopsworks. HuggingFace UI that downloads model from Hopsworks and provides an Interactive UI

Summary 

There is no such thing as a single machine learning pipeline - there are feature pipelines, training pipelines, and inference pipelines. And if you structure your ML systems as such, you too will be able to quickly build an end-to-end working ML system that can be iteratively improved. The next step after getting to a working ML system is to follow best practices for testing and versioning your ML assets: your features and your models. In future posts, we will write more about MLOps principles of testing and versioning that you should follow, without first having to learn how to install and manage MLOps infrastructure.

Join the Serverless ML Movement