Google's Rules of Machine Learning still apply in the age of large language models

I heard about Google’s Rules of Machine Learning (ML) maybe 4-5 years ago. Much like Steve McConnell’s classic software engineering mistakes, the rules capture lessons learned from software engineering projects, though they are focused on the problems that arise from the engineering problem of shipping ML systems to production.

Despite the excitement about playing with data and models, the reality of building ML systems is that it’s mostly an engineering problem. This remains the case in the age of large language models. Perhaps it’s even more so because integrating language models into a product can be as simple as calling an API, which should make it easier to focus on business problems, pipelines, data, and evaluation. It’s important to remember that at an abstract level, ML is just a data transformation – there is no magic involved.

As the page containing Google’s ML rules is long and detailed, I put together this TIL post for my own ease of reference. It contains the key quote from the overview, along with the rules without their explanations. Go to the source for further details.

Overview
To make great products:
do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.
Most of the problems you will face are, in fact, engineering problems. Even with all the resources of a great machine learning expert, most of the gains come from great features, not great machine learning algorithms. So, the basic approach is:
Make sure your pipeline is solid end to end.
Start with a reasonable objective.
Add common-sense features in a simple way.
Make sure that your pipeline stays solid.
The Rules
Before Machine Learning
Don’t be afraid to launch a product without machine learning.
First, design and implement metrics.
Choose machine learning over a complex heuristic.
ML Phase I: Your First Pipeline
Keep the first model simple and get the infrastructure right.
Test the infrastructure independently from the machine learning.
Be careful about dropped data when copying pipelines.
Turn heuristics into features, or handle them externally.
Monitoring
Know the freshness requirements of your system.
Detect problems before exporting models.
Watch for silent failures.
Give feature columns owners and documentation.
Your First Objective
Don’t overthink which objective you choose to directly optimize.
Choose a simple, observable and attributable metric for your first objective.
Starting with an interpretable model makes debugging easier.
Separate Spam Filtering and Quality Ranking in a Policy Layer.
ML Phase II: Feature Engineering
Plan to launch and iterate.
Start with directly observed and reported features as opposed to learned features.
Explore with features of content that generalize across contexts.
Use very specific features when you can.
Combine and modify existing features to create new features in human-understandable ways.
The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have.
Clean up features you are no longer using.
Human Analysis of the System
You are not a typical end user.
Measure the delta between models.
When choosing models, utilitarian performance trumps predictive power.
Look for patterns in the measured errors, and create new features.
Try to quantify observed undesirable behavior.
Be aware that identical short-term behavior does not imply identical long-term behavior.
Training-Serving Skew
The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time.
Importance-weight sampled data, don’t arbitrarily drop it!
Beware that if you join data from a table at training and serving time, the data in the table may change.
Re-use code between your training pipeline and your serving pipeline whenever possible.
If you produce a model based on the data until January 5th, test the model on the data from January 6th and after.
In binary classification for filtering (such as spam detection or determining interesting emails), make small short-term sacrifices in performance for very clean data.
Beware of the inherent skew in ranking problems.
Avoid feedback loops with positional features.
Measure Training/Serving Skew.
ML Phase III: Slowed Growth, Optimization Refinement, and Complex Models
Don’t waste time on new features if unaligned objectives have become the issue.
Launch decisions are a proxy for long-term product goals.
Keep ensembles simple.
When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.
Don’t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are.
Your friends tend to be the same across different products. Your interests tend not to be.

Overview#

The Rules#

Before Machine Learning#

ML Phase I: Your First Pipeline#

Monitoring#

Your First Objective#

ML Phase II: Feature Engineering#

Human Analysis of the System#

Training-Serving Skew#

ML Phase III: Slowed Growth, Optimization Refinement, and Complex Models#

Overview

The Rules

Before Machine Learning

ML Phase I: Your First Pipeline

Monitoring

Your First Objective

ML Phase II: Feature Engineering

Human Analysis of the System

Training-Serving Skew

ML Phase III: Slowed Growth, Optimization Refinement, and Complex Models