Models will degrade over time. Catching *when* models have degraded is a core challenge to operationalizing ML models. ## Types of Drift Sometimes drift is referred to collectively as Data Drift. However, it's useful to tease them apart to understand the types of drift that can occur so we can use the appropriate statistical tests and action plans. The most common types of drift are the following: - Feature Drift - aka input drift, covariate feature shift - the underlying input data has changed -> change in distribution of input features - Label Drift - Distribution of the label has changed due to some outside influence - Prediction Drift - Related to label drift but rather than a change in the true population prevalence it's a change in the distribution of model prediction output due to feature drift - Concept Drift - A change in the broader environment outside the control of our model. The actual patterns that our model has learned are no longer representative of the new representative patterns in the concept that we're trying to model. - External factors have changed the relationship between the input features we're considering and the labels These various types of drift can occur in a confounding or correlated manner (e.g. feature drift can cause prediction drift) ### Feature, Label, and Prediction Drift You can compare a sample of the observed features to a sample of the expected features and compare their distribution. ![[_Media/Feature Label and Prediction Drift.png]] ### Concept Drift Concept drift can manifest itself in a number of different ways. Each of the ways it manifests requires different methods for identifying it. ![[_Media/Concept Drift.png]] - Sudden: e.g. a black swan event like a pandemic - Gradual and Incremental: Concept evolving over time - Recurring: seasonality e.g. holiday sales, weekdays vs weekends ## Drift Actions | Drift Type | Actions | | ---- | ---- | | Feature Drift | - Investigate feature generation process <br> - Retrain using new data| | Label Drift | - Investigate label generation process <br> - Retrain using new data| | Prediction Drift | - Investigate model training process <br> - Assess business impact of change in predictions| | Concept Drift | - Investigate additional feature engineering <br> - Consider alternative approach / solution <br> - Retrain / tune using new data| ## What to Monitor | Data Type | Basic Summary Stats | Distributions | | ---- | ---- | ---- | | Features | X | X | | Target | X | X | - Model performance metrics - Deploy a currently deployed model with a newly trained model only if the new model performs equally good or better - Business metrics ## Monitoring Tests ### Numeric Features **Summary stats** - mean / median - min - max - percentage of missing values **Statistical tests** - Mean: - 2-sample KS test with Bonferroni correction (or some other multi-test correction to limit growth of false positives over iterated tests). The n in the [[Bonferroni Correction]] is the number of input features we're considering. - Mann-Whitney (MW) test - Variance: - Levene test (compares then variance between 2 continuous distributions). the null hyp is that both distributions come from populations of equal variance ### Categorical features **Summary stats** - Mode - Number of unique levels - Percentage of missing values **Statistical tests** - One-way chi-squared (null hyp is that the distribution of expected and observed prevalence of categories is equal) ### Models - Relationship between target and features - numerical target -> Pearson coefficient - categorical target -> contingency tables - Model performance - Regression models -> MSE, error distribution plots, ... - Classification models -> ROC, confusion matrix, F1-score, ... - Performance on data slices (if we want a more granular view of model performance when just considering a portion of the data along a certain aspect) - Time taken to train ## How to monitor No single tool, you need to piece together a solution e.g. - logging a versioning -> MLFLow (model) + Snowflake (data) - Stats tests -> scipy and statsmodels - Visualization -> plotly ### Demo [Drifting Away 18:45](https://youtu.be/tGckE83S-4s?t=1125) ## Resources - [Drifting Away: Testing ML Models in Production](https://www.youtube.com/watch?v=tGckE83S-4s&) --- - Links: - Created at: [[2022-05-21]]