The Challenge
Enterprise retail decision-making requires diverse ML approaches - no single model type solves forecasting, personalization, pricing, and customer lifecycle management. The organization needed a production ML practice that could support a wide range of modeling techniques while maintaining proper governance, experiment tracking, and deployment lifecycle management.
My Approach
I built the production ML practice with a focus on selecting the right modeling technique for each business problem, with MLflow as the backbone for experiment tracking and model lifecycle management.
Modeling Techniques
The practice supports several advanced approaches:
- GraphRAG: Combining knowledge graph relationships with retrieval-augmented generation - the graph provides structured context that enriches LLM responses far beyond what flat document retrieval can achieve
- Causal Modeling: Understanding cause-effect relationships for pricing and promotion decisions. Instead of just correlating price changes with sales, causal inference identifies the true impact of interventions
- Bandit Optimization: Multi-armed bandit approaches for dynamic decision-making under uncertainty - used for real-time personalization where exploration and exploitation must be balanced
- Survival Modeling: Time-to-event analysis for customer lifecycle and churn prediction, modeling not just whether a customer will churn, but when
MLOps Foundation
Every production model follows a consistent lifecycle:
- Experiment tracking: All training runs, hyperparameters, and metrics tracked in MLflow
- Model registry: Central source of truth for model versions, staging, and production promotion
- Monitoring: Drift detection and performance degradation alerts
- Governance: Model cards documenting purpose, training data, limitations, and owners
Key Decisions & Trade-offs
Right technique for the problem: Rather than defaulting to deep learning for everything, each use case gets the most appropriate approach. Causal models for pricing, bandits for real-time decisions, survival models for lifecycle - this pragmatism improves both accuracy and interpretability.
GraphRAG over vanilla RAG: Adding the knowledge graph layer to retrieval-augmented generation was more complex to build, but the structured relationships it provides dramatically improve response quality for domain-specific queries.
Impact
The production ML practice provides QVC Group with a governed, scalable approach to deploying diverse AI capabilities. Each technique is chosen for its fit to the business problem, not for its novelty.