How we used MLOps to boost grocery ecommerce convenience

5 min readMar 16, 2022

Dennis Doerrich, Data Scientist, Ocado Technology Barcelona

There’s been a momentous shift towards online grocery shopping in markets around the world, with customers enjoying the convenience of groceries delivered to them.

At Ocado Technology, we’re obsessed with building frictionless, user-friendly shopping experiences for our global OSP partners and their customers.

So we asked ourselves: how can we make online shopping even more convenient?

With many shoppers coming back regularly, we wanted to create a user-friendly solution that would populate their shopping basket in one click, saving them time and effort.

To do this, we developed a machine learning (ML) tool called ‘Smart Shop’.

What is “Smart Shop”?

Smart Shop is a feature that allows shoppers to create an online shopping basket of essential items in one click. The feature is unlocked after a customer’s first purchase and suggests items they may want to buy based on their shopping history.

The challenge: finding the “best basket”

One of the challenges was populating the “best” or “ideal” basket of groceries for each customer. That’s because:

the “best basket” or selection of items for a shopping basket is unique to each customer
the basket size is different for each customer
customer preferences vary over time.

Our approach to building the “best basket”

To find a solution for each customer’s “best basket”, we developed a Minimum Viable Product (MVP).

We used a two-fold approach to generate the baskets:

We created a ranking of all previously bought products for a given customer
We took the top N products from this list (where we base N on the customer’s average basket size).

To create the product ranking, we looked at the customer’s shopping history and extracted three types of machine learning (ML) features:

Customer features like the number of orders or average basket size for each customer
Product features such as product popularity, or whether the product was on promotion
Customer-product features like product popularity on the customer level. These features have by far the most predictive power.

Based on these features, the target for training and evaluating offline performance was straightforward: compare our predictions with what the customers bought in their next order.

With our MVP — we went into production quickly.

Going into production quickly

We used a data-centric rather than a model-centric approach. If you think about an ML solution as data + code + hyperparameters, then focusing on the data part is where the most potential lies with the least amount of work.

We used Google BigQuery ML (BQML). Because the modelling part was simple, we could focus our time on creating features, such as:

filtering duplicate products by using co-occurrences
using data about promotions
putting more emphasis on short-term-memory features (giving more weight to products bought recently)
capturing seasonality

Our whole ML pipeline, from data ingestion to prediction output, is essentially in BigQuery. This approach helped us establish a baseline model using logistic regression (and later boosted trees) and deploy it quickly into production. This vision, which acknowledges the model and is just one small part of the whole pipeline, made it easier to achieve what Google calls MLOps level 2 from the first experiment.

“This automated CI/CD system lets your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters. They can implement these ideas and automatically build, test, and deploy the new pipeline components to the target environment.”

We continuously monitored our predictions against our non-ML baseline throughout the whole process, which was simply repeating the customer’s last order.

We measured results with two testing frameworks: 1) before production, using data science metrics — so we could select the best candidate from our experiments and 2) in production — measuring business metrics with AB-testing.

1) Measuring before production, using data science metrics

The ML metrics we focused on were precision and recall. We could measure these metrics on historical data before going into production.

Precision, or conversion rate, refers to the number of suggested products that made it to the final basket.

Recall measures how much of the final basket comes from Smart Shop.

You may have heard of the infamous precision vs recall trade-off. In our case, we set our trade-off by the basket size we propose. If we provide small basket suggestions, the customer would likely need to add more items and spend more time doing so. Whereas if we provide a larger basket of items, the customer will probably want to remove more items. The challenge was finding the right balance.

To find the right trade-off, our approach to go into production quickly was key. It meant we could run A-B tests on changes that were only measurable in production. For example:

Looking only at the offline performance (on historical data), we can anticipate precision and recall to depend on basket size like this:

However, suggesting extremely small or extremely big baskets would likely lead to the trade-off situation mentioned previously.

2) Measuring during production, using business metrics and AB-testing

Looking only at precision and recall is a typical example of a gap between the ML metrics and what was of interest to the business.

Having 100% recall doesn’t bring much value if no one is using the feature!

Communication and collaboration were key to refining the solution. Through conversation with all parties involved we established the relevant business metric that we would measure in the AB-test. In our case, that was basket-value and smart-shop engagement (after using Smart Shop, do customers use it again?).

With good communication and clear alignment with the software development team, we agreed on precise requirements for what we, as data scientists, would need from the application to run an AB-test. This approach meant controlling the customer split and tracking which model was used for each customer.

Saving customers time and effort with our approach

Using a simple MVP and a data-centric approach allowed us to deliver quickly. With the benefits of a fast feedback loop, we could concentrate on the problem at hand — the basket-size tradeoff.

And by continuously improving and testing, we developed a frequently used feature that saves customers time and effort.

What’s next for Smart Shop?

Looking to the future, we want to add even more value for shoppers.

Rather than looking at each product individually, we’ll use product similarities and co-occurrences to remove duplicate products and prioritise products that go well together.

We will also suggest products that customers might like, even if they haven’t purchased these before.

Using this data-driven and iterative approach, we’ll be able to continuously improve features and make online shopping more convenient.

Contributions to this piece were made by: Yaroslav Marchuk, Data Scientist / Laurent Candillier, Senior Data Scientist / Raluca Simona Radu, Product Manager / Xavier Forns, Engineering Team Leader, Ocado Technology

Originally published at https://www.ocadogroup.com.