Powered ICE - Prioritizing Experiments for Multimillion eCom Brands

JC Giusto
January 28, 2025

TLDR

ICE Score misses a crucial factor for multi-million eCommerces: statistical power.

As you scale, you need to detect smaller wins.

But some experiments just don't have enough traffic or conversions to spot these changes reliably.

Solution: Add Power (1-5) to ICE based on minimum detectable effect. Lower detection threshold = higher score.

Want the spreadsheet that does this automatically? Drop your email below.

Powered ICE Access

Want access to the Powered ICE Spreadsheet?

Leave us your email and I'll send it right away!



As a prioritization framework, the ICE Score is fundamentally lacking in one critical way.

It might work well for startups and SMBs...

But if you’re doing Growth and Experimentation for multi-million-dollar brands, your prioritization approach needs to evolve.

It needs to consider more than just Impact, Confidence, and Ease.

The ability of your experiments to detect positive effects with confidence should be a crucial component of your calculation.

In other words—you need to start prioritizing by power, too.

Why Power Matters

As brands scale beyond $5-10 million, their experimentation approach matures.

With already established products and profitable channels, they begin to focus on incremental improvements.

The challenge?

Detecting those 5% improvements with confidence is hard.

This is where A/B testing comes in.

However, the ability to detect changes - statistical power - depends on visitor volume and conversion rates.

And ICE misses this crucial element.

A Quick Example

Consider this example:

  • A mobile landing page - 3,700 weekly visitors, 2.7% conversion - needs a 10% uplift for 90% confidence and 80% power (one-tailed)
  • The Cart Slider - 7,053 weekly visitors, 26.6% conversion - only needs a 6% uplift

All else equal, Cart Slider experiments should take priority because positive effects are easier to detect.

Remember, it’s all about accumulating positive results to generate exponential growth in the medium and long term.

So, what’s the solution?

Introducing Powered ICE

This enhanced framework adds Power (1-5 rating) based on the Minimum Detectable Effect (MDE) of the experiment.

The lower the MDE, the higher the score:

  • 1: MDE is higher than 10%
  • 2: MDE between 8% and 10%
  • 3: MDE between 6% and 8%
  • 4: MDE between 4% to 6%
  • 5: MDE between 2% to 4%

If the MDE is lower than 2%, it might be wise to increase the p-value threshold.

In this way, journeys (and experiments) with higher MDEs will have a lower pICE Score than others.

Making Things Practical (and Automatic)

To make this process easier and fairer, I’ve created a Spreadsheet that automatically assigns a Power Score to a journey based on the MDEs of other journeys.

All you need to do is:

  1. Input the relevant data for each journey, such as Test Page, Device, Segment, Weekly Visitors, Conversions, and Success Metric.
  2. Perform a quick pre-test analysis to determine the minimum MDE.
  3. Enter the MDE.

And voilà—the spreadsheet calculates the journey’s Power Score.

On the second tab, you can reference the Journey and a Power Score will be automatically assigned.

If you want access to the spreadsheet, click on the button below:

Powered ICE Access

Want access to the Powered ICE Spreadsheet?

Leave us your email and I'll send it right away!


A Quick Note on MDEs

Calculating the MDE can be more of an art than a science.

That said, I use a few rules of thumb to guide my approach:

  1. I don’t run experiments for more than 4 weeks. Cookies get deleted after 28 days, which can contaminate the data.
  2. I typically choose either 3 or 4 weeks for testing. I don’t mind waiting, as long as I can achieve the desired significance level.
  3. If the MDE is too low (less than 2-3%) at 90% significance, I’ll consider either shortening the test or increasing the significance level.

If you’d like to learn more about calculating MDEs and determining confidence levels, let me know in the comments.

Conclusion

I’m not bashing the ICE Score at all.

The framework remains valuable, particularly for startups and small businesses that can’t run A/B tests due to limited resources.

However, power becomes a critical component of the prioritization equation for larger organizations, where randomized controlled experiments are the bread and butter of their growth strategy.

And not taking it into account can lead to unimpactful experimentation programs, wasting hundreds of thousands of dollars in the process.

Newsletter Form (#5)

Learn eCommerce Growth & Experimentation

Sign up in the newsletter form below to receive the latest strategies and tactics to grow your eCommerce brand.


linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram