Title: Improving Revenue Experiment Accuracy for High-Variance Transactions
Optimizely currently defines revenue outliers using a fixed threshold of mean + 3 standard deviations. This approach works well for classic e-commerce use cases with stable pricing, but it becomes less reliable in scenarios where transaction values are inherently unpredictable.
In donation-based fundraising—and in other models with variable or open-ended transaction values (e.g. donations, contributions, pay-what-you-want, grants, or memberships) - there is no fixed price catalogue. As a result, a small number of very high transactions can disproportionately influence revenue and average gift metrics.
After reviewing this with our analysts and experimentation team, we found that applying a mean + 2 standard deviations threshold produces more representative and interpretable results for these high-variance environments. It better reflects real user behavior while still preserving sensitivity to genuine revenue changes.
This challenge became particularly visible in an A/A experiment, where a few exceptionally high transactions happened to land in one variant, leading to “statistically significant” revenue results despite no actual product or UX change. This highlights the risk of false positives when using a fixed 3 SD threshold in high-variance contexts.
Suggested enhancement:
Allow configuration of revenue outlier thresholds (e.g. 2 SD vs. 3 SD), or
Offer alternative, statistically robust outlier-handling options designed for high-variance transaction models.
Providing this flexibility would:
Improve experiment reliability and trust in revenue results
Reduce false positives in A/B tests
Extend Optimizely’s value for fundraising, nonprofit, and other variable-price digital experiences
-
Thanh Hoang
commented
The idea was raised on behalf of kasprzyc@unhcr.org from UNHCR customer via the Zendesk ticket #1825866