
Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing

Dan McKinley at Etsy (McKinley 2013) wrote “nearly everything fails” and for features, he wrote “it’s been humbling to realize how rare it is for them to succeed on the first attempt.
Ya Xu • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
It is often easier to generate a plan, execute against it, and declare success, with the key metric being: “percent of plan delivered,” ignoring whether the feature has any positive impact to key metrics.
Ya Xu • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
ideally you may want to combine them into an Overall Evaluation Criterion (OEC), which is believed to causally impact long-term objectives.
Ya Xu • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
making controlled experiments easy to run also accelerates innovation by decreasing the cost of trying new ideas,
Ya Xu • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
Defining guardrail metrics for experiments is important for identifying what the organization is not willing to change, since a strategy also “requires you to make tradeoffs in competing – to choose what not to do”
Ya Xu • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
“Profit” is not a good OEC, as short-term theatrics (e.g., raising prices) can increase short-term profit, but may hurt it in the long run. Customer lifetime value is a strategically powerful OEC (Kohavi, Longbottom et al. 2009).
Ya Xu • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
For k = 5, you have a 23% probability of seeing something statistically significant. For k = 10, that probability rises to 40%.
Ya Xu • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
EVI: Expected Value of Information from Douglas Hubbard (2014), which captures how additional information can help you in decision making. The ability to run controlled experiments allows you to significantly reduce uncertainty by trying a Minimum Viable Product (Ries 2011), gathering data, and iterating.
Ya Xu • Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
Only one third of the ideas tested at Microsoft improved the metric(s) they were designed to improve (Kohavi, Crook and Longbotham 2009). Success is even harder to find in well-optimized domains like Bing and Google, whereby some measures’ success rate is about 10–20% (Manzi 2012).