Improving Duolingo, one experiment at a time
We look to the experiment analysis to indicate whether a given experiment breaks any code or hurts metrics. If we find that an experiment is broken, we pause it until the bug is fixed. If the experiment seems to be stable, we continue to increase rollout over a few days while monitoring analysis reports. Then, based on these reports, successful... See more
Improving Duolingo, one experiment at a time
Before running an experiment, we always come up with a hypothesis to guess what the experiment will accomplish and to establish a baseline for success or failure. The experiments service then automatically computes the analysis of the actual results, with the report template dictating which metrics are relevant to the experiment.
Improving Duolingo, one experiment at a time
Specifically, our experiments follow the A/B testing method, where a certain portion of learners are placed into an A group (the control group) while others are placed into a B group (the experiment group); the A group sees the current version of the product, while the B group sees the new or updated feature. Then, based on several metrics,... See more