Product Analytics - A/B Testing
How to answer A/B Testing questions?
#1 Clearly define null and alternative hypothesis
- Null Hypothesis: In A/B testing, the null hypothesis states that there is no difference between the control and variant group.
- Alternative Hypothesis: The alternative hypothesis states that there is a measurable difference between the control and variant group.
The goal of the test is to determine whether to reject the null in favor of the alternative with statistical significance
#2 State the methodology: For a scaled product, there are 10s of A/B tests performed at a given point in time, it's critical to keep a clear record of different tests. You can use the PICOT framework, commonly used in healthcare research:
- Population: Target audience for the test(e.g. website visitors)
- Intervention: The change or variant being introduced and tested relative to the control. This should be a measurable product change such as altering the color of a call-to-action button and modifying the checkout process.
- Comparison: Existing product without any change (the control)
- Outcome: The key metrics that define the impact of the intervention. These become the main results to evaluate and often the most critical part of the interview.
- Time: The duration over which the experiment will run before analyzing the data and reaching a conclusion
#3. Watch for biases and statistical significance: Watch for biases and statistical significance when analyzing A/B tests:
- Novelty Effect: Users engage more with new features out of curiosity. But this spike may not last after the novelty wears off. Look for lasting changes over time, not just short spikes.
- Primacy Effect: Users prefer and stick to the original version. They may resist changes at first since they are used to the old way. Long-time users especially can temporarily react negatively.
- Interface Issues: Make sure people in the test group (trying the new thing) can't influence the control group (using the old thing).
- Statistical Significance: Check if results are truly significant, not just different. P-value represents probability of extreme results occurring by chance.By convention, p-value under 5% is a statistically significant difference.
#4: Get hands-on practical: Products such Mixpanel, Amplitude, and Optimizely make it easy to conduct A/B testing. If you haven't done A/B testing before, try one of these tools in your personal projects or watch a demo. You can also speak to your friends in the PM and Data Science world who regularly perform A/B tests as a part of their day-to-day job. Getting practical experience will make you more confident in answering in-depth questions. Also, you'll learn key considerations that come up when running actual tests.