Business Problem:
Too often a new product feature roll out is justified based on 0.01 lift. Though the criteria to assess whether or not a new product feature is a success requires rigorous statistical assessment. This is to prevent a new product feature from failing expectations when scaled out to the whole user base. And therefore preventing lost resources and/or time.
Metric to Lift:
Business Metric tied to Experiment Hypothesis/Sample Experiment To Infer True Population’s Behavior to New Product Feature
Hypothesis:
Conclusion from sample infers proper conclusion when the product is scaled out, an experimentation framework is needed to add acceptance criteria for new product feature roll out.
Analysis Approach:
- Design experiment
- Collect and Prepare Data
- Visualize Results
- Test Hypothesis
- Conclusion/Recommendation
Code
Recommendation:
Keep using hypothesis testing in order to vet initial results from A/B testing.
Conclusion:
Hypothesis testing is a great way to add rigor to experiment results analyses. At the same time, it is best to make sure there are no data errors upstream such as could happen when a new ML infrastructure is implemented. Also, another piece of advice is to try user surveys to get insight on why experiment results are not turning out as expected.
Reference:
- Statistics for People in a Hurry (Digestible Stats Terms)
- Never Start with a Hypothesis
- Google, Chief Data Scientist on Overusing the Term “Statistically Significant” Makes You Look Clueless
- Minimum Viable Experimentation
- Renato Fillinich, UX Researcher @ Google, Data science and math enthusiast