Experimentation Framework

Business Problem:

Too often a new product feature roll out is justified based on 0.01 lift. Though the criteria to assess whether or not a new product feature is a success requires rigorous statistical assessment. This is to prevent a new product feature from failing expectations when scaled out to the whole user base. And therefore preventing lost resources and/or time.

Metric to Lift:

Business Metric tied to Experiment Hypothesis/Sample Experiment To Infer True Population’s Behavior to New Product Feature

Hypothesis:

Conclusion from sample infers proper conclusion when the product is scaled out, an experimentation framework is needed to add acceptance criteria for new product feature roll out.

Analysis Approach:

Design experiment
Collect and Prepare Data
Visualize Results
Test Hypothesis
Conclusion/Recommendation

Code

Experimentation Framework

Recommendation:

Keep using hypothesis testing in order to vet initial results from A/B testing.

Conclusion:

Hypothesis testing is a great way to add rigor to experiment results analyses. At the same time, it is best to make sure there are no data errors upstream such as could happen when a new ML infrastructure is implemented. Also, another piece of advice is to try user surveys to get insight on why experiment results are not turning out as expected.

Reference:

Statistics for People in a Hurry (Digestible Stats Terms)
Never Start with a Hypothesis
Google, Chief Data Scientist on Overusing the Term “Statistically Significant” Makes You Look Clueless
Minimum Viable Experimentation
Renato Fillinich, UX Researcher @ Google, Data science and math enthusiast