Analyze A/B Test Results
Interpret A/B test results correctly — statistical significance, practical significance, and decision recommendation.
The Prompt
Analyze the following A/B test results. Provide: 1. Statistical significance: p-value calculation and interpretation 2. Confidence interval for the true effect size 3. Practical significance: is the effect size large enough to matter? 4. Sample size adequacy: was the test run long enough? 5. Segment analysis: did the effect differ by user segment or device? 6. Decision recommendation: ship, iterate, or discard — with reasoning 7. What to test next Test results: - Control conversion rate: [X%] - Variant conversion rate: [Y%] - Control sample size: [N] - Variant sample size: [N] - Test duration: [DAYS] - Primary metric: [METRIC NAME] - Any segment splits available: [LIST] - Confidence level target: [90% / 95% / 99%]
Example Output
With control at 3.2% and variant at 3.8% (n=12,000 each, 14 days), the lift is +18.75% (relative). p-value = 0.021 (statistically significant at 95%). However, confidence interval for true effect is +2% to +38% — wide range. Recommendation: ship with monitoring; re-run with larger sample before committing major engineering resources.
FAQ
Which AI model is best for Analyze A/B Test Results?
Claude Sonnet 4 or GPT-4o — both handle statistical interpretation well. Specify your confidence level.
How do I use the Analyze A/B Test Results prompt?
Copy the prompt, replace the [BRACKETED] placeholders with your specific information, and paste into your preferred AI assistant (ChatGPT, Claude, Gemini, etc.). With control at 3.2% and variant at 3.8% (n=12,000 each, 14 days), the lift is +18.75% (relative). p-value = 0.021 (statistically significant at 95%). However, confidence interval for true effect is +2% to +38% — wide range. Recommendation: ship with monitoring; re-run with larger sample before committing major engineering resources.
Model Recommendation
Claude Sonnet 4 or GPT-4o — both handle statistical interpretation well. Specify your confidence level.