A/B Testing Engine: Optimising Intervention Effectiveness

Whistl never stops improving. The A/B Testing Engine continuously tests variations of interventions—different wording, timing, and formats—to discover what works best. Winning variations are promoted across the user base while maintaining personalisation. This is evidence-based behaviour change at scale.

Why A/B Testing Matters

Small changes can have big impacts on intervention effectiveness:

Research on Message Framing

  • Gain vs. loss framing: Different users respond to different frames (Rothman et al., 2006)
  • Message length: Shorter isn't always better (Keller & Lehmann, 2008)
  • Tone matching: Coaching style must match user preference (Miller & Rollnick, 2012)
  • Timing effects: When you intervene matters as much as what you say (Heron & Smyth, 2010)

The Limits of Expert Design

  • Experts can't predict: What researchers think works ≠ what actually works
  • Context matters: Effectiveness varies by user, time, situation
  • Continuous improvement: What works today may not work tomorrow
  • Scale reveals patterns: Large user base enables statistical confidence

What Whistl A/B Tests

The testing engine evaluates variations across multiple dimensions:

Message Wording

StepVariant AVariant BVariant C
Acknowledge"I hear you. What's driving this?""I understand this is hard. Talk to me.""You want through. I get it. Why?"
Reflect"Last time you felt this way...""Remember what happened last time...""Think about the last time..."
Breathe"Let's breathe together.""Time to breathe. With me.""Breathe. Just breathe."
Visualize"Picture your goal...""Remember what you're saving for...""This is what you're working toward..."

Timing Variations

  • Immediate vs. delayed: Intervene right away or wait 30 seconds?
  • Breathing duration: 2 minutes vs. 3 minutes vs. 90 seconds
  • Step spacing: Show steps one at a time or all together?
  • Follow-up timing: Check in after 1 hour or 2 hours?

Visual Format

  • Text-only vs. image: Does showing goal images help?
  • Progress bar style: Linear vs. circular vs. numeric
  • Color schemes: Calming blues vs. urgent reds vs. neutral
  • Animation: Animated breathing pacer vs. static

How A/B Testing Works

Whistl's testing engine follows rigorous methodology:

Test Assignment

# User assignment to test variants
def assign_test_variant(user_id, test_id):
    # Hash user ID for consistent assignment
    hash_value = hash(user_id + test_id) % 100
    
    # Assign to variant based on hash
    if hash_value < 33:
        return "A"  # Control group
    elif hash_value < 66:
        return "B"  # Variant B
    else:
        return "C"  # Variant C

# User always sees same variant for same test
# Different users see different variants

Sample Size Requirements

  • Minimum per variant: 100 interventions
  • Statistical power: 80% (standard for behavioural research)
  • Confidence level: 95% (p < 0.05)
  • Minimum effect size: 5% improvement to adopt

Success Metrics

MetricDefinitionTarget
Intervention AcceptanceUser engaged with intervention>70%
Urge Pass RateUrge didn't return within 2 hours>60%
Step CompletionUser completed the full step>80%
Helpfulness RatingUser rated 4+ stars>4.0/5.0
No BypassUser didn't bypass after intervention>75%

Current Active Tests

Examples of tests running across the Whistl user base:

Test 1: Acknowledge Message Tone

Test ID: ACK_TONE_001
Status: Running (67% complete)

Variant A (Control): "I hear you. What's driving this?"
  - Sample: 1,234 interventions
  - Acceptance rate: 89%
  - Helpfulness: 4.2/5

Variant B: "I understand this is frustrating. Talk to me."
  - Sample: 1,198 interventions
  - Acceptance rate: 91%
  - Helpfulness: 4.4/5

Variant C: "This is hard. I'm here. What's happening?"
  - Sample: 1,211 interventions
  - Acceptance rate: 87%
  - Helpfulness: 4.3/5

Current leader: Variant B (+2% acceptance, +0.2 helpfulness)

Test 2: Breathing Duration

Test ID: BREATHE_DURATION_002
Status: Running (45% complete)

Variant A (Control): 2 minutes
  - Sample: 892 interventions
  - Completion rate: 78%
  - Urge pass rate: 54%

Variant B: 90 seconds
  - Sample: 867 interventions
  - Completion rate: 84%
  - Urge pass rate: 49%

Variant C: 3 minutes
  - Sample: 901 interventions
  - Completion rate: 71%
  - Urge pass rate: 58%

Current leader: Variant A (best balance of completion and effectiveness)

Test 3: Visualization Format

Test ID: VISUAL_FORMAT_003
Status: Complete - Variant B Winner

Variant A (Control): Text-only goal reminder
  - Sample: 2,100 interventions
  - Motivation increase: 45%

Variant B: Goal image + progress bar
  - Sample: 2,087 interventions
  - Motivation increase: 61% ✓ WINNER

Variant C: Goal image + time travel projection
  - Sample: 2,134 interventions
  - Motivation increase: 58%

Result: Variant B promoted to all users

From Test to Production

When a test completes, winning variants are rolled out:

Rollout Process

  1. Statistical validation: Confirm significance and effect size
  2. Segment analysis: Check if winner varies by user type
  3. Gradual rollout: 10% → 50% → 100% over 1 week
  4. Monitoring: Watch for unexpected effects
  5. Documentation: Update intervention library

Personalisation Override

Even winning variants respect personal preferences:

  • Coaching style: Tough Love users still get Tough Love variants
  • Step order: Personal step ordering takes precedence
  • Opt-out: Users can disable experimental features

Ethical Considerations

Whistl's A/B testing follows ethical guidelines:

Ethical Principles

  • No harmful variants: All variants must be supportive, not punitive
  • Crisis exclusion: Users in crisis don't receive test variants
  • Transparency: Users can view active tests in settings
  • Opt-out available: Users can choose control variants only

Data Privacy

  • Anonymous aggregation: Test results are aggregated, not individual
  • No external sharing: Test data stays within Whistl
  • Minimal collection: Only data needed for testing is collected

Effectiveness Improvements

A/B testing has driven measurable improvements:

Cumulative Impact (12 Months)

MetricBaselineCurrentImprovement
Overall Intervention Acceptance64%73%+9%
Urge Pass Rate52%61%+9%
User Satisfaction4.1/54.6/5+12%
Step Completion Rate67%78%+11%

User Testimonials

"I noticed the messages changed over time. They got... better? More helpful. Didn't realise they were testing stuff." — Marcus, 28

"The breathing timer changed from 2 minutes to something else and back. Asked support—they said they were testing what works. Cool that they care about getting it right." — Sarah, 34

"Love that Whistl is always improving. It's not static software—it's getting smarter." — Jake, 31

Conclusion

Whistl's A/B Testing Engine ensures that every intervention is backed by evidence, not just intuition. By continuously testing and learning, Whistl gets more effective over time—for every user.

This is behaviour change science in action: hypothesis, test, learn, improve. Repeat forever.

Experience Evidence-Based Protection

Whistl's interventions are continuously tested and improved. Download free and benefit from ongoing optimisation.

Download Whistl Free

Related: 8-Step Negotiation Engine | Step Effectiveness Tracking | Intervention Type Predictor