Explainable AI for Financial Decisions: Why Transparency Matters in Behavioural Prediction

Black box AI might work for recommending movies, but financial decisions demand transparency. Discover how Whistl uses explainable AI techniques to show you exactly why you're receiving an intervention, building trust and enabling meaningful behaviour change.

The Problem with Black Box AI

Modern machine learning models can achieve remarkable accuracy—but often at the cost of interpretability. Deep neural networks with millions of parameters make predictions through complex, non-linear transformations that even their creators struggle to explain.

For entertainment recommendations, this opacity is acceptable. If Netflix can't explain why it suggested a particular show, the stakes are low. But for financial interventions—where an app is essentially telling you "don't spend money right now"—explanations are essential.

Without understanding why a prediction was made, users can't:

Trust the intervention enough to act on it
Learn from the underlying pattern
Correct mistaken assumptions in the model
Feel agency over their financial decisions

Explainability as a Design Principle

At Whistl, explainability isn't an afterthought—it's built into our AI architecture from the ground up. We use a combination of inherently interpretable models and post-hoc explanation techniques to ensure every prediction comes with a clear, actionable explanation.

Explainable AI Techniques in Whistl

Whistl employs multiple XAI techniques, each serving different explanatory purposes:

1. SHAP Values for Feature Importance

SHAP (SHapley Additive exPlanations) values provide a game-theoretic approach to explaining model predictions. Based on cooperative game theory, SHAP values fairly distribute the "credit" for a prediction among all input features.

import shap
import numpy as np

class SHAPExplainer:
    def __init__(self, model, background_data):
        """
        Initialize SHAP explainer with background data.
        Background data represents typical user behaviour.
        """
        self.explainer = shap.KernelExplainer(
            model.predict_proba, 
            background_data,
            feature_names=[
                'stress_level', 'time_since_payday', 'location_risk',
                'category_momentum', 'sleep_quality', 'social_context',
                'spending_velocity', 'accountability_active'
            ]
        )
    
    def explain_prediction(self, user_features):
        """
        Generate SHAP explanation for a single prediction.
        Returns feature contributions that sum to the prediction.
        """
        shap_values = self.explainer.shap_values(user_features)
        
        # Format explanation
        explanation = {
            'base_probability': self._get_base_rate(),
            'feature_contributions': [],
            'total_risk': shap_values.sum() + self._get_base_rate()
        }
        
        for i, feature_name in enumerate(self.explainer.feature_names):
            contribution = shap_values[i]
            explanation['feature_contributions'].append({
                'feature': feature_name,
                'value': user_features[i],
                'contribution': contribution,
                'direction': 'increases' if contribution > 0 else 'decreases'
            })
        
        # Sort by absolute contribution
        explanation['feature_contributions'].sort(
            key=lambda x: abs(x['contribution']), 
            reverse=True
        )
        
        return explanation

# Example output:
# {
#     'base_probability': 0.15,
#     'feature_contributions': [
#         {'feature': 'stress_level', 'value': 0.82, 'contribution': 0.23, 'direction': 'increases'},
#         {'feature': 'time_since_payday', 'value': 0.1, 'contribution': 0.18, 'direction': 'increases'},
#         {'feature': 'location_risk', 'value': 0.9, 'contribution': 0.12, 'direction': 'increases'},
#         {'feature': 'accountability_active', 'value': 0.0, 'contribution': -0.08, 'direction': 'decreases'},
#     ],
#     'total_risk': 0.60  # 60% impulse risk
# }

This explanation tells the user: "Your risk is 60% (vs. 15% baseline) primarily because your stress level is elevated, it's right after payday, and you're in a high-risk location. Having an accountability partner would reduce your risk."

2. Counterfactual Explanations

Counterfactual explanations answer the question: "What would need to change for the prediction to be different?" This is particularly actionable for users who want to reduce their risk.

class CounterfactualExplainer:
    def __init__(self, model, feature_ranges):
        self.model = model
        self.feature_ranges = feature_ranges  # Min/max for each feature
    
    def find_counterfactual(self, user_features, target_risk=0.3):
        """
        Find minimal changes needed to achieve target risk level.
        Uses gradient-based optimization for efficiency.
        """
        current_risk = self.model.predict_proba(user_features)[0, 1]
        
        if current_risk <= target_risk:
            return None  # Already below target
        
        # Optimize to find counterfactual
        best_counterfactual = None
        best_distance = float('inf')
        
        for _ in range(100):  # Multiple restarts
            counterfactual = self._optimize_counterfactual(
                user_features, target_risk
            )
            distance = np.linalg.norm(counterfactual - user_features)
            
            if distance < best_distance:
                best_distance = distance
                best_counterfactual = counterfactual
        
        # Generate human-readable explanation
        return self._format_counterfactual(
            user_features, best_counterfactual, current_risk, target_risk
        )
    
    def _format_counterfactual(self, original, counterfactual, current_risk, target_risk):
        """Convert numerical changes to natural language."""
        changes = []
        feature_names = [
            'stress level', 'time since payday', 'location risk',
            'category momentum', 'sleep quality', 'social support'
        ]
        
        for i, (orig, cf) in enumerate(zip(original, counterfactual)):
            if abs(orig - cf) > 0.1:  # Meaningful change
                direction = 'decrease' if cf < orig else 'increase'
                changes.append(f"{direction} {feature_names[i]}")
        
        return {
            'current_risk': current_risk,
            'target_risk': target_risk,
            'required_changes': changes,
            'feasibility': self._assess_feasibility(original, counterfactual)
        }

Example counterfactual: "To reduce your impulse risk from 60% to 30%, you could: wait 24 hours (reduces time pressure), move to a different location (avoid trigger environment), or activate your accountability partner (add social support)."

3. Attention Visualisation

For our Transformer-based models, attention weights provide natural explanations by showing which historical events the model considers most relevant.

def visualize_attention(attention_weights, transactions, top_k=5):
    """
    Visualize which past transactions the model is attending to.
    
    Args:
        attention_weights: Attention matrix from Transformer
        transactions: List of past transactions with metadata
        top_k: Number of top attended transactions to show
    
    Returns:
        Human-readable explanation of attention pattern
    """
    # Get attention to current prediction (last row)
    current_attention = attention_weights[-1, :]
    
    # Get top-k attended time steps
    top_indices = np.argsort(current_attention)[-top_k:][::-1]
    
    explanation = []
    for idx in top_indices:
        tx = transactions[idx]
        attention_score = current_attention[idx]
        
        explanation.append({
            'transaction': f"{tx.category} at {tx.merchant}",
            'amount': tx.amount,
            'time_ago': tx.time_ago,
            'attention_score': attention_score,
            'relevance': _explain_relevance(tx, attention_score)
        })
    
    return explanation

def _explain_relevance(transaction, attention_score):
    """Generate natural language explanation of why this transaction matters."""
    if transaction.category in ['entertainment', 'dining']:
        return f"Similar leisure spending {transaction.time_ago} predicts current impulse"
    elif transaction.amount > transaction.average_amount * 1.5:
        return f"Large purchase {transaction.time_ago} indicates elevated spending mood"
    elif transaction.time_of_day == 'late_night':
        return f"Late-night transaction {transaction.time_ago} correlates with reduced impulse control"
    else:
        return f"Pattern from {transaction.time_ago} informs current risk assessment"

4. Rule Extraction from Neural Networks

While neural networks are inherently complex, we can extract approximate rules that capture their decision logic:

from sklearn.tree import DecisionTreeClassifier

class RuleExtractor:
    def __init__(self, neural_model, feature_names):
        self.neural_model = neural_model
        self.feature_names = feature_names
        self.surrogate_model = DecisionTreeClassifier(
            max_depth=4,  # Keep tree shallow for interpretability
            min_samples_leaf=50
        )
    
    def extract_rules(self, data):
        """
        Train a decision tree to approximate neural network behavior.
        The tree's rules approximate the neural net's decision logic.
        """
        # Get neural network predictions
        predictions = self.neural_model.predict(data)
        
        # Train decision tree to mimic neural net
        self.surrogate_model.fit(data, predictions)
        
        # Extract rules from tree
        rules = self._extract_tree_rules(self.surrogate_model)
        
        return rules
    
    def _extract_tree_rules(self, tree):
        """Convert decision tree to human-readable rules."""
        rules = []
        tree_ = tree.tree_
        
        def traverse(node, conditions):
            if tree_.feature[node] != -2:  # Not a leaf
                feature_name = self.feature_names[tree_.feature[node]]
                threshold = tree_.threshold[node]
                
                # Left child (feature <= threshold)
                traverse(
                    tree_.children_left[node],
                    conditions + [f"{feature_name} <= {threshold:.2f}"]
                )
                
                # Right child (feature > threshold)
                traverse(
                    tree_.children_right[node],
                    conditions + [f"{feature_name} > {threshold:.2f}"]
                )
            else:
                # Leaf node
                value = tree_.value[node][0][1]  # Probability of positive class
                if value > 0.5:
                    rules.append({
                        'conditions': conditions,
                        'prediction': 'HIGH RISK',
                        'confidence': value
                    })
        
        traverse(0, [])
        return rules

# Example extracted rules:
# [
#   {'conditions': ['stress_level > 0.65', 'time_since_payday <= 0.15'], 
#    'prediction': 'HIGH RISK', 'confidence': 0.87},
#   {'conditions': ['location_risk <= 0.30', 'accountability_active = 1'], 
#    'prediction': 'LOW RISK', 'confidence': 0.91}
# ]

Presenting Explanations to Users

Technical explanations must be translated into user-friendly language. Whistl uses several presentation strategies:

Natural Language Summaries

Instead of showing raw SHAP values, we generate natural language:

"Your impulse risk is currently high (68%). This is primarily because:

Your stress level is elevated (you reported feeling overwhelmed 2 hours ago)

It's only 3 days since payday (historically a high-risk period for you)

You're currently near a shopping centre you frequently overspend at

What would help: Waiting 24 hours would reduce your risk to 35%. Activating your accountability partner would reduce it further to 22%."

Visual Risk Decomposition

Whistl shows a visual breakdown of risk factors with colour-coded contributions:

🔴 Red bars show factors increasing risk
🟢 Green bars show factors decreasing risk
Grey bar shows baseline population risk
Combined bar shows your total risk

Interactive "What If" Explorer

Users can adjust sliders to see how different changes would affect their risk:

"What if I wait 24 hours?"
"What if I activate my accountability partner?"
"What if I leave this location?"
"What if I complete a mindfulness exercise?"

Building Trust Through Transparency

Explainability serves multiple purposes beyond mere information:

Calibrating Trust

Users learn when to trust the model and when to question it. If the explanation doesn't resonate ("I'm not stressed, the model is wrong"), users can override the intervention. This feedback loop actually improves the model over time.

Enabling Learning

Understanding patterns helps users internalise insights: "I always overspend three days after payday" becomes actionable self-knowledge that persists even without the app.

Supporting Autonomy

Explanations preserve user agency. Instead of "don't spend," the message becomes "here's what's happening, here's your risk, here are your options." Users make informed decisions rather than following blind commands.

"The explanations made all the difference. When Whistl told me I was high-risk because I was stressed and near a trigger location, I actually believed it. I could feel the stress, I knew that location was problematic. The app wasn't controlling me—it was helping me see what I already knew but was ignoring."
— Emma K., Whistl user since 2025

Evaluating Explanation Quality

Not all explanations are equally useful. Whistl evaluates explanations on multiple dimensions:

Dimension	Description	Measurement
Fidelity	How accurately does the explanation reflect the model?	Correlation between explanation and prediction
Comprehensibility	How easily do users understand the explanation?	User comprehension tests
Actionability	Does the explanation suggest concrete actions?	User action completion rate
Trust	Does the explanation build appropriate trust?	User trust surveys
Satisfaction	Are users satisfied with the explanation?	User satisfaction ratings

The Future of Explainable AI in Finance

Explainable AI is rapidly evolving. Whistl is researching:

Natural language generation: More sophisticated, personalised explanations
Multi-modal explanations: Combining text, visuals, and interactive elements
Causal explanations: Moving beyond correlation to explain causal mechanisms
User-adaptive explanations: Tailoring explanation style to individual preferences

Getting Started with Whistl

Experience AI that explains itself. Whistl's explainable predictions help you understand your financial behaviour while building the self-awareness needed for lasting change.

Transparent AI for Your Financial wellbeing

Join thousands of Australians using Whistl's explainable AI to understand and improve their spending patterns with complete transparency.

Download Whistl Free Learn More

Crisis Support Resources

If you're experiencing severe financial distress or gambling-related harm, professional support is available:

Gambling Help: 1800 858 858 (24/7, free and confidential)
Lifeline: 13 11 14 (24/7 crisis support)
Beyond Blue: 1300 22 4636 (mental health support)
Financial Counselling Australia: 1800 007 007