Feature Importance for Risk Signal Interpretation: What Drives Impulse Predictions
Whistl tracks 27 Risk Signals—but not all signals matter equally for every prediction. Feature importance techniques reveal which signals drive each prediction, enabling transparent explanations and helping users understand their unique risk patterns.
Why Feature Importance Matters
When Whistl predicts elevated impulse risk, users deserve to know why. Feature importance answers critical questions:
- Which of the 27 Risk Signals contributed most to this prediction?
- Is stress level more important than location for this user?
- How does time since payday interact with spending velocity?
- Which signals should the user focus on to reduce risk?
Beyond transparency, feature importance guides model improvement, identifies redundant signals, and reveals insights about behavioural patterns across the user base.
Methods for Calculating Feature Importance
Whistl uses multiple complementary techniques to assess feature importance:
1. Permutation Importance
Permutation importance measures how much model performance drops when a feature's values are randomly shuffled. If shuffling a feature causes a large performance drop, that feature is important.
import numpy as np
from sklearn.metrics import roc_auc_score
def calculate_permutation_importance(model, X, y, n_repeats=10):
"""
Calculate permutation importance for all features.
Args:
model: Trained model
X: Feature matrix
y: True labels
n_repeats: Number of shuffles per feature
Returns:
Dictionary mapping feature names to importance scores
"""
# Baseline score
baseline_score = roc_auc_score(y, model.predict_proba(X)[:, 1])
importance_scores = {}
for feature_idx in range(X.shape[1]):
scores = []
for _ in range(n_repeats):
# Create shuffled copy
X_shuffled = X.copy()
np.random.shuffle(X_shuffled[:, feature_idx])
# Calculate score with shuffled feature
shuffled_score = roc_auc_score(
y,
model.predict_proba(X_shuffled)[:, 1]
)
# Importance = drop in performance
importance = baseline_score - shuffled_score
scores.append(importance)
importance_scores[f'feature_{feature_idx}'] = np.mean(scores)
# Sort by importance
sorted_importance = sorted(
importance_scores.items(),
key=lambda x: x[1],
reverse=True
)
return sorted_importance
Advantages: Model-agnostic, captures non-linear relationships, easy to interpret.
Limitations: Computationally expensive for large datasets, can be biased by correlated features.
2. SHAP Values (SHapley Additive exPlanations)
SHAP values provide a game-theoretic approach to feature importance. Based on Shapley values from cooperative game theory, SHAP fairly distributes the prediction among all features.
import shap
import numpy as np
class SHAPFeatureImportance:
def __init__(self, model, background_data, feature_names):
"""
Initialize SHAP explainer.
Args:
model: Trained model
background_data: Representative sample for baseline
feature_names: List of feature names
"""
self.model = model
self.feature_names = feature_names
# Use KernelExplainer for model-agnostic SHAP
self.explainer = shap.KernelExplainer(
model.predict_proba,
background_data,
feature_names=feature_names
)
def get_global_importance(self, X_sample):
"""
Calculate global feature importance across dataset.
"""
shap_values = self.explainer.shap_values(X_sample)
# Mean absolute SHAP value per feature
importance = np.mean(np.abs(shap_values), axis=0)
# Create ranked list
feature_importance = list(zip(self.feature_names, importance))
feature_importance.sort(key=lambda x: x[1], reverse=True)
return feature_importance
def get_local_importance(self, single_instance):
"""
Get feature importance for a single prediction.
"""
shap_values = self.explainer.shap_values(single_instance)
# Format as explanation
explanation = {
'base_value': self.explainer.expected_value[1],
'prediction': self.model.predict_proba(single_instance)[0, 1],
'feature_contributions': []
}
for i, name in enumerate(self.feature_names):
explanation['feature_contributions'].append({
'feature': name,
'value': single_instance[0, i],
'contribution': shap_values[0, i],
'direction': 'increases' if shap_values[0, i] > 0 else 'decreases'
})
# Sort by absolute contribution
explanation['feature_contributions'].sort(
key=lambda x: abs(x['contribution']),
reverse=True
)
return explanation
# Example usage
shap_importance = SHAPFeatureImportance(
model=whistl_model,
background_data=X_train[:100],
feature_names=RISK_SIGNAL_NAMES
)
global_importance = shap_importance.get_global_importance(X_test)
Advantages: Theoretically grounded, consistent (sum to prediction), works for any model.
Limitations: Computationally intensive, requires background data selection.
3. Tree-Based Feature Importance
Tree-based models (Random Forests, XGBoost) provide built-in feature importance based on how much each feature reduces impurity across all splits.
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
def get_tree_feature_importance(model, feature_names):
"""
Extract feature importance from tree-based model.
"""
importances = model.feature_importances_
# Create DataFrame for easy viewing
importance_df = pd.DataFrame({
'feature': feature_names,
'importance': importances
})
# Sort by importance
importance_df = importance_df.sort_values('importance', ascending=False)
return importance_df
# Example output from Whistl's Random Forest:
# feature importance
# 0 stress_level 0.1823
# 1 time_since_payday 0.1547
# 2 location_risk 0.1234
# 3 spending_velocity 0.1089
# 4 category_momentum 0.0923
# 5 sleep_quality 0.0812
# 6 social_context 0.0701
# 7 accountability_active 0.0634
# 8 time_of_day_risk 0.0567
# 9 merchant_risk_score 0.0489
# ...
The 27 Risk Signals Ranked by Importance
Across Whistl's entire user base, here's how the 27 Risk Signals rank by average importance:
| Rank | Risk Signal | Category | Importance |
|---|---|---|---|
| 1 | Stress Level | Emotional | 18.2% |
| 2 | Time Since Payday | Temporal | 15.5% |
| 3 | Location Risk Score | Contextual | 12.3% |
| 4 | Spending Velocity | Behavioural | 10.9% |
| 5 | Category Momentum | Behavioural | 9.2% |
| 6 | Sleep Quality | Physiological | 8.1% |
| 7 | Social Context | Contextual | 7.0% |
| 8 | Accountability Active | Protective | 6.3% |
| 9 | Time of Day Risk | Temporal | 5.7% |
| 10 | Merchant Risk Score | Contextual | 4.9% |
Note: Importance percentages are approximate and vary by user. Remaining 17 signals account for ~12% combined.
Personalised Feature Importance
Global importance rankings don't tell the whole story. Different users have different risk patterns:
User-Specific Importance
def get_personalised_importance(model, user_data, feature_names):
"""
Calculate feature importance specific to one user's patterns.
"""
# Use SHAP for user-specific explanations
explainer = shap.KernelExplainer(model.predict_proba, user_data[:50])
shap_values = explainer.shap_values(user_data)
# Mean absolute SHAP for this user
user_importance = np.mean(np.abs(shap_values), axis=0)
# Rank features for this user
user_ranking = list(zip(feature_names, user_importance))
user_ranking.sort(key=lambda x: x[1], reverse=True)
return user_ranking
# Example: Sarah's personal risk signal ranking
sarah_importance = [
('stress_level', 0.24), # Much higher than average
('social_context', 0.18), # Social spending is her trigger
('time_since_payday', 0.12),
('location_risk', 0.10),
('category_momentum', 0.09),
# ...
]
# Example: Marcus's personal risk signal ranking
marcus_importance = [
('time_of_day_risk', 0.22), # Late-night spending is his issue
('sleep_quality', 0.19), # Poor sleep → poor decisions
('stress_level', 0.15),
('location_risk', 0.13),
('spending_velocity', 0.11),
# ...
]
Context-Dependent Importance
Feature importance can also vary by context. For the same user:
- Weekday predictions: Stress level and time pressure dominate
- Weekend predictions: Social context and location matter more
- Post-payday: Time since payday is highly predictive
- End of month: Financial stress signals increase in importance
Feature Interactions
Features don't act in isolation. Feature interaction analysis reveals how signals combine:
SHAP Interaction Values
# Calculate SHAP interaction values
shap_interaction_values = explainer.shap_interaction_values(X_sample)
# Interaction between stress and time_since_payday
# Shows how stress effect changes based on payday proximity
interaction_effect = shap_interaction_values[:, 0, 1] # Features 0 and 1
# Visualise interaction
shap.dependence_plot(
(0, 1), # Interaction between feature 0 and 1
shap_interaction_values,
X_sample,
feature_names=feature_names
)
Key interactions discovered in Whistl data:
- Stress × Time Since Payday: Stress is more predictive right after payday
- Location × Time of Day: Certain locations are only risky at specific times
- Sleep × Stress: Poor sleep amplifies the effect of stress
- Accountability × All Signals: Active accountability reduces impact of all risk signals
Using Feature Importance for User Insights
Feature importance isn't just for model debugging—it provides actionable insights for users:
Personal Risk Profile
def generate_risk_profile(user_importance, global_importance):
"""
Generate personalised risk profile comparing user to average.
"""
profile = {
'primary_triggers': [],
'secondary_triggers': [],
'protective_factors': [],
'unique_patterns': []
}
for feature, user_imp in user_importance[:5]:
global_imp = dict(global_importance).get(feature, 0)
ratio = user_imp / global_imp if global_imp > 0 else float('inf')
if ratio > 1.5:
profile['primary_triggers'].append({
'feature': feature,
'user_importance': user_imp,
'vs_average': f"{ratio:.1f}x higher than average"
})
elif ratio > 1.0:
profile['secondary_triggers'].append(feature)
elif ratio < 0.5:
profile['unique_patterns'].append({
'feature': feature,
'note': 'Less influential for you than typical users'
})
# Identify protective factors (features that reduce risk)
protective = ['accountability_active', 'goal_clarity', 'mindfulness_practice']
profile['protective_factors'] = [
f for f in protective
if dict(user_importance).get(f, 0) > 0.05
]
return profile
# Example output for user:
# {
# 'primary_triggers': [
# {'feature': 'social_context', 'user_importance': 0.18, 'vs_average': '2.6x higher than average'}
# ],
# 'secondary_triggers': ['stress_level', 'location_risk'],
# 'protective_factors': ['accountability_active'],
# 'unique_patterns': [
# {'feature': 'time_since_payday', 'note': 'Less influential for you than typical users'}
# ]
# }
"Seeing my personal risk profile was eye-opening. I thought stress was my main trigger, but Whistl showed me that social situations were actually 3x more predictive for me. That insight changed how I approach spending—I now plan ahead for social events instead of just trying to 'stress less'."
Feature Selection and Model Efficiency
Feature importance guides model optimisation by identifying redundant or low-value signals:
Recursive Feature Elimination
from sklearn.feature_selection import RFE
# Start with all 27 signals
# Iteratively remove least important and measure performance
model = RandomForestClassifier(n_estimators=100)
rfe = RFE(estimator=model, n_features_to_select=15, step=1)
rfe.fit(X_train, y_train)
# Get ranking (1 = selected, higher = eliminated earlier)
feature_ranking = list(zip(feature_names, rfe.ranking_))
# Results: Can reduce from 27 to 15 features with minimal performance loss
# Important for mobile efficiency
The Future of Feature Analysis
Whistl continues to advance feature importance techniques:
- Causal feature importance: Distinguishing correlation from causation
- Time-varying importance: Tracking how feature importance changes over time
- Counterfactual importance: Measuring importance through hypothetical interventions
- Group fairness analysis: Ensuring feature importance doesn't encode bias
Getting Started with Whistl
Understand your unique risk patterns with Whistl's personalised feature importance analysis. See which signals matter most for your spending behaviour and focus your efforts where they'll have the greatest impact.
Discover Your Personal Risk Patterns
Join thousands of Australians using Whistl to understand which risk signals matter most for their unique spending behaviour.
Crisis Support Resources
If you're experiencing severe financial distress or gambling-related harm, professional support is available:
- Gambling Help: 1800 858 858 (24/7, free and confidential)
- Lifeline: 13 11 14 (24/7 crisis support)
- Beyond Blue: 1300 22 4636 (mental health support)
- Financial Counselling Australia: 1800 007 007