Gradient Boosting for Spending Classification: The Engine Behind Accurate Predictions

Gradient boosting algorithms—XGBoost, LightGBM, and CatBoost—dominate machine learning competitions and production systems alike. Discover how Whistl leverages these powerful algorithms to classify spending behaviour with exceptional accuracy.

Understanding Gradient Boosting

Gradient boosting builds an ensemble of weak learners (typically decision trees) sequentially, with each new tree correcting the errors of its predecessors. The "gradient" refers to using gradient descent to minimise a loss function.

Unlike Random Forests which train trees independently, gradient boosting trains trees sequentially:

Start with a simple prediction (e.g., mean of target)
Calculate residuals (errors) of current predictions
Train a tree to predict these residuals
Add tree's predictions to current predictions (scaled by learning rate)
Repeat until convergence or maximum trees

The Mathematics of Gradient Boosting

import numpy as np

class SimpleGradientBoostingClassifier:
    """
    Simplified gradient boosting classifier for educational purposes.
    """
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.trees = []
        self.initial_prediction = None
    
    def _log_loss(self, y_true, y_pred_proba):
        """Calculate log loss (cross-entropy)."""
        epsilon = 1e-15
        y_pred_proba = np.clip(y_pred_proba, epsilon, 1 - epsilon)
        return -np.mean(y_true * np.log(y_pred_proba) + 
                       (1 - y_true) * np.log(1 - y_pred_proba))
    
    def _sigmoid(self, x):
        """Sigmoid function for binary classification."""
        return 1 / (1 + np.exp(-x))
    
    def fit(self, X, y):
        """
        Train gradient boosting classifier.
        """
        n_samples = len(X)
        
        # Initial prediction (log-odds of positive class)
        self.initial_prediction = np.log(y.mean() / (1 - y.mean()))
        F = np.full(n_samples, self.initial_prediction)
        
        for i in range(self.n_estimators):
            # Convert to probabilities
            proba = self._sigmoid(F)
            
            # Calculate negative gradient (residuals for log loss)
            residuals = y - proba
            
            # Fit tree to residuals
            tree = DecisionTreeRegressor(max_depth=self.max_depth)
            tree.fit(X, residuals)
            self.trees.append(tree)
            
            # Update predictions
            F += self.learning_rate * tree.predict(X)
        
        return self
    
    def predict_proba(self, X):
        """Predict class probabilities."""
        F = np.full(len(X), self.initial_prediction)
        
        for tree in self.trees:
            F += self.learning_rate * tree.predict(X)
        
        proba = self._sigmoid(F)
        return np.column_stack([1 - proba, proba])

XGBoost: Extreme Gradient Boosting

XGBoost is the most popular gradient boosting implementation, known for its speed and performance. Whistl uses XGBoost as a core component of our spending classification pipeline.

Key XGBoost Features

Regularisation: L1 (Lasso) and L2 (Ridge) penalties prevent overfitting
Handling missing values: Learns optimal default directions
Parallel processing: Tree construction parallelised across CPU cores
Tree pruning: Removes branches with negative gain
Handling imbalanced data: Scale_pos_weight parameter

XGBoost Configuration for Whistl

import xgboost as xgb
from sklearn.model_selection import GridSearchCV

# Base configuration for spending classification
xgb_params = {
    'objective': 'binary:logistic',      # Binary classification
    'eval_metric': 'auc',                 # Optimize for AUC
    'max_depth': 6,                       # Tree depth (controls complexity)
    'learning_rate': 0.05,                # Step size shrinkage
    'n_estimators': 200,                  # Number of trees
    'subsample': 0.8,                     # Row subsampling (reduces overfitting)
    'colsample_bytree': 0.8,              # Column subsampling
    'colsample_bylevel': 0.8,             # Subsampling per level
    'reg_alpha': 0.1,                     # L1 regularisation
    'reg_lambda': 1.0,                    # L2 regularisation
    'scale_pos_weight': 3.0,              # Handle class imbalance
    'min_child_weight': 3,                # Minimum samples per leaf
    'gamma': 0.1,                         # Minimum loss reduction for split
    'random_state': 42
}

# Create and train model
model = xgb.XGBClassifier(**xgb_params)
model.fit(
    X_train, 
    y_train,
    eval_set=[(X_val, y_val)],  # Validation set for early stopping
    early_stopping_rounds=20,    # Stop if no improvement for 20 rounds
    verbose=True
)

# Best model is automatically selected based on validation performance
best_model = model

Hyperparameter Tuning with XGBoost

# Grid search for optimal hyperparameters
param_grid = {
    'max_depth': [4, 6, 8],
    'learning_rate': [0.01, 0.05, 0.1],
    'n_estimators': [100, 200, 300],
    'subsample': [0.7, 0.8, 0.9],
    'colsample_bytree': [0.7, 0.8, 0.9],
    'scale_pos_weight': [2, 3, 4]
}

grid_search = GridSearchCV(
    estimator=xgb.XGBClassifier(objective='binary:logistic', random_state=42),
    param_grid=param_grid,
    scoring='roc_auc',
    cv=5,
    verbose=1,
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

print(f"Best parameters: {grid_search.best_params_}")
print(f"Best AUC: {grid_search.best_score_:.4f}")

LightGBM: Light Gradient Boosting Machine

LightGBM, developed by Microsoft, offers faster training and lower memory usage than XGBoost while maintaining comparable accuracy. It's particularly well-suited for Whistl's mobile deployment.

LightGBM Innovations

Gradient-based One-Side Sampling (GOSS): Keeps instances with large gradients
Exclusive Feature Bundling (EFB): Bundles mutually exclusive features
Leaf-wise growth: Grows trees leaf-by-leaf (vs. level-wise) for better accuracy
Categorical feature support: Native handling without one-hot encoding

LightGBM for Spending Classification

import lightgbm as lgb

# LightGBM configuration
lgb_params = {
    'objective': 'binary',
    'metric': 'auc',
    'boosting_type': 'gbdt',          # Gradient Boosting
    'num_leaves': 31,                  # Max leaves (alternative to max_depth)
    'learning_rate': 0.05,
    'feature_fraction': 0.8,           # Similar to colsample_bytree
    'bagging_fraction': 0.8,           # Similar to subsample
    'bagging_freq': 5,                 # Perform bagging every 5 iterations
    'min_child_samples': 20,           # Minimum samples per leaf
    'reg_alpha': 0.1,
    'reg_lambda': 1.0,
    'scale_pos_weight': 3.0,
    'verbose': -1
}

# Create datasets
train_data = lgb.Dataset(X_train, label=y_train, feature_name=feature_names)
val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)

# Train model
model = lgb.train(
    lgb_params,
    train_data,
    num_boost_round=500,
    valid_sets=[train_data, val_data],
    valid_names=['train', 'valid'],
    early_stopping_rounds=50,
    verbose_eval=50
)

# Make predictions
predictions = model.predict(X_test)

CatBoost: Categorical Boosting

CatBoost, developed by Yandex, excels at handling categorical features—common in spending data (merchant categories, location types, etc.). It automatically handles categorical variables without extensive preprocessing.

CatBoost Advantages

Native categorical support: No one-hot encoding needed
Ordered boosting: Reduces prediction shift and overfitting
Automatic handling of missing values: No imputation required
Symmetric trees: More robust to overfitting

CatBoost for Spending Classification

from catboost import CatBoostClassifier, Pool

# Identify categorical features
categorical_features = [
    'merchant_category',
    'location_type',
    'day_of_week',
    'payment_method',
    'accountability_partner_id'
]

# Create CatBoost pool (handles categorical features)
train_pool = Pool(
    X_train,
    y_train,
    cat_features=categorical_features,
    feature_names=feature_names
)

val_pool = Pool(
    X_val,
    y_val,
    cat_features=categorical_features,
    feature_names=feature_names
)

# CatBoost configuration
catboost_params = {
    'iterations': 500,
    'learning_rate': 0.05,
    'depth': 6,
    'loss_function': 'Logloss',
    'eval_metric': 'AUC',
    'task_type': 'GPU',  # Use GPU if available
    'early_stopping_rounds': 50,
    'use_best_model': True,
    'random_seed': 42,
    'scale_pos_weight': 3.0,
    'l2_leaf_reg': 3.0,
    'bagging_temperature': 0.8,
    'border_count': 254
}

# Train model
model = CatBoostClassifier(**catboost_params)
model.fit(
    train_pool,
    eval_set=val_pool,
    verbose=50
)

# Feature importance with categorical features
importance = model.get_feature_importance()
for name, imp in sorted(zip(feature_names, importance), key=lambda x: -x[1])[:10]:
    print(f"{name}: {imp:.4f}")

Comparing Gradient Boosting Implementations

Whistl has benchmarked all three implementations on spending classification tasks:

Metric	XGBoost	LightGBM	CatBoost
AUC-ROC	0.892	0.889	0.894
Training Time	45s	18s	62s
Prediction Time	12ms	8ms	15ms
Memory Usage	Medium	Low	High
Categorical Handling	Manual	Manual	Automatic
Mobile Deployment	Good	Excellent	Good

Handling Class Imbalance

Impulse purchases are rare compared to routine transactions. All three implementations offer strategies for handling class imbalance:

Scale Pos Weight

# Calculate scale_pos_weight for imbalanced data
n_negative = (y_train == 0).sum()
n_positive = (y_train == 1).sum()
scale_pos_weight = n_negative / n_positive

# For Whistl data: ~3:1 ratio of non-impulse to impulse
# scale_pos_weight = 3.0

# Apply to XGBoost
xgb_params['scale_pos_weight'] = scale_pos_weight

# Apply to LightGBM
lgb_params['scale_pos_weight'] = scale_pos_weight

# Apply to CatBoost
catboost_params['scale_pos_weight'] = scale_pos_weight

Focal Loss

Focal loss down-weights easy examples and focuses on hard-to-classify cases:

# Custom focal loss for XGBoost
def focal_loss(pred, dtrain, gamma=2.0):
    """Focal loss for handling class imbalance."""
    p = 1 / (1 + np.exp(-pred))
    y = dtrain.get_label()
    
    # Focal loss gradient
    grad = gamma * (1 - p) ** (gamma - 1) * (p - y)
    
    # Focal loss hessian
    hess = gamma * (1 - p) ** (gamma - 1) * p * (1 - p)
    
    return grad, hess

# Use custom objective
model = xgb.train(
    {'eval_metric': 'auc'},
    train_data,
    num_boost_round=500,
    obj=focal_loss
)

Model Interpretability with Gradient Boosting

Gradient boosting models provide several interpretability features:

Feature Importance

# XGBoost feature importance
xgb.plot_importance(model, max_num_features=15)

# LightGBM feature importance
lgb.plot_importance(model, max_num_features=15)

# CatBoost feature importance
catboost.plot_importance(model, max_num_features=15)

SHAP Values

import shap

# SHAP for XGBoost
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Summary plot
shap.summary_plot(shap_values, X_test, feature_names=feature_names)

# Dependence plot for specific feature
shap.dependence_plot('stress_level', shap_values, X_test, feature_names=feature_names)

"The accuracy of Whistl's predictions is impressive. As someone who works in data science, I asked how they achieved it. Learning they use gradient boosting—specifically an ensemble of XGBoost and LightGBM—made perfect sense. These are battle-tested algorithms that dominate Kaggle competitions for good reason."
— Alex T., Whistl user since 2025

Production Deployment Considerations

Deploying gradient boosting models in production requires attention to:

Model Serialization

# Save XGBoost model
model.save_model('whistl_spending_classifier.json')

# Load model
loaded_model = xgb.XGBClassifier()
loaded_model.load_model('whistl_spending_classifier.json')

# For mobile deployment (CoreML for iOS)
import coremltools as ct

mlmodel = ct.convert(
    model,
    source='xgboost',
    inputs=[ct.TensorType(shape=X_train.shape)]
)
mlmodel.save('WhistlSpendingClassifier.mlmodel')

Monitoring and Retraining

def monitor_model_drift(model, X_new, y_new, threshold=0.05):
    """
    Monitor for model drift in production.
    """
    # Calculate current performance
    predictions = model.predict_proba(X_new)[:, 1]
    current_auc = roc_auc_score(y_new, predictions)
    
    # Compare to baseline
    baseline_auc = 0.89  # From validation
    
    drift = baseline_auc - current_auc
    
    if drift > threshold:
        print(f"Warning: Model drift detected! AUC dropped by {drift:.4f}")
        return True  # Trigger retraining
    else:
        return False  # Model still performing well

Getting Started with Whistl

Experience the power of gradient boosting-powered spending classification. Whistl's AI accurately identifies impulse risk patterns, enabling timely interventions that help you stay on track with your financial goals.

Accurate AI-Powered Spending Classification

Join thousands of Australians using Whistl's gradient boosting-based prediction system for reliable impulse detection.

Download Whistl Free Learn More

Crisis Support Resources

If you're experiencing severe financial distress or gambling-related harm, professional support is available:

Gambling Help: 1800 858 858 (24/7, free and confidential)
Lifeline: 13 11 14 (24/7 crisis support)
Beyond Blue: 1300 22 4636 (mental health support)
Financial Counselling Australia: 1800 007 007