On-Device ML: Core ML vs TensorFlow Lite Implementation

Whistl runs all AI processing on your device—never in the cloud. This technical deep dive compares Core ML (iOS) and TensorFlow Lite (Android), explaining model conversion, hardware acceleration, performance optimisation, and why on-device processing is essential for financial privacy.

Why On-Device Machine Learning?

Cloud-based AI requires sending sensitive data to remote servers. For a financial behaviour app, this creates unacceptable risks:

Privacy exposure: Transaction data, location, biometrics leave your device
Latency: Network round-trip adds 100-500ms delay
Offline failure: No connectivity = no protection
Cost: Cloud inference at scale is expensive

On-device ML solves all four problems while enabling real-time intervention.

Core ML (iOS Implementation)

Apple's Core ML framework provides native machine learning support for iOS, iPadOS, and macOS.

Model Format and Conversion

Whistl's neural network is trained in PyTorch, then converted to Core ML format:

# PyTorch to Core ML conversion
import coremltools as ct

# Load trained PyTorch model
torch_model = torch.load('whistl_impulse_predictor.pt')
torch_model.eval()

# Create example input (56 features)
example_input = torch.randn(1, 56)

# Trace and convert
traced_model = torch.jit.trace(torch_model, example_input)
mlmodel = ct.convert(
    traced_model,
    inputs=[ct.TensorType(shape=example_input.shape, name='features')],
    convert_to='mlprogram'  # MIL backend for best performance
)

# Save Core ML model
mlmodel.save('WhistlImpulsePredictor.mlpackage')

Model Architecture in Core ML

Model: WhistlImpulsePredictor
Input: 56 Float32 features
Output: 1 Float32 probability

Layer Configuration:
├── Input (56)
├── Dense(56→128) + ReLU
├── Dense(128→64) + ReLU
├── Dense(64→32) + ReLU
├── Dense(32→1) + Sigmoid
└── Output (1)

Model Size: 450KB (compressed)
Quantisation: Float16 (optional Int8 for smaller size)

Hardware Acceleration

Core ML automatically routes inference to the optimal hardware:

Device	Neural Engine	GPU	CPU
iPhone 12+	16-core (primary)	Fallback	Fallback
iPhone 11	8-core (primary)	Fallback	Fallback
iPhone X/XS	Not available	Primary	Fallback
iPad Pro	16-core (primary)	Fallback	Fallback

Neural Engine delivers 15x faster inference than CPU with 1/10th the power consumption.

Inference Code (Swift)

import CoreML

class ImpulsePredictor {
    private let model: WhistlImpulsePredictor
    
    init() {
        let config = MLModelConfiguration()
        config.computeUnits = .all  // Use Neural Engine + GPU + CPU
        self.model = try! WhistlImpulsePredictor(configuration: config)
    }
    
    func predict(features: [Float]) -> Float {
        // Convert array to MLMultiArray
        let multiArray = try! MLMultiArray(shape: [56], dataType: .float32)
        for (index, value) in features.enumerated() {
            multiArray[index] = NSNumber(value: value)
        }
        
        // Run inference
        let output = try! model.prediction(features: multiArray)
        return output.probability
    }
}

// Usage
let predictor = ImpulsePredictor()
let risk = predictor.predict(features: userFeatures)
if risk > 0.6 {
    activateIntervention()
}

Performance Benchmarks (iOS)

Device	Neural Engine	Inference Time	Power Draw
iPhone 15 Pro	16-core	3.2ms	12mW
iPhone 14	16-core	4.1ms	15mW
iPhone 13	16-core	5.8ms	18mW
iPhone 12	8-core	8.4ms	22mW
iPhone 11	8-core	12.1ms	28mW

All devices achieve real-time inference (<50ms) with negligible battery impact.

TensorFlow Lite (Android Implementation)

Google's TensorFlow Lite provides on-device ML for Android and other platforms.

Model Format and Conversion

PyTorch models are converted to TFLite format via ONNX:

# PyTorch to TFLite conversion (via ONNX)
import torch
import onnx
import tensorflow as tf

# Export PyTorch to ONNX
torch_model = torch.load('whistl_impulse_predictor.pt')
dummy_input = torch.randn(1, 56)
torch.onnx.export(
    torch_model,
    dummy_input,
    'whistl_model.onnx',
    input_names=['features'],
    output_names=['probability'],
    opset_version=13
)

# Convert ONNX to TFLite
converter = tf.lite.TFLiteConverter.from_onnx_file('whistl_model.onnx')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()

# Save TFLite model
with open('whistl_impulse_predictor.tflite', 'wb') as f:
    f.write(tflite_model)

Model Quantisation

TFLite supports aggressive quantisation for smaller model size:

Quantisation Type	Model Size	Accuracy Loss	Speed Gain
Float32 (full precision)	900KB	0%	1.0x
Float16 (half precision)	450KB	<0.1%	1.5x
Int8 (full integer)	225KB	0.3-0.5%	2.5x

Whistl uses Float16 quantisation—50% size reduction with negligible accuracy impact.

Hardware Acceleration (Android)

TFLite delegates inference to available hardware accelerators:

// TFLite Interpreter with delegates
val interpreterOptions = Interpreter.Options()

// Try GPU delegate first (fastest for most devices)
try {
    val gpuDelegate = GpuDelegate()
    interpreterOptions.addDelegate(gpuDelegate)
} catch (e: Exception) {
    // GPU not available
}

// Try NNAPI delegate (Android Neural Networks API)
try {
    val nnapiDelegate = NnApiDelegate()
    interpreterOptions.addDelegate(nnapiDelegate)
} catch (e: Exception) {
    // NNAPI not available
}

// Fallback to CPU (always available)
val interpreter = Interpreter(modelBuffer, interpreterOptions)

Inference Code (Kotlin)

import org.tensorflow.lite.Interpreter
import org.tensorflow.lite.support.common.FileUtil

class ImpulsePredictor(context: Context) {
    private var interpreter: Interpreter
    
    init {
        val model = FileUtil.loadMappedFile(context, "whistl_impulse_predictor.tflite")
        val options = Interpreter.Options()
        options.setNumThreads(4)  // Use 4 CPU threads
        options.setUseXNNPACK(true)  // Enable XNNPACK delegate
        interpreter = Interpreter(model, options)
    }
    
    fun predict(features: FloatArray): Float {
        val input = features.reshape(1, 56)
        val output = Array(1) { FloatArray(1) }
        
        interpreter.run(input, output)
        return output[0][0]
    }
    
    fun close() {
        interpreter.close()
    }
}

Performance Benchmarks (Android)

Device	Accelerator	Inference Time	Power Draw
Pixel 8 Pro	Tensor G3 TPU	4.5ms	14mW
Samsung S24	Snapdragon 8 Gen 3	5.2ms	16mW
Pixel 7	Tensor G2 TPU	6.8ms	19mW
OnePlus 11	Snapdragon 8 Gen 2	7.4ms	21mW
Pixel 6	Tensor G1 TPU	9.1ms	24mW

Core ML vs TensorFlow Lite: Comparison

Feature	Core ML (iOS)	TensorFlow Lite (Android)
Model Format	.mlpackage	.tflite
Hardware Acceleration	Neural Engine (dedicated)	GPU/NNAPI/TPU (varies)
Conversion Complexity	Moderate (direct PyTorch)	Higher (via ONNX)
Model Size (Float16)	450KB	450KB
Avg Inference Time	5.8ms	6.6ms
Power Efficiency	Excellent (dedicated NPU)	Good (shared GPU/TPU)
Offline Support	Full	Full
Privacy	On-device only	On-device only
Dynamic Updates	App Store required	Play Store or OTA
Debugging Tools	Xcode Core ML debugger	TFLite Model Explorer

Model Update Strategy

Whistl updates ML models through different mechanisms for each platform:

iOS: App Store Updates

Process: New model bundled with app update
Frequency: Monthly model improvements
Advantage: Guaranteed model integrity
Disadvantage: Requires full app download

Android: OTA Model Downloads

Process: Play Feature Delivery or custom CDN
Frequency: Weekly model improvements
Advantage: Smaller downloads, faster iteration
Disadvantage: Requires network connectivity

Model Versioning

{
  "model_version": "2026.03.01",
  "architecture": "feedforward_56_128_64_32_1",
  "quantisation": "float16",
  "training_date": "2026-02-28",
  "accuracy": 0.842,
  "min_ios_version": "15.0",
  "min_android_api": 26,
  "changelog": [
    "Improved payday proximity detection",
    "Enhanced HRV feature weighting",
    "Reduced false positives for shopping"
  ]
}

Federated Learning for Privacy

Whistl uses federated learning to improve models without collecting raw data:

Federated Learning Workflow

Local training: Each device trains on personal data overnight
Gradient computation: Calculate weight updates (not raw data)
Differential privacy: Add calibrated noise to gradients
Secure upload: Encrypted gradient transmission to server
Aggregation: Server averages gradients from thousands of devices
Global update: Improved model distributed to all users

Privacy Guarantees

Raw data never leaves device: Only gradient updates transmitted
Differential privacy: ε=0.1 privacy budget per update
Secure aggregation: Server sees only aggregated updates
Device-level encryption: TLS 1.3 for all communications

Battery Optimisation

On-device ML must be power-efficient. Whistl implements several optimisations:

Batch Processing

Instead of continuous inference, Whistl batches predictions:

Normal mode: Predict every 5 minutes
Elevated risk: Predict every 1 minute
High risk: Predict every 30 seconds
Sleep mode: Predict every 30 minutes (when stationary + night)

Adaptive Frequency

func calculateInferenceInterval(riskScore: Float) -> TimeInterval {
    switch riskScore {
    case 0.0..<0.4:
        return 300  // 5 minutes
    case 0.4..<0.6:
        return 60   // 1 minute
    case 0.6..<0.8:
        return 30   // 30 seconds
    default:
        return 10   // 10 seconds (critical)
    }
}

Battery Impact

Usage Pattern	Daily Battery Impact
Normal (low risk)	2-3%
Elevated (moderate risk)	4-5%
High risk (frequent intervention)	6-8%
Continuous monitoring (debug mode)	15-20%

Debugging and Monitoring

Production ML requires robust debugging and monitoring:

Model Performance Tracking

Prediction latency: Log inference time for each prediction
Output distribution: Track risk score histogram
Accuracy validation: Compare predictions to actual outcomes
Drift detection: Alert if prediction distribution shifts

Debug Tools

Xcode Core ML Debugger: Visualise layer activations (iOS)
TFLite Model Explorer: Inspect model graph (Android)
Custom logging: Feature importance per prediction

Security Considerations

On-device models must be protected from tampering:

Model Integrity

Code signing: Models signed with Whistl private key
Hash verification: SHA-256 checksum before loading
Runtime attestation: Verify model hasn't been modified

Reverse Engineering Protection

Model encryption: Weights encrypted at rest
Obfuscation: Layer names and structure obfuscated
Jailbreak detection: Disable ML on compromised devices

Conclusion

On-device machine learning is essential for privacy-first financial apps. Core ML and TensorFlow Lite both provide excellent frameworks for running neural networks locally—with dedicated hardware acceleration delivering sub-10ms inference and minimal battery impact.

Whistl's implementation demonstrates that sophisticated AI doesn't require cloud processing. Your data stays on your device, predictions happen in real-time, and protection works even offline.

Experience Privacy-First AI

Whistl's on-device neural networks predict impulses without sending your data to the cloud. Download free and experience private AI.

Download Whistl Free