On-Device ML: Core ML vs TensorFlow Lite Implementation

Whistl runs all AI processing on your device—never in the cloud. This technical deep dive compares Core ML (iOS) and TensorFlow Lite (Android), explaining model conversion, hardware acceleration, performance optimisation, and why on-device processing is essential for financial privacy.

Why On-Device Machine Learning?

Cloud-based AI requires sending sensitive data to remote servers. For a financial behaviour app, this creates unacceptable risks:

  • Privacy exposure: Transaction data, location, biometrics leave your device
  • Latency: Network round-trip adds 100-500ms delay
  • Offline failure: No connectivity = no protection
  • Cost: Cloud inference at scale is expensive

On-device ML solves all four problems while enabling real-time intervention.

Core ML (iOS Implementation)

Apple's Core ML framework provides native machine learning support for iOS, iPadOS, and macOS.

Model Format and Conversion

Whistl's neural network is trained in PyTorch, then converted to Core ML format:

# PyTorch to Core ML conversion
import coremltools as ct

# Load trained PyTorch model
torch_model = torch.load('whistl_impulse_predictor.pt')
torch_model.eval()

# Create example input (56 features)
example_input = torch.randn(1, 56)

# Trace and convert
traced_model = torch.jit.trace(torch_model, example_input)
mlmodel = ct.convert(
    traced_model,
    inputs=[ct.TensorType(shape=example_input.shape, name='features')],
    convert_to='mlprogram'  # MIL backend for best performance
)

# Save Core ML model
mlmodel.save('WhistlImpulsePredictor.mlpackage')

Model Architecture in Core ML

Model: WhistlImpulsePredictor
Input: 56 Float32 features
Output: 1 Float32 probability

Layer Configuration:
├── Input (56)
├── Dense(56→128) + ReLU
├── Dense(128→64) + ReLU
├── Dense(64→32) + ReLU
├── Dense(32→1) + Sigmoid
└── Output (1)

Model Size: 450KB (compressed)
Quantisation: Float16 (optional Int8 for smaller size)

Hardware Acceleration

Core ML automatically routes inference to the optimal hardware:

DeviceNeural EngineGPUCPU
iPhone 12+16-core (primary)FallbackFallback
iPhone 118-core (primary)FallbackFallback
iPhone X/XSNot availablePrimaryFallback
iPad Pro16-core (primary)FallbackFallback

Neural Engine delivers 15x faster inference than CPU with 1/10th the power consumption.

Inference Code (Swift)

import CoreML

class ImpulsePredictor {
    private let model: WhistlImpulsePredictor
    
    init() {
        let config = MLModelConfiguration()
        config.computeUnits = .all  // Use Neural Engine + GPU + CPU
        self.model = try! WhistlImpulsePredictor(configuration: config)
    }
    
    func predict(features: [Float]) -> Float {
        // Convert array to MLMultiArray
        let multiArray = try! MLMultiArray(shape: [56], dataType: .float32)
        for (index, value) in features.enumerated() {
            multiArray[index] = NSNumber(value: value)
        }
        
        // Run inference
        let output = try! model.prediction(features: multiArray)
        return output.probability
    }
}

// Usage
let predictor = ImpulsePredictor()
let risk = predictor.predict(features: userFeatures)
if risk > 0.6 {
    activateIntervention()
}

Performance Benchmarks (iOS)

DeviceNeural EngineInference TimePower Draw
iPhone 15 Pro16-core3.2ms12mW
iPhone 1416-core4.1ms15mW
iPhone 1316-core5.8ms18mW
iPhone 128-core8.4ms22mW
iPhone 118-core12.1ms28mW

All devices achieve real-time inference (<50ms) with negligible battery impact.

TensorFlow Lite (Android Implementation)

Google's TensorFlow Lite provides on-device ML for Android and other platforms.

Model Format and Conversion

PyTorch models are converted to TFLite format via ONNX:

# PyTorch to TFLite conversion (via ONNX)
import torch
import onnx
import tensorflow as tf

# Export PyTorch to ONNX
torch_model = torch.load('whistl_impulse_predictor.pt')
dummy_input = torch.randn(1, 56)
torch.onnx.export(
    torch_model,
    dummy_input,
    'whistl_model.onnx',
    input_names=['features'],
    output_names=['probability'],
    opset_version=13
)

# Convert ONNX to TFLite
converter = tf.lite.TFLiteConverter.from_onnx_file('whistl_model.onnx')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()

# Save TFLite model
with open('whistl_impulse_predictor.tflite', 'wb') as f:
    f.write(tflite_model)

Model Quantisation

TFLite supports aggressive quantisation for smaller model size:

Quantisation TypeModel SizeAccuracy LossSpeed Gain
Float32 (full precision)900KB0%1.0x
Float16 (half precision)450KB<0.1%1.5x
Int8 (full integer)225KB0.3-0.5%2.5x

Whistl uses Float16 quantisation—50% size reduction with negligible accuracy impact.

Hardware Acceleration (Android)

TFLite delegates inference to available hardware accelerators:

// TFLite Interpreter with delegates
val interpreterOptions = Interpreter.Options()

// Try GPU delegate first (fastest for most devices)
try {
    val gpuDelegate = GpuDelegate()
    interpreterOptions.addDelegate(gpuDelegate)
} catch (e: Exception) {
    // GPU not available
}

// Try NNAPI delegate (Android Neural Networks API)
try {
    val nnapiDelegate = NnApiDelegate()
    interpreterOptions.addDelegate(nnapiDelegate)
} catch (e: Exception) {
    // NNAPI not available
}

// Fallback to CPU (always available)
val interpreter = Interpreter(modelBuffer, interpreterOptions)

Inference Code (Kotlin)

import org.tensorflow.lite.Interpreter
import org.tensorflow.lite.support.common.FileUtil

class ImpulsePredictor(context: Context) {
    private var interpreter: Interpreter
    
    init {
        val model = FileUtil.loadMappedFile(context, "whistl_impulse_predictor.tflite")
        val options = Interpreter.Options()
        options.setNumThreads(4)  // Use 4 CPU threads
        options.setUseXNNPACK(true)  // Enable XNNPACK delegate
        interpreter = Interpreter(model, options)
    }
    
    fun predict(features: FloatArray): Float {
        val input = features.reshape(1, 56)
        val output = Array(1) { FloatArray(1) }
        
        interpreter.run(input, output)
        return output[0][0]
    }
    
    fun close() {
        interpreter.close()
    }
}

Performance Benchmarks (Android)

DeviceAcceleratorInference TimePower Draw
Pixel 8 ProTensor G3 TPU4.5ms14mW
Samsung S24Snapdragon 8 Gen 35.2ms16mW
Pixel 7Tensor G2 TPU6.8ms19mW
OnePlus 11Snapdragon 8 Gen 27.4ms21mW
Pixel 6Tensor G1 TPU9.1ms24mW

Core ML vs TensorFlow Lite: Comparison

FeatureCore ML (iOS)TensorFlow Lite (Android)
Model Format.mlpackage.tflite
Hardware AccelerationNeural Engine (dedicated)GPU/NNAPI/TPU (varies)
Conversion ComplexityModerate (direct PyTorch)Higher (via ONNX)
Model Size (Float16)450KB450KB
Avg Inference Time5.8ms6.6ms
Power EfficiencyExcellent (dedicated NPU)Good (shared GPU/TPU)
Offline SupportFullFull
PrivacyOn-device onlyOn-device only
Dynamic UpdatesApp Store requiredPlay Store or OTA
Debugging ToolsXcode Core ML debuggerTFLite Model Explorer

Model Update Strategy

Whistl updates ML models through different mechanisms for each platform:

iOS: App Store Updates

  • Process: New model bundled with app update
  • Frequency: Monthly model improvements
  • Advantage: Guaranteed model integrity
  • Disadvantage: Requires full app download

Android: OTA Model Downloads

  • Process: Play Feature Delivery or custom CDN
  • Frequency: Weekly model improvements
  • Advantage: Smaller downloads, faster iteration
  • Disadvantage: Requires network connectivity

Model Versioning

{
  "model_version": "2026.03.01",
  "architecture": "feedforward_56_128_64_32_1",
  "quantisation": "float16",
  "training_date": "2026-02-28",
  "accuracy": 0.842,
  "min_ios_version": "15.0",
  "min_android_api": 26,
  "changelog": [
    "Improved payday proximity detection",
    "Enhanced HRV feature weighting",
    "Reduced false positives for shopping"
  ]
}

Federated Learning for Privacy

Whistl uses federated learning to improve models without collecting raw data:

Federated Learning Workflow

  1. Local training: Each device trains on personal data overnight
  2. Gradient computation: Calculate weight updates (not raw data)
  3. Differential privacy: Add calibrated noise to gradients
  4. Secure upload: Encrypted gradient transmission to server
  5. Aggregation: Server averages gradients from thousands of devices
  6. Global update: Improved model distributed to all users

Privacy Guarantees

  • Raw data never leaves device: Only gradient updates transmitted
  • Differential privacy: ε=0.1 privacy budget per update
  • Secure aggregation: Server sees only aggregated updates
  • Device-level encryption: TLS 1.3 for all communications

Battery Optimisation

On-device ML must be power-efficient. Whistl implements several optimisations:

Batch Processing

Instead of continuous inference, Whistl batches predictions:

  • Normal mode: Predict every 5 minutes
  • Elevated risk: Predict every 1 minute
  • High risk: Predict every 30 seconds
  • Sleep mode: Predict every 30 minutes (when stationary + night)

Adaptive Frequency

func calculateInferenceInterval(riskScore: Float) -> TimeInterval {
    switch riskScore {
    case 0.0..<0.4:
        return 300  // 5 minutes
    case 0.4..<0.6:
        return 60   // 1 minute
    case 0.6..<0.8:
        return 30   // 30 seconds
    default:
        return 10   // 10 seconds (critical)
    }
}

Battery Impact

Usage PatternDaily Battery Impact
Normal (low risk)2-3%
Elevated (moderate risk)4-5%
High risk (frequent intervention)6-8%
Continuous monitoring (debug mode)15-20%

Debugging and Monitoring

Production ML requires robust debugging and monitoring:

Model Performance Tracking

  • Prediction latency: Log inference time for each prediction
  • Output distribution: Track risk score histogram
  • Accuracy validation: Compare predictions to actual outcomes
  • Drift detection: Alert if prediction distribution shifts

Debug Tools

  • Xcode Core ML Debugger: Visualise layer activations (iOS)
  • TFLite Model Explorer: Inspect model graph (Android)
  • Custom logging: Feature importance per prediction

Security Considerations

On-device models must be protected from tampering:

Model Integrity

  • Code signing: Models signed with Whistl private key
  • Hash verification: SHA-256 checksum before loading
  • Runtime attestation: Verify model hasn't been modified

Reverse Engineering Protection

  • Model encryption: Weights encrypted at rest
  • Obfuscation: Layer names and structure obfuscated
  • Jailbreak detection: Disable ML on compromised devices

Conclusion

On-device machine learning is essential for privacy-first financial apps. Core ML and TensorFlow Lite both provide excellent frameworks for running neural networks locally—with dedicated hardware acceleration delivering sub-10ms inference and minimal battery impact.

Whistl's implementation demonstrates that sophisticated AI doesn't require cloud processing. Your data stays on your device, predictions happen in real-time, and protection works even offline.

Experience Privacy-First AI

Whistl's on-device neural networks predict impulses without sending your data to the cloud. Download free and experience private AI.

Download Whistl Free

Related: Neural Networks Explained | AI Financial Coach | Local Storage Encryption