Models and Reinforcement Learning

Ensemble forecasting, model fusion, and the reinforcement learning loop

Ensemble forecasting

  • Base learners: LSTM/Temporal CNNs trained per horizon (short, medium)

  • Calibration: Temperature scaling and isotonic calibration on validation splits

  • Regularization: Dropout + weight decay; early stopping on information leakage tests

Model fusion

  • Meta-learner: Gradient boosting + shallow MLP to stack base predictors

  • Cross-validation: K-fold or time-series split; leakage-safe folds

  • Objective: Maximize directional utility under turnover and cost constraints

Reinforcement loop

  • State: Recent forecasts, realized slippage, risk rule activations, and PnL

  • Action: Strategy selection and parameterization

  • Reward: PnL with drawdown penalties and risk-adjusted performance (e.g., Sharpe proxy)

  • Updates: Online policy updates; periodic batch retraining

Governance and safety

  • Roll-forward evaluation before model promotion

  • Canary deployments with kill-switches and automatic rollback

  • Model registry with versioned artifacts and audit trails

Last updated