Models and Reinforcement Learning
Ensemble forecasting, model fusion, and the reinforcement learning loop
Ensemble forecasting
Base learners: LSTM/Temporal CNNs trained per horizon (short, medium)
Calibration: Temperature scaling and isotonic calibration on validation splits
Regularization: Dropout + weight decay; early stopping on information leakage tests
Model fusion
Meta-learner: Gradient boosting + shallow MLP to stack base predictors
Cross-validation: K-fold or time-series split; leakage-safe folds
Objective: Maximize directional utility under turnover and cost constraints
Reinforcement loop
State: Recent forecasts, realized slippage, risk rule activations, and PnL
Action: Strategy selection and parameterization
Reward: PnL with drawdown penalties and risk-adjusted performance (e.g., Sharpe proxy)
Updates: Online policy updates; periodic batch retraining
Governance and safety
Roll-forward evaluation before model promotion
Canary deployments with kill-switches and automatic rollback
Model registry with versioned artifacts and audit trails
Last updated

