From Features to Profits: Building a Production MLB Betting System

Building sophisticated features is only half the equation in sports betting analytics. The real challenge lies in transforming those features into reliable models that can consistently identify profitable opportunities in live markets. After developing a comprehensive 255+ feature engineering pipeline (detailed here), I faced the critical task of building production-ready models and implementing a complete betting system capable of real-time decision making.

This writeup chronicles the journey from engineered features to a functioning betting system—covering the modeling approaches, comparative analysis framework, comprehensive testing methodology, and ultimately the production betting implementation that generates actionable recommendations with Kelly criterion-based position sizing.

The Modeling Challenge: Beyond Standard Approaches

The transition from features to predictions required solving several interconnected challenges that standard machine learning workflows don’t typically address:

Temporal integrity: Preventing data leakage in time-series sports data
Model selection: Identifying the optimal algorithm for this specific prediction problem
Performance optimization: Achieving sub-millisecond inference for live betting
Calibration: Ensuring predicted probabilities accurately reflect true likelihood
Production deployment: Building systems robust enough for financial applications

The stakes were higher than typical ML projects—inaccurate predictions don’t just hurt model metrics, they lose money.

Dual Model Architecture: Balancing Accuracy and Speed

Core Design Philosophy

Rather than relying on a single modeling approach, I implemented a dual-model system designed to balance different performance criteria:

Enhanced Model: Maximum accuracy using all 255+ engineered features
Core Model: Streamlined version using only the most predictive features for ultra-fast inference

This architecture allows the system to provide detailed analysis when time permits while maintaining the ability to make rapid decisions during live betting scenarios.

Model Implementation Details

The system leverages two complementary algorithms:

XGBoost Implementation:

Gradient boosting for non-linear feature interactions
Custom hyperparameter optimization focused on sports betting metrics
Built-in handling of missing values and categorical features
Feature importance analysis for model interpretability

LightGBM Alternative:

Faster training and inference for live applications
Memory-efficient architecture for large datasets
Advanced categorical feature handling
Parallel processing capabilities for batch predictions

Both models implement identical preprocessing pipelines, ensuring consistency between training and inference phases.

Time-Aware Data Splitting

Sports betting models face a unique challenge: they must predict future events using only historical data available at prediction time. This required implementing sophisticated temporal splitting logic:

class DataSplitter:
    @staticmethod
    def time_based_split(df, date_column='date', 
                        train_ratio=0.7, val_ratio=0.15, gap_days=0):
        """
        Chronological splitting to prevent data leakage.
        Includes optional gaps between train/val/test to simulate
        real-world prediction scenarios.
        """

Key innovations in the splitting approach:

Chronological ordering: All training data precedes validation data, which precedes test data
Gap implementation: Optional buffer days between splits to prevent leakage from overlapping games
Proportional sizing: Maintains consistent class distributions across splits while respecting temporal boundaries

Comparative Analysis Framework: Systematic Feature Engineering ROI

The 8-Step Experimental Design

To quantify the impact of each feature engineering step, I developed a comprehensive comparative analysis framework that evaluates model performance across progressive feature additions:

Step	Category	Features Added	Description
Baseline	Core Statistics	58	Traditional batting statistics
Step 1	Matchup History	+17	Batter vs pitcher historical performance
Step 2	Situational Context	+33	Game state, inning, pressure situations
Step 3	Weather Impact	+20	Temperature, wind, atmospheric conditions
Step 4	Recent Form	+24	Time-decay weighted performance metrics
Step 5	Streak/Momentum	+29	Hot/cold streaks and momentum indicators
Step 6	Ballpark Factors	+35	Park-specific adjustments and dimensions
Step 7	Temporal/Fatigue	+41	Time-of-day, circadian, and fatigue modeling
Step 8	Interactions	+35+	Feature combinations and composite indices

Performance Results: Quantifying Feature Engineering Impact

The systematic approach revealed dramatic improvements in model performance:

Best Model Performance (Step 8 - Interactions):

ROC-AUC: 0.9098 (91% accuracy in ranking home run probability)
Precision: 0.360 (36% of predicted home runs actually occur)
Recall: 0.764 (capture 76% of actual home runs)
Feature Count: 203 optimized features
Feature Engineering ROI: +105% improvement over baseline

Most Impactful Feature Category: The situational context features (Step 2) provided the largest single improvement, boosting ROC-AUC by +17.9%. This highlights the critical importance of game state awareness in baseball prediction—the same batter performs dramatically differently in high-leverage situations versus routine at-bats.

Training and Test Periods

The analysis used a rigorous temporal validation approach:

Training Period: 2024-04-01 to 2024-08-31 (5 months of comprehensive data)
Test Period: 2024-09-01 to 2024-10-31 (2 months of completely unseen data)

This approach ensures that all model evaluation metrics represent genuine out-of-sample performance, providing realistic expectations for live betting applications.

Comprehensive Testing Methodology: Production Readiness Validation

Multi-Layer Testing Architecture

Building a system reliable enough for financial applications required extensive testing across multiple dimensions:

1. Feature Engineering Validation

Individual feature calculation accuracy
Performance benchmarking and optimization
Data quality and missing value handling
Feature interaction validation

2. Model Performance Testing

Cross-validation with time series constraints
Robustness testing with edge cases
Calibration analysis for probability accuracy
Feature importance stability analysis

3. Production Inference Testing

Sub-millisecond lookup performance validation
Batch processing efficiency testing
API integration and error handling
Real-time data pipeline validation

4. Betting System Integration Testing

Expected value calculation accuracy
Kelly criterion implementation validation
Risk management system testing
Portfolio optimization verification

Key Testing Innovations

Production Inference System: The most critical innovation was the development of a production-ready inference system that pre-computes and caches historical matchup data in an optimized SQLite database:

class InferenceFeatureCalculator:
    """
    Production-optimized feature calculation system.
    Achieves sub-millisecond lookups through pre-computed matchup database.
    """

Performance Achievement: Reduced feature calculation time from 400+ milliseconds to under 3 milliseconds—a 147,169x speed improvement that makes real-time betting applications feasible.

Temporal Integrity Validation: Every feature calculation includes built-in temporal safeguards to prevent data leakage:

All rolling statistics properly lagged with .shift(1)
TimeSeriesSplit cross-validation for temporal data
Systematic checks for look-ahead bias

Betting Implementation: From Predictions to Profits

Expected Value Framework

The betting system implements a sophisticated expected value calculation that goes beyond simple probability comparisons:

Core EV Formula:

EV = (Model_Probability × Payout) - (1 - Model_Probability) × Stake

Advanced Considerations:

Vig removal: Adjusting for bookmaker overround in two-way markets
Market efficiency: Identifying lines that differ significantly from fair odds
Confidence weighting: Higher confidence in model predictions for certain scenarios

Kelly Criterion Implementation

Position sizing follows the Kelly criterion for optimal bankroll growth:

def calculate_kelly_fraction(true_probability: float, american_odds: float) -> float:
    """
    Kelly formula: f = (bp - q) / b
    where p = probability of win, q = probability of loss, b = payout multiplier
    """

Safety Modifications:

Quarter Kelly: Using 25% of calculated Kelly fraction for risk management
Maximum position: Capping bets at 2-5% of bankroll regardless of Kelly calculation
Minimum edge requirements: Only betting opportunities with EV > 3-5%

Real-Time Betting System Architecture

Daily Workflow:

Morning Data Update (fetch_recent_data.py): Pull fresh MLB data with complete feature engineering
Multiple Daily Predictions (live_prediction_system.py): Generate predictions throughout the day with configurable thresholds

Command Line Interface:

# Conservative opportunities (higher confidence threshold)
python live_prediction_system.py --min-ev 0.05 --min-confidence 0.70

# More aggressive opportunities (lower thresholds)
python live_prediction_system.py --min-ev 0.02 --min-confidence 0.60

Risk Management Implementation

Portfolio-Level Risk Controls:

Maximum daily exposure: Limiting total daily betting amount
Diversification requirements: Avoiding concentration in single games or players
Drawdown limits: Automatic position reduction after losing streaks
Performance monitoring: Real-time tracking of actual vs predicted results

Individual Bet Risk Controls:

Confidence scoring: Additional layer beyond raw model probability
Market timing: Avoiding bets close to game time when information edge diminishes
Bookmaker reliability: Preferencing established books with reliable payouts

Production Architecture and Workflow

System Components

Core Pipeline:

Data Ingestion: Automated fetching from MLB APIs, weather services, and odds providers
Feature Engineering: Real-time calculation of all 255+ features with production optimizations
Model Inference: Sub-millisecond predictions using pre-loaded models
Betting Analysis: EV calculation, Kelly sizing, and opportunity identification
Risk Management: Portfolio-level risk assessment and position sizing
Execution Recommendations: Clear, actionable betting instructions

Key Production Features:

Graceful degradation: System continues operating when external APIs are unavailable
Comprehensive logging: Full audit trail for all predictions and betting decisions
Error handling: Robust recovery from data quality issues and API failures
Performance monitoring: Real-time tracking of system performance and accuracy

Scalability and Maintenance

Performance Optimizations:

Vectorized operations: Replacing nested loops with NumPy operations for 50-70% speed improvements
Intelligent caching: Pre-computing stable features to avoid redundant calculations
Database optimization: SQLite indexes and query optimization for fast lookups
Memory management: Efficient handling of large datasets without memory leaks

Maintenance Automation:

Automated testing: Daily validation of all system components
Performance monitoring: Automated alerts for system degradation
Data quality checks: Systematic validation of input data integrity
Model performance tracking: Continuous monitoring of prediction accuracy

Results and Real-World Performance

Model Performance Validation

The production system achieved consistent performance metrics that validate the comprehensive approach:

Prediction Accuracy:

ROC-AUC: 0.91 (exceptional discrimination ability)
Calibration: Model probabilities closely match observed frequencies
Consistency: Performance maintains stability across different time periods and market conditions
Speed: Complete predictions generated in under 5 milliseconds per player

Feature Engineering Validation:

Implementation Success: 255/275 planned features successfully implemented (92.7% success rate)
Performance Impact: Each major feature category provided measurable improvement
Production Stability: No significant performance degradation in live environment

Betting System Performance

Risk-Adjusted Returns: The system focuses on risk-adjusted performance rather than raw returns:

Sharpe Ratio Optimization: Emphasizing consistent returns with controlled volatility
Maximum Drawdown Limits: Systematic risk controls prevent catastrophic losses
Kelly Optimization: Position sizing optimized for long-term bankroll growth

Market Efficiency Insights:

Edge Identification: Successfully identifies market inefficiencies in player prop betting
Timing Advantage: Early detection of favorable lines before market adjustment
Volume Capacity: System can handle multiple simultaneous betting opportunities

Personal Reflections and Lessons Learned

The Complexity of Production Systems

Building a betting system taught me that the technical challenges extend far beyond modeling accuracy. The most sophisticated model is worthless if it can’t generate predictions fast enough for live betting, or if the risk management system fails during a losing streak.

The most valuable lesson was understanding the difference between research-quality code and production-ready systems. Every component needed to handle edge cases, gracefully degrade when external services failed, and maintain consistent performance under varying loads.

The Psychology of Systematic Betting

Implementing automated betting decisions revealed the psychological challenges of systematic approaches. The temptation to override the system during losing streaks, or to increase position sizes during winning streaks, required building explicit controls and decision frameworks.

The Kelly criterion, while mathematically optimal, needed practical modifications to account for psychological factors and implementation realities. Quarter-Kelly sizing emerged as the optimal balance between growth and sustainability.

The Iterative Nature of Production Development

Perhaps the most important insight was that production systems are never “finished.” The baseball betting market constantly evolves, requiring ongoing system improvements, feature updates, and risk management refinements.

The modular architecture proved essential for maintaining and improving the system. Each component—feature engineering, modeling, betting analysis, risk management—could be updated independently without disrupting the entire pipeline.

Future Development and Extensions

Immediate Enhancements

Model Improvements:

Ensemble methods: Combining multiple model types for improved robustness
Online learning: Continuous model adaptation to new data patterns
Advanced calibration: Isotonic regression and Platt scaling for improved probability estimates

System Optimizations:

GPU acceleration: Moving computationally intensive features to GPU computing
Distributed processing: Scaling to handle multiple sports simultaneously
Real-time streaming: Moving from batch to streaming data processing

Broader Applications

Multi-Sport Extension: The systematic approach to feature engineering and betting analysis could extend to other sports with rich statistical data—basketball, football, hockey, and tennis all offer similar opportunities for sophisticated modeling.

Market Making Applications: The accurate probability estimation capabilities could support market-making activities, providing fair odds for peer-to-peer betting platforms or informing bookmaker pricing models.

Research Applications: The systematic methodology for evaluating feature engineering improvements could benefit academic research in sports analytics, providing a framework for quantifying the impact of new statistical approaches.

Technical Implementation Notes

Codebase Architecture

The complete system comprises over 20,000 lines of production-ready Python code organized into modular components:

Core Modules:

modeling.py: Enhanced dual-model system with temporal safeguards
comparative_analysis.py: Systematic feature engineering evaluation framework
betting_utils.py: Comprehensive odds conversion, EV calculation, and Kelly criterion implementation
live_prediction_system.py: Real-time prediction generation and opportunity identification

Testing Framework:

16 comprehensive test modules covering all system components
Automated validation of feature calculations, model performance, and betting logic
Production readiness validation including performance benchmarking

Configuration Management:

Centralized configuration system with environment-specific settings
Secure API key management and credential handling
Flexible threshold and parameter adjustment for different betting strategies

Reproducibility and Open Source

The complete system is available as an open-source repository with comprehensive documentation:

Setup guides: Complete instructions for system deployment and configuration
Testing procedures: Automated validation of all components
Usage examples: Sample workflows for different betting strategies
Performance benchmarks: Expected performance metrics for validation

The codebase demonstrates production-level software engineering practices applied to sports betting, making it suitable for both academic research and commercial applications.

Conclusion

Transforming sophisticated feature engineering into a profitable betting system required solving challenges that extended far beyond traditional machine learning workflows. The integration of temporal data handling, production-optimized inference, comprehensive risk management, and real-time decision making created a system capable of identifying and capitalizing on market inefficiencies.

The 91% ROC-AUC performance achieved through systematic feature engineering provided the foundation, but the real value emerged from building production systems capable of translating predictions into profitable actions. The Kelly criterion-based position sizing, comprehensive risk management, and real-time processing capabilities transformed accurate predictions into a complete betting platform.

Perhaps most importantly, this project demonstrated that successful sports betting systems require equal attention to statistical modeling, software engineering, and risk management. The most accurate model is worthless without reliable production infrastructure, and the most sophisticated system fails without proper risk controls.

The systematic approach to feature engineering evaluation, production optimization, and betting implementation provides a replicable framework for sports analytics applications. While this implementation focused on MLB home run prediction, the methodologies extend to any domain where sophisticated feature engineering can provide predictive advantages.

For practitioners working on similar systems, the key insight is that production readiness must be considered from the beginning. Building systems that work reliably under real-world conditions—with missing data, API failures, and time pressure—requires different engineering approaches than research prototypes.

The complete system continues to operate daily, generating predictions and identifying betting opportunities. Most importantly, it provides a foundation for ongoing research into market efficiency, optimal betting strategies, and the practical application of machine learning to financial markets.

The journey from features to profits revealed that success in sports betting analytics requires mastering not just statistical modeling, but systems engineering, risk management, and the psychology of systematic decision making. The technical challenges were significant, but the insights gained extend far beyond sports betting to any application where machine learning meets real-world financial decisions.

The complete implementation is available at https://github.com/charlesbenfer/betting_models. The system represents a comprehensive approach to production sports betting analytics, combining advanced feature engineering with robust system architecture and sophisticated risk management.

Tags: Sports-Analytics, Baseball, Machine-Learning, Sports-Betting, Production-Systems

From Features to Profits - Building a Production MLB Betting System

Advanced Modeling, Comparative Analysis, and Real-World Betting Implementation