College Football Score Predictions

Machine learning powered predictions for college football games using historical data, team metrics, and game conditions.

Project Overview

This project emerged from my passion for college football and curiosity to try and build a machine learning model. I wanted to create a prediction engine that could compete with professional handicappers while providing transparent insights into how predictions are made. The challenge was to develop a system that could process complex historical data and generate accurate predictions for future games, no matter if those teams consistently played each other or not.

Approach

The solution combines multiple machine learning models trained on historical game data, team statistics, seasonal, yearly, strength-of-schedule, and situational factors. Using Python with scikit-learn and pandas, I developed a custom ensemble model that weights various factors differently based on their historical predictive power. I found the Gradient Boost Regressor model performed better than the Random Forrest and XGBBoost models. When testing with model predicted scores the model comes quite close to the actual games scores. This was easily my most challenging project to date given the difficulty in predicting reasonable results.

Directions

Simply navigate below and select the two conferences from the drop down list. You will then select the teams you wish to choose, which will then predict the scores and provide the spread and model weights.

Key Learnings

  • Data quality and cleanliness predicate any quality model
  • ML models require subject matter experts to review results and help to train those models. Many times the predictions revealed results that did not make sense, such as the away team's venue id being the most important feature

Challenges Overcome

  • Initial models continued to weight away features more heavily
  • Managing data quality across different seasons and rule changes made predictions more challenging
  • The lack of player data resulted in less quality predictions, especially for early season matchups

Future Applications

  • Implementing automated model retraining. For example, the outputs of one function can be read by the AI to adjust the code to the next input
  • Creating similar models to other sports

Game Prediction

VS

Prediction Results

--

--

--

--
Favorite: --
Spread: --
Over/Under: --

Model Weights