Horse Races Can Machine Learning Make A Winning Proposition?
Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren
The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
For our capstone project, we partnered with RaceQuant, a startup specializing in Hong Kong horse race betting. Our goal was to apply machine learning to the world of horse racing to more accurately predict the outcome of races held by the Hong Kong Jockey Club and to advise on an optimal betting strategy. The Hong Kong Jockey Club (HKJC) is world-renowned and a distinct part of the culture in Hong Kong. It's emerald turf attracts about HK$138.8 million (US$17.86 million) per race, more than any other track in the world.
Betting on horse racing is notoriously difficult and is considered by many speculators to be uncrackable. But difficult is not the same as impossible. Our motivation to find an edge and generate profitable models stems from the ground-breaking work of Bill Benter, who is said to have amassed a $1bn fortune over his career doing just that.
We were provided with race data for more than 1600 races from the 2016-2017 and 2017-2018 seasons for races held by Hong Kong Jockey Club. To be profitable we had to first clear the hurdle of a 17.5% track-take that the HKJC deducted from the Win Pool on every race.
We began with a deep-dive into the Kelly Criterion and an exploration of the data made available to us. Our original inclination was to develop linear models that could predict horse running times, build probability distribution functions around those predicted times, simulate races, and apply a betting algorithm to them. After studying quantile-quantile plots of the features & standard errors, and applying various transformation methods including the Box-Cox Transformation, it became harder for us to justify a purely linear modeling path, given the nuances we were observing in the data.
Instead, we opted to proceed with logistic and classification based modeling, as this process relaxed some of the prerequisites, and would more easily output to us winning probabilities that we could use to feed our betting model. We engineered several features and imputed missing values on a feature by feature basis.
We created several new features to try and better estimate the probability of a horse winning a race. Based on the assumption that horses that weigh in at close to their average winning body weight have a higher likelihood of winning, we created a binary flag to signal that. We likewise engineered a feature that compared a horse’s speed rating (as computed by RaceQuant analysts) with that of a typical winner (in that Class). We created features that measured a horse’s change in weight, how many days since its last race, whether or not this was its first race in Hong Kong, and whether the horse won its last race or not. We created a composite weighted winning percentage that also considered the recent number of wins.
We imputed missing values for a horse’s previous ratings, distance run in previous races, course over which the horse had competed on in previous races, trackwork and barrier trials, jockey and previous jockey win percentage, wins and mounts. If a horse was new, it did not have an average horse body weight, so we imputed this feature with its previous weight.
Using correlation matrices, random forest classification, and coefficient analysis on normalized variables, we evaluated the relative predictive power and importance of each feature. From this work, we built models one feature at a time, based on sets of features that we identified as being impactful, and evaluated their performance. Our guess here was that our original fully-featured model with well over 100 variables may be over-informed, somewhat confusing, and not generating optimal probabilities. Our inclination proved correct; in nearly every modeling instance, we found a reduced model performed better.
By way of example, below are four sample logistic models we ran with dramatically reduced feature sets (in different combinations). Each of these performed better than our fully featured model. Of note, the starting bankroll for each model was $100,000. Additionally, though these models appeared to generate strong returns in the seasons they were initially trained and tested on, the further simulation showed reduced performance. We were concerned by their drawdown rate (as measured by minimum bankroll).
Model 1 |
Model 2 |
Model 3 |
Model 4 |
|
Total Number of Bets |
762 |
700 |
727 |
710 |
Number of Bets Per Race |
4.7 |
4.3 |
4.5 |
4.4 |
ROI (On Betting Amount) |
18.9% |
23.5% |
22.5% |
18.3% |
Number Of Winning Bets: |
62 |
54 |
60 |
63 |
Final Bankroll: |
246,436 |
289,154 |
294,663 |
269,050 |
Minimum Bankroll: |
72,924 |
68,442 |
69,498 |
67,849 |
Maximum Bankroll |
327,080 |
316,425 |
340,525 |
370,295 |
% Of Times Winner Predicted Correctly |
29.8% |
32.3% |
31.7% |
31.7% |
One question we debated as a team related to potential model over-fitting and model bias. Could it be that a certain feature set worked well in one test set while another would perform better in a different season? Another possibility we had to consider: could it be that one type of model performed better in one season, and another performed better in a different season? Given that we only had two seasons for modeling & testing, we addressed this issue by grouping and regrouping each of the races into different season simulations.
In order to best represent the true overall distribution of results for each model, we ran Monte Carlo Simulations. Monte Carlo Simulations test models through repeated random sampling. In our case, this process consisted of repeating random 80/20 splits of our data. For each split of the data, we first trained the model on a random 80% set of races. Then we passed the fitted models obtained from training to our betting algorithm, which was run on the remaining 20%.
Running many instances of simulations for each model and taking the average performance into account allowed us to achieve more accurate estimates of each model’s true performance.
The above histogram shows the ending bankroll for 500 simulated seasons consisting of all the races in the testing. Mean: $151,637, Median: $138,713, Min: $71,062, Max: $388,576.
We ran full and simplified models, and evaluated betting outcomes, for a variety of model types, including standard Logistic, Random Forest, XGBoost, Light Gradient Boost Model (LGBM), and CatBoost. For each of these models, we took note of average drawdown, average final bankroll, number of bets, Return on Capital Deployed, and Return on Initial Bankroll across the thousands of simulations we created.
Our attention turned to the Kelly Betting Algorithm that formed the basis of our betting strategy. We experimented with the algorithm and fractional betting parameters; ultimately, we zeroed in on a 5% fractional allocation to each race. On the Kelly formula itself, we found, consistently, that a very slight modification to the traditional formula resulted in far superior betting outcomes, no matter what probability model we fed in. This modification allows for the inclusion of more consensus bets (i.e. lower odds) than the traditional algorithm, and we found this to be an effective method both in the actual seasons as well as the thousands of simulated seasons we tested on.
In conclusion, we present a summary of how our models performed. Across 500 simulated seasons, our best returns were seen with an XG Boost model, that generated median and average returns of 13.5% and 14.5%, respectively, with maximum losses relatively well-contained, as observed by our minimum bankroll levels. For further work, we would look to additional feature engineering and hyperparameter tuning, so as to include more race information and improve on our returns.
XG BOOST STATISTICS |
Mean |
Median |
Total Number of Bets: |
1,205 |
1,207 |
Number of Bets Per Race: |
7.5 |
7.5 |
Amount Wagered: |
$315,247 |
$300,164 |
ROI on Bankroll: |
51.6% |
38.7% |
ROI on Betting Amount: |
14.5% |
13.5% |
Number of Winning Bets: |
111 |
111 |
Biggest Bet: |
$3,842 |
$3,550 |
Smallest Bet: |
$10.00 |
$10.00 |
Initial Bankroll: |
$100,000 |
$100,000 |
Final Bankroll: |
$151,637 |
$138,713 |
Minimum Bankroll: |
$91,273 |
$93,360 |
Maximum Bankroll: |
$158,237 |
$144,620 |
About RaceQuant -- RaceQuant was established by experts in the Thoroughbred racing domain who believed that Machine Learning could be applied successfully to maximize the return on betting investment and can be contacted at info@racequant.com.