Horse Races Can Machine Learning Make A Winning Proposition?

Michael Sankari, Matthew Rautionmaa, Eric Adlard, David Levy, Marc Hasson and David Felsen

Posted on Apr 12, 2019

Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

For our capstone project, we partnered with RaceQuant, a startup specializing in Hong Kong horse race betting. Our goal was to apply machine learning to the world of horse racing to more accurately predict the outcome of races held by the Hong Kong Jockey Club and to advise on an optimal betting strategy. The Hong Kong Jockey Club (HKJC) is world-renowned and a distinct part of the culture in Hong Kong. It's emerald turf attracts about HK$138.8 million (US$17.86 million) per race, more than any other track in the world.

Betting on horse racing is notoriously difficult and is considered by many speculators to be uncrackable. But difficult is not the same as impossible. Our motivation to find an edge and generate profitable models stems from the ground-breaking work of Bill Benter, who is said to have amassed a $1bn fortune over his career doing just that.

We were provided with race data for more than 1600 races from the 2016-2017 and 2017-2018 seasons for races held by Hong Kong Jockey Club. To be profitable we had to first clear the hurdle of a 17.5% track-take that the HKJC deducted from the Win Pool on every race.

We began with a deep-dive into the Kelly Criterion and an exploration of the data made available to us. Our original inclination was to develop linear models that could predict horse running times, build probability distribution functions around those predicted times, simulate races, and apply a betting algorithm to them. After studying quantile-quantile plots of the features & standard errors, and applying various transformation methods including the Box-Cox Transformation, it became harder for us to justify a purely linear modeling path, given the nuances we were observing in the data.

Instead, we opted to proceed with logistic and classification based modeling, as this process relaxed some of the prerequisites, and would more easily output to us winning probabilities that we could use to feed our betting model. We engineered several features and imputed missing values on a feature by feature basis.

We created several new features to try and better estimate the probability of a horse winning a race. Based on the assumption that horses that weigh in at close to their average winning body weight have a higher likelihood of winning, we created a binary flag to signal that. We likewise engineered a feature that compared a horse’s speed rating (as computed by RaceQuant analysts) with that of a typical winner (in that Class). We created features that measured a horse’s change in weight, how many days since its last race, whether or not this was its first race in Hong Kong, and whether the horse won its last race or not. We created a composite weighted winning percentage that also considered the recent number of wins.

We imputed missing values for a horse’s previous ratings, distance run in previous races, course over which the horse had competed on in previous races, trackwork and barrier trials, jockey and previous jockey win percentage, wins and mounts. If a horse was new, it did not have an average horse body weight, so we imputed this feature with its previous weight.

Using correlation matrices, random forest classification, and coefficient analysis on normalized variables, we evaluated the relative predictive power and importance of each feature. From this work, we built models one feature at a time, based on sets of features that we identified as being impactful, and evaluated their performance. Our guess here was that our original fully-featured model with well over 100 variables may be over-informed, somewhat confusing, and not generating optimal probabilities. Our inclination proved correct; in nearly every modeling instance, we found a reduced model performed better.

By way of example, below are four sample logistic models we ran with dramatically reduced feature sets (in different combinations). Each of these performed better than our fully featured model. Of note, the starting bankroll for each model was $100,000. Additionally, though these models appeared to generate strong returns in the seasons they were initially trained and tested on, the further simulation showed reduced performance. We were concerned by their drawdown rate (as measured by minimum bankroll).

	Model 1	Model 2	Model 3	Model 4
Total Number of Bets	762	700	727	710
Number of Bets Per Race	4.7	4.3	4.5	4.4
ROI (On Betting Amount)	18.9%	23.5%	22.5%	18.3%
Number Of Winning Bets:	62	54	60	63
Final Bankroll:	246,436	289,154	294,663	269,050
Minimum Bankroll:	72,924	68,442	69,498	67,849
Maximum Bankroll	327,080	316,425	340,525	370,295
% Of Times Winner Predicted Correctly	29.8%	32.3%	31.7%	31.7%

One question we debated as a team related to potential model over-fitting and model bias. Could it be that a certain feature set worked well in one test set while another would perform better in a different season? Another possibility we had to consider: could it be that one type of model performed better in one season, and another performed better in a different season? Given that we only had two seasons for modeling & testing, we addressed this issue by grouping and regrouping each of the races into different season simulations.

In order to best represent the true overall distribution of results for each model, we ran Monte Carlo Simulations. Monte Carlo Simulations test models through repeated random sampling. In our case, this process consisted of repeating random 80/20 splits of our data. For each split of the data, we first trained the model on a random 80% set of races. Then we passed the fitted models obtained from training to our betting algorithm, which was run on the remaining 20%.

Running many instances of simulations for each model and taking the average performance into account allowed us to achieve more accurate estimates of each model’s true performance.

The above histogram shows the ending bankroll for 500 simulated seasons consisting of all the races in the testing. Mean: $151,637, Median: $138,713, Min: $71,062, Max: $388,576.

We ran full and simplified models, and evaluated betting outcomes, for a variety of model types, including standard Logistic, Random Forest, XGBoost, Light Gradient Boost Model (LGBM), and CatBoost. For each of these models, we took note of average drawdown, average final bankroll, number of bets, Return on Capital Deployed, and Return on Initial Bankroll across the thousands of simulations we created.

Our attention turned to the Kelly Betting Algorithm that formed the basis of our betting strategy. We experimented with the algorithm and fractional betting parameters; ultimately, we zeroed in on a 5% fractional allocation to each race. On the Kelly formula itself, we found, consistently, that a very slight modification to the traditional formula resulted in far superior betting outcomes, no matter what probability model we fed in. This modification allows for the inclusion of more consensus bets (i.e. lower odds) than the traditional algorithm, and we found this to be an effective method both in the actual seasons as well as the thousands of simulated seasons we tested on.

In conclusion, we present a summary of how our models performed. Across 500 simulated seasons, our best returns were seen with an XG Boost model, that generated median and average returns of 13.5% and 14.5%, respectively, with maximum losses relatively well-contained, as observed by our minimum bankroll levels. For further work, we would look to additional feature engineering and hyperparameter tuning, so as to include more race information and improve on our returns.

XG BOOST STATISTICS	Mean	Median
Total Number of Bets:	1,205	1,207
Number of Bets Per Race:	7.5	7.5
Amount Wagered:	$315,247	$300,164
ROI on Bankroll:	51.6%	38.7%
ROI on Betting Amount:	14.5%	13.5%
Number of Winning Bets:	111	111
Biggest Bet:	$3,842	$3,550
Smallest Bet:	$10.00	$10.00
Initial Bankroll:	$100,000	$100,000
Final Bankroll:	$151,637	$138,713
Minimum Bankroll:	$91,273	$93,360
Maximum Bankroll:	$158,237	$144,620

About RaceQuant -- RaceQuant was established by experts in the Thoroughbred racing domain who believed that Machine Learning could be applied successfully to maximize the return on betting investment and can be contacted at info@racequant.com.

About Authors

Michael Sankari

Michael is a Certified Data Scientist with experience in R, Python and SQL. Furthermore, he has a strong background in the finance and real estate industries and loves using analytics to make better decisions.

View all posts by Michael Sankari >

Matthew Rautionmaa

Matthew is an aspiring data scientist with over four years of professional success in leveraging insights from data analysis to generate business impact in the financial services industry. He is experienced in Python, R, Machine Learning, Web Scraping...

View all posts by Matthew Rautionmaa >

Eric Adlard

Eric is an aspiring data scientist with a track record of using data to drive business insights in financial services. He has hands-on experience in R and Python in web-scraping, data visualization, supervised and unsupervised machine learning, as...

View all posts by Eric Adlard >

David Levy

David Levy completed his BS from the Kelley School of Business at Indiana University. He has eight years of experience across financial services in various data-oriented, quantitative roles. David enjoys applying an analytical mindset and approach to solve...

View all posts by David Levy >

Marc Hasson

As an investment research professional, much of my work over the last 17 has centered around developing a deep understanding of businesses based on senior management interactions, financial modeling, forecasting, and primary due diligence. Data has also been...

View all posts by Marc Hasson >

David Felsen

View all posts by David Felsen >

Cancel reply

You must be logged in to post a comment.

Milind Dalvi October 23, 2019

Interesting Blog! However, it seems like the text focuses more on the design of the betting framework rather than the model itself. Yeah, you can classify for "horse placing" or regress for "finish time" but it seems to me that racing is ranking problem. Did you try XGBoost with ranking objective? I wonder you must have faced difficulties with that imbalance in classification. Also, there is no mention of ensembling models... interesting

Horse Races Can Machine Learning Make A Winning Proposition?

Project GitHub | LinkedIn: Niki Moritz Hao-Wei Matthew Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.