Betting on Horse Racing

I. INTRODUCTION

"Founded in 1884, the Hong Kong Jockey Club is one of the most treasured—and lucrative—legacies of Britain’s colonial rule over the city. Its emerald turf attracts about HK$138.8 million (US$17.86 million) per race, more than any other track in the world." - Bloomberg

Successfully predicting even a small percentage of winning horses over a large amount of races can lead to an absurd ROI thanks to compounded interest. Most famously, William Benter earned nearly 1 Billion USDs by creating a computer program to analyze the horse racing market.

Attracted by the possibility of measuring up our data science skills to one metric, and one metric only, ROI, as well as the challenge of facing off against all HK horse racing market participants, our group (consisting of 5 data scientists) partnered with RaceQuant, a startup specializing in HK horse-betting.

RaceQuant provided us with data for all races held by the Hong-Kong Jockey Club for the years 2015, 2016, 2017, and 2018 (1st and 2nd quarters). The data set consisted of 81 different features. The objective of this capstone project was to:

1) create a a model to predict the probability of a given horse in a given race winning said race; and

2) use the probabilities outputted by our model to create a betting strategy to maximize our ROI based on a $100,000 betting bankroll when back-testing for 540 races randomly selected from the data set.

Due to an NDA contract, parts III. and IV. describing our approaches to data analysis and modeling, will be significantly simplified.

II. FEATURE CLEANING AND ENGINEERING 

Our raw data contained 2384 unique races, with a total of 24863 horses having run those races. Before we created our model, we performed several data transformations. The data cleaning was performed through using Pandas in Python.

Some of the features in our data set had missing values. For example: for jockey and trainer win percentages, we assumed a value of 10% for first time jockeys and trainers. This value was assigned as 10%, since most first time jockeys and trainers at the HK tracks have participated in other international events and haver performed quite well on average.

We also did some feature engineering to better capture certain types of information from our data. For example: we created a feature related to the horses' weight. The percent deviation of a horse’s weight from its winning weight was introduced in order to examine the effect of horse weight on the chances of winning. Similarly, the percent deviation of a horse’s weight from his average weight over all races was introduced to standardize horse weight.

III. MODELING APPROACH

In the racing data we were given, we found that many features did not hold up to a linear relationship with an increase in a horse's win probability. Therefore, we tested multiple models that captured complex relationships between input features and output probability. We ended up sticking with a neural network which was then find tuned to maximize the probability of our neural network being able to choose the winning horse of a given race.

IV. BETTING STRATEGY

Having calculated what the probability was that a given horse would win per race, we set out to develop a betting strategy to maximize our ROI. To do so, we first explored the Kelly Criterion:

Where: f* is the fraction of the current bankroll to wager, b is the net odds received for the wager (your return is "b to 1" where you make a bet of $1 you would receive $b in addition to getting the $1 back), p is the probability of winning, q is the probability of losing (1-p).

Several issues came up while applying this methodology:

  • The f values calculated were much too high and often led to an early bankruptcy.
  • The Kelly Criterion makes the assumption that bets made are independent: In a race, there can be up to 14 horses => 14 possible bets that are all mutually exclusive outcomes to one another.

To remedy the first issue, a multiplicative constant between 0 and 1 for f was introduced. It was found that 0.2 worked optimally. If interested, we recommend looking up the derivation for the Kelly Criterion which proves that a multiplicative coefficient will delay exponential growth but will provide less variance in early return outcomes. For other issues, we adapted an alternative betting strategy described in Peter Tompkin’s paper “An explicit solution to the problem of optimizing the allocations of a better’s wealth when wagering on horse races”. This strategy can be broken down into several steps:

1) Calculate the market odds  βk  , where Qk are the payoff-odds.

2) Calculate expected revenue rate:     where D = 1- tt with tt being the track take or tax on one's bet.

3) Reorder the expected revenue rates in descending order such that er1 will be the best bet

4) Create a Set S =Φ , k=1 and R(S) = 1. Thus, the best bet erwill be er1 considered first for step 5.

5) If   >  R(S), then insert the kth outcome into the set S and recalculate R(S) according to:

6) Repeat step 5 until the condition in step 5 is no longer fulfilled,   S0 = S  then.

7) Calculate the optimal fraction of bankroll to bet on each horse for a given race with:

Similar to the Kelly Criterion, one can optimize the longevity of our bankroll by multiplying f by a fixed ratio, which we found to be 0.03.

V. OUTCOME

The outcome of our model is summarized in the below table, which holds the key statistics of our betting strategy.

Metrics Results
Sample Size -- Training and Testing
Total training + test races 2384
Total training + test horses 24863
Training races 1843
% of training races 77.31%
Training horses 18304
% of training horses 73.62%
Number of races in test 541
% of test races 22.69%
Number of horses in test 6559
% of test horses 26.38%
Bankroll Related
Initial Bankroll $ $100,000.00
Minimum Bankroll $ $91,912.05
Maximum Bankroll $ $1,418,354.18
Bets Related
Total number of bets made 1953
Average number of bets made per race 3.61
% of total bets possible made 29.78%
Number of winners bet 143
% of rank 1 horses predicted 26.38%
Biggest bet $ $28,774.20
Smallest bet $ $1.30
Biggest winning bet $ $28,774.20
Smallest winning bet $ $1.30
Biggest losing bet $ $27,241.50
Smallest losing bet $ $1.30
Odds Related
Average net win odds 5.38
Lowest net win odds 0.1
Highest net win odds 17
ROI
Final Bankroll $ $1,054,003.96
Total spend on bets made $ $5,333,083.60
ROI as betting % 19.76%
Total ROI % 1054.00%

 

VI. Team

This capstone project was completed by Basant Dhital, Tristan Dresbach, Jiwon Cha, SangYeon Choi, and Karim Zaatary in collaboration with RaceQuant through NYC Data Science Academy. Please contact Basant via LinkedIn  and Tristan via LinkedIn for any questions.

About Authors

Basant Dhital

Basant Dhital

Basant Dhital is a Physics Ph.D. with an excellent background in Mathematics and Statistics and demonstrated programming skills. During his Ph.D. research, he developed several algorithms to process and analyze NMR and other spectroscopic data. He developed a...
View all posts by Basant Dhital >
Tristan Dresbach

Tristan Dresbach

Tristan is an aspiring data scientist with a track record of using data to drive significant and tangible business results in retail and financial services. He has hands on experience in R and Python in web-scraping, data visualization,...
View all posts by Tristan Dresbach >

karim El Zaatari

Data Scientist and mechanical engineering graduate with a demonstrated record of leadership & problem solving. My data science projects span over various topics including air pollution, carpooling,house pricing and machine learning in horse racing.
View all posts by karim El Zaatari >

Related Articles

Leave a Comment

Your email address will not be published. Required fields are marked *

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags