NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Data Visualization > Soccer Betting Analysis: How to Use Betting Agencies Odds to Predict Match Results?

Soccer Betting Analysis: How to Use Betting Agencies Odds to Predict Match Results?

Chen Trilnik
Posted on Apr 29, 2017

Introduction

As a soccer fan with 3 years of work experience as a live soccer match analyst, I have thousands of soccer game hours in my repertoire. I follow European soccer on a weekly basis and know most of the teams and players in the major leagues of Europe.

Even with all my knowledge and experience, when it comes to know how to use odds to predict soccer, I find it hard to predict soccer match results. Like in any other sport, the best team doesn't always win. There are many parameters that affect the outcomes of any given game. The skills of the players, the tactical formation, and teamwork may be the most important ones. But are meeting all these parameters enough to win all the games played? If so, we should be able to predict match results pretty easily.

The truth is, there are many more factors that affect soccer match results - team motivation and spirit, player injuries, fan support, chemistry between teammates and opponents, reputation, and win history are some of them. The complex interplay between these variables during the fast-paced activity of a game makes every match unique.  

Professional betting agencies are making a lot of money from people who want to predict match results. It is safe to say that their betting odds are calculated in a way that maximizes their profits and minimizes their risks.

In this project, I wanted to examine the degree to which betting agencies' odds correlate with actual match results, and to see if there is any way to maximize prediction accuracy. For this, I looked at the betting agencies' reported odds for the basic 3-way bet (home team vs.  draw vs. away team) and the level of favored outcomes within each game (high favored, moderate favored, and low favored), and compared them to actual match outcomes to see if betting agencies' odds had any value. I further categorized the match outcomes by the location of the teams (i.e. hosting vs. visiting), the stage of the season, and the numerical difference between the payouts in order to find patterns that optimize prediction accuracy.

In the end, I found that there are three parameters can help predict the outcomes with up to 80% precision:

  1. The agencies' high favored result
  2. The location of the team, and
  3. The stage of the season.

I used R shiny app and ggplot2 to visualize the data. You can find the full results on the app.

Explanation

In most soccer competitions, draws may be the final result of the game, so there are 3 different outcomes to bet on between Team 1 and Team 2:

-  First outcome: team 1 wins

-  Second outcome: team 2 wins

-  Third outcome: team 1 and team 2 draw

The odds are translated into payouts. The result with the minimum odd is the one that is most likely to happen, it has the least risk and therefore offers the lowest payout.

The result with the maximum odd is the one that is the least likely to happen, it has the higher risk and therefore offers highest payout.

For example, let's take the first match in this betting odds chart of the English Premier League and look at the odds for the full time result. In this game, Arsenal is playing against Crystal Palace. For an Arsenal win, any dollar you bet will give you $ 1.29 (a $0.29 profit). For a draw in the match, a dollar will give you $ 4.98 ($3.98 profit). And a Crystal Palace win will give a return of $ 8.06 ($7.06 profit) for a dollar bet.

Betting odds chart

The Data Frame

The data sets were taken from Kaggle, a part of a soccer SQLite data base.

The data sets include data on more than 25,000 matches from 9 different leagues in Europe over 8 seasons (2008/2009 - 2015/2016). The data includes: match results and dates, teams, leagues, and match betting odds from 9 different betting agencies.

The European leagues are:

  • Belgium Jupiler League
  • England Premier League
  • France Ligue 1, Germany 1
  • Bundesliga, Italy Serie A
  • Netherlands Eredivisie
  • Portugal Liga ZON Sagresand
  • Scotland Premier League
  • Spain LIGA BBVA

The betting agencies are: Bet365, Blue Square, Bet&Win, Gamebookers, Interwetten , Ladbrokes, Pinnacle, Sporting Odds, Sportingbet, Stan James, Stanleybet, VC Bet, and William Hill.

It is important to note that there was always consensus between the agencies regarding the probability for each outcome (i.e. they all thought Arsenal had the highest chance to win); the only difference was the magnitude of payout that they offered. Therefore, I considered the average consensus as a single entity. 

For data processing I used RSQLite package for R to convert the different SQL tables to CSV files.

                Screen Shot 2017-04-29 at 12.24.19 PM

As part of the data cleaning and preparation, I deleted rows with missing values and ignored data from 2 betting agencies because their betting odds were uploaded to the SQL server as integers rather than exact numeric values. After this process there were 22,434 observations left.

Moreover, I added columns to the data set to include the match winners,  the agencies' average minimum, middle, and maximum payout, and agencies' favored result. 

For the analysis, I defined the result with the minimum payout as the favored result by the betting agencies. The success rate shown in the charts is calculated as the number of times the favored result was the actual final result of the match, divided by the total matches played.

The favored result level column is a breakdown of the matches to 3 groups using the difference between the payouts (as extrapolations of the odds). My assumption for this calculated column is that the higher the difference between the payouts, the higher the chance for the minimum payout to be the winning outcome. Therefore, the groups are categorized in the following way:

A high favored result = max payout - min payout > 2

A moderate favored result = 2 >max payout - min payout > 1

A low favored result = max payout - min payout < 1

Analysis

The Payout Distributions

The betting payouts have a normal distribution. The maximum and middle payouts are skewed to the right. The minimum payout ranges from a little more than 1 to around 3. The maximum payout ranges from 2.5 to 40. The middle payout distribution looks similar to the maximum payout and range from 1.9 to 10. Below are the histograms of the payouts:

Screen Shot 2017-04-29 at 11.38.30 PM   Screen Shot 2017-04-29 at 11.38.43 PMScreen Shot 2017-04-29 at 11.39.04 PM

The minimum and maximum payouts are inversely related and the minimum and middle odds are also inversely related. This is explained by the fact that when there is a high favored result (for example, one team has a better track record than the other), its payout will be low and accordingly the other outcomes' payouts will be high. On the other hand, when there is no high favored result (for example, the teams playing have same skill level), the payouts will be quite similar. 

Screen Shot 2017-04-28 at 11.18.15 PM

This is a scatter plot of the minimum and maximum payouts of the matches observed: 

Screen Shot 2017-04-29 at 11.39.40 PM

Favored Result Analysis

The first chart depicts actual match results. Here, we can see that the home teams wins 46% of the time, the away team wins 29% of the time, and there is a draw 25% of the time.

The next graph shows that the agencies favored the home team 73% of the time and the away team 27% of the time, while they almost never favored a draw (13 out of 22,434 matches). We can see that the agencies favored the home teams in most cases. This emphasizes the importance of location in the competition. 

This chart raises an interesting question: why do the agencies never favor a draw result when this outcome occurs in at least 25% of the matches? I did not come across any data explaining how the agencies determine their payouts, however, I believe that agencies prefer to favor one team over the other because it's easier for them to promote the bet among gamblers. It's just more interesting to have a face-off. 

Screen Shot 2017-04-29 at 12.33.06 AM

Next, I wanted to check how accurate the favored result was in terms of predicting the match outcome. I found that the agencies' favored result (represented by the minimum payout) had an average success rate of 53%.  This was consistent for each of the seasons in the data frame.

  Screen Shot 2017-04-29 at 12.29.57 AM

I also wanted to examine to what degree the location made a difference in the payouts given to favored teams. For instance, I would expect the payout for a favored home team to be lower than a favored away team. After all, it is widely believed that the home team has the higher advantage. I used two box plots, which demonstrated that there is almost no significant difference in the minimum payouts that can be attributed to location.

Screen Shot 2017-04-29 at 12.59.13 AM

If the minimum payout for a favored home team and a favored visiting team are almost identical, does this mean that the rates of winning them are the same? 

Screen Shot 2017-04-29 at 4.24.07 PM

In fact, no, the rates of winning are not the same. As we can see from this bar chart, the favored home team had a 55% chance of winning while the favored away team had a 50.5% success rate. Although these are not earth-shattering numbers, could this be an opportunity for the betting agencies to attract more gamblers? By raising the payout for the favored away team, they are promising larger compensation when in fact the probability of the favored away team winning is actually quite low. 

I wanted to further visualize the proportion of games that can be cataloged as high favored, moderate favored, and low favored odds. Because of the league structure, like number of games and arrangement of opposition, there are more games where the difference in team skills and strengths are large. Therefore, we see that almost 50% of the matches were categorized as high favored, and more than a quarter that is moderate favored.

Screen Shot 2017-04-29 at 1.04.00 AM

One of my more interesting findings was that the accuracy of the predicted wins increased during the later stages of the season. The late stages of the season carry higher stakes than the early stages, as this when the champion and the relegations are decided. In the chart below, we can see a slight improvement in the success rate over the span of a season. 

Screen Shot 2017-04-29 at 1.09.45 AM

Next, I checked the stage of the season against the location of the team as well as the favored level, and I found that the success rate increases over the span of a season when the high favored team plays in their home arena. It seems as if the high favored teams are capable of winning in the important stages of the season, while on the contrary, the low favored team success rate decreases over time, as if the pressure of the last stages of the season and the presence of their fans have a negative effect on their performance.


Screen Shot 2017-04-29 at 1.16.23 AM

Screen Shot 2017-04-29 at 1.16.54 AM

The results of the away team demonstrate a different trend. While the high favored teams' success rate increases over time, the low favored teams show inconsistent success rates throughout the season. It seems that there is no clear effect of the season stage on their performance. 

Screen Shot 2017-04-29 at 11.54.31 AM

Screen Shot 2017-04-29 at 11.54.50 AM 

Notes

Moreover, in my shiny app, you can explore the success rates broken down by the location and the favored level. When I did this, I found a clear pattern: matches with a high favored result have a success rate of around 65% and matches with a low favored result have a success rate that ranges from 33%-43%. Moreover, when cross-examining the data, I found that there were three parameters that carried the most weight when determining a probability of a match outcome:1) the agencies' high favored result 2) the location of the team (home vs. away), and 3) the stage of the season. Ultimately, choosing a high favored team in their home arena, in the late stages of a season can raise the probability of winning the bet by 80%.

Inspired by student projects? Now it's your turn.
Get information about our data science programs and see how we can help you launch your data science career.



About Author

Chen Trilnik

View all posts by Chen Trilnik >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Data Analysis
Car Sales Report R Shiny App
Data Analysis
Injury Analysis of Soccer Players with Python
Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Capstone
Acquisition Due Dilligence Automation for Smaller Firms

Leave a Comment

Cancel reply

You must be logged in to post a comment.

Emma Ward June 22, 2020
how to make soccer predictions by yourself?
Candy Love June 4, 2020
3 way handicap explained well in this article very well & very helpful!
Agen Sbobet December 17, 2019
Google commented here. Is that real account of Google?
Google October 6, 2019
Google Check below, are some absolutely unrelated web-sites to ours, nonetheless, they may be most trustworthy sources that we use.
Google September 16, 2019
Google The information mentioned within the article are a few of the very best offered.
fanatik April 19, 2018
I just like the helpful information you supply in your articles. I will bookmark your blog and take a look at again here frequently. I'm quite certain I will be told many new stuff proper right here! Best of luck for the next!

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application