Fantasy Football Lineup Optimization - Part 1

Posted on Nov 22, 2021


Daily fantasy sports is an $18.6 billion industry and continuing to grow year over year. Market share leaders FanDuel and DraftKings are estimated to both payout over $2 billion per year in payouts. There are currently an estimated 40 million daily fantasy football players in the US & Canada.

The specific ‘daily’ fantasy football contests that this blog is studying are offered on a weekly basis corresponding to weeks of the NFL season. Each week players pay an entry fee to join a contest and drafts a team of NFL players. There are specific restrictions on the numbers of players for each position type that makeup a team. An example of a typical team is 1 Quarterback, 2 Running Backs, 3 Wide Receivers, 1 Tight End, 1 Defense, and 1 Kicker. Each team also has a virtual salary cap it cannot exceed. Each NFL player is assigned a salary cost with the better players who are theoretically expected to score more points having a higher salary value. Total team salary is not permitted to exceed the pre-defined budget. Fantasy points are scored through the accrual of game statistics by the NFL players selected on each team (For example: total catches, rushing yards, and completed passes). At the end of the NFL games scheduled for the week, the fantasy team with the most points is declared the winner and gets paid out the agreed upon percentage of the entry fees.

I started dabbling in daily fantasy football a few years ago when a few coworkers and I formed DraftKings league to compete each week. The league helped fostered camaraderie in the office and I took great pride in trying to select the top team each week. Because of my interest in the game, I decided it would be a fun project to use my data analysis skills to gain insights into the best strategy for team selection.

Unfortunately for me, outside of watching the occasional game on TV, my technical football knowledge is limited. I know there are plenty of websites available that will provide projected fantasy points for each player. However, without knowledge of how these values are forecasted I didn’t want to rely on them. Alternatively, I web scraped historical fantasy points and salaries for each player dating back to 2017 to see if I would be able to make accurate predictions and assign player value based on past performance.

Data Source:

Columns in Data Set: Week, Year, Player Name, Position, FanDuel Points, FanDuel Salary, DraftKings Points, DraftKings Salary, Yahoo Sports Points, Yahoo Sports Salary

Exploratory Data Analysis

I began my analysis with general data exploration, trying to identify any trends that could be useful.

Figure 1 - Points Over Time

Figure 1 shows total points per year appear to be increasing over time. 2021 season was not included because it is incomplete but is currently on pace to exceed 2020 points total.

Table 1 - Mean and standard deviation by position

Table 1 shows quarterbacks appear to have consistently scored the most points on average with the relatively lowest standard deviation. Standard deviation appears to be very high across all positions, exceeding the mean value is most cases.

Figure 2 - Histogram of weekly player point totals

To try to understand the high standard deviation values the above histogram was plotted. Figure 2 shows there is high quantity of 0 weekly point totals that appear to be skewing the average to a lower number. Additionally, there is a large spread of points values ranging from 0 to above 40.

Figure 3 - Correlation heat map

To understand the relationship between points and salary the above correlation heat map matrix was plotted. This explores correlation between FanDuel Points, FanDuel Salary, DraftKings Points, DraftKings Salary, Yahoo Sports Points, and Yahoo Sports Salary. Points and salary are only moderately correlated with values ranging from 0.55 to 0.64. This signifies variability in point totals compared to the forecasts represented by salary values from the platforms. Theoretically, players with higher salaries should score more points. Salary values across the 3 platforms are more correlated with values ranging from 0.79 to 0.85. Since the 3 platforms have their own algorithms for forecasting player performance it makes sense that there would slight differences in salary values. Points between the platforms are almost perfectly positively correlated with values 0.99 to 1.00. This indicates similarity between the point systems between the platforms.

Figure 4 - Scatter plot of points vs salary

To further explore the relationship between points and salary the above scatter plot was graphed using the 2020 season data. Due to the high variability in points, it is very difficult to identify trends from this graph.

Figure 5 - Bar chart of grouped salary ranges vs points

The above bar chart categorizes the data from the scatter plot into bins of salary ranges ($2,000-$4,000, $4,000-$6,000, $6,000-$8,000, $8,000-$10,000, $10,000-$12,000). Average points are then taken for the aggregated groups. When aggregated there does appear to be relationship between points and salary.

Player Valuation

Figure 6 - Individual player weekly point totals over course of season

Figure 7 - Individual player salaries over course of season

To further understand the variability of weekly point totals, it helps to look at the week over performance of individual players. The above shows the 2020 performance of the Davante Adams, wide receiver for the Green Bay Packers. Davante Adams had the most point of all wide receivers in the 2020 season. As shown line there is great variability in points week over week when you look at individual players. Salary also appears to be incrementally adjusted based on the previous week’s performance.

Without relying on analysis of football games it is not realistic to predict if a player will have a good week or poor week. The best prediction possible is to take the average points of past performances and assume the player is most likely to perform at their average each week. Using the average points as the prediction for future performance it is possible to assign player value. Player value is assigned by ‘cost per point’ metric. Taking the predicted points by the latest salary value. Using cost per point players can be ranked at each position. Below we can see the top 10 most valuable players at each position for FanDuel through week 5 of the 2021 season.

1. Justin Herbert
2. Lamar Jackson
3. Tom Brady
4. Patrick Mahomes
5. Jalen Hurts
6. Kyler Murray
7. Josh Allen
8. Sam Darnold
9. Matthew Stafford
10. Daniel Jones

Wide Receiver

1. Marquise Brown
2. Mike Williams
3. Samuel Deebo
4. Cooper Kupp
5. Ja’Marr Chase
6. Antonio Brown
7. Tyreek Hill
8. Mike Evans
9. Davante Adams
10. D.K. Metcalf

Running Back
1. Austin Ekeler
2. Derrick Henry
3. Kareem Hunt
4. Darrell Henderson
5. Ezekiel Elliot
6. Najee Harris
7. Jonathon Taylor
8. D’Andre Swift
9. Nick Chubb
10. Zack Moss

Tight End
1. Mark Andrews
2. Dawson Knox
3. David Njoku
4. Dalton Schultz
5. Travis Kelce
6. Mike Gesicki
7. Kyle Pitts
8. Hunter Henry
9. Maxx Williams
10. Darren Waller

Future Work & Optimization Problem

Now that a value and points projection is assigned to each player, the user has information to be more informed when manually selecting players for their team. However, this is still not the best approach for team selection due to the large quantity of potential lineups. For better understanding let’s try and estimated the potential lineup choices. There are 32 NFL teams each with multiple players at each position. For the sake of calculation let’s assume the fantasy lineup requires the following example: 1 Quarterback, 2 Running Backs, 3 Wide Receivers, 1 Tight End, 1 Defense, and 1 Kicker.

Assume each NFL team has the following number of players at each position listed: 1 Quarterback, 2 Running Backs, 3 Wide Receivers, 2 Tight Ends, 1 Kicker , and 1 Defense.

The formula for calculating the combination is: C = n!/((n-r)! r!)

C = combination
n = number of items
r = items to select

QB: n = 32, r = 1

RB: n = 64, r = 2

WR: n = 96, r = 3

TE: n = 32, r = 1

K: n = 32, r = 1

DEF: n = 32, r =1

32 * 2,016 * 142,880 * 32 * 32 * 32 = 302 Trillion Lineups

Due to the computational cost, iterating through each lineup and solving for most projected points is not realistic. We are left with 2 options for further optimization Random Walk (randomly generate a designated number of lineups and select the best one) or use Integer Linear Programming to try and solve for the optimal solution. These 2 alternatives will be explored in future phases of this project.

Click here for Part 2 of Blog:

Blog Part 2

About Author

Hugh Goode

Hugh is a Data Scientist with a BS in Civil Engineering from the The College of New Jersey and an MS in Engineering Management from Duke University. After 5 years as an engineer, he pivoted to pursue Data...
View all posts by Hugh Goode >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup music Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp