# Data Driven Fantasy Football Lineup Optimization - Part 1

#### The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

**Data Science Background**

Data shows that "daily" fantasy sports is an $18.6 billion industry and continuing to grow year over year. Market share leaders FanDuel and DraftKings are estimated to both payout over $2 billion per year in payouts. There are currently an estimated 40 million daily fantasy football players in the US & Canada.

The specific ‘daily’ fantasy football contests that this blog is studying are offered on a weekly basis corresponding to weeks of the NFL season. Each week players pay an entry fee to join a contest and drafts a team of NFL players. There are specific restrictions on the numbers of players for each position type that makeup a team.

## Data Driven Examples

An example of a typical team is 1 Quarterback, 2 Running Backs, 3 Wide Receivers, 1 Tight End, 1 Defense, and 1 Kicker. Each team also has a virtual salary cap it cannot exceed. Each NFL player is assigned a salary cost with the better players who are theoretically expected to score more points having a higher salary value. Total team salary is not permitted to exceed the pre-defined budget. Fantasy points are scored through the accrual of game statistics by the NFL players selected on each team (For example: total catches, rushing yards, and completed passes).

At the end of the NFL games scheduled for the week, the fantasy team with the most points is declared the winner and gets paid out the agreed upon percentage of the entry fees.

## Initial Approach

I started dabbling in daily fantasy football a few years ago when a few coworkers and I formed DraftKings league to compete each week. The league helped fostered camaraderie in the office and I took great pride in trying to select the top team each week. Because of my interest in the game, I decided it would be a fun project to use my data analysis skills to gain insights into the best strategy for team selection.

Unfortunately for me, outside of watching the occasional game on TV, my technical football knowledge is limited. I know there are plenty of websites available that will provide projected fantasy points for each player. However, without knowledge of how these values are forecasted I didn’t want to rely on them. Alternatively, I web scraped historical fantasy points and salaries for each player dating back to 2017 to see if I would be able to make accurate predictions and assign player value based on past performance.

Data Source: *http://rotoguru.net/*

### Columns in Data Set

*Week, Year, Player Name, Position, FanDuel Points, FanDuel Salary, DraftKings Points, DraftKings Salary, Yahoo Sports Points, Yahoo Sports Salary*

**Exploratory Data Analysis**

I began my analysis with general data exploration, trying to identify any trends that could be useful.

Figure 1 shows total points per year appear to be increasing over time. 2021 season was not included because it is incomplete but is currently on pace to exceed 2020 points total.

Table 1 shows quarterbacks appear to have consistently scored the most points on average with the relatively lowest standard deviation. Standard deviation appears to be very high across all positions, exceeding the mean value is most cases.

To try to understand the high standard deviation values the above histogram was plotted. Figure 2 shows there is high quantity of 0 weekly point totals that appear to be skewing the average to a lower number. Additionally, there is a large spread of points values ranging from 0 to above 40.

To understand the relationship between points and salary the above correlation heat map matrix was plotted. This explores correlation between FanDuel Points, FanDuel Salary, DraftKings Points, DraftKings Salary, Yahoo Sports Points, and Yahoo Sports Salary. Points and salary are only moderately correlated with values ranging from 0.55 to 0.64. This signifies variability in point totals compared to the forecasts represented by salary values from the platforms. Theoretically, players with higher salaries should score more points.

Salary values across the 3 platforms are more correlated with values ranging from 0.79 to 0.85. Since the 3 platforms have their own algorithms for forecasting player performance it makes sense that there would slight differences in salary values. Points between the platforms are almost perfectly positively correlated with values 0.99 to 1.00. This indicates similarity between the point systems between the platforms.

To further explore the relationship between points and salary the above scatter plot was graphed using the 2020 season data. Due to the high variability in points, it is very difficult to identify trends from this graph.

The above bar chart categorizes the data from the scatter plot into bins of salary ranges ($2,000-$4,000, $4,000-$6,000, $6,000-$8,000, $8,000-$10,000, $10,000-$12,000). Average points are then taken for the aggregated groups. When aggregated there does appear to be relationship between points and salary.

**Player Valuation**

To further understand the variability of weekly point totals, it helps to look at the week over performance of individual players. The above shows the 2020 performance of the Davante Adams, wide receiver for the Green Bay Packers. Davante Adams had the most point of all wide receivers in the 2020 season. As shown line there is great variability in points week over week when you look at individual players. Salary also appears to be incrementally adjusted based on the previous week’s performance.

### Best Prediction

Without relying on analysis of football games it is not realistic to predict if a player will have a good week or poor week. The best prediction possible is to take the average points of past performances and assume the player is most likely to perform at their average each week. Using the average points as the prediction for future performance it is possible to assign player value. Player value is assigned by ‘cost per point’ metric. Taking the predicted points by the latest salary value. Using cost per point players can be ranked at each position.

Below we can see the top 10 most valuable players at each position for FanDuel through week 5 of the 2021 season.

### Quarterback

1. Justin Herbert

2. Lamar Jackson

3. Tom Brady

4. Patrick Mahomes

5. Jalen Hurts

6. Kyler Murray

7. Josh Allen

8. Sam Darnold

9. Matthew Stafford

10. Daniel Jones

### Wide Receiver

1. Marquise Brown

2. Mike Williams

3. Samuel Deebo

4. Cooper Kupp

5. Ja’Marr Chase

6. Antonio Brown

7. Tyreek Hill

8. Mike Evans

9. Davante Adams

10. D.K. Metcalf

### Running Back

1. Austin Ekeler

2. Derrick Henry

3. Kareem Hunt

4. Darrell Henderson

5. Ezekiel Elliot

6. Najee Harris

7. Jonathon Taylor

8. D’Andre Swift

9. Nick Chubb

10. Zack Moss

### Tight End

1. Mark Andrews

2. Dawson Knox

3. David Njoku

4. Dalton Schultz

5. Travis Kelce

6. Mike Gesicki

7. Kyle Pitts

8. Hunter Henry

9. Maxx Williams

10. Darren Waller

**Future Work**

Now that a value and points projection is assigned to each player, the user has information to be more informed when manually selecting players for their team. However, this is still not the best approach for team selection due to the large quantity of potential lineups. For better understanding let’s try and estimated the potential lineup choices. There are 32 NFL teams each with multiple players at each position. For the sake of calculation let’s assume the fantasy lineup requires the following example: 1 Quarterback, 2 Running Backs, 3 Wide Receivers, 1 Tight End, 1 Defense, and 1 Kicker.

**Data Optimization Problem**

Assume each NFL team has the following number of players at each position listed: 1 Quarterback, 2 Running Backs, 3 Wide Receivers, 2 Tight Ends, 1 Kicker , and 1 Defense.

The formula for calculating the combination is: C = n!/((n-r)! r!)

C(n,r)

C = combination

n = number of items

r = items to select

QB: n = 32, r = 1

RB: n = 64, r = 2

WR: n = 96, r = 3

TE: n = 32, r = 1

K: n = 32, r = 1

DEF: n = 32, r =1

32 * 2,016 * 142,880 * 32 * 32 * 32 = 302 Trillion Lineups

Due to the computational cost, iterating through each lineup and solving for most projected points is not realistic. We are left with 2 options for further optimization Random Walk (randomly generate a designated number of lineups and select the best one) or use Integer Linear Programming to try and solve for the optimal solution. These 2 alternatives will be explored in future phases of this project.