Data Analysis on England Soccer Club Teams

Posted on Jul 27, 2016


Contributed by Le Wei. He is currently in the NYC Data Science Academy 12 weeks full time Data Science Bootcamp programs taking place from July 5th to September 23th. This post is based on the first Class project-R visualization

The Premier League is an English professional league for men's association football clubs. At the top of the English football league system, it is the country's primary football competition. Contested by 20 clubs, it operates on a system of promotion and relegation with the English Football League.

The original data is England Premiere league season 13-14. The original data includes the team names, home or guest games, match score for each round. I adapted the original table and created a new table including the total score and loss for every team in this league, total points each team got in the entire season, home performance and guest performance and the standings for each team. Also from the data I calculated which team perform better in the second half performance.


Home team performance:


Guest team performance:


From above, it is easy to see both the performance as home team and gust team. The strong teams like Man City, Liverpool, Arsenal and Chelsea had strong performance in both home game and guest game. The other good team, for example like Tottenham hot spurs and Manchester United, even though their guest winning game numbers are almost as same as the other strong team, but it is easy to see as a home team, their performance are not as good as Man City, Liverpool, Chelsea and Arsenal, which could be the reason Tottenham and Manchester United are not in the first tier teams in England Premier League.


Analysis Part Two:


In this graph, I defined good second half team as the teams with tied up score or fall behind their opponent when first half ends, and win the game finally. The data shows those teams must have strong attack ability during the second half. As we can see, Fulham, as one of the worst teams, has the most second half performance games among all teams.

But consider their winning games in total is merely 10 games, which means Fulham is not a good team at first half, or Fulham is not a good team to keep the first half winning score. Also we can see Manchester United as one of the strong teams, didn’t even have one good second half game, which means this team is not good at games that are fall behind their opponents in the first half.


Analysis Part 3:

Teams and Total goals

Teams and totalgoallost

The analysis part 3 focuses on total goal scored and total goal lost for each team in the entire season. The x axis is arranged by the team standing top to bottom from left to right. As we can see from first graph, generally, the graph forms a decreasing line which means the weaker the team, the less goal they scored. Although some points are up and down for instance, Crystal Palace, as a mid-level team, surprisingly the score number is extremely low, so what keeps this team from stepping into the relegation zone.

Let’s move to the second graph which is the total goal lost for each team. The x axis also shows the team standings top to bottom from left to right. As we can see, the graph generally form an increasing line which means the weaker team, the more goals they lost. From this goal loss graph, we can see Crystal palace has very good defense. Their team goal loss is way less than those relegation zone team, which explained why this team ends up the whole season with a relative good standings.


Now let’s see why Chelsea, one of the top teams, ended up the season as third place but not the first place even though they have the least goal lost in the entire season. Let’s move to the total goal score graph. Compare to other strong teams like Manchester City, and Liverpool, Chelsea scored much less than these two teams. So maybe lack of attacking ability is one of the reasons Chelsea end up the third place but not the first place in in this season



Based on those graphs, it is easy to analyze if one team is lack of the attacking ability or defense ability. Or what area a team should be improve on for next season. It is easy to see what good aspects a team should be keeping next season.

About Author

Le Wei

Le is a data scientist enthusiast, he brought his passion for data science to this bootcamp. He was majored in marketing when he was reading bachelor's degree. He learned from college that how to make the right decision...
View all posts by Le Wei >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI