Visualizing FIDE chess rating list

Aravind Kolumum Raja
Posted on Feb 1, 2016

Contributed by Aravind Kolumum Raja. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his first class project - R visualization (due on the 2nd Week).

FIDE, Fédération Internationale des Échecs or World Chess Federation is an international organization and one of the largest sporting bodies in the world ,connecting across  158 Federations across the world.  Across FIDE, the players are ranked according to the Elo System ( founded by Arpad Elo, a Professor of Physics and Chess Master ) . FIDE implemented the system since 1970 and has since remained the gold standard for rating players ever since.

 

The ratings system is designed such that the performance of a player is relative to the opponents played against(opponent's Rating)  and each Rating reflects  the cumulative results of all scores acquired over a period of time against various opponents.   The expected score of a Player (the probability of the player winning(along with half  the probability of drawing) is calculated from a logistic function  roughly indicates between players with an Elo difference of 400, the one with the higher rating will have a 90% chance of winning.   FIDE publishes its Elo rating list every month for players from all the countries across the world and this post analyses data from January 2016, the most recent list.

 

 

 

rating distribution overall

Women constitute only 11% of active population of players overall and the ratings seems to be normally distrubuted with a skew  to the left.  Below is a rating density plot across sex for all players.

 

 

rating density plots across sex

 

The  difference in Mean ratings between Male and Female active players seems to be significant.  A two sample t-test for means was conducted which concluded the significant difference in mean ratings between the two categories.On average, there is a 200 point rating difference across sex.

Welch Two Sample t-test

data:  female and male
t = -65.427, df = 15707, p-value < 2.2e-16
alternative hypothesis: true difference in means is less than 0
sample estimates:
mean of F mean of M 
 1574.693  1771.426

 

 

FIDE awards lifetime titles to approximately the top 7% of all its players, of which Grandmaster(GM) is the most coveted .

 

masters

Similar titles WGM,WIM,WFM and WCM are awarded to women .The following chart shows the distribution of top FIDE Chess masters across various titles. You will overlaps across titles at some level, especially between IMs and GMs around the 2400-2500 mark and between IMs and FMs (2300-2400).

 

 

Distribution of Masters

The Russians without any doubt reign supreme when it comes to the number of titled players in the world.  Germany is another strong chess playing nation followed by Spain and USA.

 

 

Rplot11

An interesting observation from the data is the relationship between age and rating and its distribution across the (x,y) plane.    The following  plot shows that the density of ratings & age is spread across all levels and all ages, making Chess a very unique sport in this regard.  It is indeed very hard to determine a rating of a player based upon just knowing the player's age.

 

Universal across age

 

However, we do see a significant negative relationship between rating and age when we look at the set of Grandmasters.   You can make an approximate guess of a players Rating by subtracting 3.8 times the GM's age from  2676 .  At the highest level, there are no GM's participating in top tournaments after the age of 50.

 

 

Grandmaster ratings across age

 

 

A cause for concern is the stark difference between the age densities between Male and Female players.  The  plot below indicates  that there are hardly any women players above the age of 25.  It raises interesting questions that can possibly even explain the ratings difference across sex.It is possible that women are retiring too early and not pursuing the sport competitively as men do .  Another possible explanation for the sharp drop in the participationage for women could be  societal in nature.  Demands on time, due  of cultural and societal expectations may result in this low participation rate. There is also the other explanation of women losing interest in the game after the age of 25 which seems highly unlikely in the context of the population spread across various countries.

 

 

age distribution discrepancy

 

 

When it comes to the percentage of females among the playing population, it may come as a surprise that east Asian countries like Vietnam,Mongolia and China have the highest percentage of females among federations. Denmark and Switzerland are among countries with the worst female ratios

 

percentage of females

 

femaleratiotop

 

 

The spread of players by age mirrors the demographic distribution of countries. Older players are found among the  aging populations of Europe whereas the youngest players are emerging from the countries with more dynamic population growth such as Sri Lanka, UAE and Korea.Denmark and Switzerland seem to again feature in the top list , this time for the most aging active chess population.

Average Player age across the World

 olderagelist

 

 

 

 

 

 

 

It was interesting to look into the frequency of GM's per capita in each country or roughly, the chances  that a random person you run into is a Grandmaster. Iceland and Armenia are among the top of the list. Armenia has always been known among the strongest chess playing nations in Europe .

.

Probability of Running to GM

 

GM_prob

 

 

 

 

 

Another observation was the confirmation of Ratings inflation over the decades among  Grandmasters . Below are two plots showing rating densities from 1975,85,95,2005 and present. The density curve has been shifting towards the right slowly but surely. The accompanying box plot shows the increase in outlier points and the shift in the 1.5x the Inter-quartile Range towards higher ratings across time.

 

density comparison of ratings data across years box plot variance increase across Rating

 

Variance tests for the pairs        (1985,1995) ,(1995, 2005)  and (2005,2016)  with the alternative hypothesis that the ratio of the Variances is less than one was conducted and each of the p values were 0.04,0.04, & 4.3*10-5 respectively and suggest that the variances have moved slowly towards the right across the years. This suggests the need to consider deflationary factors while comparing ratings of top players across eras.

 

 

 

 

 

 

 

 

About Author

Aravind Kolumum Raja

Aravind Kolumum Raja

Aravind obtained his Masters degree in Statistics from Columbia University in 2012 and is presently an Analyst with a global investment management firm based in New York. His primary interests are in Mathematics, Statistics & Machine learning. He...
View all posts by Aravind Kolumum Raja >

Leave a Comment

Avatar
android apk download games March 23, 2016
They are really good for the purpose they were created for, e-mail, surfing the internet and using a long life cycle of battery android apk files free download and crime was punished and pay with great force cash very little protest Android apk Downloader App If you don't eat and you live on the streets this figure significantly smaller android apk games 2015 which is rolling around inside my head because android apk games rpg

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp