Visualizing FIDE chess rating list
Contributed by Aravind Kolumum Raja. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his first class project - R visualization (due on the 2nd Week).
FIDE, Fédération Internationale des Échecs or World Chess Federation is an international organization and one of the largest sporting bodies in the world ,connecting across 158 Federations across the world. Across FIDE, the players are ranked according to the Elo System ( founded by Arpad Elo, a Professor of Physics and Chess Master ) . FIDE implemented the system since 1970 and has since remained the gold standard for rating players ever since.
The ratings system is designed such that the performance of a player is relative to the opponents played against(opponent's Rating) and each Rating reflects the cumulative results of all scores acquired over a period of time against various opponents.
The expected score of a Player (the probability of the player winning(along with half the probability of drawing) is calculated from a logistic function roughly indicates between players with an Elo difference of 400, the one with the higher rating will have a 90% chance of winning. FIDE publishes its Elo rating list every month for players from all the countries across the world and this post analyses data from January 2016, the most recent list.
Women constitute only 11% of active population of players overall and the ratings seems to be normally distrubuted with a skew to the left. Below is a rating density plot across sex for all players.
The difference in Mean ratings between Male and Female active players seems to be significant. A two sample t-test for means was conducted which concluded the significant difference in mean ratings between the two categories.On average, there is a 200 point rating difference across sex.
Welch Two Sample t-test data: female and male t = -65.427, df = 15707, p-value < 2.2e-16 alternative hypothesis: true difference in means is less than 0 sample estimates: mean of F mean of M 1574.693 1771.426
FIDE awards lifetime titles to approximately the top 7% of all its players, of which Grandmaster(GM) is the most coveted .
Similar titles WGM,WIM,WFM and WCM are awarded to women .The following chart shows the distribution of top FIDE Chess masters across various titles. You will overlaps across titles at some level, especially between IMs and GMs around the 2400-2500 mark and between IMs and FMs (2300-2400).
The Russians without any doubt reign supreme when it comes to the number of titled players in the world. Germany is another strong chess playing nation followed by Spain and USA.
An interesting observation from the data is the relationship between age and rating and its distribution across the (x,y) plane. The following plot shows that the density of ratings & age is spread across all levels and all ages, making Chess a very unique sport in this regard. It is indeed very hard to determine a rating of a player based upon just knowing the player's age.
However, we do see a significant negative relationship between rating and age when we look at the set of Grandmasters. You can make an approximate guess of a players Rating by subtracting 3.8 times the GM's age from 2676 . At the highest level, there are no GM's participating in top tournaments after the age of 50.
A cause for concern is the stark difference between the age densities between Male and Female players. The plot below indicates that there are hardly any women players above the age of 25. It raises interesting questions that can possibly even explain the ratings difference across sex.It is possible that women are retiring too early and not pursuing the sport competitively as men do . Another possible explanation for the sharp drop in the participationage for women could be societal in nature.
Demands on time, due of cultural and societal expectations may result in this low participation rate. There is also the other explanation of women losing interest in the game after the age of 25 which seems highly unlikely in the context of the population spread across various countries.
When it comes to the percentage of females among the playing population, it may come as a surprise that east Asian countries like Vietnam,Mongolia and China have the highest percentage of females among federations. Denmark and Switzerland are among countries with the worst female ratios
The spread of players by age mirrors the demographic distribution of countries. Older players are found among the aging populations of Europe whereas the youngest players are emerging from the countries with more dynamic population growth such as Sri Lanka, UAE and Korea.Denmark and Switzerland seem to again feature in the top list , this time for the most aging active chess population.
It was interesting to look into the frequency of GM's per capita in each country or roughly, the chances that a random person you run into is a Grandmaster. Iceland and Armenia are among the top of the list. Armenia has always been known among the strongest chess playing nations in Europe .
Another observation was the confirmation of Ratings inflation over the decades among Grandmasters . Below are two plots showing rating densities from 1975,85,95,2005 and present. The density curve has been shifting towards the right slowly but surely. The accompanying box plot shows the increase in outlier points and the shift in the 1.5x the Inter-quartile Range towards higher ratings across time.
Variance tests for the pairs (1985,1995) ,(1995, 2005) and (2005,2016) with the alternative hypothesis that the ratio of the Variances is less than one was conducted and each of the p values were 0.04,0.04, & 4.3*10-5 respectively and suggest that the variances have moved slowly towards the right across the years. This suggests the need to consider deflationary factors while comparing ratings of top players across eras.