today I'm not presenting you the next step in the evolution of rating systems. No, I'll directly jump over the dark ages of TrueSkill and Glicko to the best of the best. La crème de la crème des classements.
Whole History Rating
Smurfs are quickly put in their appropriate skill group without negatively affecting the rank of people they play against at the beginning
An Elo reset would have less negative impact
Pluks are recognized
Ladders are more accurate and less affected by the outcome of single games
Skill development during inactivity is simulated
WHR keeps track of skill variations and assigns a region of confidence to every value, effectively showing how accurate the value is. This uncertainty increases when no games are played for some time.
A "rating" is always the whole history of a player's skill, from when he started playing zk to his latest game.
Ratings are always adjusted as a whole. This means that past skill values are changing all the time. For example naturally good players will end up with a high starting value.
So if you lost vs Godde in one of his first games, the rating system will discover that Godde is actually a very good player and make sure your rating wasn't negatively affected by losing to him while he wasn't properly rated yet.
Another example would be a group of friends always playing with each other. With the current system their ratings would form a local system that is not affected by how it compares to outsiders. With WHR, if a single member of the group went and played vs an outsider, the ELO of all group members would be adjusted.
Enough talking, let's see some graphs!
(all time 1v1 ratings)
These ratings are centered around zero, so the average nub will have 0 rating.
Randy started higher than Firepluk ever went :/
And to demonstrate the algorithm doesn't just boost everybody's ego by starting above zero
But he quickly made his way up ;)
If you're wondering what the x-axis means: It's the battleID divided by 200 ("Days"). The system assumes that skill stays constant within one day. The Y-axis is just the minimum and maximum of the confidence interval with the center being marked as well.
Original publication: https://www.remi-coulom.fr/WHR/WHR.pdf
My implementation is based on an existing ruby implementaiton