Page of 4 (76 records)

sort

DeinFreund

8 years ago
(edited 8 years ago)

Dear

ELOn addicts,

today I'm not presenting you the next step in the evolution of rating systems. No, I'll directly jump over the dark ages of TrueSkill and Glicko to the best of the best. La crème de la crème des classements.

Whole History Rating

Advantages

Smurfs are quickly put in their appropriate skill group without negatively affecting the rank of people they play against at the beginning
An Elo reset would have less negative impact
Pluks are recognized
Ladders are more accurate and less affected by the outcome of single games
Skill development during inactivity is simulated

Uncertainty

WHR keeps track of skill variations and assigns a region of confidence to every value, effectively showing how accurate the value is. This uncertainty increases when no games are played for some time.

Rating History

A "rating" is always the whole history of a player's skill, from when he started playing zk to his latest game.

Time travel

Ratings are always adjusted as a whole. This means that past skill values are changing all the time. For example naturally good players will end up with a high starting value.

So if you lost vs Godde in one of his first games, the rating system will discover that Godde is actually a very good player and make sure your rating wasn't negatively affected by losing to him while he wasn't properly rated yet.

Another example would be a group of friends always playing with each other. With the current system their ratings would form a local system that is not affected by how it compares to outsiders. With WHR, if a single member of the group went and played vs an outsider, the ELO of all group members would be adjusted.

Examples

Enough talking, let's see some graphs!
(all time 1v1 ratings)

These ratings are centered around zero, so the average nub will have 0 rating.

Randy started higher than Firepluk ever went :/

And to demonstrate the algorithm doesn't just boost everybody's ego by starting above zero

But he quickly made his way up ;)

If you're wondering what the x-axis means: It's the battleID divided by 200 ("Days"). The system assumes that skill stays constant within one day. The Y-axis is just the minimum and maximum of the confidence interval with the center being marked as well.

Original publication: https://www.remi-coulom.fr/WHR/WHR.pdf
My implementation is based on an existing ruby implementaiton

+14 / -0

Sprung

8 years ago
(edited 8 years ago)

(nvm)

+2 / -0

gajop

8 years ago

As a player I definitely wouldn't want to see my rating change without me doing anything.
Balance isn't everything.

+1 / -0

Aquanim

8 years ago

The other way of dealing with that problem is to display an abstracted representation to players.

+0 / -0

Shadowfury333

8 years ago
(edited 8 years ago)

If your visible rating only changed due to playing games, I'm sure it would be fine. I mean, yes, it wouldn't be super accurate if you haven't played in a while, but that would be the case anyway. Otherwise, it would change same as always, just with a slight hidden factor for correcting to the true internal rating based on history changes.

+0 / -0

Llamadeus

8 years ago

More graphs (ო╹з╹)ო

+0 / -0

DeinFreund

8 years ago
(edited 8 years ago)

gajop but that's already what Elo decay does, just badly. In the newest infra your malus will simply increase linearly over inactivity time, decreasing your visible Elo.

Licho correct me if I'm wrong

Meanwhile this system actually improves your rating accuracy using future game results.

For example I've just lost a lot of games vs llamadeus, lowering my Elo and upping his, slowly. This system would recognize that he didn't start at zero and instantly place him in a high rank. It also would reduce the impact of him always defeating the same person.

If you're a smurf this system will quickly place you where you belong.

If you have sudden changes in skill this system will use uncertainty to represent your plukiness.

If you're a top player playing regular games, ratings will get very precise, removing the random ladder juggling we currently have.

If you're doing Elo resets, this system will be able to cope with it much better.

Major changes to your Elo will only occur if you "cheated" the classic Elo system, for example by always fighting against a limited group of people or only playing on a few selected days. Somebody who regularly fights a broad range of people wouldn't notice any changes.

If the system is too complicated for players to understand, they can finally concentrate on playing without thinking how every single game will affect their Elo, choosing their games based on win chances or avoiding games against smurfs or with pluks on their team just because of how Elo is foiled by this.

+6 / -0

aeonios

8 years ago

This system definitely seems cool. The biggest disadvantage it carries is that it requires infra to update everyone's ratings in bulk once per day which I suspect would be fairly expensive and also doesn't give immediate feedback after every game.

+0 / -0

DeinFreund

8 years ago

For testing against Elo I set it up so it'd update everyone after every game. If this is too slow, you can also selectively update players. I.e. only update who has played after each game and do a full iteration every day.

A full iteration, calculating the whole history of all players over all 1v1s ever played takes about a second on my laptop. So slower than Elo, but I don't think we can't handle it ;)

+1 / -0

GoogleFrog

8 years ago

Looks good. What does the rating number actually mean though? What claim does it make about the outcome of future battles?

+0 / -0

DeinFreund

8 years ago
(edited 8 years ago)

The number works the same as Elo. (So it's actually Elo, kinda)

Win chance = 1/(1 + 10^((rating2 - rating1)/400))

Thus we could keep using the existing balancer/ladders/rankicons just with something else calculating the values. We'd just have to add 1500 to all values to get the numbers we're so used to.

The only difference is that ratings are now time dependent, but for our use case we'd usually just want the current values, so that argument is fixed.

+0 / -0

aeonios

8 years ago

Calculating every time a game finishes seems like it might be a bad idea esp if we get a lot of players and have a lot of games running, but if it only takes one second it could be calculated every 5-30 minutes and be fine.

+0 / -0

DeinFreund

8 years ago
(edited 8 years ago)

To make a few more interesting examples, I've started using the MM database where we only have very limited information about each player.

(Arbitrarily chosen 6 peoplers)
Licho is still 1900 on ladders, WHR doesn't think so. So it'll obviously never be approved :P

For example

hedgehogs has played a total of 3 games, but the system has already rated him away from zero, while Elo would just jitter around:

(The graph is only 2 data points)

Some other peoplers

As you can see,

Llamadeus gets a big rating right from the beginning

Sad.

X axis is now days since 1970.

+1 / -0

DeinFreund

8 years ago

aeonios I wouldn't recalculate everything after each game. Just updating the ratings for the involved players would be enough and should be nearly as fast as Elo.

+1 / -0

aeonios

8 years ago

I thought it had to account for everyone systematically. o.O If it can be done without having to account for every user on the system then updating after every game would be ideal.

+0 / -0

DeinFreund

8 years ago

It tries to optimize the rating function (time dependent) for each player independently, keeping the rating of other players constant. Doing this repeatedly for all players leads to the fancy behaviour. Thanks to this it doesn't need to update all players at once.

+1 / -0

aeonios

8 years ago

quote:
Another example would be a group of friends always playing with each other. With the current system their ratings would form a local system that is not affected by how it compares to outsiders. With WHR, if a single member of the group went and played vs an outsider, the ELO of all group members would be adjusted.

How does that work then?

Also how does it handle team games and FFA? I could see it being beneficial for those since it seems like it wouldn't need to water down rating changes artificially and could assign a meaningful rating more quickly.

I think this system would be "bad" for seasonal ladders since it can peg skill more quickly and also because it works better when you have more data, but that's more of a limitation of seasonal ratings than it is of the system itself. I've been thinking of designing a league system based on chess/shogi ranks, but I need to do more research to see how that works. What I do know though is that moving up in rank in pro chess/shogi is very competitive and basically requires having the highest win ratio within your current rank, which I think would translate well for bragging rights and possibly allow for more interesting tournaments. I think it sucks that tournaments only have one bracket because then it's always drone/godde/googlefrog that wins and newb/low elo players barely even get to compete and get basically nothing for it.

+0 / -0

Fealthas

8 years ago
(edited 8 years ago)

Don't see why this shouldn't be implemented. "ladder" should be based on something else, like points for winning games, so if a worse player wins games against good people they can still be #1.
[Spoiler]

+1 / -0

[ISP]Lauri

8 years ago

I'd be interested to see my graph as well.

+0 / -0

hedgehogs

8 years ago
(edited 8 years ago)

YAY I HAVE A COLORED ICON!

Will this work for teams Custom games? Or do we need a new Elo for that?

What if teams is all nubcom trolls?
Or team is full of nubs and wins against Godde team?
What if we has godde team but we lose to nubteam?

+1 / -0

Page of 4 (76 records)

Forum index > General discussion >

Whole History Rating (improved ELO)

Whole History Rating

Advantages

Uncertainty

Rating History

Time travel

Examples