Ladder point system seems broken

sort

raaar

3 years ago

I went to play some ZK big teams after some weeks of low activity

had 2626 ladder points (probably, remember seeing it on 2620s)
win https://zero-k.info/Battles/Detail/1547362 (est win% 52%)
loss https://zero-k.info/Battles/Detail/1547396 (est win% 46%)
win https://zero-k.info/Battles/Detail/1547428 (est win% 52%)
loss https://zero-k.info/Battles/Detail/1547472 (est win% 43%)
win https://zero-k.info/Battles/Detail/1547488 (est win% 67%)

3 intercalated wins and 2 losses, all big teams games, the estimated win% was below 50% on the ones lost. Despite this, the ladder points ended up dropping to 2603. I think i only got 1 point for the last win.

This doesn't make sense. I had noticed weird variations before. Some games seem to influence a lot, then some others barely have any impact.

I've talked about this in discord but discussions there get lost in time so here's a forum thread. Has anyone paid attention to how their ladder points evolve with each game and notice weird behavior?

+0 / -0

Aquanim

3 years ago
(edited 3 years ago)

This is working as intended (or at least, as it is known to work). I may be wrong on a few minor details, but the general idea of the following should be correct:

Your "current rating" is the true Whole History Record rating. It can change quite a bit based on the outcomes of other people's games. (Possibly also by new players entering the system or players becoming inactive? Would not swear to this.)

Your "ladder rating" is determined by the following:

It only changes when you complete a game.
It must go up by at least 1 if you win, and it must go down by at least 1 if you lose.
Your ladder rating tries to converge to your current rating (edit: perhaps an average of it over a previous window of time?)

along with some other rules possibly? but these are the important ones.

So if your ladder rating is above your current rating and you win a game, your ladder rating will only increase by 1, but the win will drag your current rating up a bit. If you win enough games your current rating will catch up and surpass your ladder rating, at which point you will see more ladder rating gains.

Conversely, if you lose games but your ladder rating is below your current rating, your ladder rating will decrease by 1.

+2 / -0

raaar

3 years ago

Atm my "current rating" is 2611, which is higher than my ladder rating.

quote:

Your ladder rating tries to converge to your current rating.

Given the explanations i got, if i'm winning 2 points and losing 10 ladder points per battle, i'd expect a current rating significantly lower than my ladder rating, yet that's not the case.

People also told me the rating graph is retroactively changed so I can't get my past rating directly (it's confusing), and that the current rating adjusts itself automatically over time.

If the current rating changes over time based on players' history, the ladder ratings should change too, so people don't get surprise drops when they actually play some battles.

+0 / -0

Aquanim

3 years ago
(edited 3 years ago)

quote:
Given the explanations i got, if i'm winning 2 points and losing 10 ladder points per battle, i'd expect a current rating significantly lower than my ladder rating, yet that's not the case.

I expect your current rating was lower than your ladder rating up until the last game you played.

Also possible that the games played in the hours since your last game have shifted your current rating a little bit.

+0 / -0

GoogleFrog

3 years ago

I wouldn't even be 100% sure that current rating is real underlying rating. In any case, history for it doesn't exist.

The rating system doesn't put enough work into making sense. It isn't easy though, since the system just doesn't care about many of the things that a person watching it closely might. It wouldn't care if everyone's ratings go suddenly go up by 100. It doesn't care when ratings change, just that they are accurate. So without any attempt to make sense you would find your rating go down after wins. It only barely cares about your "most recent" rating.

Ideally ratings would go up by more than 1 on most wins, with some smooth tweak to rating gain cap, rather than suddenly hitting a limit of 1. I forget whether this was implemented.

+0 / -0

malric

3 years ago

To improve "watching" the system probably the easiest thing to change would be the win estimation shown in the battle page. It should be at least called textually "estimation based on current rating". It could show also "estimation based on rating at battle start", but it will be quite boring for large battles as it will be 50%.

+0 / -0

raaar

3 years ago

I go in to play some casual big teams, get a big points drop on the first few battles if I happen to lose, then if I try to get them back it's a trickle and requires grinding a lot of battles (almost feels like it's built that way on purpose). This time I caught the system red-handed. Casual and big teams means it should be a low stakes battle. Stakes of each battle should at least be consistent so people know what to expect. If the reason for the drop is the general adjustment of "current rating", then it should also affect the ladder rating automatically, not punish the player for playing.

That a system converges to match the players' skill over time is not a good defense. If you changed that rating system to multiply the point change of each battle by a dice roll, over time it'd still converge to people's relative skill level, but it'd feel much worse for people paying attention to the changes.

quote:

The exact change is calculated as a moving average that converges to the average WHR of the last 30 days

Does a day when a player played once or not at all weigh as much as a day when they played 30 battles?

Stuff that should make the system care less about the battle outcome (lower points changes):
- large number of players

- uncertainty because player hasn't played recently
(at least it shouldn't raise the stakes unless affecting accounts with few battles to make smurfs rapidly converge to their skill level)

- large skill variation across players

- unevenness in number of players relative to team sizes

- casual mode

Another suggestion is to put into the chart the top% the player's on instead of the rating number, which apparently isn't meaningful.

+1 / -0

raaar

3 years ago

I've tried a bunch of big team games today to see how the ratings evolve. This is what happened:

Current rating: 2608.721 ± 135.0551 (w2 = 0.00281461)
Ladder rating: 2602.885

won 1 battle https://zero-k.info/Battles/Detail/1548078 est win% 53%

Current rating: 2613.363 ± 135.0833 (?2 = 0.002814511)
Ladder rating: 2604.01

won 1 battle https://zero-k.info/Battles/Detail/1548117 est win% 52.5%

Current rating: 2616.958 ± 135.195 (w2 = 0.002814425)
Ladder rating: 2605.052

won 1 battle https://zero-k.info/Battles/Detail/1548146 est win% 56.1%

Current rating: 2621.996 ± 135.297 (w2 = 0.002814306)
Ladder rating: 2606.255

lost 1 battle https://zero-k.info/Battles/Detail/1548174 est win% 45%

Current rating: 2617.902 ± 135.4136 (?2 = 0.002814206)
Ladder rating: 2602.054

does this make sense to you?

+0 / -0

dyth68

3 years ago

raaar : Would need much more info to say.

If you want to make an alternative rating system that's just for display to players who want to measure their progress then by all means make one! But if you want to replace WHR for the purposes of !predict and !balance then you'll need to show it gives fairer games.

+0 / -0

malric

3 years ago

quote:
That a system converges to match the players' skill over time is not a good defense. If you changed that rating system to multiply the point change of each battle by a dice roll, over time it'd still converge to people's relative skill level, but it'd feel much worse for people paying attention to the changes.

In an 16v16 I think you have a lot of "dice rolls" manifesting in the form of "what are some people going to try now". I can immediately think of "highly variable" players just from memory - with strategies that can work or fail spectacularly. The idea of casual is for me accept there is some randomness/fun in each game outcome.

quote:
Another suggestion is to put into the chart the top% the player's on instead of the rating number, which apparently isn't meaningful.

While I would personally like your proposal, keep in mind that top% should be of the active player (otherwise you have no way of comparing with someone that played 5 years ago let's say). Then your change in "top%" will also change with number of active players without playing any game - again less than ideal.

I like ratings/ladders and thinking about the problem but do not believe there is a "simple/clear/consistent" way, unless it's 1v1 and you play all vs all. The ladder, the chart, the colors, the rating are all indications of skill and I prefer to have inexact indications than not.

+0 / -0

Aquanim

3 years ago
(edited 3 years ago)

quote:
I go in to play some casual big teams, get a big points drop on the first few battles if I happen to lose, then if I try to get them back it's a trickle and requires grinding a lot of battles (almost feels like it's built that way on purpose).

This happens to be the case for you at the current point in time based on your recent game history. At some other point in time you will experience the reverse, where you gain a lot on victories and don't lose much for losses. I have personally experienced both sides of the phenomenon quite a few times.

quote:
does this make sense to you?

It is entirely consistent with my understanding of how the system works.

My advice is that if you really care about your rating, track your current rating or your percentile progress to next rank.

+0 / -0

MSPR

3 years ago

There's a convergence of ladder WHR to the moving average of the (constantly reculated) WHR of the past 30 days.

Your average of the past 30 days is around 2592.

Hence you converge to that. If your ladder ranking is higher than the average of the past 30 days your wins mean a higher increase, your losses mean a lower decrease and vice versa.

If you waited 10 days (all else being equal), you would've start converging to 2616.

Thing is the WHR constantly recalculates. A RiposteR suddenly losing 400 elodiotes over testing a new Zero-K version is like a monstrous butterfly flapping its wings so hard he ripples space-time-fabric, restructuring the entire system. Conclusion: don't be too bothered about it all.

+1 / -0

GoogleFrog

3 years ago
(edited 3 years ago)

Why average over 30 days? That seems like a lot and would seem to cause the problem

raaar is having quite frequently. There was a meme about it being optimal to only play matchmaker once a month. I think it came from the 30 day average thing.

Also, going up by 1 point or down by 4 to fix a disconnect of 10 points seems far too extreme. What would even be wrong with your ladder rating being off by 50? Really, when you think about it, what would be wrong with the ladder rating just being cut loose completely, to go up and down like elo, without caring about WHR? The latter could have edge cases and end up being a bit ugly, but it would seem to take an extreme difference to get to that point. WHR should barely bias ladder rating updating until at least a 50 point difference, and not ramp up the bias until after 100.

+0 / -0

malric

3 years ago

quote:
Really, when you think about it, what would be wrong with the ladder rating just being cut loose completely, to go up and down like elo, without caring about WHR?

The way I understand it is that the ladder rating "tracks" WHR because the agreement is that WHR provides the most accurate rating based on the available information (the games played by everybody).

The reason the ladder rating is not WHR is because people do not like their rating changing without playing (hence the averaging, tracking, etc.).

The biggest issue in interpreting the first post is the estimated win ratio at battle start. Let's assume the estimated win rating were at battle start the following (exaggerated for demonstration purposes):

win https://zero-k.info/Battles/Detail/1547362 (est win% 50%)
loss https://zero-k.info/Battles/Detail/1547396 (est win% 60%) - so this was expected a win but was a loss
win https://zero-k.info/Battles/Detail/1547428 (est win% 50%)
loss https://zero-k.info/Battles/Detail/1547472 (est win% 60%) - so this was expected a win but was a loss
win https://zero-k.info/Battles/Detail/1547488 (est win% 50%)

In this scenario it makes sense that overall the rating would go down as the player lost even when he had good chances...

Of course I have no clue how initial numbers were, but the current estimated win really does not help in understanding what happened.

+0 / -0

raaar

3 years ago

quote:

The biggest issue in interpreting the first post is the estimated win ratio at battle start.

I posted the win% immediately after the battles ended, so whatever the original estimated win% were, they should have been pretty close.

Apparently the explanation was this:

quote:
There's a convergence of ladder WHR to the moving average of the (constantly reculated) WHR of the past 30 days.
Your average of the past 30 days is around 2592.

quote:

Why average over 30 days? That seems like a lot and would seem to cause the problem raaar is having quite frequently. There was a meme about it being optimal to only play matchmaker once a month. I think it came from the 30 day average thing.

Having some averaging to smoothen the curves sounds good, but it should be done across games played, not days.......... If you want people to play more, don't incentivize them to delay playing for weeks.

quote:

The reason the ladder rating is not WHR is because people do not like their rating changing without playing (hence the averaging, tracking, etc.).

the user playing and winning 6 and losing 3 roughly even games and losing points relative to people who haven't played seems like a worse experience to me............ If you want people to play more, don't incentivize them to avoid playing.

I don't have an opinion on whether the current system produces or not better balanced teams compared to earlier systems. My complaints are mostly about how it inconsistently affects the ladder rating and gives bad incentives and that can probably be fixed without affecting its other features.

+0 / -0

malric

3 years ago

quote:
the user playing and winning 6 and losing 3 roughly even games and losing points relative to people who haven't played seems like a worse experience to me............ If you want people to play more, don't incentivize them to avoid playing.

It all depends on the whole team. I had game sessions in which the team that got a specific player always lost. Best is indeed to stop playing in those cases if you care about your rating. But I don't think it's the system issue, but rather that most people actually play "casual" (as opposed to blue players that mostly try to play their best game).

quote:
I don't have an opinion on whether the current system produces or not better balanced teams compared to earlier systems. My complaints are mostly about how it inconsistently affects the ladder rating and gives bad incentives and that can probably be fixed without affecting its other features.

I am sure it can be improved. The problem is the trade-off between the inconsistencies. Not sure if you follow much the topic, but my impression is that some time ago there were more complaints about "moving too fast up and down the ladder". Which also made people reluctant to play (not to loose, or only when they liked who was playing see above paragraph). I for one would prefer accuracy (so no tracking, just current state). Someone else would prefer slow moving in the ladder. Doubt there is one system that makes both happy at the same time.

I do think the proposed idea to show in the rating graph the place in the ladder would help though, as well as not showing the est winning (because they are misleading and I think they can jump a lot).

+0 / -0

raaar

3 years ago

quote:

It all depends on the whole team. I had game sessions in which the team that got a specific player always lost.

Each player's effectiveness varies across battles, some are more consistent than others and it affects the team outcome,etc. That can be annoying, but it's expected and fine.

I stopped playing for a few weeks, then tried again. When i got back, my current rating had risen like 80 points somehow, almost 100 points above my ladder rating and I went on to play a bunch of games. The opposite happened: got like 10-20 points per win, lost 1-5 per loss (some where my team was favoured to win). When I created this thread my starting "current rating" was already above the ladder rating, but the margin was much smaller, which probably affects the way scores evolved.

Anyway, it seems that having a recent history with higher or lower rating than the current ladder rating drastically affects the way scores evolve if a player decides to play a lot across a few days. A dozen even games with 50% wins can lead to drastically different outcomes depending on when the player decides to play, and that's not fine.

- If people's current rating is rescaled/recalculated, the ladder ratings should also be adjusted immediately for everyone.
- If there's some averaging done, it should treat battles equally and average across N battles, not N days

playing to improve rating "now" shouldn't be less efficient than a week from now or a month from now

+1 / -0

Forum index > General discussion >