Loading...
  OR  Zero-K Name:    Password:   

Ladder thought

40 posts, 1364 views
Post comment
Filter:    Player:  
Page of 2 (40 records)
sort
Last night after a few games with USrankRyMarq I made a passing comment that SErankGodde and DErankManu12's positions on the ladder were protected in some way, thanks to WHR inflation. Reflecting on my words and the games I had with USrankRyMarq, I figured out the depths of what I meant.

In my 4 games with USrankRyMarq, I won 3 and he won 1. This netted me a -50 Current rating. From a WHR perspective, it's not worth me playing against this guy. I'll never continue to progress up the ladder. This is in part due to my own WHR being inflated. But then I thought more about the situation. Because their WHR is so high, SErankGodde / DErankManu12 likely don't even often get matched up with players in the WHR bracket that can substantially diminish theirs. Do they have an equivalent of USrankRyMarq? Is that me, randy or izi? Has WHR inflation made it too much of a risk for them to play against us, thus leaving us forced to take matches from potential WHR assassins and inevitably never take their protected spot?

I watched my CR go up past 3300 this week. It was higher than SErankGodde's, especially after me and randy gave him a night of manhandling. But in the back of my mind, I knew it didn't mean anything because my ladder rating was barely moving an inch and SErankGodde's ladder and CR were barely moving down facing us. Though, that's a seperate observation.

I wonder if CHrankAdminDeinFreund has anything to say on the nature of the issue. The greater the WHR inflation has become, the more it has undermined the matchmaking algorithm, preventing the #1/#2 spots (roughly 200 LR higher than #3) from fighting incredibly skilled up-and-comers from the top 20 who provide great risk to me, but none to them through omission.

Edit: Another way of wording it is that people in and around my position (randy got beat by rymarq too recently) are prone to matches that are against very skilled players who haven't yet "benefitted" from inflation, and our ratings get toasted when this happens, but the top positions are denied this "inverse opportunity" as much.

Inb4 "just don't lose to these players". Unrealistic, people get better before their WHR catches up and with all the riot-raiders being questionably OP these days, it's really easy to get taken by surprise.
+0 / -0
Maybe this can be tracked back to the non-transitivity of the win chance calculation: If players A and B play against each other for many times, their win chance against each other will determine their rating difference. The same goes for players B vs C. From the resulting rating difference between A and C, the system calculates the win chance between A and C which is not necessarily correct. To see the more general effect, we can replace A by the category of low rated players, B by medium rated players and C by highly rated players. The effect may be increased by the matchmaker enforcing a maximum rating difference and by the matchmaker using a ladder rating that is calculated from the actual rating in strange ways.

Simple ways to reduce the problem slightly are to increase the rating difference that the matchmaker allows and (to reduce the deviation between ladder rating and actual rating or to let the matchmaker use actual rating). In a complicated way, the problem can be solved completely by using a neural network of multiple logistic function neurons trained by game data instead of the current single logistic function to calculate win chances.
+1 / -0

3 years ago
its almost like the old ladder- using elo, split into 1v1 and teams- was superior in every way
+0 / -0

3 years ago
DErankBrackman hit up a good point here: The logistic distribution is not a good assumption for player performance with large skill differences. This is a problem that was already encountered in chess as well. In more simple terms: Elo doesn't correctly predict outcomes when there is a large rating difference between players. (Drone: This is exactly the same for Elo and WHR) By artificially limiting the rating difference in the matchmaker we can limit WHR to a smaller part of the logistic distribution, avoiding the erroneus tails. The downside being that it causes bigger errors when players with large difference play each other. The proper fix would be to adjust the expected win chance calculation, for example using Brackman's suggestion.

To give a more extreme example from recent discussions, you can have a look at FFA games. If the players in the game were always very close to each other's rating, good players could rise very high. If the players were randomly sampled in each game, the lesser rated players could always team up and make it nearly impossible for the highly skilled player to reach a high win percentage. The fix would be a predictor that incorporates that ability of players to team up and thus assigns lower rated players a higher chance to win even against strong opponents.

An interesting experiment to make in this direction would be to simply plot the win chance as a function of elo difference and see where it deviates from the logistic distribution. NZrankesainane's tool nearly does that already.
+1 / -0
3 years ago
This plot would be interesting to see. If there are deviations, then my guess would be that the values are a bit closer to 50% for high absolute values of skill difference, but only the data can tell that. If so, the current win chance calculation can be replaced by a linear combination of about 2 logisitc and/or Cauchy–Lorentz cumulative distributions with different broadening which is equivalent to a neural network with 2 neurons in 1 layer. The number of free parameters can be reduced to 2*number of neurons minus 2 by making the coefficient sum = 1 and normalizing the broadening.

There are different levels with respect to how dynamic such a fit function or neural network could be: The parameters can be calculated once. They can be updated regularly. They can be included in the WHR calculation like players' ratings. By having a good initial guess, not more Newton steps would be needed, but the single steps would become more expensive. I think it is not necessary to go further like making different parameters for different players, adding many neurons or layers or making the neuron structure dynamic.
+0 / -0

3 years ago
At least a single scale parameter per player would be interesting though, consistency is very different between different players. I'm wondering how to fit this though, maybe it could be held constant during the general optimization and then optimized per player.
+0 / -0

3 years ago
when the Godde was rusty, he came out from 3100 elo to 3400 without any problems and he got all the players
+0 / -0
3 years ago
It would be interesting to plot "best calculation, including current post-battle information, for player ratings in the battle" versus "best calculation, using only information available at the time that the battle started, for player ratings in the battle".

This would probably involve adding a new API to ZKI for bulk queries, and a slower timer to zkstats, since fetching >19,970*2 data points every two hours might be a bit much. For the front end, I can imagine this being a separate opt-in feature flag, say #fullwhr.
+0 / -0

3 years ago
NZrankesainane could you export the data of player rating difference vs victory (bool) you have now? I could make an API for you if you want. It costs absolutely nothing to get WHR data points, you can have 20k in a single request if you can fit the battle IDs.
+0 / -0
3 years ago
CHrankAdminDeinFreund

$ curl -s https://zkstats.antihype.space/data/live.json | js -e 'JSON.parse(require("fs").readFileSync(0).toString()).forEach(d=>!d.skip&&console.log(d.winner_elo_lead))'

[Spoiler]
+1 / -0


3 years ago
CHrankAdminDeinFreund and DErankBrackman how about a simple solution rather than an increasingly complex system? For example, we could just declare that players in the top 20 are always able to match against each other.
+0 / -0
There is a weakness in WHR here, which can't be easily fixed by just matching players differently. The difference a fix would make would be rather minute though, probably on the order of 1-5% win chance. No fix would let USrankDregs catch up to SErankGodde with the current standings.

Regarding matching everyone in top 20: Personally, I don't want to be matched against players where I have a <10% win chance, even if @Dregs thinks that I have a higher win chance (which I haven't experienced). So now I'm supposed to be pit against much stronger enemies just so that they might make a mistake and lose all their rating? I find the rating gap from me to Randy and Godde to be very real, and would like to have the choice whether I feel lucky today. Then I play against them in a custom host.
+1 / -0

3 years ago
NZrankesainane
curl -X POST -H "Content-Type: application/json" --data "{'battleIds': [953624, 953536]}" http://test.zero-k.info/api/whr/battles


Coming to live soon.
+1 / -0


3 years ago
If you can't reasonably fix the historical Fuckruement of the system, add another motivational vote for the new season / league refresh idea.
+2 / -0

3 years ago
If SErankGodde agrees, you two can have a rating reset and have at it. Then we can check in a month just how bad WHR's prediction was.
+1 / -0
Very funny. Although you know how unrealistic that is. For the record this isn't about who's position is where on the ladder, I'm pretty sure SErankGodde has earned his place. But I do think the mechanics that I've mentioned make our positions on the ladder prone to different experiences and highlight a minor issue at least. On one side of the coin, you could say that matching with ANYONE is a bad investment for DErankManu12 and lesser so for SErankGodde, and that the league point system is a very healthy idea motivating them to make their positions more vulnerable/assailable.
+1 / -0


3 years ago
quote:
From a WHR perspective, it's not worth me playing against this guy. I'll never continue to progress up the ladder.


quote:
Because their WHR is so high, Godde / Manu12 likely don't even often get matched up with players in the WHR bracket that can substantially diminish theirs. Do they have an equivalent of RyMarq? Is that me, randy or izi? Has WHR inflation made it too much of a risk for them to play against us, thus leaving us forced to take matches from potential WHR assassins and inevitably never take their protected spot?


Rating is a measurement, not a competition. It's not a question of whether it's fair for one player to lose so many or so few rating points given the results of a match, it's only a question of whether it's accurate.

The WHR ranking list probably shouldn't be called a "ladder", since that normally connotes a competitive arrangement akin to an ongoing tournament, where one's place on the ladder can be modestly out of proportion to one's skill, and where placement is expected to fluctuate on a fairly regular basis simply due to variance. If we established an actual competitive ladder and de-emphasized the WHR ratings (I object to concealing them entirely, but I see no reason to object to making them non-obvious and not prominently displayed) then maybe we'll get fewer people obsessing over something they can't easily control and let them direct their competitive urges towards something that's much easier to grasp, i.e. their position on a tournament ladder.

WHR should still be used for matchmaking, of course, perhaps modified or informed by CHrankAdminDeinFreund 's and DErankBrackman 's ruminations on the limits of its accuracy (which, I suspect, are generally negligible from a practical standpoint).
+2 / -0


3 years ago
quote:
Rating is a measurement, not a competition. It's not a question of whether it's fair for one player to lose so many or so few rating points given the results of a match, it's only a question of whether it's accurate.

With this view you miss the reality of what some not-insignificant portion of people want from a rating system, and how they'll interact with one even if it doesn't seem fit for the purpose.
+2 / -0
quote:
Rating is a measurement, not a competition.

Isn`t it a measurement of how far in the competition you got?

+0 / -0
quote:
With this view you miss the reality of what some not-insignificant portion of people want from a rating system, and how they'll interact with one even if it doesn't seem fit for the purpose.
I think the second paragraph of that post explains quite clearly how the reality can be taken into account (make one more "ladder", leave WHR rating available, but not as visible Edit: and to be clear: not hidden from players, would hate that!).

The view (that WHR targets accuracy) seems correct considering how is the WHR rating used: to improve matchmaking. Also, there probably are some people that are interested in the system as it is (that encourages accuracy over perceived fairness). I think the best is just to accept there can't be one rating/ladder to make everybody happy, so just show multiple. I prefer the accurate one, but I would not mind being low/incorrect on some other "ladder" that encourages regular play...
+0 / -0
Page of 2 (40 records)