math bork

sort

Sprung

9 years ago

Elo win chance between two values is

1 : (2^(difference/120))

the constants don't exactly matter, the thing is that it's exponential.

So a guy with:
1500 has 1:2 win chance against 1620
1380 has 1:2 win chance against 1500
1380 has 1:4 win chance against 1620

So we could assign the guys linear skill values:
1380 -> 1
1500 -> 2
1620 -> 4

In a 2v2 where the teams are
1500, 1500 VS 1380, 1620

the linear Elo average is equal but since Elo is exponential we should use the linearized value instead which gives us unequal 2+2 vs 1+4.

So our team Elo is bork and should first convert to some sort of log function before the linear average. I'd like to implement that (someday) but want math people (

Brackman

TheEloIsALie etc. ) to comment because maybe the thought line is wrong or there's some caveat.

+1 / -0

DeinFreund

9 years ago
(edited 9 years ago)

quote:

So our team Elo is bork and should first convert to some sort of log function before the linear average.

Not exactly sure what you mean by this, but if you want X + X vs (X - 120) + (X + 120) to result in 4:5, we'd just have to use your formula:

playerstrength=2^(ELO/120)

This is the same as comparing players to a theoretical 1 elo player. The linear averages would then be off by this factor of 1.25.

One thing current balance theoretically allows is a <0 strength player. The exponential function wouldn't allow for this.

Forum broken, here's a closing bracket to fix it: >

+0 / -0

Sprung

9 years ago
(edited 9 years ago)

People with negative Elo aren't really any different to 1100 people, in that all they do is lower the team average. For all actual purposes only the difference is used, also under the current system.

Using 2^(elo/120) sounds ideal from a math standpoint but I'm somewhat worried about overflows and such.

There are also social issues. If this system gets used then everyone has a wrong value because the systems are self-reinforcing. In the long run it makes the ratings more accurate but for some time, before the system adjusts people's ratings, the games would be unbalanced and crappy.

+0 / -0

DeinFreund

9 years ago
(edited 9 years ago)

The values would be between 2^10 (ELO 1200) and 2^20 (ELO 2400). Floating points shouldn't have any problem with this. Make sure to add them up in ascending order, to prevent floating point inaccuracies.

A higher adjustment rate for the first few games after the change should help with the recalibration.

+0 / -0

CrazyEddie

9 years ago

This comes up over and over.

We have a database of all games played, all players in each game, and all results of each game. If you think that the current balancing method does not result in balanced games, or if you think you have a better balancing method, you can demonstrate the problems with the current system and/or the benefits of a new system by running tests against the historical data.

Here are two threads that discuss these kinds of issues and how you can use the historical data to do testing and validation:

http://zero-k.info/Forum/Thread/20208

http://zero-k.info/Forum/Thread/7402

+0 / -0

hokomoko

9 years ago

The problem is that you can't compare it with that data since people's skill isn't constant, it changes over time.
Moreover, the balancing throughout that time was based on the flawed algorithm, which means that your sampling of battles is already heavily biased.

+0 / -0

TheEloIsALie

9 years ago
(edited 9 years ago)

hokomoko : If it's so biased, then your algorithm/system shouldn't have a problem with predicting the winner better than the current one.

Given those games, you're also supposed to "start over" with your own elo system. It might be a bit off at the start, but the battles were predicted by the current system while player skill was also varying, so that's not an excuse.

quote:
So we could assign the guys linear skill values:
1380 -> 1
1500 -> 2
1620 -> 4

In a 2v2 where the teams are
1500, 1500 VS 1380, 1620

the linear Elo average is equal but since Elo is exponential we should use the linearized value instead which gives us unequal 2+2 vs 1+4.

I don't think this makes sense. You are essentially adding fractions by adding the nominators and denominators separately. It may make sense here on some level, but transforming the elo value into another measure yields no reason for why team elo should be linear in that measure instead of the other one.

+0 / -0

CrazyEddie

9 years ago
(edited 9 years ago)

There's two ways you can go about this.

Way One: You can assume that an individual's rating at any point in time - as caluclated by the Zero-K infrastructure - is reasonably correlated to their skill at the game and the effectiveness of their contributions to their team's victory, i.e. you can assume that that individual ratings are basically correct. You then suppose that the problem is the BALANCING method for team games, not the rating-determination method for individuals.

If that's your approach, then you can come up with your own algorithm which takes as its input the list of individuals who are playing a game and their respective Elo scores as of the time they play the game, and which produces an expected probability of one side winning. You can then do this for every game in the database. Then you bin the games and your resulting predictions into buckets of, say, 2% or 3% or 5% width. Then you see how well your predictions for each bucket match the actual results for that bucket.

I.E. you end up with, say, 1,000 games where one side is predicted (by your algorithm) to win at 55% to 57%. If that side turns out to have won 560 of those thousand games, then your algorithm is pretty good. If that side turns out to have won 400 of those thousand games, then your algorithm is pretty bad.

Way Two: You can assume that not only is the balancing method broken, but the RATING method is broken as well. Now your task is more complicated, but still possible.

You'll have to use the historical data to construct a new rating for each individual for each game they play, as of the time that they play that game, using your new rating method.

THEN you can test your new balancing method as above, using the new ratings that your new rating method has produced.

---------

You should probably also study the literature before you undertake this journey. If you are not already intimately familiar with the details behind FIDE's Elo calculations, Glicko, and TrueSkill, then you are not yet ready to be offering useful suggestions here.

+0 / -0

TheEloIsALie

9 years ago
(edited 9 years ago)

CrazyEddie, the balance and the rating system are not independent. In particular, the rating will "adapt" to the balance (and team compositions etc.), so method 1 is very unlikely to work out.

In the end, elo is just a model. It cannot reasonably predict the outcome of games that are influenced by so many factors that are unrelated to player elo: Map choice, time of day, game balance, player moral... You name it. Any other model will suffer from the same. The model will more or less adapt to these things (i.e. a player that is playing more during his "bad" time of day will lose elo).

Enter team compositions: The balancer is, in the end, just another one of those factors. The elo model doesn't know about it, and it likely can't model it properly, but it will be influenced by it one way or another. This is why using the current elo with another prediction algorithm (i.e. balancer) will not be fair.

quote:
There are also social issues. If this system gets used then everyone has a wrong value because the systems are self-reinforcing

You could just have both systems running in parallel, where old elo and balance is used until the new system has established reasonable values with how it would've predicted the games and who actually ends up winning.

PS: I'm waiting for IvoryKing to show up and tout his method. He does it in every single one of those threads (hope he's not breaking the tradition).

+0 / -0

Anarchid

9 years ago
(edited 9 years ago)

quote:
You name it. Any other model will suffer from the same

[Spoiler]

+1 / -0

TheEloIsALie

9 years ago

Sure, you start with times of all previous games, add the IP address and... Oh wait, you don't even have remotely enough data to train your NN with :)

+1 / -0

Sprung

9 years ago

I found the historic data files so I guess I'll present my results in the next week.

quote:
I don't think this makes sense. You are essentially adding fractions by adding the nominators and denominators separately. It may make sense here on some level, but transforming the elo value into another measure yields no reason for why team elo should be linear in that measure instead of the other one.

I am not adding fractions. I assign values based on log2(elo/120) and each fraction is individually and separately satisfied. The reason why this measure should be linear and not Elo itself is because Elo uses exponent to find a person's value.

+0 / -0

CrazyEddie

9 years ago

Seriously, guys. Go read up on TrueSkill. Then come back.

+0 / -0

Sprung

9 years ago

TrueSkill would solve the problem, sure. But there must have been some design reason why we chose to use Elo instead of TrueSkill and with that assumption in mind I'm interested in fixing our use of Elo instead.

+0 / -0

CrazyEddie

9 years ago

Only Licho knows for sure, but my guess is:

a) TrueSkill was not publicly documented at the time,
b) Elo was, and was fairly well understood by many people,
c) What we're doing now for teams is (relatively speaking) simple and yet good enough
d) What we're doing now for teams isn't radically different from TrueSkill anyway

+0 / -0

Brackman

9 years ago

This is a really interesting idea!

Note that what you call 120 is actually 400/log_2(10) and thus 2^(1/120) actually 10^(1/400) =: B.

quote:
The values would be between 2^10 (ELO 1200) and 2^20 (ELO 2400). Floating points shouldn't have any problem with this. Make sure to add them up in ascending order, to prevent floating point inaccuracies.

Yes. And the scale can be multiplied with any factor, so that we can get numbers rather like

Sprung proposed in his 1st post. [Spoiler]I will use a general factor B^-eloShift to get playerstrength=1 for elo=eloShift. eloShift=0 is easiest to calculate, eloShift=1500 is most elegant (because it compensates the non-elegance of 1500), eloShift=1380 is used in

Sprung 's 1st example. Thus

playerstrength = B^(elo-eloShift).

teamstrength is the average of the team's players' playerstrength.
Actually the only thing that changes is that team elo is no longer the normal average, but

team elo = log_B(teamstrength)+eloShift.

My observation with concrete values in comparison to the current system:
For teams with no elo deviation it results in the same. The higher a team's elo deviation, the better its teamstrength.

I have proven that using this new kind of team elo in the current elo calculation for 2 teams of any size results in win probabilitites proportional to teamstrengths. Because I have recently calculated the correct solution for ZK's team FFA elo (currently a wrong one is used), I wondered if teamstrengths are still proportional to win probabilities that were calculated by inserting this new team elo in the FFA solution. This would show that "playerstrength" is really fundamental. Unfortunately it didn't hold true, even though both alternatives seem valid.

TheEloIsALie indicated that the transformation could be arbitrary. So let's assume a general function f with inversion g, where

playerstrength = g(elo)
teamstrength = average of team's playerstrength
team elo = f(teamstrength).

Then we have the possibility to use either team elo for a probability calculation as usual or probabilities proportional to teamstrength. I have shown that the equivalence of both even for team FFA is equivalent to

2/(N(N-1)) sum from k=2 to N (1/1+B^(f(teamstrength_k) - f(teamstrength_1))) = teamstrength_1 / (sum from k=1 to N (teamstrength_k)).

What is f? Does such an f even exist?
[Spoiler]

+0 / -0

Forum index > General discussion >