math bork 2nd try

sort

Brackman

9 years ago
(edited 9 years ago)

Sorry

Sprung that I annihilated your thread. Your idea is really interesting!

Note that what you call 120 is actually 400/log_2(10) and thus 2^(1/120) actually 10^(1/400) =: B.

quote:
The values would be between 2^10 (ELO 1200) and 2^20 (ELO 2400). Floating points shouldn't have any problem with this. Make sure to add them up in ascending order, to prevent floating point inaccuracies.

Yes. And the scale can be multiplied with any factor, so that we can get numbers rather like

Sprung proposed in his 1st post. [Spoiler]I will use a general factor B^-eloShift to get playerstrength=1 for elo=eloShift. eloShift=0 is easiest to calculate, eloShift=1500 is most elegant (because it compensates the non-elegance of 1500), eloShift=1380 is used in

Sprung 's 1st example. Thus

 playerstrength = B^(elo-eloShift).

teamstrength is the average of the team's players' playerstrength. Actually the only thing that changes is that team elo is no longer the normal average, but

 team elo = log_B(teamstrength)+eloShift.

My observation with concrete values in comparison to the current system: For teams with no elo deviation it results in the same. The higher a team's elo deviation, the better its teamstrength.

I have proven that using this new kind of team elo in the current elo calculation for 2 teams of any size results in win probabilitites proportional to teamstrengths. Because I have recently calculated the correct solution for ZK's team FFA elo (currently a wrong one is used), I wondered if teamstrengths are still proportional to win probabilities that were calculated by inserting this new team elo in the FFA solution. This would show that "playerstrength" is really fundamental. Unfortunately it didn't hold true, even though both alternatives seem valid.

TheEloIsALie indicated that the transformation could be arbitrary. So let's assume a general function f with inversion g, where

 playerstrength = g(elo)
 teamstrength = average of team's playerstrength
 team elo = f(teamstrength).

[Spoiler]
Then we have the possibility to use either team elo for a probability calculation as usual or probabilities proportional to teamstrength. I have shown that the equivalence of both even for team FFA with N teams is equivalent to

 2/(N(N-1)) sum from k=2 to N (1/(1+B^(f(teamstrength_k) - f(teamstrength_1)))) = teamstrength_1 / (sum from k=1 to N (teamstrength_k)).

What is f? Does such an f even exist?
[Spoiler]

+2 / -0

Qrow

9 years ago

Ill pretend I understand all of that

+4 / -0

Brackman

9 years ago

Well you could need the context of the original thread, which you can read here :D.

+0 / -0

Brackman

9 years ago
(edited 9 years ago)

In the end we have the following types of systems (apart from probability system, TrueSkill and ANN):

Team Elo System given by a function g to transform to playerstrength and h for the team size dependency:

 playerstrength = g(elo)
 teamstrength = (sum of team's playerstrength)*h(n)
 team elo = f(teamstrength)

n is the number of players in a team (can be different for any team) and N the number of teams. Win probabilities are calculated with my team FFA generalization of the elo system.

The current system uses g(elo)=elo and h(n)=1/n (but a wrong FFA calculation). Systems that I developed earlier use h(n)=1/sqrt(n) or h(n)=0.5+0.5^n.

Sprung 's system uses g(elo)=B^(elo-eloShift) and h(n)=1/n.

Teamstrength System given by the same but without the calculation of team elo. Win probabilities are distributed proportional to teamstrengths. Shifts in the elo scale should not change the result. From this invariance g(elo)=B^(elo-eloShift) can already be concluded and from this follows the equivalence to the team elo system for N=2 but not for more teams. This is still true for any h.

So g(elo)=B^(elo-eloShift) with h(n)=1/sqrt(n) should be an interesting system. I don't know which of both probability calculations should be used for N>2.

+0 / -0

Sprung

9 years ago
(edited 9 years ago)

quote:
I wondered if teamstrengths are still proportional to win probabilities that were calculated by inserting this new team elo in the FFA solution. This would show that "playerstrength" is really fundamental. Unfortunately it didn't hold true
(...)
FFA solution with team elo from teamstrength: 1: 31.9%, 2: 22.6%, 3: 45.5%
proportional to teamstrength: 1: 29.0%, 2: 19.5%, 3: 51.5%

Here's a reasoning which is contradictory to the above but I'm not sure where the error is:
1) player strength can be summed linearly.
2) if two teams cooperate perfectly, they might as well be treated as one team with summed strength (this is how individual players are summed in a team).
3) under the two-team system, team 3 defeats a combined 1+2 team with 51.5% chance, since with two teams the win chance is proportional to strength.
4) if teams 1 and 2 don't cooperate perfectly, team 3's win chance gets higher, ergo the 51.5% is a minimum (with perfect cooperation between 1 and 2)
5) the Elo solution gives less than 51.5%, so Elo is wrong.

+0 / -0

Brackman

9 years ago
(edited 9 years ago)

Your argumentation would be correct in a normal strategy game, where teamstrength is the sum of playerstrengths. In ZK however there are extra coms. Thus teamstrength is the average of playerstrengths. 51.5% for team 3 is for no cooperation between team 1 and 2. If they cooperate, team 3's win chance will be higher (68.0%) because of extra coms.
For h=1/(number of the teams' coms) all cases are considered here.

[Spoiler]
Furthermore you could argue that probabilities should be proportional to squares of teamstrengths to better reflect team cooperations in FFA. But in order to leave at least 1v1 unaffected, you need to apply sqrt first :

 playerstrength = B^((elo-eloShift)/2)
 teamstrength = (sum of team's playerstrength)/n
 team elo = log_B(teamstrength^2)+eloShift

Instead of squares you can use a more general convex function phi (a strictly monotonic increasing diffeomorphism) with inversion theta, which is concave:

 playerstrength = theta(g(elo))
 teamstrength = (sum of team's playerstrength)*h(n)
 team elo = f(phi(teamstrength))

Then you can calculate probabilities either with team elo or proportional to phi(teamstrength). As g is convex and theta concave, the better rating of teams with higher elo deviation is partially compensated.

+0 / -0

Anarchid

9 years ago
(edited 9 years ago)

quote:
In ZK however there are extra coms. Thus teamstrength is the average of playerstrengths.

Extra coms are more complex than this. Basically this means that both teams start with the same amount of resources, but their strength is still still not exactly equal.

Namely, once game proceeds past five minutes, in case of a 2v3 the smaller team will (usually) have less APM per unit, so their forces will be controlled less perfectly.

+0 / -0

Brackman

9 years ago

Yes indeed. Still averages are better than sums here. So in a 2v3, the players in the smaller team would be weighted 1/2 and in the bigger one 1/3 times. Generally the players of a team with n players could be weighted v*(1/n-1)+1 times, where v=0 would be the sum and v=1 the average. So what value do you suggest for the "extra com advantage" v? My evaluation of game outcomes with current balancer (v=1) has shown that the bigger team wins exactly as often as the smaller one, but maybe that's only because of self-balancing prophecy.

In earlier theories I also considered that extra com players have a higher influence in their team with an "extra com weighting" w (from 0 to 1), so that a player's weighting is 1+w*(number of extra coms). But since coms' incomes are distributed equally, a smaller w is appropriate now. (Initial BP is still unequal.) Generally you could just multiply both weightings.

+0 / -0

Sprung

9 years ago
(edited 9 years ago)

TL;DR: teamstrength as a fundamental.

I spent like 5 minutes thinking about this so this is probably full of logic holes but:

Brackman 's system of calculating FFA win chance of any given team is

(sum of probabilities of win against each enemy team) * 2 /(N * (N-1))

I claim this is wrong if we use the pairwise values Elo gives us. FFAs do not work that way - everyone interacts with everyone else simultaneously, it's not pairwise. You can construct an counterexample, too: check out the chance of a guy with X elo as X approaches infinity, against two people both with some constant C elo. The pairwise winchance vectors are

{-> 1, -> 1}
{0.5, -> 0}
{0.5, -> 0}

so (sum * 2 / (N * (N-1)) approaches {1/6, 1/6, 2/3} respectively. This literally gives you a 1/3 chance to lose against two Null AIs!

Teamstrength - used as a fundamental, not just as a way of averaging Elo - doesn't have this problem. If teams' strengths are 1, 2 and 4 then their chances to win are in proportion 1:2:4 (ie. 1/7, 2/7 and 4/7).

Now, the basic Elo requirements. The Elo change is (1-winchance)*K for a win and winchance*K for a lose (K is the Elo K-factor, ie. some scaling constant).
Here's an example of Elo gains teams of winstrength 1, 2 and 4 (sans K-factor):

team	win	lose
1	+6	-1
2	+5	-2
4	+3	-4

Win chances are 1/7, 2/7 and 4/7 -> sum = 1.
Expected change for team 1 is (1/7 * +6) - (6/7 * -1) == 0.
Expected change for team 2 is (2/7 * +5) - (5/7 * -2) == 0.
Expected change for team 4 is (4/7 * +3) - (3/7 * -4) == 0.
If team 1 wins, it gets +6 while teams 2/4 get -2/-4. Sum = 0.
If team 2 wins, it gets +5 while teams 1/4 get -1/-4. Sum = 0.
If team 4 wins, it gets +3 while teams 1/2 get -1/-2. Sum = 0.

It's probably not hard to generalize but hopefully this illustrates the idea.

Effectively this reduces Elo to a wrapper around teamstrength meant to calculate change after game, and to show people some number that doesn't make them feel like crap (if you're a newb, Elo 1200 vs 2000 sounds okayish, but teamstrength 1 vs 100 doesn't -- of course in a teamstrength-based system the Elo values would become less extreme but e^x always looks more brutal than just x).

+3 / -0

TheEloIsALie

9 years ago
(edited 9 years ago)

quote:
This literally gives you a 1/3 chance to lose against two Null AIs!

That's a very good observation and quite solid proof that the proposed formula doesn't work.

The main problem is that any team balancing algorithm needs to make assumptions about how team composition affects team strength/elo. This is basically the choice of f and g in the OP, although it would be easier to define it as a function p(elo vector of team) that produces team strength/elo, which is essentially a norm, like the p-norm

Brackman mentioned (see again the OP, or glimpse over this <- just look at pictures and formulas).

Your example with the teamstrength change looks correct (although you did mess up the signs for the expected outcomes), but the question still stands how to map/combine individual strength/elo to team strength/elo.

Some observations:
- Team strength (as used by you) must be in the interval (0; ∞]. It cannot be < 0 (negative probability?), but it can be arbitrarily large, because for every team with strength p1, one can imagine a stronger team that will beat the first one 2 out of 3 times and thus must have a strength p2 = 2*p1. It could theoretically be zero, but not in practical applications.

- According to

Brackman p needs to (essentially) satisfy the triangle inequality to fit the known data: Adding a "variation" elo vector (like [-300; 100; 200]) to a team of equally strong players (like [1500, 1500, 1500]) increases their strength (so p([1200; 1600; 1700]) > p([1500; 1500; 1500])). (Note that this is not inherent to the math used! It would be possible that serious anti-synergies in teams with high elo variance would cause the opposite).

- If all elo values increased by the same flat amount, the ratios of the team strengths would need to stay the same. This pretty much implies an exponential approach (which conveniently also satisfies the first point).

- It is still unclear how different team sizes (= vector lengths) should affect p. Is one player with 5 coms stronger than 5 players with one com each (all of equal elo)? I reckon this largely depends on the players, and it's conceivable that for high numbers of extra coms, multiple players get an edge through micro advantage, but it's hard to judge how the force concentration (and increased coordination) balance that out in the general case.

- All of this has many parallels to the 1v1 probabilities, because that's essentially just a special case of the above.

+1 / -0

Brackman

9 years ago
(edited 9 years ago)

Indeed the teamstrength system is currently my favourite system as I said here recently. In my above posts I explained that there are 2 possibilities to generalize teamstrength for FFA (the matrix approach and the teamstrength proportional one) and that I don't know which of them is correct. It seems like

Sprung has figured it out now. The elo->infinity argument makes it fairly convincing.

Now I'm happy that I didn't find the time to implement the matrix solution for the current system (even though this would have been much better than the current implementation and the teamstrength system needs further consideration, because it makes a difference even for 2 teams). The teamstrength proportional system will also be easier to implement.

 playerstrength = g(elo)
 teamstrength = (sum of team's playerstrength)*h(n)

Should teams with higher elo deviation really get a higher win probability prediction? Note that this is a consequence of g being convex and if we only assume teamstrength proportional probabilities with any g and the invariance to shifts in the elo scale, we already get g(elo)=B^(elo-eloShift), which is convex. From the condition "no change for 1v1" we can conclude B=10^(1/400). Btw the condition that all players' elo average must be 1500 is equivalent to the product of all playerstrengths being 1 only if eloShift=1500. [Spoiler]
The teamstrength function does not really fulfill the conditions of a norm. The effect of different team sizes is indeed the biggest remaining question now, though. This is described by the function h. In the easiest case we use h=1/teamsize. Additionally we can multiply factors to consider extra com advantage and extra com player weighting as described here, but I think this would only make it unnecessarily complicated, because those factors would be nearly 1 anyway. What I would really try is using h=sqrt(average size of all teams)/teamsize to make win chances more distinct for bigger teams like this. For example a 3 player team in a 2v3 would have h=sqrt(2.5)/3. [Spoiler]

Note that bigger teams are not rated better here, but only more distinct.

Furthermore I would delete the factor "Math.Sqrt(sumCount / 2.0)" in SpringBattle.cs, which is exaclty the factor I would multiply to h instead as explained in the 3rd paragraph of this post (where D(n)/n=h(n)).

Additionally I would fix 1v1 XP in SpringBattle.cs to use 1v1 elo instead of team elo.

In order to fix balance the function "public static double GetTeamsDifference(List<BalanceTeam> t)" in line 111 of Balancer.cs would have to give the following result:

  E = sum from k=1 to N (teamsize_k*(1/N - p_k)² + c*Var_k)

, where N is the number of teams, p_k team k's teamstrength proportional win probability and Var_k its elo variance, maybe even with c=0.

Finally I would give every player a small teams elo (maybe including 1v1) and a big teams elo and then always use a weighted average of both depending on team size.

  player's small team elo weighting = 2/number of all players in the game
 player's big team elo weighting = 1-player's small team elo weighting

+0 / -0

Sprung

9 years ago

I'd handle the uneven teams case by not giving an extra comm and just equalizing the strength. Extra comm is a bad way of handling inequal teams (some person suddenly has a different start state without announcement, getting twice the BP but not income to compensate, and no ability to place them separately). It's not a large issue because the matchmaker won't be allowed to create such games anyway.

+0 / -0

Brackman

9 years ago

Probably you want to use h=1 then? I always found that the possibility of handling uneven teams is a big advantage of ZK. With your suggestion this is still true.
Let's look at an example 2v1 where an elo 1600 player plays against two players of equal elo. What elo must they have for balance according to teamstrength? With extra coms: 1600. Without extra coms: 1480 (generally the one player's elo - 400/log_2(10)). Maybe this is better.

My distinctivity factor sqrt(average size of all teams) in h doesn't change anything, because it is the same for all teams. Maybe this was the reason why I made h only dependent in the local team size n in the 1st place. When I added the dependency in average of all team sizes, I thought too much like in a normal elo system, where I have tested this factor succesfully on a data set of games, if not with sufficient test conditions. But as g is exponential, it must be

 
 teamstrength = ((sum of team's playerstrength)*h(n))^sqrt(average size of all teams)

if anything, where h(n)=1/n with extra coms and h=1 without. But this sqrt exponent would have to be tested. I'm not sure that it will still be beneficial with teamstrength.

+0 / -0

TheEloIsALie

9 years ago
(edited 9 years ago)

I'm sorry, but I still don't quite get where the sqrt(average size of all teams) is coming from. I can see why one would want to use some adjustment to adapt to elo-imbalances in games with high player numbers more, but why this exact term?

Just to gain an understanding:
For two teams of 4 players, that would yield an exponent of 2. Using exponentiation laws, this effectively means that the "simple" team elo (I mean the compounded strength without the sqrt() exponent, transformed back into elo via the normal formula) would be doubled for each team before arriving at the actual team strength, so it magnifies the elo differences between the teams.

In effect, your formula suggests that (for non-1v1) a higher elo/strength team is underpredicted (and a lower one is overpredicted) by the "simple" team strength calculation. Did you base this on your data examinations?

+0 / -0

Brackman

9 years ago
(edited 9 years ago)

Yes, exactly. It is based on the assumption that probability predictions should be more distinct with higher player numbers due to the law of large numbers. If no distinctivity factor is used probabilities don't change if you replace every player by 2 clones of himself.

Unfortunately I don't have a really good mathematical derivation why exactly sqrt. I just tried different functions on game data (back then without teamstrength) and this scored best. Higher or lower distinctivity was not as good. A reason could be that standard deviation is proportional to sqrt(number of samples)*mean.

On the other hand it is against mathematical simplicity. I really don't know whether it is good to use it together with teamstrength. I did reproduce all players' elo progresses in my tests, but what I didn't do was:
- Considering ZK's weighting system
- Testing it for at least 2000 games (only ~300 with and ~1200 without elo progress)
- Considering teamstrength

+0 / -0

TheEloIsALie

9 years ago
(edited 9 years ago)

quote:
It is based on the assumption that probability predictions should be more distinct with higher player numbers due to the law of large numbers.

I'd like to point out that h(n) = 1 also does that :P

+0 / -0

Brackman

9 years ago

h(n)=1 does that in a normal elo system. A normal elo system that doesn't make probabilities more distinct for higher player numbers uses h(n)=1/n.

In a teamstrength system, however, any dependency of h in the average size of all teams doesn't change anything, because the average size of all teams is the same for all teams. Thus h is a constant factor and only proportionality matters, not absolute differences. In a teamstrength system the only purpose of h is to manage the effect of different team sizes and extra coms. h=1/n should be used with extra coms (team strength is the average of playerstrengths), h=1 without (teamstrength is the sum of playerstrengths), where n is the respective team size, not the average of all team sizes. But probability would still not become more distinct if you replace every player with two copies of himself. As teamstrength is exponential it would have to be raised to the power of a function dependent in the average size of all teams like in the formula above, if distinctivity was really to be made more distinct for higher player numbers.

+0 / -0

Forum index > General discussion >