Balancer broken?

Page of 3 (41 records)

sort

TheEloIsALie

9 years ago
(edited 9 years ago)

So... are you saying each player should have an elo value for the pairing with each other player (in the extreme case)? So for 1000 players there would be 1 million values?

This system might have some benefits for a small player base like ours where the same people are playing together constantly. But if the only balance information you're using is "how much did player x win with player y" the system falls apart as soon as x and y don't see each other more than a handful of times.
In other words: "Training" the system (as in "finding everybody's elo") will take a massive amount of games (well, unless we keep at it with 12v12). Not to mention that you're giving up on some measure of "general competence" (what elo currently represents) in favor of measuring only the player synergy.

The idea is cool and might actually have potential in the current scenario, but is that what we want to settle on?

+0 / -0

Shaman

9 years ago

My idea was to incorporate (though I don't know the first thing of how to do it) synergy into the formula. Obviously strong synergy (such as found between sprung and brofessional) will have some significant impact on the outcome AND also the *enjoyability factor* (<-- MEGA ULTRA IMPORTANT) of a game while the inverse will also be true.

I don't think that general competence is enough to produce well-balanced and enjoyable games. Synergy has some part in the formula (perhaps as a favorable weight for two players?) but maybe using it as the entire formula is a bad idea for the reasons you described. I do think that general competence is a major part of it, but we should also find a way to incorperate synergy into the equation as well. Clans do work somewhat but there are some people who work really well together in two different clans.

Idea: maybe synergy could be some sort of weight that starts off at 1 and climbs/falls to some maximum influence and works as a modifier to the average general competency of two players?

Ideally the balancer should take synergy into consideration (and even reward it by placing players who work together well into the same team!) to not only produce better balances, but also more rewarding games for both sides. If one of the math wizards could find a way to do this, that'd be great.

+0 / -0

Brackman

9 years ago
(edited 9 years ago)

My 2^8-th post: If I wanted to consider synergy, I would not store values for every pair of players, but give every player an elo for his cooperation with low elo players and one for his cooperation with high elo players and then always use a weighted average of both, where the weigthings depend on the average elo of the rest of the team.[Spoiler]

 player's high elo cooperation elo weighting = chance that the rest of the team wins vs elo 1500 team = 1/(10^((1500-rest of team's elo avg)/400))
 player's low elo cooperation elo weighting = chance that the rest of the team looses vs elo 1500 team = 1/(10^((rest of team's elo avg-1500)/400))

But actually I think that this is too complicated for little use. What I would really do is giving every player a small teams elo (maybe including 1v1) and a big teams elo and then always use a weighted average of both depending on team size.[Spoiler]

 player's small team elo weighting = 2/number of all players in the game
 player's big team elo weighting = 1-player's small team elo weighting

TheEloIsALie is right that my rather new formulas in this thread don't consider elo deviation and don't make a difference as long as elo difference is 0, but also quoted my old formula that does consider it. Probably minimizing deviation is just too complicated. I guess the computational expensiveness is rather due to going through all player compositions than calculating the objective function. It can be drastically reduced by using a Markov chain (a powerfull method to calculate NP-problems efficiently, maybe sacrificing too much accuracy for this relatively simple problem. It only makes sense for very high player numbers, otherwise use bruteforce): [Spoiler]

n players have to be distributed to N teams. Go through all teams again and again until all players are distributed and always do the following: If there are more than N players left to distribute, assign the best and the worst of the players not distributed yet to the current team. If there are less players left, assign any of them to the current team. If there are no players left, use this as start configuration and calculate its objective function value "E" (my formula that

TheEloIsALie quoted). Always swap a random pair of players (="test configuration") between different teams and then calculate the new objective function by using this:

quote:
If you only ever swap out two players when mutating your team composition, then you only need to subtract their elo difference from one team and add it to the other. (...) With variance/standard deviation, the change in the mean means (no pun intended) that you definitely need to recalculate the entire function.

I tried to apply this principle to deviation calculation by using the average of simple differences to mean instead of variance. (Let's call this "1-deviation" and std deviation "2-deviation". p-deviation would be the p-norm of the difference of a team's elo vector with the vector that has the avg team elo in all its components). With 1-deviation you could use this principle by storing an additional vector with 1, -1 or in some cases more special components, but you could also calculate 1-deviation the normal way or just use 2-deviation. Anyway, if the test configuration is better than the start configuration (which means lower E), use it as new start configuration. Otherwise only do so with a probability of

 exp(constant*(old E - new E)).

(This "tunnel effect" is needed in thermodynamical annealing to get out of local minima and find global equilibrium.) Do that until E is smaller or = a certain limit. My formula for E is maybe not correctly generalized for team FFA, because I didn't know the correct team FFA solution at that time (which is sadly still not implemented!). I have no idea at which player number this will be profitable. For this it would be usefull to know if really all configurations are calculated currently or only a portion until the elo difference is small enough.

There is a much newer system that does consider deviation without an additional formalism, the "teamstrength system".

Sprung had the initial idea and I formalized it. It is currently my favourite system. But it gives teams with high elo deviation better win chances. Balancing with it does not increase nor decrease deviations! There are 2 ways to apply this system to team FFA. The easier, but more different from the current system, makes it more difficult to consider deviation. The principles of teamstrength, minimizing deviation, using Markov chains for high player numbers and making win chances more distinct in dependence of player number can all be combined to create an ultimative system.[Spoiler]

+2 / -0

Brackman

9 years ago

I have some further ideas:
1. Maybe there is only so high deviation in big teams currently, because the balancer goes through those constellations first and then stops, because elo difference is close enough to 0. Then using Markov chains with current system would already reduce computation time and deviation drastically.

2. When using Markov chains with deviation, the first steps could be done without deviation to make full use of

TheEloIsALie 's simple substitution principle and save computation time. The next steps could be done with 1-deviation and then with 2-deviation.

3. If practical tests reveal that Markov chains only become profitable at unrealistically high player numbers, but the player number is still high enough that brute force is too expensive, only bruteforce constellations that are similar to the above constructed start constellation. For low player numbers, bruteforce all. Among those who are best without deviation calculate E with deviation and choose the best. This would already reduce deviation and computation time a lot.[Spoiler]

+0 / -0

Brackman

9 years ago

Now I have found out that my 2 years old team FFA generalization of the term to be minimized for balancing that was quoted by

TheEloIsALie does not correctly correspond to my correct team FFA prediction. The formula does try to give every of the N teams a win chance of 1/N, but calculates the difference from it in a strange way, because it uses elo differences instead of probability differences, which makes a difference for team FFA. If p_k is team number k's win chance according to my team FFA prediction, Var_k team k's elo variance and teamsize_k the number of players in team k, then the term to be minimized for balance is

 E = sum from k=1 to N (teamsize_k*(1/N - p_k)² + c*Var_k)

[Spoiler]c is how high variance minimization is weighted against win chance equalization and should be around 1/10000 (or lower). Note that even with c=0 this formula is still better than current balancing, because it considers team FFA.
[Spoiler]
Things I still don't know:
- Are all team constellations tested currently or only until elo difference is small enough? Or more generally: Where is the code for !balance?
- Should the sum of team variances or the sum of variance differences of pairs of teams or no variance at all be minimized?

+0 / -0

Sprung

9 years ago

The balancer code is here. Notes:

it checks all combinations
disregard all the stdDev comments, it uses plain avgElo difference; for FFA, it only considers the two extreme teams
max team size is ceil(people/teams)
a "balance item" is a group of people which holds {sumElo, peopleCount}; normally it's {personalElo, 1} per person but a clan is treated as a singular group under clan balance

+1 / -0

TheEloIsALie

9 years ago
(edited 9 years ago)

Well, what do you do when you want to solve an NP problem, your objective function is slow and your search space too large? You throw heuristics at it.

Genetic algorithms:
[Spoiler]

I could point you to a GA implementation I contributed to a different open source project (it's technically under MIT license though), but in general genetic algorithms are pretty solid. You only need to provide a function that takes a bitstring and returns a score value.

The open question is how to best map bitstrings to matchups (assuming we agree on one of the suggested scoring functions for team matchups,

Brackman provided these already though). This is somewhat complicated by the fact that team sizes aren't necessarily fixed.

A naive solution would be to use bitstrings of length N*M for N players and M teams, where bit x being set means that player number x % N is placed in team floor(x / N). Obviously, this would generate lots and lots of invalid solutions (since a player can only be in one team at a time).

Given a way to enumerate all team combinations, it is alternatively also possible to interpret the entire bitstring as the number p of the team combination (e.g. 1100 = 12 -> 12th combination). Since team sizes are somewhat variable, the easiest mapping I can imagine here is the following:
1. Convert the number p to base M (let's call it p_M).
2. The kth digit of p_M is the team number of player k.

Consequently, you'd end up with M^N possible numbers and would therefore have to use bitstrings of length lg2(M) * N.

This would perform worst for high M (in relation to N), which is many-team FFA. It might be necessary to adopt a different mapping function (since leaving teams empty is not allowed*, which is harder to incorporate into the enumeration scheme above) for FFAs.
Further, this approach may be problematic because changing a single bit in the bitstring will end up producing completely different matchups when M is not a power of two, making the search space (from the perspective of the algorithm) a lot more pathological.

*I'm pretty sure that cases can be constructed where the best solution (from a mathematical standpoint) is one where teams are left empty. Is this actually to be forbidden?

Simulated annealing:
[Spoiler]

+3 / -0

Aquanim

9 years ago

Is the objective function slow?

Is the problem being exponential in difficulty even relevant for the values of N we see in practice?

+1 / -0

Sprung

9 years ago

Current brute force is good.

In larger games it starts to become slow enough for the hard cap to kick in but that's not a problem. Ratings are very vague at that density anyway and an unranked room doesn't need perfect balance.

+1 / -0

TheEloIsALie

9 years ago
(edited 9 years ago)

I did forget to mention that the brute force is going to be less of a hassle for smaller games where the amount of combinations is still manageable.

quote:
Is the objective function slow?

Slow enough that the current (much easier one) runs into its limits for brute forcing higher player numbers, as

Sprung said.

quote:
In larger games [the objective function] starts to become slow enough for the hard cap to kick in but that's not a problem.

Are you talking about the current one or the proposed one? There's a difference of about one to two orders of magnitude in flops between them... Again, keep in mind that currently you have a low fixed cost per combination tried, whereas the new formula takes longer the more players there are...

+0 / -0

Aquanim

9 years ago
(edited 9 years ago)

I am of the opinion that improving how the algorithm chooses permutations to check the balance of is a far more productive line of optimisation than improving how fast each permutation is checked, which is I suppose what you have said.

Something as sophisticated as GAs or simulated annealing seems like overkill but I guess it would get the job done.

EDIT - I believe the following algorithm will yield a good starting point:

1) Order the players by rating.
2) Pair the players off (#1 with #2, #3 with #4, #5 with #6, etc.)
3) Order the pairs by the difference in their rating.
4) Take the pair with the highest difference in rating and allocate them to separate teams.
5) Take the pair with the next highest difference in rating and allocate them to separate teams such that the difference in the current teams' average skill is minimised.
6) Return to (5) until all pairs are allocated.

This solution will typically not be too bad (since we use the highest-difference pairs to offset one another first, they cannot ambush us later) and has the desirable property that no team is overloaded with players of high or low skill (since each consecutive pair is allocated to a different team). Searching the neighbourhood of that point by swapping players according to some heuristic (which I think need not be complicated, preferring to make small swaps in the correct direction might be enough) ought to end up with a reasonable solution in any reasonable situation, assuming the above algorithm hasn't already found one.

Happily, (based on a bit of test code) I think this algorithm will tend to perform best for large numbers of players - the larger the game, the less swapping will be necessary to end up with an adequate solution.

Notes:
[Spoiler]

I ran my proof-of-concept code on the example game in the OP and ended up with the following teams:

1988 1868
1852 1774
1681 1673
1665 1623
1587 1588
1494 1580
1392 1492
1315 1332
1236 1293
1202 1184

The difference in the average ELOs of the two teams is 0.25. Swapping the 1492 with the 1494 (an insignificant change in terms of the distribution of elo within the teams) reduces this to 0.05. The latter solution has the same average-ELO difference as the one in the OP, but presumably the brute force algorithm found the "bad" one first (20-choose-10 is still a fair bit less than two million I believe).

(Another note: the problem in the OP was partially caused by Sprung and Brofessional being the two highest-rate players and in the same clan.)

+2 / -0

Rafalpluk

9 years ago

Aquanim your algorithm (or any algoritm that assumes pairs) wont work well in certain cases. For example:
1900 1600 1600 1600 1500 1500 1500 1500

+0 / -0

Aquanim

9 years ago
(edited 9 years ago)

Indeed it wouldn't. Cases like that are what the swapping afterwards if necessary is for. It particularly tends to happen when one pair has a large difference and all the others have a very small one.

(Also, most problems small enough to have a decent likelihood of being difficult in that way are also small enough to brute force. Problems large enough that a heuristic are necessary are very, very unlikely to be that bad. Furthermore, that looks like a pretty bad game to play no matter how you balance it.)

+0 / -0

aeonios

9 years ago

Why is cbalance still the default in public rooms? gtfo.

+1 / -0

CrazyEddie

9 years ago

This seems like a good place to point out that there are three different issues that come up in these balance discussions. Often a discussion starts about one but then turns to another, and the original point gets lost.

How to search for a balanced match
How to identify a balanced match
How to select a balanced match

Those sound similar, but are different. The first is about solving the NP-complete problem of iterating through the space of all possible matches. The second is about determining the win likelihood of any given match. The third is about choosing the best match out of all candidate matches given not only the win likelihood but also potentially other factors, such as perceived fairness.

The Elo calculation and modification methods are yet another discussion, although they're related to the second point above.

Aquanim,

Sprung, and

TheEloIsALie have just now been talking about the first problem. I don't think it's a particularly important problem to solve, but certainly don't object to anyone who wants to code a different approach. My view is that any search algorithm which probably maybe comes close to finding something that might be in the neighborhood of optimum is going to be good enough for our purposes.

Brackman and others usually focus on the second problem. Right now the balancer is simplistic, and it assumes the win likelihood is determined by the average Elo of the two teams (I think, right?). Brackman has proposed a few different ways to calculate the win likelihood. He says that they would produce a better-performing !predict result than the current method, and he says he has the numbers to back him up. I'll take his word for it, but unless it makes much better predictions than the current method (meaning that his methods identify a way in which the current balancer systematically creates teams where one side has an identifiable and substantial advantage that is greater than the current balancer's predicted advantage), I wouldn't spend a lot of time implementing it.

I think the real issue that is worthy of discussion - focused, on-topic discussion - is what

dahn raised in his original post, echoed by @Rafal[ZK],

DeinFreund, and many others elsewhere on the forum. That issue is this: even assuming that !predict is correct and that the teams as matched by the balancer both have close-to-equal chances to win, the balancer sometimes produces matches which seem unfair, where "unfair" here means not merely a lopsided chance to win, but perhaps more importantly a lopsided chance to enjoy the game due to one side having a much higher number of much lower-skilled players.

Players have complained about this time and time again in the forum. And not just the trolls who like to complain about playing with newbies the way that other people like to drink beer. Reasonable and respectful players have complained. And they're not just complaining about playing with newbies, they're complaining about the unfairness of the matches. We should be taking these complaints seriously. However, in the past, it's always been met with discussions about how the prediction algorithm shows that the teams are equally likely to win (which is not the issue!) and challenges to provide a better prediction algorithm to run against historical data (which is a correct response to claims that the prediction is off, but that's not the issue!).

Only on a few occasions has anyone suggested that perhaps some matches would be better off with a somewhat less balanced predicted win probability in exchange for an apparently-better distribution of low-skilled players so as to not constantly create unenjoyable games where the high-skill players are all on one side along with all the low-skilled players, effectively playing 1v2 against the medium-skilled players.

I think this point deserves a lot more discussion than it's gotten so far.

+1 / -0

Skasi

9 years ago

quote:
Why is cbalance still the default in public rooms? gtfo.

Why wouldn't it? There's currently no other way for players who want to play together in the public room to do so reliably, unless you count !forcestart which can easily lead to ruined games.

+1 / -0

TheEloIsALie

9 years ago
(edited 9 years ago)

quote:
Only on a few occasions has anyone suggested that perhaps some matches would be better off with a somewhat less balanced predicted win probability in exchange for an apparently-better distribution of low-skilled players so as to not constantly create unenjoyable games where the high-skill players are all on one side along with all the low-skilled players, effectively playing 1v2 against the medium-skilled players.

I think this point deserves a lot more discussion than it's gotten so far.

That's actually what I've been talking about in the entire first page of the thread... The need to introduce a tradeoff (= new formula) between elo difference between teams and intra-team-elo-variance.
Then we diverged on how

Brackman's formula (or any such formula, really) makes the brute forcing a lot slower, and then we got to common optimization techniques and

Aquanim's algorithm.

About that:
Yes, a decent starting guess and some sort of short hillclimbing (i.e. try swapping random players, accept it if the solution got better, stop if it didn't succeed x times in a row) should work pretty well. But the random swapping might increase elo variance again, potentially to the point where it's back to the old problematic cases.

PS: I decided to stop caring about the apostrophes breaking player links. Blame the one who broke it...

+1 / -0

Aquanim

9 years ago
(edited 9 years ago)

quote:
But the random swapping might increase elo variance again, potentially to the point where it's back to the old problematic cases.

I tried a very simple descent algorithm (take the smallest step in the right direction possible, stop if there is no such step) on random data (continuous distribution, might not represent real elo data I suppose) and found that for medium-to-large problems (where the original solution isn't good enough, which is itself rare for large problems) it converges to a good-enough solution quite rapidly, without disturbing the elo variance much.

In general this depends on how much random swapping one permits oneself.

quote:
That issue is this: even assuming that !predict is correct and that the teams as matched by the balancer both have close-to-equal chances to win, the balancer sometimes produces matches which seem unfair, where "unfair" here means not merely a lopsided chance to win, but perhaps more importantly a lopsided chance to enjoy the game due to one side having a much higher number of much lower-skilled players.

I don't know that there's even a discussion to be had here. I am not sure that one-team-has-high-elo-variance games are imbalanced (and if they were I don't know which way) but I am sure that they are unfun and to be avoided where possible. What else is there to say?

The difficult part is how to accomplish that without compromising anything else.

+0 / -0

Brackman

9 years ago
(edited 9 years ago)

Indeed there are those 3 issues

CrazyEddie explained.

Imo the question whether elo variance should also be minimized is not answered yet. It's a part of the 3rd question. The elo variance minimization weighting in the balance function to be minimized would be so low that it usually doesn't make a difference for small games and good balance is still possible in big games. The higher calculation time could be compensated by a better way to iterate through combinations. But I think other issues are more important now than variance consideration.

For small and medium games brute force should still be used. The 2 million limit will not even be reached for 10v10. Also note that the current recursion tests if all teams are smaller or = maxTeamSize. Thus it only really calculates less than n^N combinations, where n is the number of all players and N the number of all teams. Afaict the number of combinations checked by the current recursion is

(N choose X) * (product from k=0 to N-X-1 ((n-k*t) choose t)) * (product from k=0 to X-1 ((n-(N-X)*t-k*T) choose T)),
where
X := n mod N
t := floor(n/N)
T := ceil(n/N).

(!cbalance checks less if there are ppl from the same clan.) For N=2 this is n choose floor(n/2) for even teams and 2 times that for uneven teams. The theoretical minimum for all combinations to be checked is a factor N! less. I know an easy way how to reduce it at least by a factor N: The first player is always put in the first team.

Aquanim your calculation of a start combination seems to be better than mine. Your steps are not enough, though. There must be a small probability to make a bad step ("tunnel effect") which increases the function E that should be minimized. Otherwise you will probably get stuck in local minima and never find global minimum. Always propose a random step. Accept it if it's better. Otherwise only do so with a probability of

  exp(constant*(old E - new E)).

Stop if E is smaller than a certain limit. Or if you have found good combinations also check if they are also good in terms of variance. Your system can also be applied on playerstrength and teamstrength.

Imo the most important thing to do now is considering FFA, probably with a teamstrength system.

+0 / -0

Aquanim

9 years ago
(edited 9 years ago)

quote:
There must be a small probability to make a bad step ("tunnel effect") which increases the function E that should be minimized. Otherwise you will probably get stuck in local minima and never find global minimum.

Yeah, the descent algorithm was more for the purposes of proof-of-concept than for being a finished product.

A question to ask is whether a global minimum (in terms of the objective of elo balance) is even the most desirable outcome. In a 10v10 game a difference of 1 or 2 in average elo will make very little difference, but if the 2-difference game is closer to the original point (and therefore probably the elo distributions on each team are more similar) it might be a preferable game to play.

+1 / -0

Page of 3 (41 records)

Forum index > General discussion > Help and bugs >