Does the balance in the lobpot work?

sort

manero

4 years ago

Most likely the answer is already in the question. But I still ask myself: what is wrong with the balance?
Is it an Elo calculation problem?
here are some examples:

B1122995 26 on Highway 95 v5

B1122837 19 on FolsomDamDeluxeV4
Where in addition to the difference in rank in a team there is also numerical superiority.
Is only a statistical mathematical calculation of the balance between the two teams made, without detecting the number of players, as in the case of the second game?
Or is the quality of the individual player considered instead and his relative strength based on his rank? Because it is not the same thing.
in my opinion we should:
1) consider the number of players first, (What's the point of starting a game when there is already an obvious numerical disadvantage? Not all players are capable of handling two fac and two comms)
2) the balanced division between the two teams of the strongest players
3) the balanced division between the two teams of players with medium strength
4) the balanced division between the two teams of the weakest players.
It can be done or already consider all this already?
Thank you

+2 / -0

dyth68

4 years ago

What would such an implementation look like in detail?
E.g. trying to equalize average ELOs of two teams?

+0 / -0

manero

4 years ago

Yes. I don't know how the program evaluates the balance, but given the errors I assume it does a mathematical average of the elo, trying to get as close as possible to 50% for both teams.
This is valid from the point of view of computers for AI, but not from the point of view of humans. I attended a game with Godde

B1122182 15 on FolsomDamDeluxeV4, against 4 blues and one less player, and clearly we lost. So you have to count the players per head, that is to prevent the start of games with unequal number of players, and furthermore the skills of a purple or blue player must be equally distributed, equating the purple ranks with the blues in the distribution of the troops, and So on in the intermediate ranks, as far as possible.
In summary you cannot put a player like godde, for example with 5 rank orange to balance against 4 blues and 2 reds, it cannot be the same thing.

+0 / -0

Dave[tB]

4 years ago

I'm not sure how the algorithm is calculated, but one fix might be adding variance as a parameter...ie, picking from a number of options with acceptable win % based on the lowest variance of skill within each team. Or perhaps the lowest difference in variance between teams.

+1 / -0

Brackman

4 years ago
(edited 4 years ago)

In the games in question, the shown win probabilities deviate strongly from 50%. Even though clan mates are put together, there should be enough team configurations with such high player numbers to balance the teams out. I think there are 2 possible reasons for the discrepancy:

1. After a game, the whole time-dependency of player ratings changes. If a player looses a game, WHR thinks that the player was already worse before the game. The win probabilities shown on the website are according to player ratings during the game but with the knowledge of the game outcome. Indeed, the loosing teams of the games in question have a smaller win probability. Probably, the balancer estimated win probabilities closer to 50% before the game happened. This effect is especially strong if players with very uncertain ratings participate in the game.

2. IIRC, teams are not balanced according to rating mean value as it should be, but according to "shown rating" which is an elo limitation fluff delayed version of mean value - x * standard deviation with x > 0. So players who don't play often are underestimated by the balancer.

Furthermore, there is the effect that one team tends to have the medium players while the other team has the best and the worst players. For small teams, it is obvious to me that this is the ideal solution. For big teams though, it might be a problem of the algorithm: If there are many players, there are so many possible team configurations that it is enough to check through only some of them to find good balance. Maybe the algorithm checks those with the worst or best players on one team first and then stops because win chances are already nearly 50%.

Considering unequal player numbers and skill deviations has been suggested before. AFAIK, the analysis so far showed that its effect on win probabilities is not that big, but more analysis could be done on that. For example, I have the theory that it actually does have an effect but it is compensated by distortions in the rating distribution. In any case, reducing deviation is nice to have additionally if player numbers are big enough to reach win probabilities close to 50% anyway.

+1 / -0

dyth68

4 years ago

manero : So specifically what function would you make?
Like, mathematically?

E.g. should the team with an extra player have a 200/[number of players in smaller team] ELO handicap for the purpose of balancing?
Should the algorithm try and minimize the highest square distance from the average ELO of all players in the lobby?

+1 / -0

strikeshadow

4 years ago
(edited 4 years ago)

When a players do not play as effectively as their rank indicates they can, then the game will be unbalanced unless it equally happens on the other team. In the games in question, I do believe Godde, for one, was trying tactics new to him and of course they are going to be less effective because he had less experience with them than normal tactics. He still did well, just not as well as his rank indicated he could do.

This happens all the time in the lobpot and people complain about it all the time. When high ranked players try cheese styles, then it impacts game balance even more. However, it's part of the joy of the lobpot so eh whatever.

+0 / -0

manero

4 years ago

dyth68: Yes that could also be a solution, assigning a handicap to the team that has one more player. The fact remains that mathematically the problem cannot be solved when, with the same number of players, the abilities of 4 blues players are not the same as those of 4 yellow or orange players, with all due respect for them.

+0 / -0

dyth68

4 years ago

Ok, if you think you have a better algorithm, test it against the existing data and see if it better predicts who wins/loses in team games. If it does then the current algo can be replaced.

Do you know how to do this? If not, someone in the zkdev channel can probably help.

+1 / -0

malric

4 years ago
(edited 4 years ago)

Based on past discussions in the forum, significant work went into trying and testing several options for balancing and rating algorithms. Everybody has ideas, but trying them and checking they (don't) work is very hard and time consuming. I really do hope that someone (you!) has a great idea, but that has to be proven on existing data.

The way I see it, ideally we would have a data set on which ideas can be easily tested (or is there such a thing already?). You could of course scrape the replay list, but that is even more work.

As an anecdote I think I remember as many games when people say "balance is shit, no chance of winning" and they win as the ones in which they say that and they loose.

quote:
Do you know how to do this?

One idea would be to use N games to predict the result of the N+1 game, and do that for all games. Then for the games where the algorithm under test gave different balance, check if the algorithm prediction is correct or not.

+0 / -0

dyth68

4 years ago

quote:
The way I see it, ideally we would have a data set on which ideas can be easily tested (or is there such a thing already?).

I believe

esainane has such a dataset. Ask him?

+0 / -0

Brackman

4 years ago
(edited 4 years ago)

There are already well established mathematical methods to evaluate the quality of rating systems which

DeinFreund and I have tested on game data. Here are some relevant threads sorted chronologically from old to new. Note that the newer ones are closer to the current rating system (WHR):
Alternative balance
!predict is wrong! - New Prediction System for Teams
Evaluating rating systems
Predictiveness of the new ELO Split in 1v1-4v4
ELOs again, how about some ultracompetition?
Whole History Rating (improved ELO)
Slides about Whole History Rating

Here you can find a game data set for testing from 2014 by @KingRaptor /

Histidine and from 2015 by

Licho. Here you can find the paper about the basics of WHR.

+3 / -0

malric

4 years ago
(edited 4 years ago)

quote:
to evaluate the quality of rating systems

I think there are 2 components: the rating system and the balancing. They influence each other in a complex way, but you could just adjust balancing without changing the rating system for example.

Also, as discussed in a previous threads, question is what is "quality"? Some people might care about "perceived balance" rather than "numerical exactness", or on many other points such as 2 ratings for casual and match-maker although results for a combined ladder were better.

Edit: without checking I wonder if the 2014 data set has the same patterns/behavior as 2021 - team sizes/people/styles/game might have changed quite some in last 7 years.

+0 / -0

manero

4 years ago
(edited 4 years ago)

I didn't say Balance is shit, but sometimes it creates problems. I happened to lose even 7 consecutive games for the same problem.
Most likely the problem occurs at that given moment because the presence of players with different quality of play (rank) determines a subdivision for two teams which apparently seems the most correct, as the arithmetic average of the elo, but not correct for the strength, overall of the players themselves.
In summary, only the possibility at the time of the start of the game would be fine, that player who queues last in the loppot is put in the waiting list. Given that with its inclusion in the game it would still lead to a numerical imbalance of players between the two teams. (p.s.
by player quality I mean his ability to solve in the shortest possible time a problem during the game with the appropriate counter move, or an adequate strategy, and by non-quality who builds merlin in a water map, as unfortunately I happened to to see. In a player with yellow rank)

+0 / -0

malric

4 years ago

I guess nobody disagrees the overall system can be improved, the hard part is improve one aspect without making the system worse elsewhere.

My current win/losses streaks (talk about me as I have the most experience with my games :-p) are affected by two factors that AFAIK are not taken into account: map selection and team-mates play style. I generally prefer to support larger areas/different situations (hence, play air a lot, like larger maps) and help team-mates. This means that on small/porcy maps where everybody must play "within his lane" I enjoy less/feel I can do less.

Not sure why you had a 7 games loss streak, but there also are uncontrollable factors like: smurfs or good players trying troll (or new) strategies. I for example try to play always "as good as I can", but I have seen some (ex: Firepluk, not to give examples of active players) that can some days play great and other days just build singus/pala/ramp/etc. in the back...

+1 / -0

manero

4 years ago

I agree that it is difficult to add a function without interacting with another. Perhaps without changing things you could change the classes, the parameters, which enclose a rank, perhaps adding another 2 especially the lower ones, with other colors to better level the players based on their improvements. This would perhaps lead to an improvement in balance with a better division of players by team

+0 / -0

Brackman

4 years ago
(edited 4 years ago)

I agree that it would be good to distinguish low ranks better than by different shades of orange. But this does not influence balance because the balancer operates on the continous numbers behind the discrete ranks.

I think there are two major improvements for balance to be made:

1. Reduce "the fluff" that makes shown rating deviate from actual WHR mean value.

2. Look into how the code iterates through player constellations and optimize it to also minimize rating deviation if this is possible without worsening equal win chances. Maybe the code can also be made more efficient so that more constellations can be checked for this without more computation power.

Edit:
I took a look in the code and found two balance functions: LegacyBalance and PartitionBalance. If player number < DynamicConfig.Instance.MinimumPlayersForStdevBalance, it does PartitionBalance. Otherwise LegacyBalance.

LegacyBalance goes through up to 2 million constellations in a recursive way while also minimizing deviation weighted by DynamicConfig.Instance.StdevBalanceWeight (very elegant but not most efficient).

PartitionBalance goes through all constellations iteratively. As going through all constellations would be too costly for very many players (more than 38 resulting in 1 million constellations), it defaults to LegacyBalance for very many players. This one is more efficient but I didn't find a deviation minimization in it.

Now the question is what the values for DynamicConfig.Instance.MinimumPlayersForStdevBalance and DynamicConfig.Instance.StdevBalanceWeight are.

Commands
CmdBalance
ServerBattle
Balancer
PartitionBalance
MatchMaker

+3 / -0

Forum index > General discussion >