Matchmaker balancing for uneven team counts seems broken

Page of 2 (30 records)

sort

Nobbester

2 years ago

It seems to me like the matchmaker heavily weights (maybe even double-counts) the elo of a play given multiple coms when teams have uneven player count. See example battle I had today: https://zero-k.info/Battles/Detail/1724377

My team effectively has 1-2 competitive players going by elo while other team has 3-4. So I get a combined penalty of less players on my team and the players I do have are worse than the enemy. This leaves me microing 2 commanders/factories worth of stuff and having to fight MORE than 2 players worth of enemies to win. To cap it off, I DON'T EVEN GET 2 PLAYERS WORTH OF INCOME DISTRIBUTED TO MYSELF!

If this is how the balancer must work, at least give the multiple commander player the resources to have a chance. As it is, my team had 1/6th-2/6th of our metal being used competently (again going off of elo) while the other team had 3/7-4/7. And more hands to micro that metal with. On a medium-large map, this difference is almost insurmountable regardless of how good the high-elo individual is.

I don't mind getting hard matchups (basically a given if you're at the higher end of the elo chart, esp when playing small teams), but the current situation feels blatantly unfair. The end result that I'm not a fan of is I either don't play small teams (which often means I don't play zero-k) or I condemn myself and my teammates to a likely loss. People even notice that the teams are unbalanced in these cases and that further kills morale/chances of pulling off a win.

I really feel like we should reconsider how the balancer works for uneven player counts and/or do resource distribution based on initial commander count. Or the brute force option: restrict the lobby to only allow even player counts under ~10v10. Above that player count it smooths out a bit better (more competent players on both teams, reclaim and such can make up for the income diff, etc.).

Either way I think the income distribution thing should be considered. Even in a 31v32 lobby if I want to play air the second fac/com is basically AFK unless I want to lose the early swift fights.

+3 / -0

malric

2 years ago

And numbers seem to support this opinion (that current situation is unbalanced), see the great analysis at http://zero-k.info/Forum/Thread/36555 , to quote "The results indicate that the larger team does have a moderate advantage depending on team sizes and the exact model"

+0 / -0

Kosynthary

2 years ago

https://zero-k.info/Battles/Detail/1711089
https://zero-k.info/Battles/Detail/1659966
https://zero-k.info/Battles/Detail/1625455
https://zero-k.info/Battles/Detail/1656801
https://zero-k.info/Battles/Detail/1653856
https://zero-k.info/Battles/Detail/1661527

I agree, give the person with 2 coms double the income. Why do I get 2 factories if I barely have the metal for 1.

+1 / -0

Helwor

2 years ago

I made a widget that count elo diff of average elo of teams (no matter the count), and every game it's the same.
That diff is always very low, often between 0 and 30 in large games.
So how it works is very likely that the balancer make every combinations, calculate the average elo of potential teams, and return the combination that has the smaller difference between those averages.

So to illustrate the problem, here a simple example:

playerA 2500elo, playerB 2000elo, playerC 1800elo

B+C avg: 1900 -> 600 elo diff with playerA
A+B avg: 2250 -> 450 elo diff with playerC
A+C avg: 2150 -> 50 elo diff with playerB

=> balancer would put playerB with 2 coms vs playerA and playerC => best combination with 50 average elo diff
The problem is that playerB is counted as 2 players of 2000 elo while he is just slightly above playerC and totally behind playerA.

PlayerB in my view don't stand a chance, but yet, the balancer sees only 50 elo diff and consider that a somewhat even match.

My solution would be to consider the smaller team's highest elo one guy plus another guy with 2/3 of his elo.

which makes:

B+C avg: 1900 -> 183 elo diff with playerA (2500 + 1666)/2 = 2083
A+B avg: 2250 -> 750 elo diff with playerC (1800 + 1200)/2 = 1500
A+C avg: 2150 -> 484 elo diff with playerB (2000 + 1333)/2 = 1666

Balancer would put playerA 2500elo alone with 2 coms and assume 183 elo diff which is more close to the reality.
I think that calculation would give a more fair balance for uneven team.

+1 / -0

Helwor

2 years ago

error the average diff between A+C and B is 150, but the reasoning is the same

+0 / -0

Nobbester

2 years ago

Appreciate the extra data from malric & kos.

Helwor I agree it's likely doing something like that when balancing and that's not appropriate. I really don't think scaling com #2 weighting to 2/3 is enough as is. That says that giving me a second com/fac at the start (and still my normal share of team metal) is equivalent to adding a player at 2/3 my elo to the other team. Or more realistically 2 players at 5/6 my elo.

I don't see how player of elo X w/2 com, 2 fac, and 1+epsilon metal income is equal to 2 player of elo 5/6*X w/1 com, 1 fac, and 1 metal income each is remotely balanced. Team of single player also gets a bit more metal each but that's a wash when they are bad. Maybe even a negative if they feed.

I can understand they may not want to mess with elo algo anyway which is why I think just giving the player their 2x share of income is a good first step. Follow that with just considering locking out uneven teams for smaller lobbies.

+1 / -0

Kosynthary

2 years ago

I believe it should be the other way around. Why are we even giving more chances of winning to the other team just because I have 2 coms in the first place? They are not the victim here. It's like giving a cane to the guy that can walk meanwhile I'm the one with a broken leg.

+0 / -0

skuggmodzer0

2 years ago

That is a fascinating observation

Helwor.

Without having done any work to understand the algorithm or why it does that ... do you know why it tries to find best average? This is identical to the sum, with the minor difference of dividing by the total number of players. Stop dividing, and be happy?

+0 / -0

Helwor

2 years ago

Nobbester I agree about the income problem but this is a separate feature than
calculating who gonna go with who, what the balancer does. I was only talking
about that thing.

Considering the income fixed (as also imo it should be), there would still be that
problem because 1 guy with 2 players income/com/fac don't have the apm and
awareness of 2 guys of that same elo playing at the same time.
Hence it should be considered with a lesser elo and in turn, might change who
will end up with 2 coms as shown in my example.

On the side, I would add also that the rating system is also wrong not considering wether
teams are small or big.
Indeed when teams are small, each player action count, while in big, you can just
sit in the back and still win even doing nothing, and have decent elo.
This end up with teams unbalanced when small teams happens, some player with
decent elo in big are totally outclassed in small teams mainly due to the fact
that they never raid or expand...
To have a more decent balancer, imho, there should be 2 separate rating, one for
small and one for big (over 5v5).

+0 / -0

Brackman

2 years ago

Why do you think that a player with multiple coms would get a higher weighting in the balancer? AFAIK, this is not the case.

quote:
To have a more decent balancer, imho, there should be 2 separate rating, one for
small and one for big (over 5v5).

It has already been shown that this would actually worsen the balancer because each of the separate ratings would be based on less data.

quote:
Stop dividing, and be happy?

No! This would be a violation against the invariance of the elo scale under additive shifting transformation. It would mess up the whole calculation arbitrarily and massively overcompensate the larger team advantage described by

dunno.

Also, you can use my predicter widget.

+1 / -0

Nobbester

2 years ago
(edited 2 years ago)

Brackman I'm primarily going off of personal experience that the team balance is heavily shifted from typical in the case of small, uneven teams with multiple high-elo players. I suggest you look at the team composition in the battle I linked and some that Kosynthary shared. The more statistically valid sample malric shared seems to indicate this is a consistent issue. If you can point me at the exact balancing logic we use I would be happy to take a look - I'll probably go find it myself at some point.

I generally agree with you that we are likely better off not modifying the actual elo system given its desirable properties. I'm just perceiving that the way we are using that elo balancing seems questionable in this fairly common scenario. Hence the suggestions towards balancing in-game resources and restricting team sizes to those that seem to function more reasonably.

+0 / -0

malric

2 years ago

Note: the analysis I shared was performed by

dunno!

quote:
in the case of small, uneven teams

Probably would be best to use numbers. If I read small I think less than 6 players. You seem to consider small also 11 players. Neither is wrong, but can be confusing for anyone who reads.

quote:
If you can point me at the exact balancing logic we use I would be happy to take a look

dunno's analysis already had a conclusion:

quote:
Summary: Uneven team games should probably be balanced such that the smaller team has about 50-200 higher average Elo than the larger team

If there would be agreement to try implementing that conclusion, I guess the code change will not be extremely hard. I looked through the analysis but I did not do a through review, someone working more with statistics could probably do it easier than me (@Brackman?..)

+1 / -0

Helwor

2 years ago

skuggmodzer0

I have no idea about why, I just noticed it because I modified a player list widget,
to show me directly elo of each players and diff of average team's elo.
And after many games, that ended up being obvious to me that I found out of luck how the balancer works.

If you stop dividing that would end up like this:

---
Consider uneven teams of 7 players, elos: 2500, 2000, 2000, 1500, 1500, 1500, 1000

with lowest sum diff:
team1 3 players: 2500, 2000, 1500
team2 4 players: 2000, 1500, 1500, 1000
sum diff = 0 (6000 vs 6000)

with lowest average:
team1 3 players: 2500, 1500, 1000
team2 4 players: 2000, 2000, 1500, 1500
avg diff = 84 (1666 vs 1750)
---

So, no, I think the balance would be even worse with minimum sum diff.

+1 / -0

Brackman

2 years ago
(edited 2 years ago)

I'm aware of the problem as described by

dunno (and linked by

malric ). There are multiple possible solutions. I just wanted to exclude those that would obviously not work.

quote:
If you can point me at the exact balancing logic we use I would be happy to take a look - I'll probably go find it myself at some point.

Essentially, it minimizes the difference between the teams' WHR averages. The balancer uses ladder rating, the predicter uses actual rating. For the details of the code, look at the links at the bottom of this post.

Helwor's 2/3 suggestion would also violate against the invariance of the elo scale under additive shifting transformation. According to

dunno's data,

Helwor's suggestion would overcompensate the effect for some cases and undercompensate it for other cases.

I suggest that those who want to make useful suggestions for solutions should first understand

dunno's analysis.

quote:
dunno's analysis already had a conclusion:
quote:
Summary: Uneven team games should probably be balanced such that the smaller team has about 50-200 higher average Elo than the larger team

If there would be agreement to try implementing that conclusion, I guess the code change will not be extremely hard. I looked through the analysis but I did not do a through review, someone working more with statistics could probably do it easier than me ( Brackman ?..)

Yes, this would be a decent solution and rather easy to implement. The question is only how the epad vector should be calculated exactly.

+1 / -0

Aquanim

2 years ago
(edited 2 years ago)

The relevant code is somewhere around here.

The balancer does not count the highest-rated player's rating more than once. I think, under different circumstances, it can do either of the following:

Minimise the difference in the average ratings of both teams, weighted a bit by the standard deviation of ratings in each team. Not sure what that weighting is presently. Could be zero for all I know.
Add a 'dummy' player with rating equal to the average rating of all players in the room to achieve an even number of players, then minimise difference in average ratings of both teams.

If the teams are balanced then the dummy player will have more-or-less the same average rating as both teams, so (besides whatever accounting for standard deviation takes place) there is little practical difference.

I think that the present accounting for imbalanced teams is not very good, particularly for small teams, but it is nontrivial to find and implement a convincing improvement. (As in, I expect that improvements exist, but proving that they are improvements is unlikely to be easy.)

The impact of double-com on a game is influenced by many things such as the size of the map, the number of lanes on the map, whether both teams have at least one competent player for each lane, whether both teams have a competent air player other than the double-com player, etc, etc. Even more fundamentally, some players are considerably better at managing double-com than others. [Spoiler]

+0 / -0

Helwor

2 years ago

Brackman
There is already 2 separate ratings for 1v1 and teams (well, actually for MM and casual but ppl use MM for 1v1 mostly).
I don't see how more data is better if those data contradicts each other (unless they already take into account size of the teams??), that sounds actually worse to me.
I do mostly small teams or 1v1. My elo fluctuated from 2.9K to 2.6K and vice versa from a day to the next several times because I was playing 1v1 to 5v5 on casual. The 'accuracy' of elo is very discutable imho, and I still think having a 'small teams rating' derived from the casual rating would be better for the balance sake.
We are at a point where ppl think twice before joining a game trying to guess how bad this will be for their elo if they lose.
That's how accurate it is.

+0 / -0

Aquanim

2 years ago

As I recall the evidence which showed that separating out "small teams" or "ffa" into their own rating category, *without interaction with the other categories*, would diminish rating accuracy was pretty convincing.

Not sure if anybody ever ran the numbers on a "small teams" rating (et cetera) which weighted similar-size games more strongly but did not altogether throw out other games.

+0 / -0

Nobbester

2 years ago

Ok I'll definitely further review everything shared so far before I comment much further. Definitely want to go back through and do the math on some of the games shared to see how it is working out in practice. Appreciate all the info shared.

+0 / -0

Helwor

2 years ago

Aquanim
Well, I would think basically and simply:
-keeping the casual rating as basis,
-noting the difference of success when it comes to small teams,
-and give a bonus/malus to the elo according to that difference.
So no, in my view it would not be an isolated rating.

+0 / -0

skuggmodzer0

2 years ago
(edited 2 years ago)

quote:
So, no, I think the balance would be even worse with minimum sum diff.

Thanks for the clarification. I'm so unfamiliar with the system that I had wondered if the average was even the most suitable thing to seek to equalize; your original 2500-2000-1800 example showed it functioning quite badly. Just questions from the audience, you know. (BTW not asking for more clarification.)

+0 / -0

Page of 2 (30 records)

Forum index > General discussion >