Battle 1403384 detail

sort

TinySpider

2 years ago

Balance decides that 2 purple + 1 blue + noobs = 7 blue. Then a silver throws a hissy fit because his cerb died and goes pure storage. Really great game.

Balance should not be done numerically, it should pick best player for one team. Then second two best players for other team, then another two best players for other team, and so on. Not this bullshit where it puts all coppers on one team and all blues on other.

+0 / -0

GoogleFrog

2 years ago
(edited 2 years ago)

People suggest versions of this from time to time. It just never goes further because suggesting a system is just the first step. I don't think anyone is against trying stuff out. Someone needs to:

Pick a system.
Precisely specify it.
Check that the edge cases make sense.
Simulate it against past battles to check that it won't do anything too wrong.
Code it.
Make a pull request.

+2 / -0

Brackman

2 years ago

Not numerically?!?

quote:
Check that the edge cases make sense.

So you would rather have

3300
1801
1800
vs
1803
1802
300

than

3300
1803
300
vs
1802
1801
1800

?

+1 / -0

Aquanim

2 years ago
(edited 2 years ago)

Obviously that's an extreme example, but yes pure snake draft is going to produce some garbage games, even outside corner cases.

At the moment the balancer (as I understand it) searches a list of candidate solutions (based on elos and party/clan stacks) and selects the "best", measured by difference in rating average.

The lowest-effort improvement I can think of is to take the best N solutions from that list (in a large teams game, these should almost always be close to balanced anyway) and choose from them according to some cleverer criteria.

+2 / -0

GoogleFrog

2 years ago
(edited 2 years ago)

Actually hang on. The spoiler says that team 1 had a 42% chance of victory. I suspect clans/parties were involved in creating this balance, as there is surely a more even win chance to create with 29 players.

See https://docs.google.com/spreadsheets/d/12juRYegGLi5y4JAbir_r8Hai0sylDIjFSFDwq6qf2Jo/edit?usp=sharing

(The replay may well be lying about ratings though).

+1 / -0

Aquanim

2 years ago
(edited 2 years ago)

I do not trust the winrates listed on the site after the game.

EDIT: Searched Nightwatch logs for:

quote:
Zero-K 1.10.6.0 Comet Catcher Redux v3.1 (23+6/32), 8 hours ago:
23 players balanced ClanWise to 2 teams ( 1=50%) : 2=50%)). 299676 combinations checked, spent 201ms of CPU time

+4 / -0

Brackman

2 years ago

Indeed 50% chances before game start were to be expected. I think it would be worth to look into how the code iterates through player constellations that produce equal win chances. Here is a starting point.

+1 / -0

TinySpider

2 years ago
(edited 2 years ago)

quote:
23 players balanced ClanWise to 2 teams ( 1=50%) : 2=50%))

I've never seen the pregame percentage say anything other than 50%, it must be some fallback value because there's no way every single game has 50%.

I think the worst part of these shitty games is how the balancer assumes that the extra commander for highest ranking player is worth way more than it actually is. Having one less player should be considered a penalty and not a bonus.

+0 / -0

Aquanim

2 years ago
(edited 2 years ago)

quote:
Zero-K 1.10.6.0 Terra 2 (5+1/32) 7 hours ago:
4 players balanced Normal to 2 teams ( ( 1=71%) : 2=29%)). 4 combinations checked, spent 0ms of CPU time

quote:
Zero-K 1.10.6.0 Into Battle v4 (9+5/32) 5 hours ago:
8 players balanced ClanWise to 2 teams ( ( 1=52%) : 2=48%)). 16 combinations checked, spent 0ms of CPU time

If the number of players in the room is large, the balancer will almost always manage to find teams with very nearly equal rating average, so you will almost always see 50% or occasionally perhaps 51% in large teams.

(Broadly speaking, this is because (a) the number of potential team assignments increases exponentially, and (b) even a randomly selected team assignment is increasingly likely to be pretty close to balanced as the number of players increases.)

If I remember the code correctly, the "empty" player is assumed to have rating equal to the average rating of all players in the room. Perhaps it should count for less than that. I am not sure if anybody has run the numbers to figure that out.

+2 / -0

Brackman

2 years ago
(edited 2 years ago)

Actually it's not exponential but a Binomial coefficient (N choose floor(N/2))/(2 - N mod 2).

I also found another hint that standard deviation difference minimization is already applied.

quote:
If I remember the code correctly, the "empty" player is assumed to have rating equal to the average rating of all players in the room.

Rather the average of all players in their team.

quote:
Perhaps it should count for less than that. I am not sure if anybody has run the numbers to figure that out.

DeinFreund has run the numbers and they show that the bigger team has an advantage. A better way to consider uneven teams has not been demonstrated yet but more can be done on that. I have the theory that the rating of players who tend to get 2nd coms is distorted to compensate the misconsideration on average. Maybe this is even the reason that the highest casual ratings are smaller than the highest competitive ratings.

+0 / -0

Aquanim

2 years ago
(edited 2 years ago)

quote:
Actually it's not exponential but a Binomial coefficient (N choose floor(N/2))/(2 - N mod 2).

According to Wikipedia this is ~ 2^(n)/(2*sqrt(pi*n/2)) for large even n; so there is a pesky sqrt(n) term in the denominator.

quote:
Rather the average of all players in their team.

I think this is (implicitly) correct for the LegacyBalancer code used for large teams. I was thinking of the PartitionBalancer which explicitly adds a dummy player before choosing teams. The practical difference is probably minimal.

quote:
I have the theory that the rating of players who tend to get 2nd coms is distorted to compensate the misconsideration on average.

As a ballpark estimate, even the #1 player only gets an extra commander about 1/4 of the time.
[Spoiler] According to DeinFreund's data the winrate in 4v5+ games for the larger team is less than 53% (and uneven games below that threshold are no longer played by default, though historical games do still affect rating).

By definition these games must be priced in to the ratings of higher-rated players, but it seems hardly sufficient to explain the 500-600 rating gap between the top of the casual and competitive ladders. I think the simplest explanation for that is that a high rated player influences much more of a 1v1 game than they do a teams game, even a 2v2, so they have more opportunities to turn superior skill into a W.

+0 / -0

Brackman

2 years ago

You're right, the 2nd com rating distortion is by far not enough to explain the MM casual rating difference. It can only be a small contribution to that. It seems now like the main contribution was FFA.

Now one can think about how the standard deviation and 2nd com consideration can be improved or if the current solution is already fine. Do you know up to which player number PartitionBalance is used?

+0 / -0

TinySpider

2 years ago

Chance of victory: 37.1%

I knew it was really bad, it keeps getting worse.

+0 / -0

Title:	[A] Teams All Welcome (32p)
Host:	Nobody
Game version:	Zero-K 1.10.6.0
Engine version:	105.1.1-841-g099e9d0
Battle ID:	1403384
Started:	2 years ago
Duration:	40 minutes
Players:	23
Bots:	False
Mission:	False
Rating:	Casual