Loading...
  OR  Zero-K Name:    Password:   

!predict is wrong! - New Prediction System for Teams

34 posts, 2574 views
Post comment
Filter:    Player:  
Page of 2 (34 records)
sort
So here's the post. Normally in an elo system the expectation value of elo change due to a game is zero (~true for 1v1). In teams however !predicted win chances are not distinct enough (too close to 50%). Therefore you can increase your team elo systematically by only playing games with !predicted win chance > 50%, because in reality it is even higher.

quote:
I want you (...) to come up with something more complex to prove that springie balance is fundamentally wrong
RUrankYogzototh
quote:
We should make this into a mini competition.. someone provide a standard data set of eg. 1000 3v3+ games in an easy to use Excel format and we all apply krazy math to achieve the best prediction.
USrank[GBC]1v0ry_k1ng

How to evaluate prediction systems
In 2014 @KingRaptor published a file with 1195 team games (after 1v1, ffa, coop and uneven teams have been deleted).

I searched for a better evaluation method for prediction systems (where unbalanced games are considered and the best strategy is not always guessing 100%). Fortunately I found ingenious scoring rules that punish false high probability predictions hard and I have proven that score is maximized for guessing the true probability:
[Spoiler]Let me know if I should prepare a file (with/without uneven teams, with/without my results) to test your own prediction systems (includes not only logarithmic, but also Brier scoring transformed to a scale from 0 to 1).

Balance vs. prediction
Team balance is how players are distributed to the teams. Changing the prediction system doesn't directly change team balance, but indirectly by changing elo wins and losses. The systems presented here don't change balance directly, my more sophisticated but untested systems do. A term to also minimize standard deviation can be added to any system like this and will only change balance direclty, but not predictions. Testing a prediction system only really makes sense on teams that are balanced with the corresponding balance system. Other systems will probably predict better on their own balance than a test on current balance hypothesizes (especially my more advanced ones that change balance not only indirectly but directly.)

Concrete results
!predict has reached an average trans log score of 0.079 on 1195 games, "sqrt sm" has 0.098 and "smes" 0.100 (where 0 is guessing always 50%, 1 is guessing everything 100% right and minus infinity is guessing only one thing 100% wrong). The 2-normed average of differences of probability predictions between current and smes is 9%, between "sqrt sm" and "smes" only 2% even though "smes" is of another type. For example in Multiplayer B245589 20 on Cooper_Hill_TNM02-V1 !predict gave the winning team 54.6%, "sqrt sm" 64.2% and "smes" 71.6%.

"smes" system
"smes" means size dependently modified elo sum system. It is based on elo sums and is the best system tested on those 1195 games. n is the average size of the 2 opposing teams (for example 2.5 in a 2v3). The probability p is the win probability according to the normal elo formula with elo sums instead of elo averages. It has been shown that calculating "elo sums" by n*(average elo) is better for uneven teams than calculating real elo sums. The finally predicted probability is then (p^D)/(p^D+(1-p)^D), where D is the distinctivity modificator that depends in n (D>1 increases prediction distinctivity, D<1 decreases) with D(n=1)=1 (to leave 1v1 unchanged). Here we use D=0.5+0.5^n.

"sqrt sm" system
When adding uneven team games (1726 games on the whole), "sqrt sm" seems to do better, but "smes" and "sqrt sm" have similar and better results than !predict. "sqrt sm" simply uses !predicted probability and modifies it with D=sqrt(n).
+4 / -0
So, given that this data only matters as far as it can be used to set up teams that actually have a 50% win rate, do you have a fast algorithm that will achieve a 50% win rate for teams according to the smes or sqrt sm approach?
+0 / -0
9 years ago
smes and sqrt sm algorithms are about as fast as currently. But they don't change balance directly, but only indirectly by changing elo wins and losses. Currently you will loose lots of elo even though you had a high probability to loose, which !predict doesn't know. Elos will be a little more accurate and thus balance a little bit better. If you really want to change balance, there are still my more advanced systems. But for now it will be good to have at least better predictions.
+1 / -0


9 years ago
I'm going to generate a slightly larger set for you to run on to solidify the glorious victory.

+3 / -0
Skasi
quote:
It has been shown that calculating "elo sums" by n*(average elo) is better for uneven teams than calculating real elo sums.

Did you take into account the fact that some weeks ago resource distribution was changed? If I remember correct (high elo) players with two commanders used to receive a two-player-income (or at least personal metal from the second commander). This is no longer true (unless the playerlist happened to show numbers far off for other reasons like reclaim/...).

Also, have you tested counting players with two commanders twice? As in 2000+1000=3000 elo team vs 1500+1500+1500=4500, here the 2000 elo player counts twice, resulting in a 5000 elo team. Again, this might no longer be as accurate in newer ZK replays as it might've been with older replay data.
+1 / -0

9 years ago
quote:
Did you take into account the fact that some weeks ago resource distribution was changed?

nope
quote:
In 2014 KingRaptor published a file


quote:
I'm going to generate a slightly larger set for you to run on to solidify the glorious victory.

Just give him all battles (where applicable)?

+0 / -0
I am in favor of a new elo balancing system that encourages uneven teams.

Those games are more fun for both teams, as often the popular majority low elo players get bunched together and feel a sense of camraderie while the minority higher elo players get bunched together and while outnumbered by two or three players feel a sense of camraderie.
+1 / -0

9 years ago
quote:
I am in favor of a new elo balancing imbalancing system that encourages uneven teams.

ftfy
+2 / -0
ty sprung :D

i guess the better way to say it is i am in favor of a balancing system that balances and splits teams skill fairly and fun-ly regardless of the number of players being odd or even or significantly disproportionate.

Fun-fair balancing would look at team elo additively with a split-bell-curve elo arrangement of available players going on.
+1 / -0


9 years ago
  • Did you replicate existing system first? That means calculating elo progress over time for each player over time, using ZK's weighted elo?
  • Did you compare ZK's weighted elo average to your predictor and see that your predictor is better than ZK's? That means it predicted the winner more reliably than weighted elo?

If answer to these questions is yes, i will implement this and run on full history of hundred thousands battles.
+0 / -0
quote:
Just give him all battles (where applicable)?

That was the intention.
+0 / -0
Here is a slighlty larger CSV listing 51695 ZK teamgames of at least 6 players in size, with winner and loser player id lists for each.

Here is the exact methodology used to obtain this data:
quote:

Select distinct SBP2.SpringBattleID, SpringBattles.PlayerCount,SpringBattles.Title,
substring(
(
Select ','+CAST(SBP1.AccountID AS VARCHAR) AS [text()]
From SpringBattlePlayers SBP1
Where SBP1.SpringBattleID = SBP2.SpringBattleID
AND SBP1.IsInVictoryTeam

1 AND SBP1.IsSpectator

0
For XML PATH ('')
), 2, 1000) [Winners],
substring(
(
Select ','+CAST(SBP1.AccountID AS VARCHAR) AS [text()]
From SpringBattlePlayers SBP1
Where SBP1.SpringBattleID = SBP2.SpringBattleID
AND SBP1.IsInVictoryTeam

0 AND SBP1.IsSpectator

0
For XML PATH ('')
), 2, 1000) [Losers]
From SpringBattlePlayers SBP2 LEFT JOIN SpringBattles ON SBP2.SpringBattleID = SpringBattles.SpringBattleID
WHERE IsFfa

0 AND isMission

0 AND HasBots = 0 AND PlayerCount > 5


Verify that i didn't do anything silly, and you can have a conclusive run.
+0 / -0
[Spoiler]
quote:
* Did you replicate existing system first? That means calculating elo progress over time for each player over time, using ZK's weighted elo?
* Did you compare ZK's weighted elo average to your predictor and see that your predictor is better than ZK's? That means it predicted the winner more reliably than weighted elo?
@KingRaptor's data already included players' elos and teams' averages. I don't know whether it considered weighting and progress with the current system, but I doubt it. I don't even know how the weighted average is calculated currently nor how elo wins and losses are distributed within a team with weightings. If @KingRaptor's data already included it (unlikely), my systems will probably get even better when using their own elo progress. If not, all systems will probably become a bit better and we don't know by how much. In any case my systems could use the same weightings.

I only know that the total elo win (loss) of a team with winning (loosing) probability p must be (1-p)*f(p), where f>0 any function with a line of symmetry at 50% (e.g. f(p)=c*(p*(1-p))^z) to guarantee the expectation value of elo change is zero. Probably z=0,f=c=30 is used currently. I still don't know how weighted average and weighted elo change is calculated, though.

Thanks for the data, EErankAdminAnarchid. Should I assume that all players have elo 1500 in the beginning? If you can include the elos that the players had respectively at the time of those games and their weighted team average, I don't have to write an own programm for the test. Recalculating and storing all players' elos will be quite extensive. At best there would be 2 columns for the teams' weighted averages and 32 columns for the player elo slots, where the looser team always starts at the 17th of those columns like in @KingRaptor's data. If "smes" or "sqrt sm" will be better than current even when using current elo progress, they will probably be even better, when using their own. But for testing them with their own progress it might be easier to use existing zk infrastructure than coding new programms.
+0 / -0
Then-elo is not available in the database as far as i can see, so it has to be recalculated or extracted from the replay files. The provided dataset does not contain any elo information, only lists of account ID's of winners and losers.

Extracting the information from replays is not currently possible because i can't find the replays from 2014. I think KR's data was obtained using the "Fixer" program, which basically recomputes elo from historic data.

It is possible, though, to obtain elo changes for each player in each battle.

On a practica/technical note, you can use a local database to store intermediate results, so it won't be that expensive to keep a running rating for everyone.
+0 / -0
Yes of course. It would only need some coding time. Can the "Fixer" or other existing programms be used for it?

How are the current formulas for weighted average and elo change?
+0 / -0


9 years ago
be very interested to see outcome of this!
+0 / -0
Can you order the games chronologically (by SpringBattleID)?

Not all FFAs have been deleted, for example Multiplayer B81537 7 on hotstepper. Maybe leave out games with more than 2 teams? There are also some silly maps, Dota and lots of PW games, but i think this will be ok for testing purposes. You don't have to demand PlayerCount > 5. PlayerCount > 2 will be enough, because the new systems are for any game size and will only leave 1v1 and COOP unchanged (and non-team FFA, but any FFA should still be excluded for testing).

What is the maximum playerID?
+0 / -0


9 years ago
DErankBrackman this is just WRONG

You need to do calculation that only takes into account information available at that point of time!

For battle 10, you can only consider information from battles 1-9, and not current elo generated by our system.

You need to rerun calculation, including elo, from battle 0, and do all ZK does, including weighting in the same way to really compare the two.

Doing backwards in time comparison using present time elo is completely meaningless..

It's like deciding who will win after knowing who won.
+1 / -0


9 years ago
I will provide the data.
+0 / -0


9 years ago
http://zero-k.info/temp/minibat.zip

JSON file with all the battles except missions and bot games.
Players is array of arrays by teams. First team is winners always.

It also contains map id and battle duration in seconds.

You should process this using ZK's weighted elo together with your new elo/predictor and compare your prediction with ZK's weighted elo average.
+0 / -0
Page of 2 (34 records)