Slides about Whole History Rating

sort

DeinFreund

4 years ago

As you may know I'm studying computational science and since seminars are part of any course of studies, I used the chance to talk a bit about WHR with subtle Zero-K advertisement. It's a short presentation and doesn't go much into detail, but it contains the added formulas that are needed for ZK which are not part of the original paper. So maybe

Brackman is interested (Also likelihood calculation in Appendix).

https://drive.google.com/file/d/1sDqfKhThaQqDbFQ5Xb6rit-ZOZo303TI/view?usp=sharing

I also recorded the talk, but since the conference had been moved online and I never bothered to practice it's a bit awkward. That recording is here:

Keep in mind all of this is about Zero-K's underlying rating system, not the rank system which has some things bolted on to make it human and frog compatible.

+23 / -0

binko

4 years ago

when the hundred thousand games played question came you can read Firepluk on drdein's forhead

+0 / -0

Jasper

4 years ago

This is very interesting. I am sure if a library was available under a permissive license, like MIT, it would get used by independent game developers. I know I would be interested in replacing ELO with a more advanced system. The effort in doing so is significantly higher then most game devs could muster.

+0 / -0

dyth68

4 years ago

quote:
frog compatible

I admit I lolled :D

+1 / -0

katastrophe

4 years ago
(edited 4 years ago)

why did you cut out your applause?

...also, ShyFreund is something new for me.

+3 / -0

DeinFreund

4 years ago

Jasper
There currently is a python package but I think it's rather ugly. I'll probably do a C++ implementation at some point. The problem with both the python and my own implementations in Java and C# is that they're based off of an existing ruby whr library. One that is as badly designed as the language it's written in. It stores entire matrices when it only needs to store the diagonal. I've fixed it up a bit for the Zero-K implementation, but I still don't like it.

Also the Zero-K implementation is currently the only one supporting FFA and Teams, since that wasn't part of the original paper.

+0 / -0

dyth68

4 years ago

Regarding teams, it feels as if a 45% chance pre-game might as well be 20% and a 40% chance is more like 5%.

How often does the team with a 40% chance actually win a team game?

+1 / -0

DeinFreund

4 years ago

dyth68 this can effectively be tuned by adjusting the omega^2. Higher values there will give larger ratings (changes) and thus more pronounced win chances. Me increasing that constant is also why we have 3000+ elo now. In the end I try to find a compromise that works well for most game modes.

+0 / -0

Jasper

4 years ago

quote:
There currently is a python package but I think it's rather ugly. I'll probably do a C++ implementation at some point. The problem with both the python and my own implementations in Java and C# is that they're based off of an existing ruby whr library. One that is as badly designed as the language it's written in. It stores entire matrices when it only needs to store the diagonal. I've fixed it up a bit for the Zero-K implementation, but I still don't like it.

Also the Zero-K implementation is currently the only one supporting FFA and Teams, since that wasn't part of the original paper.

A cleaned up C# implementation would be mint. The only other skill rating system that I am aware of which is similar is Microsoft TrueSkill. Which as far as I know does not have source code available.

+0 / -0

Brackman

4 years ago

Very good presentation! Directly using natural ratings makes the formulas much simpler to get. Just the logistic distribution is harder to see if you're not used to those formulas.

I'm not so sure if

dyth68's concern is considered by omega². If omega was sufficiently small to make ratings not overshoot and enough playing time without skill changes passed, the rating deviation over players should converge independently of omega. Of course the assumptions of no overshoot and enough time without skill changes do not hold. Therefore, omega has some influence, but it's not proportional.

It's good to see the formulas for omega_t and omega_g. What is g in the formula for omega_t²? The number of games the player played on that day or in total?

It seems that we are currently using very big omegas. It was interesting to see that this has a better log score. But maybe there is a local maximum of log score at smaller omega. To which extent is the better log score due to bigger omega only caused by faster changes for new players or by making it depend on game number rather than only time? And I'm wondering to which extent the human and frog changes worsen log score and balancing.

+1 / -0

DeinFreund

4 years ago
(edited 4 years ago)

The g is the total sum of game weights. Weighting is 1/n in a team with n players, so 1v1 counts as 1, 2v2 as 1/2 and so on. omega_t^2 = 200000 / (g + 400). So it starts at 500 elo^2/day and halves after 400 1v1 games played. This decay is only for the omega_t term, the omega_g term solely depends on the number of games played on that day (g_2 - g_1).

Besides the better log scores, I like to think the large omegas make the rating system more forgiving. It loosens the coupling between old games and the current rating which should allow anyone to quickly climb the ladders.

The balancer is based solely on raw WHR numbers and ignores all the fluff. The fluff is also designed in a way that the shown rating can go from anywhere to anywhere else within a month.

+1 / -0

Zenfur

4 years ago

Thanks for bumping, I've overlooked this thread when it was created.

+0 / -0

Brackman

4 years ago

Thanks for the answers. The formula for omega_t is better than I thought before the presentation. Still I'm wondering if omega_t should decay faster or differently in g.

Also, log score must be calculated from the probability estimations that the rating system made before knowing the outcome of the game. Otherwise, it would be no surprise that infinite gamma yields a normalized log score of one (assuming a per game recalculation instead of per day).

+0 / -0

DeinFreund

4 years ago

Yes of course it is calculated before knowing the outcome.

+0 / -0

Brackman

4 years ago

Fine, never mind then. :)

+0 / -0

DeinFreund

4 years ago

One thing I didn't mention is that I tried tuning the w^2 not just for maximum predictiveness, but also good modeling. So giving WHR all battles in advance and then trying to predict all battles with the precalculated rating history. A high w^2 is not always optimal there, depending on the dataset. This was mostly to verify that the w^2 not only leads to good predictiveness in future games, but also past games.

+0 / -0

LobHunter

4 years ago

This is amazing stuff!

+0 / -0

Forum index > General discussion >