1 |
Very good!
|
1 |
Very good!
|
2 |
\n
|
2 |
\n
|
3 |
[q]The delta mu from mean peak is so tightly centered around the exact mean that it can't be a coincidence.[/q]I know what delta_mu is but what is the mean peak and what is the exact mean?
|
3 |
[q]The delta mu from mean peak is so tightly centered around the exact mean that it can't be a coincidence.[/q]I know what delta_mu is but what is the mean peak and what is the exact mean?
|
4 |
\n
|
4 |
\n
|
5 |
[q]None
of
these
changes
affect
the
success
rate
of
the
function.
[/q]This
is
obvious
from
the
underlying
math.
Multiplying
delta_mu
by
a
factor
D
is
equivalent
to
dividing
denom
by
D
which
has
the
same
effect
as
applying
a
D
mod
of
D.
D
mod
means
modifying
the
probability
p
to
(
p^D)
/(
p^D+(
1-p)
^D)
.
All
this
does
is
bring
the
predictions
further
away
from
50%
for
D
>
1
and
closer
to
50%
for
D
<
1.
It
is
just
the
correct
way
of
doing
it
in
contrast
to
the
0.
5
fudge
which
can
never
produce
chances
<=
25%
or
>=
75%.
[url=https://zero-k.
info/Forum/Post/250860#250860]By
fitting
the
D
mod
to
maximize
the
log
score,
it
should
be
possible
to
eliminate
data
poisoning[/url].
|
5 |
[q]None
of
these
changes
affect
the
success
rate
of
the
function.
[/q]This
is
obvious
from
the
underlying
math.
Multiplying
delta_mu
by
a
factor
D
is
equivalent
to
dividing
denom
by
D
which
has
the
same
effect
as
applying
a
D
mod
of
D.
[spoiler]D
mod
means
modifying
the
probability
p
to
(
p^D)
/(
p^D+(
1-p)
^D)
.
Same
effect
means
exact
equivalance
for
the
cases
of
elo
and
WHR
and
at
least
similar
for
TrueSkill.
But
for
TrueSkill
I
have
not
proven
the
equivalence.
[/spoiler]
All
this
does
is
bring
the
predictions
further
away
from
50%
for
D
>
1
and
closer
to
50%
for
D
<
1.
It
is
just
the
correct
way
of
doing
it
in
contrast
to
the
0.
5
fudge
which
can
never
produce
chances
<=
25%
or
>=
75%.
[url=https://zero-k.
info/Forum/Post/250860#250860]By
fitting
the
D
mod
to
maximize
the
log
score,
it
should
be
possible
to
eliminate
data
poisoning[/url].
|
6 |
\n
|
6 |
\n
|
7 |
Here I define "base" as using D = 1. [spoiler]By calculating team rating sums instead of averages and then having a denom proportional to sqrt(size), it effectively assumes that big team game outcomes should be more distinct proportional to sqrt(size) which is probably wrong. I'm alternating between == and = to avoid forum format breaking.[/spoiler]Calculating delta_mu from mean instead of sum uses D == 2/size which is indeed expected to perform better. [spoiler]By still having a denom proportional to sqrt(size), it effectively assumes that big team games become less distinct proportional to sqrt(size).[/spoiler]My suggestion uses D = 2/sqrt(size) which is a compromise of the two for size >= 4. [spoiler]It effectively uses team means and removes the sqrt(size) proportionality in denom and thereby assumes that big team outcomes do not become more or less distinct with size which is what traditional ZK elo does.[/spoiler]How can the compromise be worse than each of the extremes? Did you apply the 0.5 fudge on it but not on the others?
|
7 |
Here I define "base" as using D = 1. [spoiler]By calculating team rating sums instead of averages and then having a denom proportional to sqrt(size), it effectively assumes that big team game outcomes should be more distinct proportional to sqrt(size) which is probably wrong. I'm alternating between == and = to avoid forum format breaking.[/spoiler]Calculating delta_mu from mean instead of sum uses D == 2/size which is indeed expected to perform better. [spoiler]By still having a denom proportional to sqrt(size), it effectively assumes that big team games become less distinct proportional to sqrt(size).[/spoiler]My suggestion uses D = 2/sqrt(size) which is a compromise of the two for size >= 4. [spoiler]It effectively uses team means and removes the sqrt(size) proportionality in denom and thereby assumes that big team outcomes do not become more or less distinct with size which is what traditional ZK elo does.[/spoiler]How can the compromise be worse than each of the extremes? Did you apply the 0.5 fudge on it but not on the others?
|
8 |
\n
|
8 |
\n
|
9 |
[q]Example, 2v2-4v4 games, ranking from all games:[/q]From the number 0.0297, I guess that this was only with ranking data from the class of 2v2-4v4 battles. If it was from all games, you would get 0.0408, right?
|
9 |
[q]Example, 2v2-4v4 games, ranking from all games:[/q]From the number 0.0297, I guess that this was only with ranking data from the class of 2v2-4v4 battles. If it was from all games, you would get 0.0408, right?
|