Loading...
  OR  Zero-K Name:    Password:   

Post edit history

Evaluating rating systems

To display differences between versions, select one or more edits in the list using checkboxes and click "diff selected"
Post edit history
Date Editor Before After
7/17/2022 5:57:40 AMDErankBrackman before revert after revert
7/17/2022 5:15:22 AMDErankBrackman before revert after revert
Before After
1 [q]also I'm not sure if I'm supposed to get the mean either - but I feel like it should be the mean[/q]Yes, it's best to use the mean. It's also not wrong to use the sum instead or to use ln instead of 1+log_2. Any affine transformation with non-zero slope works. But only if you use 1+log_2, you have the nice property that always guessing 50% yields score 0. And only if you use the mean, guessing always 100% right yields score 1. 1 [q]also I'm not sure if I'm supposed to get the mean either - but I feel like it should be the mean[/q]Yes, it's best to use the mean. It's also not wrong to use the sum instead or to use ln instead of 1+log_2. Any affine transformation with non-zero slope works. But only if you use 1+log_2, you have the nice property that always guessing 50% yields score 0. And only if you use the mean, guessing always 100% right yields score 1.
2 \n 2 \n
3 Here's another way to see that guessing probabilities far away from 50% must be punished harder: Unexpected events contain a much higher amount of information. For example, if you win against @Godde, that's more interesting than if you win against an equal player. If you choose a complicated password, it becomes exponentially more unexpected. { { { probability = exp( - information) 3 Here's another way to see that guessing probabilities far away from 50% must be punished harder ( using information theory, like @GoogleFrog) : Unexpected events contain a much higher amount of information. For example, if you win against @Godde, that's more interesting than if you win against an equal player. If you choose a complicated password, it becomes exponentially more unexpected. { { { probability = exp( - information)
4 information = - log(probability)}}}The log scoring rule is a direct consequence from the [url=https://en.wikipedia.org/wiki/Entropy_(information_theory)]entropy of information[/url]. If something very unexpected happens, this means that a lot of information in your prediction system might be wrong. If something 100% unexpected happens, this means that all information in your prediction system is wrong. This is the logical principle [url=https://en.wikipedia.org/wiki/Principle_of_explosion]ex falso quodlibet[/url]: If you believe only one wrong statement, you can use that to prove logically that every statement is true and false at the same time. Hence this yields score minus infinity. So choose your religion carefully. 4 information = - log(probability)}}}The log scoring rule is a direct consequence from the [url=https://en.wikipedia.org/wiki/Entropy_(information_theory)]entropy of information[/url]. If something very unexpected happens, this means that a lot of information in your prediction system might be wrong. If something 100% unexpected happens, this means that all information in your prediction system is wrong. This is the logical principle [url=https://en.wikipedia.org/wiki/Principle_of_explosion]ex falso quodlibet[/url]: If you believe only one wrong statement, you can use that to prove logically that every statement is true and false at the same time. Hence this yields score minus infinity. So choose your religion carefully.
5 \n 5 \n
6 [q]I'll do a new post about my experiments in the future. Don't worry, I'm not going to argue it's better than WHR - I'm quite sure it isn't - just looking if I can see any patterns in the data :) [/q]Feel free to try. It would be a great achievement to find a better system than WHR. Like @Aquanim, I'd rather expect small improvements from details about the WHR implementation. Finding a better system that is very different from the established systems like Glicko or WHR is very unlikely imo - but I'm not saying 0% because this might give me a very bad score ;). 6 [q]I'll do a new post about my experiments in the future. Don't worry, I'm not going to argue it's better than WHR - I'm quite sure it isn't - just looking if I can see any patterns in the data :) [/q]Feel free to try. It would be a great achievement to find a better system than WHR. Like @Aquanim, I'd rather expect small improvements from details about the WHR implementation. Finding a better system that is very different from the established systems like Glicko or WHR is very unlikely imo - but I'm not saying 0% because this might give me a very bad score ;).