Loading...
  OR  Zero-K Name:    Password:   

Post edit history

Evaluating rating systems

To display differences between versions, select one or more edits in the list using checkboxes and click "diff selected"
Post edit history
Date Editor Before After
7/17/2022 7:27:40 PMDErankBrackman before revert after revert
7/17/2022 7:19:46 PMDErankBrackman before revert after revert
Before After
1 The relation to the Kelly criterion is very interesting! 1 The relation to the Kelly criterion is very interesting!
2 \n 2 \n
3 [q](adj) score = success_rate - 0.5 - mean([math.pow(success_rate - x, 2) for x in predictions])[/q]Measuring the deviation between success rate and predicted probability doesn't make sense because you are not looking at a series of equal games. The (actual and predicted) probabilities can vary between different games and they are not very directly related to the total succes rate. A system that gets this variation correctly is punished by this term in adj score. For example, Trueskill only has a higher adj score than constant p=0.5402115158636898 because Trueskill has a higher success rate. The last adj score term actually punishes Trueskill arbitrarily. About the success rate, I want to quote @GoogleFrog:[q]It doesn't make sense to then take these individual events and aggregate the number of times the predictor "got it wrong" without taking into account the probabilities assigned.[/q] 3 [q](adj) score = success_rate - 0.5 - mean([math.pow(success_rate - x, 2) for x in predictions])[/q]Measuring the deviation between success rate and predicted probability doesn't make sense because you are not looking at a series of equal games. The (actual and predicted) probabilities can vary between different games and they are not very directly related to the total succes rate. A system that gets this variation correctly is punished by this term in adj score. For example, Trueskill only has a higher adj score than constant p=0.5402115158636898 because Trueskill has a higher success rate. The last adj score term actually punishes Trueskill arbitrarily. About the success rate, I want to quote @GoogleFrog:[q]It doesn't make sense to then take these individual events and aggregate the number of times the predictor "got it wrong" without taking into account the probabilities assigned.[/q]
4 All you really have is a vector of predicted probabilties and a vector of actual results and somehow you have to calculate their deviation. If you don't believe in the logarithmic relation between information and probability, there are other valid scoring rules. But I think it is worth to think about the presented arguments for nonlinearity.