3 |
[q](adj) score = success_rate - 0.5 - mean([math.pow(success_rate - x, 2) for x in predictions])[/q]Measuring the deviation between success rate and predicted probability doesn't make sense because you are not looking at a series of equal games. The (actual and predicted) probabilities can vary between different games and they are not very directly related to the total succes rate. A system that gets this variation correctly is punished by this term in adj score. For example, Trueskill only has a higher adj score than constant p=0.5402115158636898 because Trueskill has a higher success rate. The last adj score term actually punishes Trueskill arbitrarily. About the success rate, I want to quote @GoogleFrog:[q]It doesn't make sense to then take these individual events and aggregate the number of times the predictor "got it wrong" without taking into account the probabilities assigned.[/q]
|
3 |
[q](adj) score = success_rate - 0.5 - mean([math.pow(success_rate - x, 2) for x in predictions])[/q]Measuring the deviation between success rate and predicted probability doesn't make sense because you are not looking at a series of equal games. The (actual and predicted) probabilities can vary between different games and they are not very directly related to the total succes rate. A system that gets this variation correctly is punished by this term in adj score. For example, Trueskill only has a higher adj score than constant p=0.5402115158636898 because Trueskill has a higher success rate. The last adj score term actually punishes Trueskill arbitrarily. About the success rate, I want to quote @GoogleFrog:[q]It doesn't make sense to then take these individual events and aggregate the number of times the predictor "got it wrong" without taking into account the probabilities assigned.[/q]
|