Model 14
There are two parts to frnt3: the website and the Elo/Prediction model. After building out the test app and setting up a maintable backend my last goal before the world cup season started was to strengthen up my data model.
What does that mean?
Stats are stats, as long as I find the right values and bring them together then it is just a matter of presentation. Athlete Elo, however, is a derived value. It needs to serve a purpose. For frnt3 it has 2:
- Athlete Elo is a reference point for overall athlete performance overtime, comparable with other athletes.
- A key metric to feed into future event predictions.
Point 1, as long as I am consistent with my Elo definition; present a decent range of Elo spread; a tune the gains a loses well; it is a relatively easy task.
frnt3 Elo: is a measure of an athletes skill at getting the zone/top of a problem of a given Elo difficulty. e.g. Athlete Elo 1500 : Problem Zone Elo 1500 means that the athlete has a 50% chance of getting that zone on getting the zone on a given go.
"Relative", because predictions are really hard...
Model 3, subjectively, solved point 1 quiet well (for senior athletes, anyway). It did not make a strong attempt at predictions, just ranking athletes in decending Elo order. Backtesting this method showed a 0.25 ρ Spearman's rank correlation (no correlation), although it atleast showed signs of being able to predict the top 25% of the field.
Initially testing, reducing the Elo gain from competitions showed some promise at reducing the error rate. Elo was tighter and Spearman's went up to about 0.65 ρ, but platued quickly. I was also unhappy, subjectively that point 1 had become compromised by the tighter Elo, it was harder to differentiate the top climbers.
Maybe I should add point 1.5: "frnt3 Elo should be opinionated, I would rather it tried to swing for the fences than get lost in the muddy noise of reality".
And I think breaking point 1.5 is why the predictions improved. Turning down the noise, by compressing the pack, until any errors would only ever by minor, because you weren't saying much in the first place.
Instead, I eventually went back to an Elo model with a much wider spread. Fine tuned the wins and losses; problem difficulty seeding; athlete initial Elo; off-season rebalancing until I got values I was happy told the story. Although this was a painful process as each time I needed to rebuild the model took 12 hours to process all the competitions. This is, for now, a fixed metric.
Fighting with this metric also took a while.
Backtests were stuck around 0.3 - 0.5 ρ and appeared to be going nowhere. I was trying to find metrics, I had available, that actually effected athlete performance. Some metrics, like if the athlete was "at home", produced minor improvements, but not anything I would actually care to publish and still well below the desired 0.7 ρ.
The key to Model 14 was first factoring in athlete's abilities in, obviously before the competition we don't know what abilities will show up, however, using an average of all their abilities at least create outliers with big weaknesses. Most athlete's have a pretty consistent coverage across all abilities, so are more likely to be consistent across competitions, outliers are "punished" more. In the future I could fine tune this by taking into account the style of setting at certain venues and weighting these abilities more. After this I compound the consistency factor more by mutliplying the final result by their recent consistency value + overperforming value. This is a heavy factor in the prediction and can really shuffle up the pack, I was worried that it would be too much. I was ready to go back on this dicission until I go the backtest results... 0.9 ρ... that is a ridiculous correlation.
I don't quiet believe that it can be this good, and it may just have been a case of overfitting. For now I am going to roll with it and see how the predicitions pan out for the first few competitions of the season. It should at least be better (or more interesting at least) than before anyway.
-vm