As I'm understanding it, nonathlon uses a polynomial trend analysis, forcing a 4th order solution, with a dataset that includes only the best times for each event, at each age for which data exists, breaking it down by event, heavy and lightweights, and gender. So, each equations is based on about 70 or so data points (ages 15 to 85, say) that represent the best performances in that event.
Is that correct?
If it is, it seems inherently weird to use only outlier data in a statistical analysis of any kind. I've never heard of doing such a thing--if there are sources you can point me to, I'd be happy to look into it.
Wouldn't it make more sense to collect ALL available data and run a multiple regression on age, gender, and weight class for each event, then use this to predict a person's score for a given event, and assign points according to whether they are above or below that score?
It would serve the same purpose--rewarding strong performance for a particular person, rather than absolute numbers, but wouldn't rely entirely on data that is at the tip of the distribution tail, and, by very nature, is likely to be an unreliable measure of performance in the group represented by that distribution.
Sorry to geek out on this, but it seems a geek sort of activity anyway, so I figured I'd ask.
