Nonathlon Scoring Equations?

teampbandj · Post by **teampbandj** » April 30th, 2007, 12:09 pm

I just found the nonathlon site (www.nonathlon.com), and I'm interested in signing up, but I'm not sure I buy the math.

As I'm understanding it, nonathlon uses a polynomial trend analysis, forcing a 4th order solution, with a dataset that includes only the best times for each event, at each age for which data exists, breaking it down by event, heavy and lightweights, and gender. So, each equations is based on about 70 or so data points (ages 15 to 85, say) that represent the best performances in that event.

Is that correct?

If it is, it seems inherently weird to use only outlier data in a statistical analysis of any kind. I've never heard of doing such a thing--if there are sources you can point me to, I'd be happy to look into it.

Wouldn't it make more sense to collect ALL available data and run a multiple regression on age, gender, and weight class for each event, then use this to predict a person's score for a given event, and assign points according to whether they are above or below that score?

It would serve the same purpose--rewarding strong performance for a particular person, rather than absolute numbers, but wouldn't rely entirely on data that is at the tip of the distribution tail, and, by very nature, is likely to be an unreliable measure of performance in the group represented by that distribution.

Sorry to geek out on this, but it seems a geek sort of activity anyway, so I figured I'd ask.

Nosmo · Post by **Nosmo** » April 30th, 2007, 1:45 pm

Yes it does seem not to be the best. The person who did this does not seem to be a math wizard ("for complex reasons I don't understand, as the order increases we can get some odd curves that would spoil the results.") However his method is not bad. The important thing is to have something reasonable and consistent and present a challenge for people to enjoy. What ever he does it will be a bit arbitrary.

He could use the median times but I'm not sure that would be more representative of the relative difficulty of getting a high score (although the statistics would be better).

Off the top of my head it seems like fourth order curve is a bit high--it could have three local extrema. FIY: The masters (i.e. over 27) rowing association uses a quadric handicap formula: 0.025 * (age-27)^2 seconds / 1000m for a single. This however seems to benifit the older people a little. Perhaps a logarithmic or some other function would be better then a polynomial.

I think it would be better to do a two dimensional surface of age and distance. That way all the distances and ages would be smoothed.

Some distances are more competitive then others. Coming close to a record in the 2000m or 6000m is a lot more impressive then coming close to a record in the marathon.

Bob S. · Post by **Bob S.** » April 30th, 2007, 2:20 pm

For some events/categories there are so few entries I wonder how he can come up with any scores. Since I did a marathon yesterday that gave a good boost to my nonathlon score, I got curious and checked out all the rankings available, i.e. 2002 to 2007. I found only three entries for all over 80s, either weight, either gender. There was one M HWT in 2002, my own M LWT entry in 2006, and then my new entry as a M HWT in 2007.

There may be nonathlon participants who do not list their times and distances in the C2 rankings, but they are probably too few to have much effect on the results.

Bob S.

johnlvs2run · Post by **johnlvs2run** » April 30th, 2007, 4:35 pm

teampbandj wrote:Wouldn't it make more sense to collect ALL available data and run a multiple regression on age, gender, and weight class for each event, then use this to predict a person's score for a given event, and assign points according to whether they are above or below that score?

I'm interested to see what you think of the PERathlon tables and scores.

The PERathlon thread begins here.
http://www.c2forum.com/viewtopic.php?t= ... sc&start=0

Figure your personal PERathlon scores here.
http://www.c2forum.com/viewtopic.php?p=298#298

I need to update the tables to reflect new WR's for this year, and the 30/60 formulas need more refinement. Even so, scores will still be quite accurate. At some point it would be nice to set up the PERathlon on a server, so people can enter times and have their scores calculated automatically.

Andy Nield · Post by **Andy Nield** » May 1st, 2007, 5:56 am

I'm not sure exactly what data or formulas Paul uses, I just plug the results into the database, but as far as I know it does have a degree of curve smoothing and doesn't just rely on the top scores from the rankings.

Obviously some categories have more data and more reliable data than others, and even when it comes to World records some are softer than others...

There will always be some one or some group who has an advantage in anything.

PaulH · Post by **PaulH** » May 1st, 2007, 7:09 am

teampbandj - I'm unsure why the fastest times are unreliable - they are, so far as is publicly known, the fastest times that a person of that age/weight/gender has ever done. And assuming that a reasonable number of people in each such category have made an attempt at each event, it's a good indicator of the fastest time possible. Therefore a line drawn through these best times is a consistent indicator of a level of performance.

In contrast a collection of times logged for each event would include times from people who didn't 'try' - for example, I've only ever tried in one marathon (3h27, not much I know), but I've done 2-3 others (typically 3h45). So particularly for the two events that are challenges for many people to complete at all, results would be skewed to make a 'decent' standard easier than perhaps it should be.

Having said all that, the bigger objection to using all the data is simple - C2 (very reasonably) don't release it in any bulk form, so collecting it would involve a screen-scraping application that I lack the knowledge to create.

Nosmo - You're right, fourth order is high, that's why I switched to 2nd order a couple of years ago. The particular motivation was that for some of the least contested events higher orders would give some very strange results - in theory I don't think higher orders would be bad, as performance doesn't improve and degrade perfectly linearly.

Bob S - you're right about the over 80s being an issue. Only ages that have a result for all 10 events are included in the curves, so they tend to end around the mid-70s and the curve is projected forward for older rowers. That's unfair, as I imagine performance declines faster from 70-80 than 60-70, but on the other hand the older scores are generally easier at the moment because of fewer competitors to raise the standard, so I figure it evens out.

All - It's true I'm not a stats wiz, and neither are most people, so I'm trying for a method that's relatively easy to understand for all. "Take all the best times from the rankings, and draw a line through them" serves that purpose well, though I could make a much better job of explaining it on the site if real life didn't insist on so much of my time

Thanks for all the comments!

Cheers, Paul

TomR · Post by **TomR** » May 1st, 2007, 9:23 am

Paul--

Thanx for creating the Nonathlon. It's instructive to be able to compare timess across age, gender, and weight classes. I just wish you'd make my scores higher.

Couch Potato · Post by **Couch Potato** » May 1st, 2007, 1:47 pm

Paul,

Thanks for the Nonathlon. I find it motivational - I don't worry where the numbers come from, I pick a target and try to get there - I was 153rd (not quick I know), I aimed to get 150th and did. I suppose it is half the reason for completing a marathon, the other half being the C2 challenge.

It is fun!

Thanks again

PaulH · Post by **PaulH** » May 1st, 2007, 2:28 pm

Tom and CP, thanks for the compliments (which I readily share with Cran, who turned the idea into something usable). And don't worry, I'm happy to hear critiques as well as compliments!

Bob S. · Post by **Bob S.** » May 1st, 2007, 3:42 pm

PaulH wrote:Tom and CP, thanks for the compliments (which I readily share with Cran, who turned the idea into something usable). And don't worry, I'm happy to hear critiques as well as compliments!

Paul,

I'll add my own compliments to Tom's and CP's. The system may have flaws and weaknesses, but I don't think that it is possible to design anything near perfect. In any case, it is a fine motivational tool. I doubt very much that I would have done a marathon last Sunday if I had not had the added incentive of Nonathlon score improvement. I may not have felt so complimentary that day, but once I completed it, got refueled, rehydrated, showered and rested, I was glad that I had done it. The same can be said for going after improvements in the other events.

There does seem to be a bias in favor of seniors, but I'll take any advantage I can get.

Bob S.

Nosmo · Post by **Nosmo** » May 1st, 2007, 3:44 pm

PaulH,
Some of the curves on the nanathlon web site are clearly not second order. Also the zipped file is from 2002. Is it what you are currently using?
One of these days I may play with the data and see about using other functions and doing a 2D fit (i.e. fit distance and age), just to see what it looks like.

PaulH · Post by **PaulH** » May 1st, 2007, 4:09 pm

Nosmo - you're right, the stuff on the website is way out of date. Unfortunately the data used for this year got wiped (more fool me for polluting a Mac with Excel), so I'm currently rebuilding. Email me in a couple of weeks and I'll send you a copy of the revised data

gregory.cook · Post by **gregory.cook** » May 1st, 2007, 9:58 pm

Nosmo wrote:Yes it does seem not to be the best.

I spent many hours trying to come up with something better, however it is not clear what something better would look like. I think that there is something unappealing about using a polynomial because it does seem clear that a polynomial is not the "right" function. On the other hand, the "right" function would have to be pretty strange since the performance of the 30, 40 and 50 year olds is probably better than that of the 29, 39, and 49 year olds because people know that they can be more competitive in the younger range of an age category and therefore train harder.

I came up with various ideas, most of which were more complex but not objectively better than what Paul has done. In the end I gave up, tipped my hat to Paul and said, "well done."

PaulH · Post by **PaulH** » May 2nd, 2007, 1:22 am

Greg - I agree completely, and in fact have had a similar discussion with John Rupp about this. His PERathlon, as one example, may well be closer to an accurate standard than the Nonathlon, but as we don't know what that standard really is, we just can't tell. And of course if we did know what that standard was, we could just use that standard

Cheers, Paul

Tom Barrick · Post by **Tom Barrick** » May 2nd, 2007, 3:40 am

I've already made peace with this. As a 33 year-old with sub-par times, I'm going to do my best to maintain bone and muscle mass well into my mid-60s, assuming I make it that far, and *then* compete in the Nonathlon. If I do poorly at the time, I will be a bit sour.