Are Rank-Based Statistics Bad For Incentives?

This post is part of a collection of posts responding to various aspects of the following video:

Quotation (Dustin Fife)
And I also think the costs are greater than the reward. So the benefit or the reward is that it is easier for the statistician but the cost is that you might have a misleading conclusion.

I think that rank-based statistics can be misleading in ways similar to other statistics. Perhaps the greatest demonstration of this is the wide-spread misunderstanding of what these rank-based statistics quantify which in turn leads to dubious conclusions about the data.

Quotation (Dustin Fife)
And if it is a misleading conclusion that gets you like super excited, like “wow I discovered something amazing”, and the actual model isn’t so optimistic, you may not ever get to the point of following up with a “proper” analysis because, why would you want to? The original model was good for your career.

This problem highlighted by Dustin pertains to works not being replicated, including by the original author due to incentives. It isn’t really a point about rank-based statistics, or even statistics per se. Either by mistake or deception a result may be incorrect. This could be due to choices in the statistical analysis, or it could be from laboratory procedures, or method of visualization or reporting, etc.

Suppose a treatment is accidentally created by using a different pipetting method than the control, and this goes unnoticed, then the difference between the treatment and the control could be biased. Mí͇͔̠ś̷͎̹̲̻̻̘̝t̞̖͍͚̤k̥̞à̸͕̮͍͉̹̰͚̰ẹ̶̢̪s͏̨͈̙̹̜͚̲ ̛̬͓͟ like this can happen. If the result is important to a growing field, it should be replicated early in the development of that area of research to help ensure that enormous effort is not wasted on the assumption that the result is robust. It isn’t a statistics problem in this example, and yet the structure of this problem is not importantly different.

Sometimes statistical methodology issues are easier to correct than data collection problems. Once the data collection is done it may not be feasible to obtain that data. But in some cases good data is analyzed with poor statistical methods, and when that data is available it can be re-analyzed. That being said, data is not always available and is not always good enough even for secondary analysis.

As an aside about incentives, I enjoyed Richard McElreath’s talk Science Is Like A Chicken Coop, which you can watch here:

I also think that this last lecture of Richard McElreath’s Statistical Rethinking is worth a watch on some practices to help with science reform.

Incentives are important, but I don’t think abolishing rank-based nonparametric statistics is really getting at a root problem. Rank-based nonparametric statistics can be planned and executed in a non-janky and principled way with transparent methods that allow for further replication.

Are Rank-Based Statistics Bad For Incentives?

Further Reading

Wilcoxon's Heuristic

Just Plugging In Ranks

Rank-Based Statistics For Convergence Issues