ORDERPACK 2.0 -- Unconditional, Unique, and Partial Ranking, Sorting, and Permutation Downloadable Fortran 90 source code


Author: Michel Olagnon


Contents :

Introduction
Ranking versus sorting
Optimization choices
Programming style
Examples of use
A word of apology
Ranking
Unconditional ranking
Partial ranking
Unique ranking
Random permutation: an interesting use of ranking
Sorting
Full sorting
Partial sorting
Unique sorting
Download all at once


NEWS

Bugs were corrected as of fall 2010 in UNIRNK, UNIINV (The routine tried to access the 4th value when there are only 3) and in RNKPAR (The routine fails when the 3 first values are equal). Please download the corrected versions.
RAPKNR was added first of February 2011.

Similar bugs were corrected as of March 2011 in RNKPAR and RAPKNR (The routines may fail when ranking 3 values out of 4). Please download the corrected versions.

MRGREF was slightly modifed as of January 2012 to make the sort stable.

MEDIAN was added 22nd November 2012, to deal with the case of an even number of samples.


Introduction

The existing fortran code base provides many conventional ranking or sorting routines, but very few specialized ranking or sorting routines. Specifically, we know of no other fortran code which sorts or ranks only a small proportion of an array (partial ordering). Such partial ranking routines have applications in statistics for rapidly computing extreme order statistics, finding nearest neighbors, and other clustering operations. In addition, many applications need to work with only the unique values in an array (unique ordering). Such unique ranking routines allow users to isolate individual cases out of a mass of discrete data. Many times the frequency of the unique values proves interesting (e.g., empirical distributions). ORDERPACK handles all of these ordering needs. Also, ORDERPACK contains a partial unique ranking routine. Such a routine would prove useful in finding a limited number of unique values in an array. Inversion of orderings becomes difficult when duplicates exist (not a one-to-one relation). The ORDERPACK inverse ranking routine handles this difficult case. As an added bonus ORDERPACK provides an unusual routine which allows user controllable partial random permutation of arrays. ORDERPACK contains conventional or unconditional sorting routines as well.

Finally, many fortran sorting or ranking routines do not take advantage of available memory and cache to maximize performance. The routines in ORDERPACK have been designed to take advantage of modern machines.

To show the potential speed gains, we conducted an experiment involving 100,000 trials of simulating a random vector of length 500 with duplicates and ranking the 9 smallest unique elements (duplicates discarded). On a 600 Mhz PIII machine using the CVF 6.1a compiler it took under 2.7 seconds for the unique partial ranking. In fact, the time was dominated by the simultation of the vector, looping, and other overhead.

A similar experiment involved 100 trials of simulating a random vector of length 1,000,000 and ranking the 20 smallest elements (keeping duplicates). On a 460 Mhz AlphaStation with Compaq Fortran 90 V5.2, taking care to increase stacksize, partial ranking by itself took 2.3 seconds, i.e. 23 milliseconds per vector. In that case, the total overhead for random vector simulation was nearly 1 minute.

Users can freely download ORDERPACK 2.0 from this site.

As time goes by, we hope to extend ORDERPACK, and welcome your suggestions to this aim.

Ranking versus sorting

Ranking consists in finding, for each element of a set, its rank in the sorted set, without effectively changing the initial order (or disorder !) of the set. In many instances, it suits better the actual need of the user, who can then use the index array to order other related sets or to select some elements, than a sorting program would.

Ranking is especially needed when the sizes of the elements are large, and that moving them around is resource-consuming.

Optimization choices

We tried to take into account the recent trends in computing to make our compromise choices. Of course, no two problems are the same, and for some of them, the following decisions may happen to be wrong. We just hope that for most cases, they will be right.

Programming style

Programming style is personal, and difficult to modify when one has been programming for several decades. Perhaps the following should have been under the ``word of apology'' item: my programming style does not stick tightly to commonly established rules.

If you want to modify my programs, it might be useful to know:

Examples of use

In order to make use of Fortran 90 argument passing improvements, it is necessary to make the routine interface known to the calling program. There are three main ways to implement it:


A word of apology

When one looks at the description of a sorting algorithm, the process seems pretty simple, and can usually hold in 10 to 20 lines of pseudo-code. But if one wants an optimized program, one takes this simple implementation, and looks for redundant operations, investigates runs with sample data sets with a profiling tool, and is led to duplicate code with slight modifications rather than use tests in inner loops, to process differently the first and the last iterations, or to take into account some special cases that are only special in that they can be done faster.

In the end, the number of lines of source code may be multiplied tenfold, and the readability decreased in a similar proportion. Unfortunately, this is the price to pay for speed of execution. It was that way when I started programming more than 20 years ago, and I have forsaken any hope that it might become otherwise before I return to dust. So please accept my apologies that this code is often complex and difficult to read.


Ranking

In some instances, one is not actually interested in modifying the order of the elements in a set, but only in knowing how to access them in increasing -- or decreasing -- order. Ranking, as it is called, provides the index array I(:) such as the set S(I(:)) is ordered. One of the advantages of carrying out ranking rather than sorting is that the index array can be computed without the performance penalty of moving the elements around when they are of large sizes. A similar point is that the index array can be used to index other data.


Sorting


Download all at once


Last updated: 2011/02/01

Back to top


Michel Olagnon IFREMER Brest / [email protected]