Binary search faster than Sun's Arrays.binarySearch()
I have always been annoyed by the usual implementation of the binarySearch algorithm.
While optimizing a graph library of mine, I came to the point where the bottleneck of my application was the search of elements in sorted int vectors. I am using these vectors as a kind of sparse BitSet, which you’d expect to be cheap to lookup.
What should you expect from this article ? If your data is amenable for optimization, let’s bet on a 4x speedup !
Binary Search ?
The binary search algorithm consists in choosing a value in the list, called pivot, which is going to be compared against the key that is looked for. If the key is equal to the pivot, the algorithm finishes, returning the index of the pivot. If the key is smaller than the pivot, the algorithm recurses on the sublist on the left of the pivot. If it is bigger, it recurses on the sublist at the right of the pivot. The algorithm stops whenever the current sublist is empty.
Adaptive pivot
The problematic part is the choice of this pivot value. A usual strategy is to take the value at the middle of the list (or sublist). This guarantees that if the list is of size N, there will be no more than $latex ceil(log(N) / log(2))$ steps before finding the key (or failing to find it). Nice, but… assuming the values are either “uniformly randomly” distributed between the minimal and maximal values or that they are completely sequential (or any case somewhere between the two), there may be a smarter way to choose a pivot, using a simple proportionality rule. This means that given a (sub)list, we need to access the first and last values and compute a pivot like :
pivotIndex = firstIndex + (key  firstValue) * (lastIndex  firstIndex) / (lastValue  firstValue)
For the pivot index to fall between the first and last indices, we need to ensure that key >= firstValue and key <= lastValue.
This lets us fail prematurely whenever the key is outside the range of the current (sub)list, which also reduces the number of steps needed to return (the usual implementation returns only when min == max, requiring exactly $latex log(n)/log(2)$ steps when the key is not in the list).
The code
/**
* Searches a sorted int array for a specified value,
* using an optimized binary search algorithm (which tries to guess
* smart pivots).
* The result is unspecified if the array is not sorted.
* The method returns an index where key was found in the array.
* If the array contains duplicates, this might not be the first occurrence.
* @see java.util.Arrays.sort(int[])
* @see java.util.Arrays.binarySearch(int[])
* @param array sorted array of integers
* @param key value to search for in the array
* @param offset index of the first valid value in the array
* @param length number of valid values in the array
* @return index of an occurrence of key in array,
* or (insertionIndex + 1) if key is not contained in array (<i>insertionIndex</i> is then the index at which key could be inserted).
*/
public static final int binarySearch(int[] array, int key, int offset, int length) {//min, int max) {
if (length == 0) {
return 1  offset;
}
int min = offset, max = offset + length  1;
int minVal = array[min], maxVal = array[max];
int nPreviousSteps = 0;
// Uncomment these two lines to get statistics about the average number of steps in the test report :
//totalCalls++;
for (;;) {
//totalSteps++;
// be careful not to compute key  minVal, for there might be an integer overflow.
if (key < = minVal) return key == minVal ? min : 1  min;
if (key >= maxVal) return key == maxVal ? max : 2  max;
assert min != max;
int pivot;
// A typical binarySearch algorithm uses pivot = (min + max) / 2.
// The pivot we use here tries to be smarter and to choose a pivot close to the expectable location of the key.
// This reduces dramatically the number of steps needed to get to the key.
// However, it does not work well with a logaritmic distribution of values, for instance.
// When the key is not found quickly the smart way, we switch to the standard pivot.
if (nPreviousSteps > 2) {
pivot = (min + max) >> 1;
// stop increasing nPreviousSteps from now on
} else {
// NOTE: We cannot do the following operations in int precision, because there might be overflows.
// long operations are slower than float operations with the hardware this was tested on (intel core duo 2, JVM 1.6.0).
// Overall, using float proved to be the safest and fastest approach.
pivot = min + (int)((key  (float)minVal) / (maxVal  (float)minVal) * (max  min));
nPreviousSteps++;
}
int pivotVal = array[pivot];
// NOTE: do not store key  pivotVal because of overflows
if (key > pivotVal) {
min = pivot + 1;
max ;
} else if (key == pivotVal) {
return pivot;
} else {
min++;
max = pivot  1;
}
maxVal = array[max];
minVal = array[min];
}
}
Benchmarking…
Benchmarking is no easy task. It is very easy to pick a set of arrays for which this algorithm is faster or slower than others. I tried to analyze the weaknesses and stengths of my approach and produced tests for many cases, doing my best to let them be unbiased. Feel free to submit your own tests / results.
Here are the three types of distributions of int arrays that I tested :

random array :
for (int i = size; i != 0;) array[i] = random.nextInteger(); Arrays.sort(array);
This gives truly random values, expected to be evenly distributed accross all possible integer values. There might be duplicates in the resulting array.

sequential array :
for (int i = size; i != 0;) array[i] = i;
This gives a dense array [0, 1, 2, 3, … n].

emptied sequence :
for (int i = 0; i < size; i++) list.add(i); for (int i = nElementsToRemove; i != 0;) { list.remove((int)(random.nextDouble() * (list.size()  1))); }
This gives a sparse sequence with variable density : if nElementsToRemove == 0, the sequence is completely dense.
We define a load factor by
loadFactor = (size  nElementsToRemove)/ (float)size;
For instance with an initial array with sequential values from 0 to 29 and a loadFactor == 1 / 3f :
[1, 6, 9, 12, 13, 15, 17, 21, 25, 29]

logarithmic sequence :
for (int i = size; i != 0;) array[i] = (int)Math.log(i);
This is meant to defeat my pivot choice strategy : choosing a pivot with the proportionality rule is never a good choice with this array.
Results
This implementation of binarySearch is a lot faster than Sun’s default in all cases but the logarithmic array. In fact in this worstknown case Sun’s implementation initially proved to be 5 to 8 times faster than mine. After analyzing how many steps are usually necessary to find the key when the data is “friendly”, I added a switch that makes the method use the regular pivot (= (min + max) / 2) after three unsuccessful attempts with the proportional one. Even with this change Sun’s implementation is faster, but not by much (something like 20% faster).
Here are the performance results as output by the test program (run with Sun’s Mac OS X Java 1.6.0dpb8834 on a MacBook with Core Duo 2 @ 2.16 GHz, with the server JVM) :
Size of data arrays = 100000
# Random elements, search of existing elements
zOlive 1.6 x faster
# Random elements, search of random elements
zOlive 1.6 x faster
# Sequential elements, search of existing elements
zOlive 8.8 x faster
# Sequential elements, search of random elements
zOlive 8.9 x faster
# Sequential duplicated elements, search of existing elements
zOlive 4.7 x faster
# Sequential duplicated elements, search of random elements
zOlive 9.7 x faster
# Logaritmic elements, search of existing elements
Java 1.1 x faster
# Logaritmic elements, search of random elements
Java 1.2 x faster
# Sparse sequential elements (loadFactor = 0.1), search of existing elements
zOlive 1.6 x faster
# Sparse sequential elements (loadFactor = 0.1), sequential keys
zOlive 4.4 x faster
# Sparse sequential elements (loadFactor = 0.3), search of existing elements
zOlive 1.7 x faster
# Sparse sequential elements (loadFactor = 0.3), sequential keys
zOlive 4.5 x faster
# Sparse sequential elements (loadFactor = 0.5), search of existing elements
zOlive 1.9 x faster
# Sparse sequential elements (loadFactor = 0.5), sequential keys
zOlive 4.8 x faster
# Sparse sequential elements (loadFactor = 0.75), search of existing elements
zOlive 2.2 x faster
# Sparse sequential elements (loadFactor = 0.75), sequential keys
zOlive 5.5 x faster
# Sparse sequential elements (loadFactor = 0.9), search of existing elements
zOlive 2.9 x faster
# Sparse sequential elements (loadFactor = 0.9), sequential keys
zOlive 6.4 x faster
I packaged the test in a JAR which also contains the source (com/ochafik/util/BinarySearchUtils.java).
I wrote simple validation tests to ensure the code behaves correctly before running the benchmarks. This is what helped me catch some integer overflow bugs.
I also took care to warm up the JVM properly before actually running the test.
To run the tests, download this jar and type the following command line in the directory where the jar is located :
java server Xmx100m jar binarysearch.jar
All comments are welcome…
Edit [August 20th 2007] : One can always dream… I filed an RFE to Sun, just in case they would be interested by this seemingly faster algorithm…
Edit [Feb. 16th 2009] : I updated the JAR with a new category of test elements : missing values. This makes the performance gain of my routine to skyrocket, it reaches a 10x factor over Sun’s in most tests.