Range query (data structures)

For finding items that fall within a range, see range query (database).

In data structures, a range query consists of preprocessing some input data into a data structure to efficiently answer any number of queries on any subset of the input. Particularly, there is a group of problems that have been extensively studied where the input is an array of unsorted numbers and a query consists of computing some function on a specific range of the array. In this article we describe some of these problems together with their solutions.

Problem statement

We may state the problem of range queries in the following way: a range query $q_f(A,i,j)$ on an array $A=[a_1,a_2,..,a_n]$ of n elements of some set $S$ , denoted $A[1,n]$ , takes two indices $1\leq i\leq j\leq n$ , a function $f$ defined over arrays of elements of $S$ and outputs $f(A[i,j])= f(a_i,\ldots,a_j)$ . This should be done space and time efficient.

Consider for instance $f = sum$ and $A[1,n]$ an array of numbers, the range query $sum(A,i,j)$ computes $sum(A[i,j]) = (a_i+\ldots + a_j)$ , for any $1 \leq i \leq j \leq n$ . These queries may be answered in constant time and using $O(n)$ extra space by calculating the sums of the first $i$ elements of $A$ and storing them into an auxiliary array $B$ , such that $B[i]$ contains the sum of the first $i$ elements of $A$ for every $0\leq i\leq n$ .Therefore any query might be answered by doing $sum(A[i,j]) = B[j] - B[i-1]$ .

This strategy may be extended for every group operator $f$ where the notion of $f^{-1}$ is well defined and easily computable.^[1] Finally notice this solution might be extended for arrays of dimension two with a similar preprocessing.^[2]

Examples

Semigroup operators

A Constructing the corresponding cartesian tree to solve a range minimum query.

Range minimum query reduced to the lowest common ancestor problem.

Main article: Range minimum query

When the function of interest in a range query is a semigroup operator the notion of $f^{-1}$ is not always defined, therefore we can not use an analogous strategy to the previous section. Yao showed^[3] that there exists an efficient solution for range queries that involve semigroup operators. He proved that for any constant $c$ , a preprocessing of time and space $\theta(c\cdot n)$ allows to answer range queries on lists where $f$ is a semigroup operator in $\theta(\alpha_c(n))$ time, where $\alpha_k$ is a certain functional inverse of the Ackermann function.

There are some semigroup operators that admit slightly better solutions. For instance when $f\in \{\max,\min\}$ . Assume $f = \min$ then $\min(A[1..n])$ returns the index of the minimum element of $A[1..n]$ . Then $\min(A, i,j)$ denotes the corresponding minimum range query. There are several data structures that allow to answer a range minimum query in $O(1)$ time using a preprocessing of time and space $O(n)$ . Probably the simplest solution to sketch here is based on the equivalence between this problem and the Lowest common ancestor problem. We briefly describe this solution.

The cartesian tree $T_A$ of an array $A[1,n]$ has as root $a_i = min\{a_1,a_2,\ldots,a_n\}$ and it has as left and right subtrees the cartesian tree of $A[1,i-1]$ and the cartesian tree of $A[i+1,n]$ respectively. It is easy to see that a range minimum query $min(A,i,j)$ is the lowest common ancestor in $T_A$ of $a_i$ and $a_j$ . Since the lowest common ancestor is solvable in constant time using a preprocessing of time and space $O(n)$ thus so does the range minimum query problem. The solution when f = max is analogous. Cartesian trees can be constructed in linear time.

Mode

Main article: Range mode query

The mode of an array A is the element that appears the most in A. For instance the mode of $A=[4,5,6,7,4,]$ is 4. In case of ties any of the most frequent elements might be picked as mode. A range mode query consists in preprocessing $A[1,n]$ such that we can find the mode in any range of $A[1,n]$ . Several data structures have been devised to solve this problem, we summarize some of the results in the following table.^[1]

Range Mode Queries
Space	Query Time	Restrictions
$O(n^{2-2\epsilon})$	$O(n^\epsilon \log n)$	$0\leq \epsilon\leq 1/2$
$O(n^2\log\log n/ \log n)$	$O(1)$

Recently Jørgensen et al. proved a lower bound on the cell probe model of $\Omega\left(\frac{\log n}{\log (S w/n)}\right)$ for any data structure that uses $S$ cells.^[4]

Median

This particular case is of special interest since finding the median has several applications, for further reference see.^[5] On the other hand, the median problem, a special case of the selection problem, is solvable in O(n), by the median of medians algorithm.^[6] However its generalization through range median queries is recent.^[7] A range median query $median(A,i,j)$ where A,i and j have the usual meanings returns the median element of $A[i,j]$ . Equivalently, $median(A,i,j)$ should return the element of $A[i,j]$ of rank $\frac{j-i}{2}$ . Note that range median queries can not be solved by following any of the previous methods discussed above including Yao's approach for semigroup operators.^[8]

There have been studied two variants of this problem, the offline version, where all the k queries of interest are given in a batch and we are interested in reduce the total cost and a version where all the preprocessing is done up front and we are interested in optimize the cost of any subsequent single query. Concerning the first variant of the problem recently was proven that can be solved in time $O(n\log k + k \log n)$ and space $O(n\log k)$ . We describe such a solution.^[7]

The following pseudo code shows how to find the element of rank $r$ in $A[i,j]$ an unsorted array of distinct elements, to find the range medians we set $r=\frac{j-i}{2}$ .

rangeMedian(A,i,j,r){

  if A.length() == 1 return A[1]

  if A.low is undefined then
    m = median(A)
    A.low  = [e in A | e <= m]
    A.high = [e in A | e > m ]

 calculate t  the number of elements of A[i,j] that belong to A.low

 if r <= t return rangeMedian(A.low, i,j,r)
  else return rangeMedian(A.high, i,j, r-t)
}

Procedure rangeMedian partitions A, using A's median, into two arrays A.low and A.high, where the former contains the elements of A that are less than or equal to the median m and the latter the rest of the elements of A. If we know that the number of elements of $A[i,j]$ that end up in A.low is t and this number is bigger than r then we should keep looking for the element of rank r in A.low else we should look for the element of rank $(r-t)$ in A.high. To find $t$ , it is enough to find the maximum index $m\leq i-1$ such that $a_m$ is in A.low and the maximum index $l\leq j$ such that $a_l$ is in A.high. Then $t=l-m$ . The total cost for any query, without considering the partitioning part, is $\log n$ since at most $\log n$ recursion calls are done and only a constant number of operations are performed in each of them (to get the value of $t$ fractional cascading should be used). If a linear algorithm to find the medians is used, the total cost of preprocessing for $k$ range median queries is $n\log k$ . Clearly this algorithm can be easily modified to solve the up front version of the problem.^[7]

References

1 2 Krizanc, Danny; Morin, Pat; Smid, Michiel H. M. (2003). "Range Mode and Range Median Queries on Lists and Trees". ISAAC: 517–526.
↑ Meng, He; Munro, J. Ian; Nicholson, Patrick K. (2011). "Dynamic Range Selection in Linear Space". ISAAC: 160–169.
↑ Yao, A. C (1982). "Space-Time Tradeoﬀ for Answering Range Queries". e 14th Annual ACM Symposium on the Theory of Computing: 128–136.
↑ Greve, M; J{\o}rgensen, A.; Larsen, K.; Truelsen, J. (2010). "Cell probe lower bounds and approximations for range mode". Automata, Languages and Programming: 605–616.
↑ Har-Peled, Sariel; Muthukrishnan, S. (2008). "Range Medians". ESA: 503–514.
↑ Blum, M.; Floyd, R. W.; Pratt, V. R.; Rivest, R. L.; Tarjan, R. E. (August 1973). "Time bounds for selection" (PDF). Journal of Computer and System Sciences 7 (4): 448–461. doi:10.1016/S0022-0000(73)80033-9.
1 2 3 Beat, Gfeller; Sanders, Peter (2009). "Towards Optimal Range Medians". ICALP (1): 475–486.
1 2 Bose, P; Kranakis, E.; Morin, P.; Tang, Y. (2005). "Approximate range mode and range median queries". In Proceedings of the 22nd Symposium on Theoretical Aspects of Computer Science (STACS 2005), volume 3404 of Lecture Notes in ComputerScience: 377–388.

External links

Tree data structures

Search trees (dynamic sets/associative arrays)	2–3 2–3–4 AA (a,b) AVL B B+ B* B^x (Optimal) Binary search Dancing HTree Interval Order statistic (Left-leaning) Red-black Scapegoat Splay T Treap UB Weight-balanced

Heaps	Binary Binomial Fibonacci Leftist Pairing Skew Van Emde Boas

Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast

Spatial data partitioning trees	BK BSP Cartesian Hilbert R k-d (implicit k-d) M Metric MVP Octree Priority R Quad R R+ R* Segment VP X

Other trees	Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top

This article is issued from Wikipedia - version of the Thursday, March 03, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.