Range mode query
In data structures, the range mode query problem asks to build a data structure on some input data to efficiently answer queries asking for the mode of any consecutive subset of the input.
Problem Statement
Given an array , we wish to answer queries of the form
, where
. The mode of an array
,
, is an element
such that the frequency of
is greater than the frequency of
. For example, if
,
. This definition can be extended to the mode of any subset of the array
.
Theorem 1
Let and
be any multisets. If
is a mode of
and
, then
is a mode of
.
Proof
Let be a mode of
and
be its frequency in
. Suppose that
is not a mode of
. Thus, there exists an element
with frequency
that is the mode of
. Since
is the mode of
and that
, then
. Thus,
should be the mode of
which is a contradiction.
Results
Space | Query Time | Restrictions | Source |
---|---|---|---|
![]() | ![]() | [1] | |
![]() | ![]() | ![]() | [1] |
![]() | ![]() | [2] | |
![]() | ![]() | ![]() | [2] |
Lower bound
Any data structure using cells of
bits each needs
time to answer a range mode query.[3]
This contrasts with other range mode problems, such as the range minimum query which have solutions offering constant time query time and linear space. This is due to the hardness of the mode problem, since if we know the mode of and the mode of
, there is no simple way of computing the mode of
. Any element of
or
could be the mode. For example, if
and its frequency is
, and
and its frequency is also
, there could be an element
with frequency
in
and frequency
in
.
, but its frequency in
is greater than the frequency of
and
, which makes
a better candidate for
than
or
.
Linear space data structure with square root query time
This method by Chan et al.[1] uses space and
query time. By setting
, we get
and
bounds for space and query time.
Preprocessing
Let be an array, and
be an array that contains the distinct values of A, where
is the number of distinct elements. We define
to be an array such that, for each
,
contains the rank (position) of
in
. Arrays
can be created by a linear scan of
.
Arrays are also created, such that, for each
,
. We then create an array
, such that, for all
,
contains the rank of
in
. Again, a linear scan of
suffices to create arrays
and
.
It is now possible to answer queries of the form "is the frequency of in
at least
" in constant time, by checking whether
.
The array is split B into blocks
, each of size
. Thus, a block
spans over
. The mode and the frequency of each block or set of consecutive blocks will be pre-computed in two tables
and
.
is the mode of
, or equivalently, the mode of
, and
stores the corresponding frequency. These two tables can be stored in
space, and can be populated in
by scanning
times, computing a row of
each time with the following algorithm:
algorithm computeS_Sprime is input: Array B=[0:n-1], Array D=[0:Delta-1], Integer s output: Tables S and Sprime let S ← Table(0:s-1,0:s-1) let Sprime ← Table(0:s-1,0:s-1) let firstOccurence ← Array(0:Delta-1) for all i in {0,...,Delta-1} do firstOccurence[i] ← -1 end for for i ← 0:s-1 do let j ← i*t let c ← 0 let fc ← 0 let noBlock ← i let block_start ← j let block_end ← min{(i+1)*t-1, n-1} while j < n do if firstOccurence[B[i]] = -1 then firstOccurence[B[i]] ← j end if if atLeastQInstances(firstOccurence[B[j]], block_end, fc+1) then c ← B[i] fc ← fc+1 end if if (j+1)%t = 0 or j = size-1 then S[i*s+noBlock] ← c Sprime[i*s+noBlock] ← fc noBlock ← noBlock+1 block_end ← min{block_end+t, size-1} end if end while for all i in {0,...,Delta-1} do firstOccurence[i] ← -1 end for end for
Query
We will define the query algorithm over array . This can be translated to an answer over
, since for any
,
is a mode for
if and only if
is a mode for
. We can convert an answer for
to an answer for
in constant time by looking in
or
at the corresponding index.
Given a query , the query is split in three parts: the prefix, the span and the suffix. Let
and
. These denote the indices of the first and last block that are completely contained in
. The range of these blocks is called the span. The prefix is then
(the set of indices before the span), and the suffix is
(the set of indices after the span). The prefix, suffix or span can be empty, the latter is if
.
For the span, the mode is already stored in
. Let
be the frequency of the mode, which is stored in
. If the span is empty, let
. Recall that, by Theorem 1, the mode of
is either an element of the prefix, span or suffix. A linear scan is performed over each element in the prefix and in the suffix to check if its frequency is greater than the current candidate
, in which case
and
are updated to the new value. At the end of the scan,
contains the mode of
and
its frequency.
Scanning procedure
The procedure is similar for both prefix and suffix, so it suffice to run this procedure for both:
Let be the index of the current element. There are three cases:
- If
, then it was present in
and its frequency has already been counted. Pass to the next element.
- Otherwise, check if the frequency of
in
is at least
(this can be done in constant time since it is the equivalent of checking it for
).
- If it is not, then pass to the next element.
- If it is, then compute the actual frequency
of
in
by a linear scan (starting at index
) or a binary search in
. Set
and
.
This linear scan (excluding the frequency computations) is bounded by the block size , since neither the prefix or the suffix can be greater than
. A further analysis of the linear scans done for frequency computations shows that it is also bounded by the block size.[1] Thus, the query time is
.
Subquadratic space data structure with constant query time
This method by [2] uses space for a constant time query. We can observe that, if a constant query time is desired, this is a better solution than the one proposed by Chan et al.,[1] as the latter gives a space of
for constant query time if
.
Preprocessing
Let be an array. The preprocessing is done in three steps:
- Split the array
in
blocks
, where the size of each block is
. Build a table
of size
where
is the mode of
. The total space for this step is
- For any query
, let
be the block that contains
and
be the block that contains
. Let the span be the set of blocks completely contained in
. The mode
of the block can be retrieved from
. By Theorem 1, the mode can be either the mode of the prefix (indices of
before the start of the span), the mode of the suffix (indices of
after the end of the span), or
. The size of the prefix plus the size of the suffix is bounded by
, thus the position of the mode isstored as an integer ranging from
to
, where
indicates a position in the prefix/suffix and
indicates that the mode is the mode of the span. There are
possible queries involving blocks
and
, so these values are stored in a table of size
. Furthermore, there are
such tables, so the total space required for this step is
. To access those tables, a pointer is added in addition to the mode in the table
for each pair of blocks.
- To handle queries
where
and
are in the same block, all such solutions are precomputed. There are
of them, they are stored in a three dimensional table
of this size.
The total space used by this data structure is , which reduces to
if we take
.
Query
Given a query , check if it is completely contained inside a block, in which case the answer is stored in table
. If the query spans exactly one or more blocks, then the answer is found in table
. Otherwise, use the pointer stored in table
at position
, where
are the indices of the blocks that contain respectively
and
, to find the table
that contains the positions of the mode for these blocks and use the position to find the mode in
. This can be done in constant time.
References
- 1 2 3 4 5 Chan, Timothy M.; Durocher, Stephane; Larsen, Kasper Green; Morrison, Jason; Wilkinson, Bryan T. (2013). "Linear-Space Data Structures for Range Mode Query in Arrays" (PDF). Theory of Computing Systems (Springer): 1–23.
- 1 2 3 Krizanc, Danny; Morin, Pat; Smid, Michiel H. M. (2003). "Range Mode and Range Median Queries on Lists and Trees" (PDF). ISAAC: 517–526.
- ↑ Greve, M; Jørgensen, A.; Larsen, K.; Truelsen, J. (2010). "Cell probe lower bounds and approximations for range mode". Automata, Languages and Programming: 605–616.