Disjunct matrix
Disjunct and separable matrices play a pivotal role in the mathematical area of non-adaptive group testing. This area investigates efficient designs and procedures to identify 'needles in haystacks' by conducting the tests on groups of items instead of each item alone. The main concept is that if there are very few special items (needles) and the groups are constructed according to certain combinatorial guidelines, then one can test the groups and find all the needles. This can reduce the cost and the labor associated with of large scale experiments.
The grouping pattern can be represented by a binary matrix, where each column represents an item and each row represents a pool. The symbol '1' denotes participation in the pool and '0' absence from a pool. The d-disjunctness and the d-separability of the matrix describe sufficient condition to identify d special items.
In a matrix that is d-separable, the Boolean sum of every d columns is unique. In a matrix that is d-disjunct the Boolean sum of every d columns does not contain any other column in the matrix. Theoretically, for the same number of columns (items), one can construct d-separable matrices with fewer rows (tests) than d-disjunct. However, designs that are based on d-separable are less applicable since the decoding time to identify the special items is exponential. In contrast, the decoding time for d-disjunct matrices is polynomial.
d-separable
Definition: A matrix
is
-separable if and only if
where
such that
Decoding algorithm
First we will describe another way to look at the problem of group testing and how to decode it from a different notation. We can give a new interpretation of how group testing works as follows:
Group testing: Given input and
such that
output
- Take
to be the
column of
- Define
so that
if and only if
- This gives that
This formalizes the relation between and the columns of
and
in a way more suitable to the thinking of
-separable and
-disjunct matrices. The algorithm to decode a
-separable matrix is as follows:
Given a matrix
such that
is
-separable:
- For each
such that
check if
This algorithm runs in time .
d-disjunct
In literature disjunct matrices are also called super-imposed codes and d-cover-free families.
Definition: A x
matrix
is d-disjunct if
such that
,
such that
but
.
Denoting
is the
column of
and
where
if and only if
gives that
is
-disjunct if and only if
Claim: is
-disjunct implies
is
-separable
Proof: (by contradiction) Let be a
x
-disjunct matrix. Assume for contradiction that
is not
-separable. Then there exists
and
with
such that
. This implies that
such that
. This contradicts the fact that
is
-disjunct. Therefore
is
-separable.
Decoding algorithm
The algorithm for -separable matrices was still a polynomial in
. The following will give a nicer algorithm for
-disjunct matrices which will be a
multiple instead of raised to the power of
given our bounds for
. The algorithm is as follows in the proof of the following lemma:
Lemma 1: There exists an time decoding for any
-disjunct
x
matrix.
- Observation 1: For any matrix
and given
if
it implies
such that
and
where
and
. The opposite is also true. If
it implies
if
then
. This is the case because
is generated by taking all of the logical or of the
's where
.
- Observation 2: For any
-disjunct matrix and every set
where
and for each
where
there exists some
where
such that
but
. Thus, if
then
.
Proof of Lemma 1: Given as input use the following algorithm:
- For each
set
- For
, if
then for all
, if
set
By Observation 1 we get that any position where the appropriate
's will be set to 0 by step 2 of the algorithm. By Observation 2 we have that there is at least one
such that if
is supposed to be 1 then
and, if
is supposed to be 1, it can only be the case that
as well. Therefore step 2 will never assign
the value 0 leaving it as a 1 and solving for
. This takes time
overall.
d^e-disjunct
Definition:A matrix is
-disjunct if for any
columns
,
,
,
of
there are at least
elements
in
.
Definition:Let be a
-disjunct matrix. The output
of
in
is the union of those columns indexed by
, where
is a subset of
with at most size
.
Proposition: Let be a
-disjunct matrix. Let
be two distinct subsets with each at most
elements. Then the Hamming distance of the outputs
and
is at least
.
Proof: Without loss of generality, we may assume s.t.
and
. We consider the
-th column of
and those columns of
indexed by
, then we can find
different
such that
and
for all
, because the definition of
-disjunct. Hence we complete the proof.
Then we have the following corollary.
Corollary We can detect errors and correct
errors on the outcome of
-disjunct matrix.
Upper bounds for non-adaptive group testing
The results for these upper bounds rely mostly on the properties of -disjunct matrices. Not only are the upper bounds nice, but from Lemma 1 we know that there is also a nice decoding algorithm for these bounds. First the following lemma will be proved since it is relied upon for both constructions:
Lemma 2: Given let
be a
matrix and:
for some integers then
is
-disjunct.
Note: these conditions are stronger than simply having a subset of size but rather applies to any pair of columns in a matrix. Therefore no matter what column
that is chosen in the matrix, that column will contain at least
1's and the total number of shared 1's by any two columns is
.
Proof of Lemma 2: Fix an arbitrary and a matrix
. There exists a match between
if column
has a 1 in the same row position as in column
. Then the total number of matches is
, i.e. a column
has a fewer number of matches than the number of ones in it. Therefore there must be a row with all 0s in
but a 1 in
.
We will now generate constructions for the bounds.
Constructions can be either randomized (brute-force), explicit or strongly explicit. This concerns the time in which the incidence matrix can be generated. An explicit construction for a matrix has a complexity
, whereas a randomized construction takes more than that. For a strongly explicit construction, one can find a single entry of the matrix in
time.
Randomized construction
This first construction will use a probabilistic argument to show the property wanted, in particular the Chernoff bound. Using this randomized construction gives that . The following lemma will give the result needed.
Theorem 1: There exists a random -disjunct matrix with
rows.
Proof of Theorem 1: Begin by building a random matrix
with
(where
will be picked later). It will be shown that
is
-disjunct. First note that
and let
independently with probability
for
and
. Now fix
. Denote the
column of
as
. Then the expectancy is
. Using the Chernoff bound, with
, gives
if
. Taking the union bound over all columns gives
,
. This gives
,
. Therefore
with probability
.
Now suppose and
then
. So
. Using the Chernoff bound on this gives
if
. By the union bound over
pairs
such that
. This gives that
and
with probability
. Note that by changing
the probability
can be made to be
. Thus
. By setting
to be
, the above argument shows that
is
-disjunct.
Note that in this proof thus giving the upper bound of
.
Strongly explicit construction
It is possible to prove a bound of using a strongly explicit code. Although this bound is worse by a
factor, it is preferable because this produces a strongly explicit construction instead of a randomized one.
Theorem 2: There exists a strongly explicit -disjunct matrix with
rows.
This proof will use the properties of concatenated codes along with the properties of disjunct matrices to construct a code that will satisfy the bound we are after.
Proof of Theorem 2:
Let such that
. Denote
as the matrix with its
column being
. If
can be found such that
-
-
,
then is
-disjunct. To complete the proof another concept must be introduced. This concept uses code concatenation to obtain the result we want.
Kautz-Singleton '64
Let . Let
be a
-Reed–Solomon code. Let
such that for
,
where the 1 is in the
position. Then
,
, and
.
---
Example: Let . Below,
denotes the matrix of codewords for
and
denotes the matrix of codewords for
, where each column is a codeword. The overall image shows the transition from the outer code to the concatenated code.
---
Divide the rows of into sets of size
and number them as
where
indexes the set of rows and
indexes the row in the set. If
then note that
where
. So that means
. Since
it gives that
so let
. Since
, the entries in each column of
can be looked at as
sets of
entries where only one of the entries is nonzero (by definition of
) which gives a total of
nonzero entries in each column. Therefore
and
(so
is
-disjunct).
Now pick and
such that
(so
). Since
we have
. Since
and
it gives that
.
Thus we have a strongly explicit construction for a code that can be used to form a group testing matrix and so .
For non-adaptive testing we have shown that and we have that (i)
(strongly explicit) and (ii)
(randomized). As of recent work by Porat and Rothscheld, they presented an explicit method construction (i.e. deterministic time but not strongly explicit) for
,[1] however it is not shown here. There is also a lower bound for disjunct matrices of
[2][3][4] which is not shown here either.
Examples
Here is the 2-disjunct matrix :
See also
References
- ↑ Porat, E., & Rothschild, A. (2008). Explicit Non-adaptive Combinatorial Group Testing Schemes. In Proceedings of the 35th International Colloquium on Automata, Languages and Programming (ICALP) (pp. 748–759).
- ↑ Dýachkov, A. G., & Rykov, V. V. (1982). Bounds on the length of disjunctive codes. Problemy Peredachi Informatsii [Problems of Information Transmission], 18(3), 7–13.
- ↑ Dýachkov, A. G., Rashad, A. M., & Rykov, V. V. (1989). Superimposed distance codes. Problemy Upravlenija i Teorii Informacii [Problems of Control and Information Theory], 18(4), 237–250.
- ↑ Zoltan Furedi, On r-Cover-free Families, Journal of Combinatorial Theory, Series A, Volume 73, Issue 1, January 1996, Pages 172–173, ISSN 0097-3165, doi:10.1006/jcta.1996.0012. (http://www.sciencedirect.com/science/article/B6WHS-45NJMVF-39/2/172ef8c5c4aee2d85d1ddd56b107eef3)
Notes
- Atri Rudra's book on Error Correcting Codes: Combinatorics, Algorithms, and Applications (Spring 201), Chapter 17