Fuzzy extractor
Fuzzy extractors convert biometric data into random strings, which makes it possible to apply cryptographic techniques for biometric security. They are used to encrypt and authenticate users records, with biometric inputs as a key. Historically, the first biometric system of this kind was designed by Juels and Wattenberg and was called "Fuzzy commitment", where the cryptographic key is decommitted using biometric data. "Fuzzy", in that context, implies that the value close to the original one can extract the committed value. Later, Juels and Sudan came up with Fuzzy vault schemes which are order invariant for the fuzzy commitment scheme but uses a Reed–Solomon code. Codeword is evaluated by polynomial and the secret message is inserted as the coefficients of the polynomial. The polynomial is evaluated for different values of a set of features of the biometric data. So Fuzzy commitment and Fuzzy Vault were per-cursor to Fuzzy extractors.
Fuzzy extractor is a biometric tool to authenticate a user using its own biometric template as a key. They extract uniform and random string from its input
that has tolerance for noise. If the input changes to
but is still close to
, the string
can still be re-constructed. When
is used first time to re-construct, it outputs a helper string
which can be made public without compromising the security of
(used for encryption and authentication key) and
(helper string) is stored to recover
. They remain secure even when the adversary modifies
(key agreement between a user and a server based only on a biometric input).
This article is based on the papers "Fuzzy Extractors: A Brief Survey of Results from 2004 to 2006" and "Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data" by Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin and Adam Smith
Motivation
As fuzzy extractors deal with how to generate strong keys from Biometrics and other Noisy Data, it applies cryptography paradigms to biometric data and that means (1) Make little assumptions about the biometric data (these data comes from variety of sources and don't want adversary to exploit that so it is best to assume the input is unpredictable) (2) Apply cryptographic application techniques to the input. (for that fuzzy extractor converts biometric data into secret, uniformly random and reliably reproducible random string). According to "Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data" paper by Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin and Adam Smith – these techniques also have other broader applications (when noisy inputs are used) such as human memory, images used as passwords, keys from quantum channel. Based on the Differential Privacy paper by Cynthia Dwork (ICALP 2006) – fuzzy extractors have application in the proof of impossibility of strong notions of privacy for statistical databases.
Basic definitions
Predictability
Predictability indicates probability that adversary can guess a secret key. Mathematically speaking, the predictability of a random variable is
.
For example, if pair of random variable
and
, if the adversary knows
of
, then predictability of
will be
. So, Adversary can predict
with
. Taking average over
as it is not under adversary control, but since knowing
makes
prediction adversarial, taking the worst case over
.
Min-entropy
Min-entropy indicates worst-case entropy. Mathematically speaking, it is defined as . Random variables with min-entropy at least
is called
-source.
Statistical distance
Statistical distance is measure of distinguishability. Mathematically speaking, it is between two probability distributions and
is,
=
. In any system if
is replaced by
, it will behave as original system with probability at least
.
Definition 1 (strong extractor)
Set is strong randomness extractor.
Randomized function Ext:
with randomness of length
is an
-strong extractor if for all
-sources (Random variables with min-entropy at least
is called
-source)
on
where
is independent of
.
Output of the extractor is a key generated from
with the seed
. It behaves independent of other parts of the system with the probability of
. Strong extractors can extract at most
bits from arbitrary
-source.
Secure sketch
Secure sketch makes it possible to reconstruct noisy input, so if the input is and sketch is
, given
and value
close to
, it is possible to recover
. But sketch
doesn't give much information about
, so it is secure.
If
is a metric space with distance function dis. Secure sketch recovers string
from any close string
without disclosing
.
Definition 2 (secure sketch)
An secure sketch is a pair of efficient randomized procedures (SS – Sketch, Rec – Recover) such that –
(1) The sketching procedure SS on input
returns a string
.
The recovery procedure Rec takes an element
and
.
(2) Correctness: If
then
.
(3) Security: For any
-source over
, the min-entropy of
given
is high: for any
, if
, then
.
Fuzzy extractor
Fuzzy extractors do not recover the original input but generate string (which is close to uniform) from
and its subsequent reproduction (using helper string
) given any
close to
. Strong extractors are a special case of fuzzy extractors when
= 0 and
.
Definition 3 (fuzzy extractor)
An fuzzy extractor is a pair of efficient randomized procedures (Gen – Generate and Rep – Reproduce) such that:
(1) Gen, given
, outputs an extracted string
and a helper string
.
(2) Correctness: If
and
, then
.
(3) Security: For all m-sources
over
, the string
is nearly uniform even given
, So
, then
.
So Fuzzy extractors output almost uniform random bits which is prerequisite for using cryptographic applications (in terms of secret keys). Since output bits are slightly non-uniform, it can decrease security, but not more than the distance from the uniform and as long as that distance is sufficiently small – security still remains robust.
Secure sketches and fuzzy extractors
Secure sketches can be used to construct fuzzy extractors. Like applying SS to to obtain
and strong extractor Ext with randomness
to
to get
.
can be stored as helper string
.
can be reproduced by
and
.
can recover
and
can reproduce
.
Following Lemma formalize this.
Lemma 1 (fuzzy extractors from sketches)
Assume (SS,Rec) is an secure sketch and let Ext be an average-case
strong extractor. Then the following (Gen, Rep) is an
fuzzy extractor:
(1) Gen
and output
.
(2) Rep
: recover
and output
.
Proof: From the definition of secure sketch (Definition 2),
. And since Ext is an average-case
-strong extractor.
Corollary 1
If (SS,Rec) is an – secure sketch and Ext is an
– strong extractor, then the above construction (Gen,Rep) is a
fuzzy extractor.
Reference paper "Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data" by Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin and Adam Smith (2008) includes many generic combinatorial bounds on secure sketches and fuzzy extractors
Basic constructions
Due to their error tolerant properties, a secure sketches can be treated, analyzed, and constructed like a general error correcting code or
for linear codes, where
is the length of codewords,
is the length of the message to be codded,
is the distance between codewords, and
is the alphabet. If
is the universe of possible words then it may be possible to find an error correcting code
that has a unique codeword
for every
and have a Hamming distance of
. The first step for constructing a secure sketch is determining the type of errors that will likely occur and then choosing a distance to measure.
Hamming distance constructions
When there is no chance of data being deleted and only being corrupted then the best measurement to use for error correction is Hamming distance. There are two common constructions for correcting Hamming errors depending on whether the code is linear or not. Both constructions start with an error correcting code that has a distance of where
is the number of tolerated errors.
Code-offset construction
When using a general code, assign a uniformly random codeword
to each
, then let
which is the shift needed to change
into
. To fix errors in
subtract
from
then correct the errors in the resulting incorrect codeword to get
and finally add
to
to get
. This means
. This construction can achieve the best possible tradeoff between error tolerance and entropy loss when
and a Reed–Solomon code is used resulting in an entropy loss of
, and the only way to improve upon this is to find a code better than Reed–Solomon.
Syndrome construction
When using a linear code let the
be the syndrome of
. To correct
find a vector
such that
, then
.
Set difference constructions
When working with a very large alphabet or very long strings resulting in a very large universe , it may be more efficient to treat
and
as sets and look at set differences to correct errors. To work with a large set
it is useful to look at its characteristic vector
, which is a binary vector of length
that has a value of 1 when an element
and
, or 0 when
. The best way to decrease the size of a secure sketch when
is large is make
large since the size is determined by
. A good code to base this construction on is a
BCH code where
and
so
, it is also useful that BCH codes can be decode in sub-linear time.
Pin sketch construction
Let . To correct
first find
, then find a set v where
, finally compute the symmetric difference to get
. While this is not the only construction to use set difference it is the easiest one to use.
Edit distance constructions
When data can be corrupted or deleted the best measurement to use is edit distance. To make a construction based on edit distance it is easiest to start with a construction for set difference or hamming distance as an intermediate correction step and then build the edit distance construction around that.
Other distance measure constructions
There are many other types of errors and distances that can be measured which can be used to model other situations. Most of these other possible constructions are like edit distance constructions where they build upon simpler constructions.
Improving error-tolerance via relaxed notions of correctness
It is possible to show that the error-tolerance of a secure sketch can be improved by applying a probabilistic method to error correction and only needing errors to be correctable with a high probability. This will show that it is possible to exceed the Plotkin bound which is limited to correcting errors, and approach Shannon’s bound allowing for nearly
corrections. To achieve this better error correction a less restrictive error distribution model must be used.
Random errors
For this most restrictive model use a BSC to create a
that a probability
at each position in
that the bit received is wrong. This model can show that entropy loss is limited to
, where
is the binary entropy function, and if min-entropy
then
errors can be tolerated, for some constant
.
Input-dependent errors
For this model errors do not have a known distribution and can be from an adversary, the only constraints are and that a corrupted word depends only on the input
and not on the secure sketch. It can be shown for this error model that there will never be more than
errors since this model can account for all complex noise processes, meaning that Shannon’s bound can be reached, to do this a random permutation is prepended to the secure sketch that will reduce entropy loss.
Computationally bounded errors
This differs from the input dependent model by having errors that depend on both the input and the secure sketch, and an adversary is limited to polynomial time algorithms for introducing errors. Since algorithms that can run in better than polynomial time are not currently feasible in the real world, then a positive result using this error model would guarantee that any errors can be fixed. This is the least restrictive model the only known way to approach Shannon’s bound is to use list-decodable codes although this may not always be useful in practice since returning a list instead of a single codeword may not always be acceptable.
Privacy guarantees
In general a secure system attempts to leak as little information as possible to an adversary. In the case of biometrics if information about the biometric reading is leaked the adversary may be able to learn personal information about a user. For example an adversary notices that there is a certain pattern in the helper strings that implies the ethnicity of the user. We can consider this additional information a function . If an adversary were to learn a helper string, it must be ensured that, from this data he can not infer any data about the person from which the biometric reading was taken.
Correlation between helper string and biometric input
Ideally the helper string would reveal no information about the biometric input
. This is only possible when every subsequent biometric reading
is identical to the original
. In this case there is actually no need for the helper string, so it is easy to generate a string that is in no way correlated to
.
Since it is desirable to accept biometric input similar to
the helper string
must be somehow correlated. The more different
and
are allowed to be, the more correlation there will be between
and
, the more correlated they are the more information
reveals about
. We can consider this information to be a function
. The best possible solution is to make sure the adversary can't learn anything useful from the helper string.
Gen(W) as a probabilistic map
A probabilistic map hides the results of functions with a small amount of leakage
. The leakage is the difference in probability two adversaries have of guessing some function when one knows the probabilistic map and one does not. Formally:
If the function is a probabilistic map, then even if an adversary knows both the helper string
and the secret string
they are only negligibly more likely figure something out about the subject as if they knew nothing. The string
is supposed to kept secret, so even if it is leaked (which should be very unlikely) the adversary can still figure out nothing useful about the subject, as long as
is small. We can consider
to be any correlation between the biometric input and some physical characteristic of the person. Setting
in the above equation changes it to:
This means that if one adversary has
and a second adversary
knows nothing, their best guesses at
are only
apart.
Uniform fuzzy extractors
Uniform fuzzy extractors are a special case of fuzzy extractors, where the output of
are negligibly different from strings picked from the uniform distribution, i.e.
Uniform secure sketches
Since secure sketches imply fuzzy extractors, constructing a uniform secure sketch allows for the easy construction of a uniform fuzzy extractor. In a uniform secure sketch the sketch procedure is a randomness extractor
. Where
is the biometric input and
is the random seed. Since randomness extractors output a string that appears to be from a uniform distribution they hide all the information about their input.
Applications
Extractor sketches can be used to construct -fuzzy perfectly one-way hash functions. When used as a hash function the input
is the object you want to hash. The
that
outputs is the hash value. If one wanted to verify that a
within
from the original
, they would verify that
.
-fuzzy perfectly one-way hash functions are special hash functions where they accept any input with at most
errors, compared to traditional hash functions which only accept when the input matches the original exactly. Traditional cryptographic hash functions attempt to guarantee that is it is computationally infeasible to find two different inputs that hash to the same value. Fuzzy perfectly one-way hash functions make an analogous claim. They make it computationally infeasible two find two inputs, that are more than
Hamming distance apart and hash to the same value.
Protection against active attacks
An active attack could be one where the adversary can modify the helper string . If the adversary is able to change
to another string that is also acceptable to the reproduce function
, it cause
to output an incorrect secret string
. Robust fuzzy extractors solve this problem by allowing the reproduce function to fail, if a modified helper string is provided as input.
Robust fuzzy extractors
One method of constructing robust fuzzy extractors is to use hash functions. This construction requires two hash functions and
. The
functions produces the helper string
by appending the output of a secure sketch
to the hash of both the reading
and secure sketch
. It generates the secret string
by applying the second hash function to
and
. Formally:

The reproduce function also makes use of the hash functions
and
. In addition to verifying the biometric input is similar enough to the one recovered using the
function, it also verifies that hash in the second part of
was actually derived from
and
. If both of those conditions are met it returns
which is itself the second hash function applied to
and
. Formally:
Get
and
from
If
and
then
else

If has been tampered with it will be obvious because,
will output fail with very high probability. To cause the algorithm accept a different
an adversary would have to find a
such that
. Since hash function are believed to be one way functions, it is computationally infeasible to find such a
.
Seeing
would provide the adversary with no useful information. Since, again, hash function are one way functions, it is computationally infeasible for the adversary to reverse the hash function and figure out
. Part of
is the secure sketch, but by definition the sketch reveals negligible information about its input. Similarly seeing
(even though it should never see it) would provide the adversary with no useful information as the adversary wouldn't be able to reverse the hash function and see the biometric input.
References
- Fuzzy Extractors: A Brief Survey of Results from 2004 to 2006
- Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data
- Biometric Fuzzy Extractor Scheme for Iris Templates
- A Fuzzy Vault Scheme