Abstract—Biometric systems are increasingly deployed in networked environment, and issues related to interoperability are bound to arise as single vendor, monolithic architectures become less desirable. Interoperability issues affect every subsystem of the biometric system, and a statistical framework to evaluate interoperability is proposed. The framework was applied to the acquisition subsystem for a fingerprint recognition system and the results were evaluated using the framework. Fingerprints were collected from 100 subjects on 6 fingerprint sensors. The results show that performance of interoperable fingerprint datasets is not easily predictable and the proposed framework can aid in removing unpredictability to some degree.
(2008) Statistical Analysis Framework for Biometric System Interoperability Testing
1. 5th International Conference on Information Technology and Applications (ICITA 2008)
Statistical Analysis Framework for Biometric System
Interoperability Testing
Shimon K. Modi, Stephen J. Elliott, Ph.D., and H. Kim, Ph.D., Eric P. Kukula
Abstract—Biometric systems are increasingly deployed in architectures are designed. Instead of stand-alone, monolithic
networked environment, and issues related to interoperability are authentication architectures of the past, today’s networked
bound to arise as single vendor, monolithic architectures become world offers the advantage of distributed and federated
less desirable. Interoperability issues affect every subsystem authentication architectures. The development of distributed
of the biometric system, and a statistical framework to authentication architectures can be seen as an evolutionary step,
evaluate interoperability is proposed. The framework was but it raises an issue always accompanied by an attempt to mix
applied to the acquisition subsystem for a fingerprint disparate systems: interoperability. What is the effect on
recognition system and the results were evaluated using the performance of the authentication process if an individual
framework. Fingerprints were collected from 100 subjects establishes his/her credential on one system, and then
on 6 fingerprint sensors. The results show that performance authenticates him/her-self on a different system of the same
of interoperable fingerprint datasets is not easily modality? This issue is of relevance to all kinds of
predictable and the proposed framework can aid in authentication mechanisms, and of particular relevance to
removing unpredictability to some degree. biometric recognition systems. The last decade has witnessed a
huge increase in deployment of biometric systems, and while
Index Terms—fingerprint recognition, biometrics, most of these systems have been single vendor systems the issue
statistics, fingerprint sensor interoperability, analysis of interoperability is bound to arise as distributed architectures
framework. are considered to be the norm, and not an exception. Fingerprint
recognition systems are the most widely deployed biometric
I. INTRODUCTION systems, and most commercially deployed fingerprint systems
Establishing and maintaining identity of individuals is are single vendor systems [1]. The single point of interaction of
becoming evermore important in today’s networked world. The a user with the fingerprint system is during the acquisition stage,
complexity of these tasks has also increased proportionally with and this stage has the maximum probability of introducing
advances in automated recognition. There are three main inconsistencies in the biometric data. Fingerprint sensors are
methods of authenticating an individual: 1) using something based on a variety of different technologies like electrical,
that only the authorized individual has knowledge of e.g. optical, thermal etc. The physics behind these technologies
passwords 2) using something that only the authorized introduces distortions and variations in the feature set of the
individual has possession of e.g. physical tokens 3) using captured image that is not consistent across all of them. This
physiological or behavioral characteristics that only the makes the goal of interoperability even more challenging.
authorized individual can reproduce i.e. biometrics. The Performance analysis of biometric systems can be achieved
increasing use of information technology systems has created using several different techniques; one such technique involves
the concept of digital identities which can be used in any of analyzing DET curves. This methodology can be also applied
these authentication mechanisms. Digital identities and to testing performance rates of native systems and interoperable
electronic credentialing have changed the way authentication systems. Although this method allows a researcher to visually
compare the different error rates, creating a statistical
methodology for testing interoperability of biometric systems is
S. K. Modi is a researcher and Ph.D. candidate in the Biometrics Standards,
also of great importance. A formalized process of testing
Performance, and Assurance Laboratory in the Department of Industrial
Technology, Purdue University, West Lafayette, IN 47907 USA (e-mail: interoperability of biometric systems will be required as the
shimon@purdue.edu). issues related to interoperability become more prominent. This
S. J. Elliott is Director of the Biometrics Standards, Performance, and research proposes and tests a basic statistical framework for
Assurance Laboratory and Associate Professor in the Department of Industrial
analyzing matching scores and error rates for fingerprints
Technology, Purdue University, West Lafayette, IN 47907 USA (e-mail:
elliott@purdue.edu). collected from 6 different fingerprint sensors.
H. Kim is Professor of School of Information & Communication
Engineering at Inha University, and a member of Biometrics Engineering
Research Center (BERC) at Yonsei University, Seoul, Korea (email:
hikim@inha.ac.kr).
II. REVIEW OF LITERATURE
E. P. Kukula is a researcher and Ph.D. candidate in the Biometrics The acquisition of fingerprint images is heavily affected by
Standards, Performance, and Assurance Laboratory in the Department of
interaction and contact issues. Fingerprint images are affected
Industrial Technology, Purdue University, West Lafayette, IN 47907 USA
(e-mail: kukula@purdue.edu). by issues like inconsistent contact, non-uniform contact and
irreproducible contact [2]. The interaction of a finger when
placed on a sensor maps a 3D shape on a 2D surface. This
ICITA2008 ISBN: 978-0-9803267-2-7
777
2. mapping is not controlled, and mapping for the same finger can datasets were far lower than FNMR for interoperable datasets.
differ from one sensor to another and thereby adding sensor In [7] the authors conduct an image quality and minutiae count
specific deformations. The area of contact between a finger analysis on fingerprints collected from three different sensors
surface and the sensor is not the same for different sensors. This and assess the performance rates of the native and interoperable
non-uniform contact can influence the amount of detail captured fingerprint datasets using different enrollment methodologies.
from the finger surface and consistency of detail captured. Jain They also describe an ANOVA test for testing differences in
and Ross examined the issue of matching feature sets mean genuine matching scores between the three datasets.
originating from an optical sensor and a capacitive sensor Their preliminary statistical analysis showed that the genuine
[3]. Their results showed that minutiae count for the dataset matching scores were statistically significant in their differences
collected from the optical sensor was higher than the minutiae for the native and interoperable datasets. The previous body of
count for the dataset collected from the capacitive sensor. Their literature points to the growing importance of interoperability
results showed that Equal Error Rate (EER) for the two native for biometric systems. The previous research has concentrated
databases were 6.14% and 10.39%, while the EER for the on interoperability error rate matrices and comparison of EER
interoperable database was 23.13%. Ko and Krishnan [4] between native and interoperable datasets. Although analysis of
present a methodology for measuring and monitoring quality of error rates serve as a good indicator of performance, an alternate
fingerprint database and fingerprint match performance of the technique which utilizes statistical techniques would be
Department of Homeland Security’s Biometric Identification beneficial. A formalized statistical analysis framework for
System. One of the findings of their research was the importance testing interoperability is lacking and this problem needs to be
of understanding the impact on performance if fingerprints addressed. The researchers in this experiment build on previous
captured by a new fingerprint sensor were integrated into an work done in this area and propose a statistical analysis
identification application with images captured from an existing framework for testing interoperability.
fingerprint sensor.
Han et. al [5] performed a study examining the influence of
image resolution and distortion due to differences in fingerprint III. STATISTICAL ANALYSIS FRAMEWORK
sensor technologies to their matching performance. Their Interoperability of biometric systems is going to become an
approach proposed a compensation algorithm which worked on important issue, and the need for an analysis framework will
raw images and templates so that all the fingerprint images and
become imperative. Frameworks for testing biometric systems
templates processed by the system are normalized against a
and biometric algorithms can be found in literature, and the
pre-specified baseline. Their research performed statistical
researchers have proposed a framework for testing
analysis on the basic fingerprint features for the original images
and the transformed images to test for differences between the interoperability of biometric systems in this paper. For the
two. purposes of this research the framework was adapted for testing
The International Labor Organization (ILO) commissioned a interoperability of fingerprint sensors. The framework is based
biometric testing campaign in an attempt to understand the on the concept that if two fingerprint sensors are interoperable
causes of the lack of interoperability [6]. 10 products were then the resulting fingerprint datasets should have similar error
tested, where each product consisted of a sensor paired with an rates compared to error rates of fingerprint datasets collected
algorithm capable of enrollment and verification. Native and from any one of the single fingerprint sensors. The framework
interoperable False Accept Rate (FAR) and False Reject Rate for testing interoperability of fingerprint sensors is based on
(FRR) were computed for all datasets. Mean FRR for genuine three steps:
matching scores of 0.92% was observed at FAR of 1%. The 1. Statistical analysis of basic fingerprint features.
objectives of this test were twofold: to test conformance of the 2. Error rates analysis of native and interoperable fingerprint
products in producing biometric information records complying datasets.
with ILO SID-0002, and to test which products could 3. Statistical analysis of matching scores of native and
interoperate at levels of less than 1% FRR at a fixed 1% FAR. interoperable fingerprint datasets.
The results showed that out of the 10 products, only 2 products
were able to interoperate at the mandated levels. This framework was evaluated using a dataset of fingerprints
NIST conducted the minutiae template interoperability test - collected from 6 fingerprint sensors and the methodology and
MINEX 2004 – to assess the interoperability of fingerprint results are discussed in the following sections.
templates generated by different extractors and then matched
with different matchers. Four different datasets were used which
were referred to as the DOS, DHS, POE and POEBVA. The
performance evaluation framework calculated FNMR at fixed
FMR of 0.01% for all the matchers. Performance matrices were
created which represented all FNMR of native and
interoperable datasets and provided a means for a quick
comparison. Their results showed that FNMR for native
778
3. Sensor 1 Sensor 2 Sensor 3
Sensor 4 Sensor 5 Sensor 6
Fig. 2. Example Fingerprint Images.
Fig. 1. Statistical Analysis Framework for Interoperability V. FINGERPRINT FEATURE ANALYSIS
Testing. An important factor to consider when examining
interoperability is the ability of different fingerprint sensors to
capture similar fingerprint features from the same fingerprint.
IV. DATA COLLECTION Human interaction with the sensor, levels of habituation, finger
skin characteristics, and sensor characteristics introduce its own
The dataset used in this research is a part of KFRIA-DB source of variability. All of these factors affect the consistency
(Korea Fingerprint Recognition Interoperability Alliance
of fingerprint features of images acquired from different
Database). Fingerprints were collected from 100 subjects using
sensors. It is important to analyze the amount of variance in
6 different fingerprint sensors. Each subject provided 6
image quality and minutiae count of fingerprints captured from
fingerprint images from the right index finger, right middle
finger, left index finger, and left middle finger. 2,400 different sensors. This analysis was performed by computing
fingerprint images were collected using each sensor. Table I has image quality scores and minutiae count for all fingerprint
a specifications overview for all the fingerprint sensors used in images using a commercially available software. Table II
the study. shows the average values for image quality scores and minutiae
counts for all the datasets.
Table I. Sensor Specifications
Table II. Average Image Quality Scores & Minutiae Count
Sensor Technology Resolution Interaction Image Size
Type (DPI) Type Fingerprint dataset Quality Score Minutiae Count
Sensor 1 Thermal 500 Swipe 360 X
500
Range 0-100
Sensor 2 Optical 500 Touch 280 X Sensor 1 15.24 94.13
320 Sensor 2 74.97 45.89
Sensor 3 Optical 500 Touch 248 X
Sensor 3 71.15 38.77
292
Sensor 4 Polymer 620 Touch 480 X Sensor 4 6.62 52.44
640 Sensor 5 68.92 39.21
Sensor 5 Capacitive 508 Touch 256 X Sensor 6 62.58 31.25
360
Sensor 6 Optical 500 Touch 224 X
256 The datasets for Sensor1 and Sensor4 showed very low
quality scores. It should be noted that the background of the
All analysis for this study was performed on raw fingerprint images captured from Sensor4 had a very dark background
images collected from the 6 sensors. Sensor characteristics like which could have contributed for a very low quality score. Also
capture area, capture technology, aspect ratio etc. have an Sensor1 and Sensor4 were different technologies compared to
influence on the resulting image. Fig. 2 shows sample images the other sensors which are more commonly available. An
collected from the different sensors. analysis of variance (ANOVA) was performed on all the
datasets to test the differences in the mean count of image
quality and minutiae count between all the datasets. The
hypothesis stated in (1) was tested using the ANOVA test.
779
4. symmetric process, the matrix can be viewed as a symmetric
H10: µ1 = µ 2 = ……..= µ 6 matrix as well.
H1A: µ1 ≠ µ2 ≠ ……..≠ µ 6 (1) The FNMR matrix for 0.1% fixed FMR showed a varying
range of FNMR for native and interoperable datasets. All the
native datasets had a significantly lower FNMR compared to the
The p-value of this hypothesis test was computed to be less interoperable datasets. For example, S4 showed a native FNMR
than 0.05 which indicated that all the mean scores were of 0.1% and lowest interoperable FNMR of 35% which
statistically significant in their differences. The same indicates a very low level of interoperability between the
hypothesis test was conducted on minutiae count for all the datasets. When the FNMR are analyzed in the context of image
datasets. The p-value of this hypothesis test was calculated to quality scores, it can be seen that sensor 4 had the lowest image
be less than 0.05 which indicated that all the mean scores quality scores which indicated it was an important factor in the
were statistically significant in their differences. Table II high FNMR of interoperable datasets. The interesting
shows that image quality scores for fingerprints collected observation is the relatively low level of FNMR for the native
from different sensors were significantly different. Previous dataset of fingerprints captured with S4. It was also observed
research has shown that image quality has an impact on that FNMR of interoperable datasets created from fingerprint
performance of fingerprint matching systems [8]. The next sensors of the same acquisition technology and interaction type,
step of the research involved evaluating the impact of the for example S2 and S3, was comparable to the FNMR of their
basic fingerprint feature inconsistencies on performance of native datasets.
fingerprint datasets collected from different sensors.
Table III. FNMR at fixed 0.1% FMR
VI. PERFORMANCE RATES ANALYSIS
S1 S2 S3 S4 S5 S6
S1 0.8 5.0 8.0 35.0 18.0 7.0
A. Performance Rates: ROC and Error Rates Matrix S2 0 0.6 38.0 2.0 0.12
S3 0.1 38.0 1.9 0.12
In order to evaluate the performance of fingerprint datasets S4 0.1 58.0 32.0
False Non Match Rates (FNMR) were computed for all datasets. S5 0.1 6.0
A commercially available fingerprint feature extractor and S6 0.1
matcher was used to generate FNMR. FNMR were generated
for native datasets and interoperable datasets, where the native
Table IV. FNMR at fixed 1% FMR
dataset refers to the comparison of fingerprint images collected
from the same fingerprint sensors, and the interoperable S1 S2 S3 S4 S5 S6
datasets refer to fingerprint images collected from different S1 0 0 0 0 0 0
sensors. The first three fingerprint images provided by the S2 0 0 0 0 0
subject were used to create the enrollment database, and the last S3 0 0 0 0
three images were used to create the verification database.
S4 0 0 0
Enrollment databases were created for each of the 6 sensors, and
S5 0 0
verification databases were also created for each of the 6
sensors. Matching scores were generated by comparing every S6 0
enrollment template from each enrollment database against
every fingerprint image from each verification database, which
resulted in a set of scores S for every combination of enrollment Detection Error Tradeoff (DET) curves were also generated for
and verification databases, where all the datasets. DET curves are a modification of Receiver
Operating Characteristic (ROC) curves. ROC curves are a
S ={Eix,Vjy,scoreijxy} means of representing results of performance of diagnostic,
i= 1,.. , number of enrolled template detection and pattern matching systems [9]. A DET curve plots
j = 1,.., number of verification images FMR on the x-axis and FNMR on the y-axis as function of
x= 1,.., number of enrollment datasets decision threshold. DET curves for different combinations of
y=1,…, number of verification datasets enrollment/verification databases allow comparison of error
scoreijxy = match score between enrollment template and rates at different thresholds.
verification image
DET (T) = (FMR (T), FNMR (T))
Using this set of scores, a FNMR matrix was generated. (2)
FNMR was calculated for each set of scores at a fixed False where T is the threshold
Match Rate (FMR) of 0.1% and 1%. The results are shown in
Fig. 3 shows three superimposed DET curves. It can be
Table III and IV. The diagonal of the FNMR matrix are rates
observed that the DET curve for the interoperable dataset for S2
for the native datasets, and the cells off the diagonal are rates for
and S3 performs worse than the other two native datasets at
the interoperable datasets. Since matching of fingerprints is a
every possible threshold. Fig. 4 also shows three superimposed
780
5. DET curves. The DET curve for the interoperable dataset for S2 performance of interoperable datasets should be statistically
and S4 shows its performance is much worse compared to the similar to performance of native dataset. An ANOVA test was
native datasets. Looking at the DET curves for native datasets in used to test for differences in the mean genuine matching scores
Fig. 3 and Fig. 4, the difference in performance between the between the native dataset and the interoperable datasets at a
native datasets is comparable. But the difference in performance significance level of 0.05. This test was performed for each of
between the interoperable datasets is significantly different. the six native datasets, which resulted in 6 sets of hypothesis as
This indicates the unpredictable nature of determining stated in (3).
performance of the interoperable datasets based entirely on
performance of native datasets. H20: µnative = µ interoperable1 = ……..= µ interoperable5
H2A: µnative ≠ µ interoperable1 ≠ ……..≠ µ interoperable5 (3)
The ANOVA test for all six hypothesis had a p-value of less
than 0.05, which resulted in rejecting the null hypothesis and
concluding that native genuine matching scores were
significantly different compared to interoperable matching
scores.
In several experiments such as this, one of the treatments is a
control and the other treatments are comparison treatments. A
statistical test can be devised which compares different
treatments to a control. Such a statistical test can be performed
using the Dunnett’s test, which is a modified form of a t-test
[10]. For this particular experiment, the native dataset genuine
match scores were used as the control and the interoperable
dataset genuine match scores as the comparison treatments. For
Fig. 3. DET Curve for S2 and S3 datasets. each native dataset, there were 5 control treatments which
corresponded to interoperable datasets. The mean genuine
match score for each interoperable database was tested against
the control (i.e. native database score). According to the
Dunnett’s test, the null hypothesis H0: µnative = µinteroperable is
rejected at α = 0.05 if
1 1
| yi. ya. | dα(a 1, f ) MSE ( ) (4)
ni na
where
dα (a-1,f) = Dunnet’s constant
a-1=number of interoperable datasets
f = number of error degrees of freedom
MSE = mean square of error terms
ni = number of samples in control
na= number of samples in interoperable set a
Fig. 4. DET Curve for S2 and S4 datasets.
The Dunnet’s test was performed on all the possible
B. Statistical Analysis of Matching Scores combinations of native and interoperable datasets. The
Dunnet’s test showed all of the genuine matching scores of the
The DET curves and FNMR matrix provide an insight into interoperable datasets were different compared to the genuine
any existing differences in FNMR between native and matching scores of the control dataset. Table IV. shows the
interoperable databases, but they do not provide a statistical average genuine matching score of the control dataset and the
basis for testing the differences. A statistical analysis of the average genuine matching score of the interoperable dataset
results could help uncover underlying patterns which contribute which had the least absolute difference with the control.
to the unpredictability observed in comparison of the DET An evaluation of results in Table IV shows that S2 and S3 had
curves. To assess interoperability at matching score level, the the best interoperable genuine matching scores. When the
matching scores from the genuine comparisons of native dataset interoperable matching rates are analyzed in context of image
were compared to matching scores from genuine comparisons quality scores and minutiae count, S2 and S3 had the least
of interoperable datasets. For true interoperability, absolute difference between their image quality scores and
minutiae counts. Combining these results provides a positive
781
6. indicator for improving predictability of FNMR for uses basic fingerprint features as predictor variables and
interoperable datasets. matching scores as response variables is another avenue of
future work. Understanding the effect of these predictor
variables on interoperable matching scores could be used to
Table V. Matching Scores for Control Dataset and create a model which is capable of describing the interactions
Interoperable Dataset and effects.
Average Matching Score Interoperable Dataset with ACKNOWLEDGMENT
Control Dataset Least Difference
Sensor 1 Dataset- 319.3 Sensor 3- 294.2 The authors would like to thank KFRIA (Korea Fingerprint
Recognition Interoperability Alliance) for supporting this
Sensor 2 Dataset- 749.2 Sensor 3- 609.2 research and providing the fingerprint database for analysis.
Sensor 3 Dataset- 789 Sensor 2- 609.2 REFERENCES
[1] IBG, Biometrics Market and Industry Report. 2007, IBG: NY. p.
Sensor 4 Dataset- 575.5 Sensor 3- 281.6 224.
[2] Haas, N., S. Pankanti, and M. Yao, Fingerprint Quality
Sensor 5 Dataset- 631.9 Sensor 3- 390.5 Assessment. In Automatic Fingerprint Recognition Systems. 2004,
NY: Springer-Verlag. 55-66.
[3] Jain, A. and A. Ross, eds. Biometric Sensor Interoperability.
Sensor 6 Dataset- 652.7 Sensor 2- 521.9 BioAW 2004, ed. A. Jain and D. Maltoni. Vol. 3067. 2004,
Springer-Verlag: Berlin. 134-145.
[4] Ko, T. and R. Krishnan. Monitoring and Reporting of Fingerprint
Image Quality and Match Accuracy for a Large User Application.
VII. CONCLUSIONS AND FUTURE WORK in Applied Imagery Pattern Recognition Workshop. 2004.
Previous research has shown image quality has a significant Washington, D.C.: IEEE Computer Society.
[5] Han, Y., et al. Resolution and Distortion Compensation based on
impact on performance of native fingerprint datasets, and this Sensor Evaluation for Interoperable Fingerprint Recognition. in
research showed that image quality and minutiae count have an 2006 International Joint Conference on Neural Networks. 2006.
impact on performance of interoperable fingerprint datasets. Vancourver, Canada.
The type of capture technology did not have a consistent effect [6] Campbell, J. and M. Madden, ILO Seafarers' Identity Documents
Biometric Interoperability Test Report 2006, International Labour
on FNMR of interoperable fingerprint datasets which was Organization: Geneva. p. 170.
noticed in the difference in FNMR between datasets S2 and S3, [7] Modi, S., S. Elliott, and H. Kim. Performance Analysis for Multi
and S2 and S4. It is important to understand the effect of these Sensor Fingerprint Recognition System. in International
Conference on Information Systems Security. 2007. Delhi, India:
factors since they can be used to reduce the unpredictability of Springer Verlag.
performance of interoperable datasets. Interoperability is [8] Elliott, S.J. and S.K. Modi. Impact of Image Quality on
dependent on several factors, and this research uncovered Performance: Comparison of Young and Elderly Fingerprints. in
6th International Conference on Recent Advances in Soft
important factors and illustrated its significance using statistical Computing (RASC). 2006. Canterbury, UK.
tests and analysis methodologies. The results of these findings [9] Mansfield, A. and J. Wayman, Best Practices. 2002, National
can be used in designing fingerprint matching algorithms which Physics Laboratory: Middlesex. p. 32.
[10] Montgomery, D.C., Design and Analysis of Experiments. 4th ed.
specifically take advantage of this new knowledge.
1997, New York: John Wiley & Sons. 704.
The results discussed in this paper indicate several avenues of
research which could be followed to improve the statistical
analysis framework. Along with comparison of genuine
matching scores using the Dunnet’s test, a comparison of
proportions can also be applied to statistically test the FNMR
between native and interoperable datasets. This would add one
more test to collection of interoperability tests. Application of
this framework to a different modality would also be an
interesting study. In this research the framework was applied
exclusively to interoperability tests for fingerprint recognition
and it helped in synthesizing the results in a novel way. Other
biometric modalities will be facing the same problems related to
interoperability as those by fingerprint recognition, and it will
become imperative to understand these issues and try to solve
them. Application of this framework to other modalities could
provide ideas into solving the problems of interoperability in a
larger context. An investigative multivariate analysis which
782