Hughes, V. and Foulkes, P. (2012) Effects of variation on the computation of numerical likelihood ratios for forensic voice comparison. Paper presented at International Association for Forensic Phonetics and Acoustics (IAFPA) Conference, Universidad Internacional Menédez Pelayo, Santander. 5-8 August 2012.
2. 1. introduction
• Likelihood Ratio (LR) = “logically and legally correct
framework” for assessing forensic comparison
evidence (Rose & Morrison 2009: 143)
p(E|Hp)
p(E|Hd)
2
Hughes & Foulkes
IAFPA 2012
LR =
3. • assessment of similarity of observed features in the
criminal and known samples, and their typicality
• typicality = dependent on patterns in the relevant
population (Aitken & Taroni 2004)
– definedby the defencehypothesis
– quantified relative to a sampled sub-section of
that population(reference data)
3
Hughes & Foulkes
IAFPA 2012
Hp Hd
from Berger (2012)
4. • Rose (2004: 4) default Hd:
“same-sexspeaker(s) of the language”
• ‘logical relevance’ (Kaye 2004, 2008)
4
Hughes & Foulkes
IAFPA 2012
Study Feature
REFERENCE DATA
Speech style N speakers Age Language
Rose et al
(2003)
/ɕ/ /o/ /N/ Read 60 20-50 Japanese
Rose et al
(2006)
/aI/ Read 166 19-64 Australian
English
Morrison
(2008)
/aI/ Read 27 19-64 Australian
English
Kinoshita et al
(2009)
f0 Controlled
spontaneous
201 No
control
Japanese
5. Hughes & Foulkes
IAFPA 2012
5
• collecting referencedata
– bespokecase-by-casedata
– ‘off-the-shelf’data
• inevitable mismatch between the off-the-shelf
data and the facts of the case at trial
• LRs necessary vary with different reference
data
6. 2. research questions
to what extent are LRs affected by…
i. varying N speakers in the reference data?
ii. varying N tokens per speaker in the
reference data?
iii. dialect mismatch between target voice and
reference data?
6
Hughes & Foulkes
IAFPA 2012
7. 7
Hughes & Foulkes
IAFPA 2012 7
Raw LR Log10 LR Verbal expression
>10000 5 Very strong evidence
1000-10000 4 Strong evidence
100-1000 3 Moderately strong evidence
10-100 2 Moderate evidence
1-10 1 Limited evidence
1-0.1 -1 Limited evidence
0.1-0.01 -2 Moderate evidence
0.01-0.001 -3 Moderately strong evidence
0.001-0.0001 -4 Strong evidence
<0.0001 -5 Very strong evidence
Champod and Evett (2000)
Hp
Hd
9. • reference data:
– New Zealand English (NZE) from Canterbury
Corpus (ONZE)
– 120 male speakers (born 1932-1987)
– min 10 tokensper speaker (codedfor context)
– auto-generatedformant data
9
Hughes & Foulkes
IAFPA 2012
• test data:
– NZE/ Manchester/ Newcastle/York
– 8 male speakers per set (aged 16-31)
– 16 tokens per speaker (coded for context)
10. • why GOOSE /u:/?
– not a regional stereotype (Labov 1971) of any of the
test set dialects
10
Hughes & Foulkes
IAFPA 2012
200
300
400
500
600
700
800
05001000150020002500
F1 (Hz)
F2 (Hz)
Manchester
Newcastle
York
ONZE
12. 4. results
i. number of reference speakers
12
Hughes & Foulkes
IAFPA 2012
– test data combined
• 32 same-speaker comparisons
• 992 different-speaker comparisons
– starting with 120 speakers
• 10 tokens per speaker
– ten speakers removed at a time
13. N speakers
13
Hughes & Foulkes
IAFPA 2012
0 20 40 60 80 100 120
-6
-5
-4
-3
-2
-1
0
1
2
3
4
Number of Speakers
Log10 LR
Log1o LR Verbal expression
+/- 1 Limited evidence
+/- 2 Moderate evidence
+/- 3 Moderately strong evidence
+/- 4 Strong evidence
+/- 5 Very strong evidence
same-speaker pairs
Mean Log10 LR
Standard deviation
14. N speakers
14
Hughes & Foulkes
IAFPA 2012
0 20 40 60 80 100 120
-6
-5
-4
-3
-2
-1
0
1
2
3
4
Number of Speakers
Log10 LR
• stablemean > 20 speakers
• increasedvariance< 40 speakers
same-speaker pairs
Mean Log10 LR
Standard deviation
24. -12 -10 -8 -6 -4 -2 0 2 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Log10 Likelihood Ratio
Cumulative Proportion
dialect mismatch: F1 and F2
24
Hughes & Foulkes
IAFPA 2012
same-speaker
pairs
different-speaker
pairs
ONZE (match)
Newcastle
Manchester
York
25. -12 -10 -8 -6 -4 -2 0 2 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Log10 Likelihood Ratio
Cumulative Proportion
dialect mismatch: F1 and F2
25
Hughes & Foulkes
IAFPA 2012
same-speaker
pairs
different-speaker
pairs
71%
58%
ONZE (match)
Newcastle
Manchester
York
26. 5. discussion
26
Hughes & Foulkes
IAFPA 2012
i. number of reference speakers
– evidenceof “population size effect” (Ishihara and
Kinoshita 2008)
• misrepresentative estimation of the strength of
evidence with small N speakers in reference data
– mean LRs & variance stable > ca. 40 speakers
• different-speaker pairs more sensitive
28. 28
Hughes & Foulkes
IAFPA 2012
iii. dialect mismatch
- same-speakerstrengthof evidenceoverestimated
• generally equivalent to one verbal category
- multitudeof issueswith different-speakerpairs
• overestimation of LRs for York (BUT issues of between-
speaker variation)
• high levels of contrary to fact support for the
prosecution for Manchester and Newcastle
• potential miscarriages of justice
29. 29
Hughes & Foulkes
IAFPA 2012
5. conclusion
• positive practical implications
- mean and variance of LRs stable until only small N
speakers in the reference data
- good Cllr, even with relatively small N tokens per speaker
- but the more speakers and the more tokens the better
• predictably dialect matters
- even for features which aren’t expected to display
considerable variation according to region
- default Hd needs to account for this
- how narrowly do we need to define dialect?
- what about other ‘logically relevant’ class factors?
31. References
Aitken, C. G. G. and Taroni, F. (2004) Statistics and the evaluation of evidence for forensic
scientists (2nd edition). Chichester: John Wiley & Sons.
Berger, C. (2012) Modern evidential interpretation, reporting and fallacies. Lecture given at the
BBfor2 Summer School in Forensic Evidence Evaluation and Validation. Universidad
Autonoma de Madrid, Spain. 18-21 July 2012.
Brümmer, N. and du Preez, J. (2006) Application independent evaluation of speaker detection.
Computer Speech and Language 20: 230-275.
Champod, C. and Evett, I. W. (2000) Commentary on A. P. A. Broeders (1999) ‘Some
observations on the use of probability scales in forensic identification’. Forensic
Linguistics 7(2): 238-243.
Ishihara, S. and Kinoshita, Y. (2008) How many do we need? Exploration of the Population Size
Effect on the performance of forensic speaker classification. Paper presented at the 9th
Annual Conference of the International Speech Communication Association
(Interspeech). Brisbane, Australia. 1941-1944.
Kaye, D. H. (2004) Logical relevance: problems with the reference population and DNA mixtures
in People v. Pizarro. Law, Probability and Risk 3: 211-220.
Kaye, D. H. (2008) DNA probabilities in People v. Prince: When are racial and ethnic statistics
relevant? In Speed, T. And Nolan, D. (eds.) Probability and Statistics: Essays in Honour
of David A Freedman. Beachwood, OH: Institute of Mathematical Statistics. 289-301.
31
Hughes & Foulkes
IAFPA 2012
32. 32
Hughes & Foulkes
IAFPA 2012
Kinoshita, Y., Ishihara, S. and Rose, P. (2009) Exploring the discriminatory potential of F0
distribution parameters in traditional speaker recognition. International Journal of
Speech, Language and the Law 16(1): 91-111.
Labov, W. (1971) The study of language in its social context. In Fishman, J. A. (ed.) Advances in
the Sociology of Language (vol. 1). The Hague: Mouton. 152-216.
Loakes, D. (2006) A forensic phonetic investigation into the speech patterns of identical and
non-identical twins. PhD Dissertation, University of Melbourne.
McDougall, K. (2004) Speaker-specific formant dynamics: An experiment on Australian English
/aɪ/. International Journalof Speech, Language and the Law 11(1): 103-130.
McDougall, K. (2006) Dynamic features of speech and the characterisation of speakers: towards
a new approach using formant frequencies. International Journal of Speech, Language
and the Law 13(1): 89-126.
Morrison, G. S. (2007) Matlab implementation of Aitken and Lucy’s (2004) Forensic
Likelihood-Ratio Software Using Multivariate-Kernel-Density Estimation [software].
Available: http://geoff-morrison.net.
Morrison, G. S. (2008) Forensic voice comparison using likelihood ratios based on polynomial
curves fitted to the formant trajectories of Australian English /aI/. International
Journalof Speech, Language and the Law 5(2): 249-266.