SlideShare ist ein Scribd-Unternehmen logo
1 von 258
Downloaden Sie, um offline zu lesen
I
                                                  4 1
                                 1    1
     1 ft


SOLVIN G CRIME WITH MATHEMATICS
                                                                       1
                                                            -      *

THE NUMBERS BEHIND

NUMB3RS
  KEITH DEVLIN .                N P R ' S " M o t h Guy"   and


 G A R ! ' L O R D E hI,      the M o t h C o n s u l t a n t on
      NU MB3RS", t h e h it C B S tel evision series
A COMPANION TO THE HIT CBS
       CRIME SERIES NUMB3RS                                   PRESENTS
    THE FASCINATING WAYS MATHEMATICS
       IS USED TO FIGHT REAL-LIFE CRIME


•             :i                        k                 im
Using the popular CBS prime-time TV crime series NUMB3RS' as
a springboard, Keith Devlin (known to millions of NPR listeners
as "the Math Guy" on NPR's Weekend Edition with Scott Simon)
and Gary Lorden (the math consultant to NUMB3RS " explain
                                                 )
real-life mathematical techniques used by the FBI and other law
enforcement agencies to catch and convict criminals. From
forensics to counterterrorism. the Riemann hypothesis lo image
enhancement, solving murders to beating casino odds, Devlin
and Lorden present compelling cases that illustrate how ad­
vanced mathematics can be used in state-of-the-art criminal
investigations.
               P r a i s e for t h e t e l e v i s i o n s e r i e s :
    "NUMB3RS                  L O O K S LIKE A W I N N 3 R . "
                                  —USA        Today
A PLUME     BOOK


              THE NUMBERS BEHIND NUMB3RS


DR. KEITH DEVLIN       is executive director o f Stanford University's Center for
the Study o f Language and Information and a consulting professor o f
mathematics at Stanford. Devlin has a B.Sc. degree in Mathematics from
King's College London (1968) and a Ph.D. in Mathematics from the Uni­
versity o f Bristol (1971). He is a fellow o f the American Association for
the Advancement o f Science, a World Economic Forum fellow, and a
former member o f the Mathematical Sciences Education Board o f the
U.S. National Academy o f Sciences. The author o f twenty-five books,
Devlin has been a regular contributor to National Public Radio's popular
program Weekend Edition, where he is known as "the Math Guy" in his
on-air conversations with host Scott Simon. His monthly column, "Dev­
lin's Angle," appears on Mathematical Association o f America's web
journal MAA Online.


DR. GARY L O R D E N   is a professor in the mathematics department o f the
California Institute o f Technology in Pasadena. He graduated from
Caltech with a B.S. in mathematics in 1962, received his Ph.D. in math­
ematics from Cornell University in 1966, and taught at Northwestern
University before returning to Caltech in 1968. A fellow o f the Institute
of Mathematical Statistics, Lorden has taught statistics, probability, and
other mathematics at all levels from freshman to doctoral. Lorden has
also been active as a consultant and expert witness in mathematics and
statistics for government agencies and laboratories, private companies,
and law firms. For many years he consulted for Caltech's Jet Propulsion
Laboratory for their space exploration programs. He has participated in
highly classified research projects aimed at enhancing the ability o f gov­
ernment agencies (such as the NSA) to protect national security. Lorden
is the chief mathematics consultant for the CBS T V series NUMB3RS.
THE
NUMBERS BEHIND

NUMB3RS
Solving Crime with Mathematics


       Keith Devlin, Ph.D.
               and

      Gary Lorden, Ph.D.




               ©
          A PLUME B O O K
PLUME
Published by Penguin Group
Penguin Group (USA) Inc., 375 Hudson Street, New York, New York 10014,
U.S.A.     Penguin Group (Canada), 9 0 Eglinton Avenue East, Suite 700, Toronto,
Ontario, Canada M 4 P 2Y3 (a division of Pearson Penguin Canada Inc.) Penguin Books
Ltd., 8 0 Strand, London W C 2 R 0 R L , England  Penguin Ireland, 25 St. Stephen's Green,
Dublin 2, Ireland (a division of Penguin Books Ltd.)   Penguin Group (Australia),
2 5 0 Camberwell Road, Camberwell, Victoria 3124, Australia (a division of Pearson
Australia Group Pty. Ltd.) Penguin Books India Pvt. Ltd., 11 Community Centre,
Panchsheel Park, New Delhi - 110 017, India      Penguin Books (NZ), 67 Apollo Drive,
Rosedale, North Shore 0 7 4 5 , Auckland, New Zealand (a division of Pearson
New Zealand Ltd.)      Penguin Books (South Africa) (Pty.) Ltd., 2 4 Sturdee Avenue,
Rosebank, Johannesburg 2196, South Africa
Penguin Books Ltd., Registered Offices: 80 Strand, London WC2R 0RL, England
First published by Plume, a member of Penguin Group (USA) Inc.
First Printing, September 2 0 0 7
10      9 8 7 6 5 4 3 2              1
Copyright © Keith Devlin and Gary Lorden, 2007
All rights reserved
Illustration credits appear on page 244.
    REGISTERED TRADEMARK—MARCA REGISTRADA

LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA

Devlin, Keith J.
   The numbers behind NUMB3RS: solving crime with mathematics/Keith Devlin,
Gary Lorden.
        p. cm.
   ISBN 978-0-452-28857-7
 1. Criminal investigation. 2. Mathematical statistics. 3. Criminal investigation—Data
processing. I. Title: Numbers behind numbers. II. Lorden, Gary. HI. Title.
HV8073.5.D485 2007
363.2501'5195—dc22
                           2007018115

Printed in the United States of America
Set in Dante MT
Designed by Joseph Rutt
Without limiting the rights under copyright reserved above, no part of this publication may
be reproduced, stored in or introduced into a retrieval system, or transmitted, in any form, or
by any means (electronic, mechanical, photocopying, recording, or otherwise), without the
prior written permission of both the copyright owner and the above publisher of this book.
PUBLISHER'S NOTE
The scanning, uploading, and distribution of this book via the Internet or via any other
means without the permission of the publisher is illegal and punishable by law. Please
purchase only authorized electronic editions, and do not participate in or encour­
age electronic piracy of copyrighted materials. Your support of the author's rights is
appreciated.
BOOKS ARE AVAILABLE AT QUANTITY DISCOUNTS WHEN USED TO PROMOTE PRODUCTS OR SERVICES.
FOR INFORMATION PLEASE WRITE TO PREMIUM MARKETING DIVISION, PENGUIN GROUP (USA) INC.,
3 7 5 HUDSON STREET, NEW YORK, NEW YORK 1 0 0 1 4 .
Acknowledgments




The authors want to thank NUMB3RS creators Cheryl Heuton and Nick
Falacci for creating Charlie Eppes, television's first mathematics super­
hero, and succeeding brilliantly in putting math on television in prime
time. Their efforts have been joined by a stellar team o f other writers,
actors, producers, directors, and specialists whose work has inspired us to
write this book. The gifted actor David Krumholtz has earned the undy­
ing love o f mathematicians everywhere for bringing Charlie to life in a
way that has led millions o f people to see mathematics in a completely
new light. Thanks also to NUMB3RS researchers Andy Black and Matt
Kolokoff for being wonderful to work with in coming up with endless
applications o f mathematics to make the writers' dreams come true.
   We wish to express our particular thanks to mathematician Dr.
Lenny Rudin o f Cognitech, one o f the world's foremost experts on im­
age enhancement, for considerable help with Chapter 5 and for provid­
ing the images we show in that chapter.
   Finally, Ted Weinstein, our agent, found us an excellent publisher in
David Cashion o f Plume, and both worked tirelessly to turn a manuscript
that we felt was as reader-friendly as possible, given that this is a math
book, into one that, we have to acknowledge, is now a lot more so!


                                             Keith Devlin, Palo Alto, CA
                                             Gary Lorden, Pasadena, CA
Contents



              Introduction
             The Hero Is a Mathematician?                           ix


         1    Finding t h e H o t Z o n e                            1
              Criminal Geographic         Profiling

         2    Fighting Crime w i t h Statistics 101                 13

         3    D a t a Mining                                        25
              Finding Meaningful      Patterns in
              Masses of     Information

4 When Does the Writing First Appear
on the Wall? 51
              Changepoint       Detection

         5    I m a g e Enhancement and Reconstruction              63

         6    Predicting t h e Future                               77

              Bayesian    Inference

         7    D N A Profiling                                       89

         8    S e c r e t s — M a k i n g and Breaking C o d e s   105

         9    H o w Reliable Is t h e Evidence?                    121
              Doubts about    Fingerprints

        10    Connecting t h e Dots                                137
              The Math of      Networks
viii                                     Contents


11 The Prisoner's Dilemma, Risk Analysis,
and Counterterrorism 153

           12   M a t h e m a t i c s in t h e C o u r t r o o m   175

           13   C r i m e in t h e Casino                          193
                Using Math to Beat the System

                Appendix
      Mathematical Synopses of the Episodes
      in the First Three Seasons of NUMB3RS 207

                Index                                              233
INTRODUCTION

                    The Hero Is a
                    Mathematician ?



On January 23, 2005, a new television crime series called NUMB3RS de­
buted. Created by the husband-and-wife team Nick Falacci and Cheryl
Heuton, the series was produced by Paramount Network Television
and acclaimed Hollywood veterans Ridley and Tony Scott, whose movie
credits include Alien, Top Gun, and Gladiator.      Throughout its run,
NUMB3RS has regularly beat out the competition to be the most watched
series in its time slot on Friday nights.
   What has surprised many is that one o f the show's two heroes is a
mathematician, and much o f the action revolves around mathematics,
as professor Charlie Eppes uses his powerful skills to help his older
brother, Don, an FBI agent, identify and catch criminals. Many viewers,
and several critics, have commented that the stories are entertaining,
but the basic premise is far-fetched: You simply can't use math to solve
crimes, they say. As this book proves, they are wrong. You can use math
to solve crimes, and law enforcement agencies do—not in every instance
to be sure, but often enough to make math a powerful weapon in the
never-ending fight against crime. In fact, the very first episode o f the
series was closely based on a real-life case, as we will discuss in the next
chapter.
   Our book sets out to describe, in a nontechnical fashion, some o f the
major mathematical techniques currently available to the police, CIA,
and FBI. Most o f these methods have been mentioned during episodes
of NUMB3RS, and while we frequently link our explanations to what
was depicted on the air, our focus is on the mathematical techniques
and how they can be used in law enforcement. In addition we describe
X                              Introduction


some real-life cases where mathematics played a role in solving a crime
that have not been used in the T V series—at least not directly.
    In many ways, NUMB3RS is similar to good science fiction, which is
based on correct physics or chemistry. Each week, NUMB3RS presents a
dramatic story in which realistic mathematics plays a key role in the nar­
rative. The producers o f NUMB3RS go to great lengths to ensure that the
mathematics used in the scripts is correct and that the applications shown
are possible. Although some o f the cases viewers see are fictional, they
certainly could have happened, and in some cases very well may. Though
the T V series takes some dramatic license, this book does not. In The
Numbers Behind NUMB3RS, you will discover the mathematics that can
be, and is, used in fighting real crime and catching actual criminals.
THE NUMBERS BEHIND NUMB3RS
CHAPTER


               Finding the Hot Zone
 1             Criminal Geographic                                             Profiling




FBI Special Agent D o n Eppes looks again at t h e large street m a p of Los
Angeles spread across t h e dining-room table of his father's h o u s e . T h e
crosses inked o n t h e m a p s h o w t h e locations w h e r e , over a period of
several m o n t h s , a b r u t a l serial killer has struck, raping and t h e n m u r d e r ­
ing a n u m b e r of y o u n g w o m e n . D o n ' s j o b is t o catch t h e killer before h e
strikes again. But t h e investigation has stalled. D o n is o u t of clues, a n d
doesn't k n o w w h a t t o d o next.
    "Can I help?" T h e voice is that of D o n ' s y o u n g e r brother, Charlie, a
brilliant y o u n g professor of m a t h e m a t i c s at t h e n e a r b y university CalSci.
D o n has always b e e n in awe of his b r o t h e r ' s incredible ability at m a t h ,
and frankly w o u l d w e l c o m e any help h e can get. B u t . . . help from a
mathematician?
    "This case isn't about numbers, Charlie." T h e edge in Don's voice is
caused m o r e by frustration than anger, b u t Charlie seems not to notice, and
his reply is totally matter-of-fact b u t insistent: "Everything is numbers."
    D o n is n o t convinced. Sure, h e has often h e a r d Charlie say that
m a t h e m a t i c s is all a b o u t patterns—identifying t h e m , analyzing t h e m ,
m a k i n g predictions a b o u t t h e m . But it didn't take a m a t h genius t o see
that t h e crosses o n t h e m a p w e r e scattered haphazardly. T h e r e w a s n o
pattern, n o way anyone could predict w h e r e t h e next cross w o u l d g o —
the exact location w h e r e t h e next y o u n g girl w o u l d b e attacked. Maybe
it w o u l d occur that very evening. If only there w e r e s o m e regularity t o
the a r r a n g e m e n t of t h e crosses, a p a t t e r n that could b e c a p t u r e d w i t h a
mathematical equation, t h e w a y D o n r e m e m b e r s from his schooldays
                          2      2
that the equation x + y = 9 describes a circle.
2                         T H E NUMBERS B E H I N D             NUMB3RS


    L o o k i n g at t h e m a p , even Charlie has t o agree there is n o way to use
m a t h t o predict w h e r e t h e killer w o u l d strike next. H e strolls over to the
w i n d o w a n d stares o u t across t h e garden, t h e silence of the evening
b r o k e n only by t h e continual flick-flick-jiick-ftick of t h e automatic sprin­
kler w a t e r i n g t h e lawn. Charlie's eyes see t h e sprinkler b u t his m i n d is
far away. H e h a d t o a d m i t that D o n w a s probably right. Mathematics
could b e used t o d o lots of things, far m o r e t h a n m o s t people realized.
But in o r d e r t o use m a t h , t h e r e h a d t o b e s o m e sort of pattern.
    Flick-Jiick-flick-jlick. T h e sprinkler continued to do its job. T h e r e was
t h e brilliant m a t h e m a t i c i a n in N e w York w h o used mathematics to study
t h e w a y t h e h e a r t w o r k s , helping doctors spot tiny irregularities in a
heartbeat before t h e p e r s o n has a h e a r t attack.
    Flick-flick-flick-flick. T h e r e were all those mathematics-based c o m p u t e r
p r o g r a m s the banks utilized t o track credit card purchases, looking for a
sudden change in the p a t t e r n that might indicate identity theft or a stolen
card.
    Flick-flick-flick-flick. W i t h o u t clever m a t h e m a t i c a l algorithms, the cell
p h o n e in Charlie's p o c k e t w o u l d have b e e n twice as big and a lot
heavier.
    Flick-flick-flick-flick. In fact, t h e r e w a s scarcely any area of m o d e r n life
that did n o t d e p e n d , often in a crucial way, o n m a t h e m a t i c s . But there
h a d t o b e a p a t t e r n , o t h e r w i s e t h e m a t h can't get started.
    Flick-flick-flick-flick. For t h e first t i m e , Charlie notices t h e sprinkler,
and suddenly h e k n o w s w h a t t o do. H e has his answer. H e could help
solve D o n ' s case, a n d t h e solution has b e e n staring h i m in t h e face all
along. H e j u s t h a d n o t realized it.
    H e drags D o n over t o t h e window. "We've b e e n asking the w r o n g
question," h e says. " F r o m w h a t y o u know, there's n o way y o u can pre­
dict w h e r e t h e killer will strike next." H e points t o t h e sprinkler. "Just
like, n o m a t t e r h o w m u c h y o u study w h e r e each d r o p of w a t e r hits the
grass, there's n o w a y y o u can predict w h e r e the next d r o p will land.
T h e r e ' s t o o m u c h uncertainty." H e glances at D o n t o m a k e sure his
older b r o t h e r is listening. "But suppose you could n o t see t h e sprinkler,
a n d all y o u h a d t o g o o n was t h e p a t t e r n of w h e r e all the drops landed.
T h e n , using m a t h , y o u could w o r k o u t exactly w h e r e the sprinkler m u s t
be. You can't use t h e p a t t e r n of drops t o predict forward t o the next
Finding    the Hot     Zone                                  3


drop, b u t y o u can use it t o w o r k b a c k w a r d t o t h e source. It's t h e s a m e
with your killer."
    D o n finds it difficult to accept w h a t his b r o t h e r seems t o b e suggesting.
"Charlie, are you telling m e you can figure o u t w h e r e the killer lives?"
    Charlie's answer is simple: "Yes."
    D o n is still skeptical that Charlie's idea can really w o r k , b u t he's
impressed by his b r o t h e r ' s confidence and passion, a n d so h e agrees t o
let h i m assist w i t h t h e investigation.
    Charlie's first step is to learn s o m e basic facts from the science of crimi­
nology: First, h o w do serial killers behave? Here, his years of experience as
a mathematician have taught h i m h o w to recognize the key factors and
ignore all the others, so that a seemingly complex problem can b e reduced
to one with just a few key variables. Talking with D o n and the other agents
at the FBI office where his elder brother works, h e learns, for instance, that
violent serial criminals exhibit certain tendencies in selecting locations.
They tend to strike close to their h o m e , b u t n o t t o o close; they always set
a "buffer z o n e " around their residence w h e r e they will n o t strike, an area
that is too close for comfort; outside that comfort zone, the frequency of
crime locations decreases as the distance from h o m e increases.
    T h e n , back in his office in t h e CalSci m a t h e m a t i c s d e p a r t m e n t ,
Charlie gets t o w o r k in earnest, feverishly covering his blackboards
w i t h mathematical equations and formulas. His goal: t o find t h e m a t h ­
ematical key t o d e t e r m i n e a "hot z o n e " — a n area o n t h e m a p , derived
from the crime locations, w h e r e t h e p e r p e t r a t o r is m o s t likely t o live.
   As always w h e n h e w o r k s o n a difficult m a t h e m a t i c a l p r o b l e m , t h e
h o u r s fly by as Charlie tries o u t m a n y unsuccessful approaches. T h e n ,
finally, h e has an idea h e thinks should w o r k . H e erases his previous
chalk scribbles o n e m o r e t i m e a n d writes this complicated-looking
formula o n t h e board:*



      =k
  p, Y,

    *We'll take a closer look at this formula in a moment.
4                       THE NUMBERS B E H I N D            NUMB3RS


    " T h a t should d o t h e trick," h e says t o himself.
    T h e next step is t o fine-tune his formula by checking it against exam­
ples of past serial crimes D o n provides h i m with. W h e n h e inputs the
crime locations from those previous cases into his formula, does it accu­
rately predict w h e r e t h e criminals lived? This is t h e m o m e n t of truth,
w h e n Charlie will discover w h e t h e r his m a t h e m a t i c s reflects reality.
S o m e t i m e s it doesn't, and h e learns that w h e n h e first decided which
factors t o take into a c c o u n t and which to ignore, h e m u s t have got it
w r o n g . But this time, after Charlie m a k e s a few m i n o r adjustments, the
formula s e e m s t o w o r k .
    T h e next day, b u r s t i n g w i t h e n e r g y and conviction, Charlie shows u p
at t h e FBI offices w i t h a p r i n t o u t of the crime-location m a p w i t h the
                                                                          2     2
"hot z o n e " p r o m i n e n t l y displayed. Just as the equation x + y = 9 that
D o n r e m e m b e r e d from his schooldays describes a circle, so that w h e n
t h e e q u a t i o n is fed into a suitably p r o g r a m m e d c o m p u t e r it will draw
t h e circle, so t o o w h e n Charlie fed his n e w equation into his computer,
it also p r o d u c e d a picture. N o t a circle this time—Charlie's equation is
m u c h m o r e complicated. W h a t it gave was a series of concentric col­
ored regions d r a w n o n D o n ' s crime m a p of Los Angeles, regions that
h o m e d in o n t h e h o t z o n e w h e r e the killer lives.
    H a v i n g this m a p will still leave a lot of w o r k for D o n and his col­
leagues, b u t finding t h e killer is n o longer like looking for a needle in a
haystack. T h a n k s t o Charlie's m a t h e m a t i c s , the haystack has suddenly
dwindled t o a m e r e sackful of hay.
Finding    t h e H o t Zone                                    5


    Charlie explains to D o n and the other FBI agents w o r k i n g t h e case that
the serial criminal has tried n o t to reveal w h e r e h e lives, picking victims in
w h a t h e thinks is a r a n d o m p a t t e r n of locations, b u t that t h e m a t h e m a t i ­
cal formula nevertheless reveals the truth: a h o t z o n e in which t h e crimi­
nal's residence is located, to a very high probability. D o n and the t e a m
decide to investigate m e n within a certain range of ages, w h o live in t h e
h o t zone, and use surveillance and stealth tactics t o obtain D N A evidence
from the suspects' discarded cigarette butts, drinking straws, and the like,
which can be m a t c h e d w i t h D N A from t h e crime-scene investigations.
    Within a few days—and a few heart-stopping m o m e n t s — t h e y have
their m a n . T h e case is solved. D o n tells his y o u n g e r brother, " T h a t ' s
some formula you've got there, Charlie."



FACT OR FICTION?
Leaving out a few dramatic twists, the above is w h a t t h e T V audience saw
in the very first episode of NUMB3RS, broadcast o n January 23, 2005.
Many viewers could n o t believe that mathematics could help capture a
criminal in this way. In fact, that entire first episode w a s based fairly closely
on a real case in which a single mathematical equation was used t o identify
the hot zone w h e r e a criminal lived. It was the very equation, reproduced
above, that viewers saw Charlie write o n his blackboard.
    T h e real-life m a t h e m a t i c i a n w h o p r o d u c e d t h a t formula is n a m e d
Kim Rossmo. T h e technique of using m a t h e m a t i c s t o predict w h e r e
a serial criminal lives, w h i c h R o s s m o helped t o establish, is called
geographic profiling.
    In the 1980s R o s s m o w a s a y o u n g constable o n t h e police force in
Vancouver, Canada. W h a t m a d e h i m u n u s u a l for a police officer w a s his
talent for mathematics. T h r o u g h o u t school h e h a d b e e n a " m a t h w h i z , "
the kind of student w h o m a k e s fellow students, a n d often teachers, a
little nervous. T h e story is told that early in t h e twelfth g r a d e , b o r e d
w i t h the slow pace of his m a t h e m a t i c s course, h e asked t o take t h e final
exam in the second w e e k of t h e semester. After scoring o n e h u n d r e d
percent, h e was excused from t h e r e m a i n d e r of t h e course.
    Similarly b o r e d w i t h t h e typical slow progress of police investigations
involving violent serial criminals, R o s s m o decided t o g o back t o school,
6                        T H E NUMBERS B E H I N D          NUMB3RS


ending u p w i t h a Ph.D. in criminology from Simon Fraser University, the
first cop in Canada t o get one. His thesis advisers, Paul and Patricia
Brantingham, w e r e pioneers in t h e development of mathematical models
(essentially sets of equations that describe a situation) of criminal
behavior, particularly those that describe w h e r e crimes are m o s t likely to
occur based o n w h e r e a criminal lives, works, and plays. (It was the
Brantinghams w h o noticed the location patterns of serial criminals
that T V veiwers saw Charlie learning a b o u t from D o n and his FBI
colleagues.)
    Rossmo's interest w a s a little different from the Brantinghams'. H e
did n o t w a n t t o study p a t t e r n s of criminal behavior. As a police officer,
h e w a n t e d t o use actual data a b o u t t h e locations of crimes linked to a
single u n k n o w n p e r p e t r a t o r as an investigative tool t o help the police find
t h e criminal.
    R o s s m o h a d s o m e initial successes in re-analyzing old cases, and after
receiving his Ph.D. and b e i n g p r o m o t e d to detective inspector, h e pur­
sued his interest in developing b e t t e r m a t h e m a t i c a l m e t h o d s to do w h a t
h e c a m e t o call criminal g e o g r a p h i c targeting (CGT). O t h e r s called the
m e t h o d "geographic profiling," since it c o m p l e m e n t e d the well-known
t e c h n i q u e of "psychological profiling" used by investigators to find
criminals based o n their motivations and psychological characteristics.
G e o g r a p h i c profiling a t t e m p t s t o locate a likely base of operation for a
criminal b y analyzing t h e locations of their crimes.
    R o s s m o hit u p o n t h e key idea b e h i n d his seemingly m a g i c formula
while riding o n a bullet train in J a p a n o n e day in 1991. Finding himself
w i t h o u t a n o t e p a d t o w r i t e on, h e scribbled it o n a napkin. W i t h
later refinements, the formula b e c a m e the principal e l e m e n t of a
c o m p u t e r p r o g r a m R o s s m o w r o t e , called Rigel ( p r o n o u n c e d RYE-gel,
a n d n a m e d after t h e star in the constellation Orion, the H u n t e r ) . Today,
R o s s m o sells Rigel, along w i t h training and consultancy, to police
and o t h e r investigative agencies a r o u n d the world t o help t h e m find
criminals.
    W h e n R o s s m o describes h o w Rigel works to a law enforcement
agency interested in t h e p r o g r a m , h e offers his favorite m e t a p h o r — t h a t
of d e t e r m i n i n g t h e location of a rotating lawn sprinkler by analyzing the
p a t t e r n of t h e w a t e r drops it sprays o n t h e g r o u n d . W h e n NUMB3RS
Finding   the Hot     Zone                                  7


cocreators Cheryl H e u t o n and Nick Falacci w e r e w o r k i n g o n their pilot
episode, they t o o k Rossmo's o w n m e t a p h o r as t h e w a y Charlie w o u l d hit
u p o n the formula and explain the idea t o his brother.
    Rossmo h a d s o m e early successes dealing w i t h serial crime investiga­
tions in Canada, b u t w h a t really m a d e h i m a h o u s e h o l d n a m e a m o n g
law enforcement agencies all over N o r t h America w a s t h e case of t h e
South Side Rapist in Lafayette, Louisiana.
    For m o r e t h a n t e n years, an u n k n o w n assailant, his face w r a p p e d
bandit-style in a scarf, h a d b e e n stalking w o m e n in t h e t o w n a n d assault­
ing t h e m . In 1998 t h e local police, s n o w e d u n d e r by t h o u s a n d s of tips
and a corresponding n u m b e r of suspects, b r o u g h t R o s s m o in t o help.
Using Rigel, R o s s m o analyzed t h e crime-location data a n d p r o d u c e d a
m a p m u c h like the o n e Charlie displayed in NUMB3RS, w i t h b a n d s of
color indicating the h o t z o n e and its increasingly h o t interior rings. T h e
m a p enabled police t o n a r r o w d o w n t h e h u n t t o half a square mile a n d
about a d o z e n suspects. Undercover officers c o m b e d t h e h o t z o n e using
the same techniques p o r t r a y e d in NUMB3RS, t o obtain D N A samples of
all males of t h e right age r a n g e in t h e area.
    Frustration set in w h e n each of t h e suspects in t h e h o t z o n e w a s
cleared by D N A evidence. But t h e n they g o t lucky. T h e lead investigator,
McCullan "Mac" Gallien, received an a n o n y m o u s tip pointing t o a very
unlikely suspect—a sheriff's d e p u t y from a n e a r b y d e p a r t m e n t . As j u s t
o n e m o r e tip o n t o p of t h e m o u n t a i n h e already had, Mac w a s inclined
t o just file it, b u t o n a w h i m h e decided t o check t h e deputy's address.
N o t even close t o t h e h o t z o n e . Still s o m e t h i n g niggled h i m , and h e d u g
a little deeper. A n d t h e n h e hit t h e jackpot. T h e d e p u t y h a d previously
lived at a n o t h e r address—right in t h e h o t z o n e ! D N A evidence w a s
collected from a cigarette butt, and it m a t c h e d t h a t t a k e n from t h e
crime scenes. T h e d e p u t y w a s arrested, a n d R o s s m o b e c a m e an instant
celebrity in t h e crime-fighting world.
    Interestingly, w h e n H e u t o n and Falacci w e r e w r i t i n g t h e pilot epi­
sode of NUMB3RS, based o n this real-life case, they could n o t resist
incorporating the s a m e d r a m a t i c twist at t h e end. W h e n Charlie first
applies his formula, n o D N A m a t c h e s are found a m o n g t h e suspects in
the h o t z o n e , as h a p p e n e d w i t h Rossmo's formula in Lafayette. Charlie's
belief in his m a t h e m a t i c a l analysis is so s t r o n g that w h e n D o n tells h i m
8                       THE NUMBERS B E H I N D            NUMB3RS


t h e search has d r a w n a blank, h e initially refuses t o accept this o u t c o m e .
"You m u s t have missed h i m , " h e says.
    Frustrated and upset, Charlie huddles w i t h D o n at their father Alan's
h o u s e , and Alan says, "I k n o w t h e p r o b l e m can't b e t h e m a t h , Charlie. It
m u s t b e s o m e t h i n g else." This r e m a r k spurs D o n t o realize that finding
t h e killer's residence m a y b e t h e w r o n g goal. "If y o u tried to find m e
w h e r e I live, y o u w o u l d probably fail because I'm almost never there,"
h e notes. " I ' m usually at work." Charlie seizes o n this n o t i o n t o pursue
a different line of attack, modifying his calculations t o look for two
h o t z o n e s , o n e t h a t m i g h t contain t h e killer's residence and t h e other
his place of w o r k . This t i m e Charlie's m a t h w o r k s . D o n m a n a g e s t o
identify a n d catch t h e criminal j u s t before h e kills a n o t h e r victim.
    T h e s e days, Rossmo's c o m p a n y ECRI (Environmental Criminology
Research, Inc.) offers t h e p a t e n t e d c o m p u t e r package Rigel along w i t h
training in h o w t o use it effectively t o solve crimes. R o s s m o himself
travels a r o u n d t h e world, t o Asia, Africa, E u r o p e , and t h e Middle East,
assisting in criminal investigations and giving lectures to police and
criminologists. T w o years of training, by R o s s m o or o n e of his assistants,
is required t o learn t o adapt t h e use of t h e p r o g r a m to t h e idiosyncrasies
of a particular criminal's behavior.
    Rigel does n o t score a big w i n every time. For example, Rossmo was
called in o n t h e n o t o r i o u s Beltway Sniper case w h e n , during a three-week
period in O c t o b e r 2002, t e n people w e r e killed and three others critically
injured by w h a t t u r n e d o u t t o b e a pair of serial killers operating in and
a r o u n d t h e Washington, D.C., area. R o s s m o concluded that the sniper's
base w a s s o m e w h e r e in the suburbs t o t h e n o r t h of Washington, b u t it
t u r n e d o u t that t h e t w o killers did n o t live in t h e area and moved t o o
often t o b e located by geographic profiling.
    T h e fact that Rigel does n o t always w o r k will n o t c o m e as a surprise
t o anyone familiar w i t h w h a t h a p p e n s w h e n y o u try t o apply m a t h e m a t ­
ics t o t h e m e s s y real w o r l d of people. M a n y people c o m e away from
their h i g h school experience w i t h m a t h e m a t i c s thinking that there is a
right w a y a n d a w r o n g w a y t o use m a t h to solve a p r o b l e m — i n t o o
m a n y cases w i t h t h e teacher's w a y b e i n g t h e right o n e and their o w n
a t t e m p t s b e i n g t h e w r o n g o n e . But this is rarely t h e case. Mathematics
will always give y o u t h e correct answer (if you d o t h e m a t h right) w h e n
Finding    the Hot     Zone                                    9


you apply it to very well-defined physical situations, such as calculating
h o w m u c h fuel a j e t needs t o fly from Los Angeles t o N e w York. (That
is, the m a t h will give you t h e right answer provided y o u start w i t h accu­
rate data a b o u t t h e total w e i g h t of t h e plane, passengers, a n d cargo, t h e
prevailing winds, a n d so forth. Missing a key piece of i n p u t data t o
incorporate into t h e m a t h e m a t i c a l equations will almost always result
in an inaccurate answer.) But w h e n y o u apply m a t h t o a social p r o b l e m ,
such as a crime, things are rarely so clear-cut.
    Setting u p equations that capture elements of s o m e real-life activity is
called constructing a "mathematical m o d e l . " In constructing a physical
m o d e l of something, say an aircraft t o study in a w i n d tunnel, t h e impor­
tant thing is t o get everything right, apart from t h e size and t h e materials
used. In constructing a mathematical m o d e l , t h e idea is t o get t h e appro­
priate behavior right. For example, to b e useful, a m a t h e m a t i c a l m o d e l of
the w e a t h e r should predict rain for days w h e n it rains and predict sun­
shine o n sunny days. Constructing t h e m o d e l in t h e first place is usually
the hard part. "Doing the m a t h " w i t h t h e model—i.e., solving t h e equa­
tions that m a k e u p the model—is generally m u c h easier, especially w h e n
using computers. Mathematical models of t h e w e a t h e r often fail because
the w e a t h e r is simply far t o o complicated (in everyday language, it's "too
unpredictable") to b e captured by m a t h e m a t i c s w i t h great accuracy.
    As w e shall see in later chapters, t h e r e is usually n o such thing as
"one correct w a y " t o use m a t h e m a t i c s t o solve p r o b l e m s in t h e real
world, particularly p r o b l e m s involving people. To try t o m e e t t h e chal­
lenges that confront Charlie in NUMB3RS—locating                            criminals, tracing
the spread of a disease or of counterfeit money, predicting t h e target
selection of terrorists, and so o n — a m a t h e m a t i c i a n c a n n o t m e r e l y w r i t e
d o w n an equation and solve it. T h e r e is a considerable art t o t h e process
of assembling information and data, selecting m a t h e m a t i c a l variables
that describe a situation, and t h e n m o d e l i n g it w i t h a set of equations.
And once a m a t h e m a t i c i a n has c o n s t r u c t e d a m o d e l , t h e r e is still t h e
m a t t e r of solving it in s o m e way, by approximations or calculations or
c o m p u t e r simulations. Every step in t h e process requires j u d g m e n t a n d
creativity. N o t w o m a t h e m a t i c i a n s w o r k i n g independently, h o w e v e r
brilliant, are likely t o p r o d u c e identical results, if i n d e e d they can
p r o d u c e useful results at all.
10                       T H E NUMBERS B E H I N D          NUMB3RS


     It is n o t surprising, then, that in t h e field of geographic profiling,
R o s s m o has competitors. Dr. Grover M. G o d w i n of t h e Justice Center at
t h e University of Alaska, a u t h o r of t h e b o o k Hunting Serial Predators, has
developed a c o m p u t e r package called Predator that uses a b r a n c h of
m a t h e m a t i c a l statistics called multivariate analysis t o pinpoint a serial
killer's h o m e base b y analyzing t h e locations of crimes, w h e r e the
victims w e r e last seen, a n d w h e r e t h e bodies w e r e discovered. N e d
Levine, a H o u s t o n - b a s e d u r b a n planner, developed a p r o g r a m called
Crimestat for t h e National Institute of Justice, a research b r a n c h of the
U.S. Justice D e p a r t m e n t . It uses s o m e t h i n g called spatial statistics to
analyze serial-crime data, and it can also b e applied t o help agents under­
stand such things as p a t t e r n s of a u t o accidents o r disease outbreaks.
A n d David Canter, a professor of psychology at t h e University of
Liverpool in England, a n d t h e director of t h e Centre for Investigative
Psychology there, has developed his o w n c o m p u t e r p r o g r a m , Dragnet,
w h i c h h e has s o m e t i m e s offered free t o researchers. C a n t e r has pointed
o u t t h a t so far n o o n e has p e r f o r m e d a head-to-head comparison of the
various m a t h / c o m p u t e r systems for locating serial criminals based o n
applying t h e m in t h e s a m e cases, and h e has claimed in interviews that
in t h e l o n g r u n , his p r o g r a m and o t h e r s will prove to b e at least as
accurate as Rigel.



ROSSMO'S FORMULA
Finally, let's take a closer l o o k at t h e formulas R o s s m o scribbled d o w n
o n t h a t p a p e r n a p k i n o n t h e bullet train in Japan b a c k in 1991.



              c




     To u n d e r s t a n d w h a t it m e a n s , i m a g i n e a grid of little squares super­
i m p o s e d o n t h e m a p , each square having t w o n u m b e r s that locate it:
w h a t r o w it's in and w h a t c o l u m n it's in, "i" and "j". T h e probability, p..,
that t h e killer's residence is in that square is w r i t t e n o n t h e left side of
Finding    the Hot     Zone                                 11


the equation, and t h e right side shows h o w t o calculate it. T h e crime
locations are represented by m a p coordinates, ( x ^ ) for t h e first crime,
(x ,y ) for the second crime, a n d so on. W h a t t h e formula says is this:
  2    2



      To get the probability p.^ for t h e square in r o w "i", c o l u m n "j" of t h e
grid, first calculate h o w far y o u have t o g o t o get from t h e center p o i n t
(x.,y.) of that square t o each crime location ( x , y ) . T h e little "n" h e r e
                                                                n   n



stands for any o n e of t h e crime l o c a t i o n s — n = l m e a n s "first crime,"
n = 2 m e a n s "second crime," and so on. T h e answer t o t h e question of
h o w far you have t o g o is:


                                  IXi-xJ + ly.-yJ

and this is used in t w o ways.
      Reading from left t o right in t h e formula, t h e first way is to p u t that
distance in the d e n o m i n a t o r , w i t h (p in t h e n u m e r a t o r . T h e distance is
raised t o the p o w e r / T h e choice of w h a t n u m b e r t o use for t h i s / w i l l b e
based o n w h a t w o r k s best w h e n t h e formula is checked against data o n
past crime patterns. (If y o u t a k e / = 2, for example, t h e n that p a r t of t h e
formula will resemble t h e "inverse square law" that describes t h e force
of gravity.) This part of t h e formula expresses t h e idea that t h e probabil­
ity of crime locations decreases as t h e distance increases, once outside of
the buffer z o n e .
      T h e second w a y t h e formula uses t h e "traveling distance" of each
crime involves the buffer z o n e . In t h e second fraction, y o u subtract t h e
distance from 2B, w h e r e B is a n u m b e r t h a t will b e chosen t o describe
the size of t h e buffer z o n e , and y o u use that subtraction result in
the second fraction. T h e subtraction p r o d u c e s smaller answers as t h e
distance increases, so that after raising those answers t o a n o t h e r power,
g, in the d e n o m i n a t o r of t h e second p a r t of t h e formula, y o u get larger
results.
      Together, the first and second parts of t h e formula p e r f o r m a sort of
"balancing act," expressing t h e fact that as you m o v e away from t h e
criminal's base, the probability of crimes first increases (as y o u m o v e
t h r o u g h the buffer zone) and t h e n decreases. T h e t w o p a r t s of t h e
formula are c o m b i n e d using a fancy m a t h e m a t i c a l notation, t h e G r e e k
letter Z standing for " s u m (add up) t h e contributions from each of t h e
12                      T H E NUMBERS B E H I N D         NUMB3RS


crimes t o t h e evaluation of the probability for the 'if grid square." T h e
G r e e k letter (p is u s e d in t h e t w o parts as a way of placing m o r e "weight"
o n o n e p a r t or t h e other. A larger choice of (p p u t s m o r e weight o n the
p h e n o m e n o n of "decreasing probability as distance increases," whereas
a smaller 9 emphasizes t h e effect of t h e buffer z o n e .
     O n c e t h e formula is used t o calculate t h e probabilities, p„, of all of
t h e little squares in t h e grid, it's easy t o m a k e a h o t z o n e map. You just
color t h e squares, w i t h t h e highest probabilities bright yellow, slightly
smaller probabilities o r a n g e , t h e n red, and so on, leaving t h e squares
w i t h l o w probability uncolored.
     Rossmo's formula is a g o o d example of t h e art of using m a t h e m a t i c s
t o describe i n c o m p l e t e k n o w l e d g e of real-world p h e n o m e n a . Unlike
t h e law of gravity, w h i c h t h r o u g h careful m e a s u r e m e n t s can b e observed
t o o p e r a t e the same way every time, descriptions of t h e behavior of
individual h u m a n beings are at best approximate and uncertain. W h e n
R o s s m o checked o u t his formula o n past crimes, h e h a d to find the
best fit of his formula t o those data b y choosing different possible values
of / a n d g, a n d of B a n d (p. H e t h e n used those findings in analyzing
future crime p a t t e r n s , still allowing for further fine-tuning in each n e w
investigation.
     Rossmo's m e t h o d is definitely n o t rocket science—space travel
d e p e n d s crucially o n always getting t h e right answer w i t h great accu­
racy. But it is nevertheless science. It does n o t w o r k every time, and the
answers it gives are probabilities. But in crime detection and other
d o m a i n s involving h u m a n behavior, k n o w i n g those probabilities can
s o m e t i m e s m a k e all t h e difference.
CHAPTER




2            Fighting Crime with
             Statistics 101




THE ANGEL OF DEATH
By 1996, Kristen Gilbert, a thirty-three-year-old divorced m o t h e r of t w o
sons, ages seven and ten, and a nurse in W a r d C at t h e Veteran's Affairs
Medical Center in N o r t h a m p t o n , Massachusetts, h a d built u p quite a
reputation a m o n g her colleagues at the hospital. O n several occasions she
was the first o n e to notice that a patient was going into cardiac arrest and
to sound a "code blue" to bring t h e e m e r g e n c y resuscitation t e a m . She
always stayed calm, and was c o m p e t e n t and efficient in administering to
the patient. Sometimes she w o u l d give t h e patient an injection of t h e
heart-stimulant d r u g epinephrine to a t t e m p t to restart the h e a r t before
the emergency t e a m arrived, occasionally saving t h e patient's life in this
way. T h e other nurses had given h e r the nickname 'Angel of Death."
   But that same year, three nurses approached the authorities to express
their growing suspicions that something was not quite right. There had
been just too many deaths from cardiac arrest in that particular ward, they
felt. There had also been several unexplained shortages of epinephrine. T h e
nurses were starting to fear that Gilbert was giving the patients large doses
of the drug to bring o n the heart attacks in the first place, so that she could
play the heroic role of trying to save them. T h e 'Angel of Death" nickname
was beginning to sound m o r e apt than they h a d first intended.
   T h e hospital launched an investigation, b u t found nothing untoward. In
particular, the n u m b e r of cardiac deaths at the unit was broadly in line w i t h
the rates at other VA hospitals, they said. Despite t h e findings of t h e initial
14                       T H E NUMBERS B E H I N D          NUMB3RS


investigation, however, the staff at the hospital remained suspicious, and
eventually a second investigation was begun. This included bringing in a
professional statistician, Stephen Gehlbach of the University of Massachu­
setts, to take a closer look at the unit's cardiac arrest and mortality figures.
Largely as a result of Gehlbach's analysis, in 1998 the U.S. Attorney's Office
decided to convene a g r a n d j u r y to hear the evidence against Gilbert.
     Part of t h e evidence w a s h e r alleged motivation. In addition to seek­
ing t h e excitement of t h e code blue a l a r m and the resuscitation process,
plus t h e recognition for having struggled valiantly to save t h e patient, it
w a s suggested t h a t she s o u g h t t o impress h e r boyfriend, w h o also
w o r k e d at t h e hospital. Moreover, she h a d access t o t h e epinephrine.
But since n o o n e h a d seen h e r administer any fatal injections, the case
against her, while suggestive, was purely circumstantial. Although the
patients involved w e r e mostly middle-aged m e n n o t regarded as poten­
tial h e a r t attack victims, it w a s possible that their attacks had occurred
naturally. W h a t tipped t h e balance, and led t o a decision t o indict Gilbert
for multiple m u r d e r , w a s Gehlbach's statistical analysis.



THE SCIENCE OF STATE
Statistics is widely used in law enforcement in m a n y ways and for m a n y
p u r p o s e s . In NUMB3RS, Charlie often carries o u t a statistical analysis,
and t h e use of statistical techniques will appear in m a n y chapters in this
b o o k , often w i t h o u t o u r m a k i n g explicit m e n t i o n of t h e fact. But w h a t
exactly does statistics entail? A n d w h y was t h e w o r d in the singular in
t h a t last sentence?
     T h e w o r d "statistics" c o m e s from the Latin t e r m statisticum collegium,
m e a n i n g "council of state" a n d t h e Italian w o r d statista, m e a n i n g "states­
m a n , " w h i c h reflects t h e initial uses of the technique. T h e G e r m a n
w o r d Statistik likewise originally m e a n t t h e analysis of data about the
state. Until t h e n i n e t e e n t h century, t h e equivalent English t e r m was
"political arithmetic," after w h i c h t h e w o r d "statistics" was introduced
t o refer t o any collection and classification of data.
     Today, "statistics" really has t w o c o n n e c t e d meanings. T h e first is the
collection a n d tabulation of data; t h e second is t h e use of mathematical
and o t h e r m e t h o d s t o d r a w meaningful and useful conclusions                  from
Fighting    Crime with Statistics       101                        15


tabulated data. S o m e statisticians refer t o t h e f o r m e r activity as "little-s
statistics" and the latter activity as "big-S Statistics". Spelled w i t h a
lower-case s, t h e w o r d is treated as plural w h e n it refers t o a collection
of n u m b e r s . But it is singular w h e n used t o refer t o t h e activity of
collecting and tabulating those n u m b e r s . "Statistics" (with a capital S)
refers t o an activity, and h e n c e is singular.
    T h o u g h m a n y sports fans a n d o t h e r kinds of people enjoy collecting
and tabulating numerical data, t h e real value of little-s statistics is t o
provide t h e data for big-S Statistics. M a n y of t h e m a t h e m a t i c a l tech­
niques used in big-S Statistics involve t h e b r a n c h of m a t h e m a t i c s k n o w n
as probability theory, which b e g a n in t h e sixteenth a n d seventeenth
centuries as an a t t e m p t t o u n d e r s t a n d t h e likely o u t c o m e s of g a m e s
of chance, in order t o increase t h e likelihood of winning. But w h e r e a s
probability t h e o r y is a definite b r a n c h of m a t h e m a t i c s , Statistics is
essentially an applied science that uses m a t h e m a t i c a l m e t h o d s .
   While the law enforcement profession collects a large quantity of little-
s statistics, it is the use of big-S Statistics as a tool in fighting crime that w e
shall focus on. (From n o w o n w e shall drop the "big S", "little s" terminol­
ogy and use the w o r d "statistics" the way statisticians do, to m e a n b o t h ,
leaving the reader to determine the intended m e a n i n g from the context.)
   Although s o m e applications of statistics in law e n f o r c e m e n t use
sophisticated m e t h o d s , the basic techniques covered in a                 first-semester
college statistics course are often e n o u g h t o crack a case.
    This was certainly t r u e for United States v. Kristen Gilbert. In that case,
a crucial question for the g r a n d j u r y w a s w h e t h e r there w e r e significantly
m o r e deaths in t h e unit w h e n Kristen Gilbert w a s o n duty t h a n at o t h e r
times. T h e key w o r d here is "significantly". O n e or t w o extra deaths o n
her watch could b e coincidence. H o w m a n y deaths w o u l d it take to reach
the level of "significance" sufficient t o indict Gilbert? This is a question
that only statistics can answer. Accordingly, Stephen Gehlbach was asked
to provide the g r a n d j u r y w i t h a s u m m a r y of his findings.



HYPOTHESIS TESTING
Gehlbach's testimony was based o n a f u n d a m e n t a l statistical t e c h n i q u e
k n o w n as hypothesis testing. This m e t h o d uses probability t h e o r y t o
16                           THE NUMBERS B E H I N D                          NUMB3RS


determine whether an observed outcome is so unusual that it is highly
unlikely to have occurred naturally.
   One of the first things Gehlbach did was plot the annual number of
deaths at the hospital from 1988 through 1997, broken down by shifts—
midnight to 8:00 AM, 8:00 AM to 4:00 PM, and 4:00 PM to midnight. The
resulting graph is shown in Figure 1. Each vertical bar shows the total
number of deaths in the year during that particular shift.

       40




                1988    1989       1990          1991    1992        1993          1994       1995    1996     1997
                                                                Year


            •      Night (12 A . M . - 8 A.M.)     •    Day (8 A . M . - 4 P.M.)          H    Evening (4 P.M.-12 A.M.)


     Figure 1 . Total deaths at the hospital, by shift and year.


    The graph shows a definite pattern. For the first two years, there were
around ten deaths per year on each shift. Then, for each of the years 1990
through 1995, one of the three shifts shows between 25 and 35 deaths per
year. Finally, for the last two years, the figures drop back to roughly ten
deaths on each of the three shifts. When the investigators examined
Kristen Gilbert's work record, they discovered that she started work in
Ward C in March 1990 and stopped working at the hospital in February
1996. Moreover, for each of the years she worked at the VA, the shift that
showed the dramatically increased number of deaths was the one she
worked. To a layperson, this might suggest that Gilbert was clearly respon­
sible for the deaths, but on its own it would not be sufficient to secure a
conviction—indeed, it might not be enough to justify even an indictment.
The problem is that it may be just a coincidence. The job of the statistician
Fighting   Crime with Statistics       101                        17


in this situation is to d e t e r m i n e just h o w unlikely such a coincidence
would be. If the answer is that the likelihood of such a coincidence is, say,
1 in 100, then Gilbert might well b e innocent; and even 1 in 1,000 leaves
some d o u b t as to her guilt; b u t with a likelihood of, say, 1 in 100,000, m o s t
people w o u l d find the evidence against her t o b e pretty compelling.
   To see h o w hypothesis testing works, let's start w i t h t h e simple
example of tossing a coin. If t h e coin is perfectly balanced (i.e., unbiased
or fair), t h e n t h e probability of getting heads is 0.5.* Suppose w e toss t h e
coin ten times in a r o w t o see if it is biased in favor of heads. T h e n w e
can get a range of different o u t c o m e s , and it is possible t o c o m p u t e t h e
likelihood of different results. For example, t h e probability of getting at
least six heads is a b o u t 0.38. (The calculation is straightforward b u t a bit
intricate, because there are m a n y possible ways y o u can get six or m o r e
heads in ten tosses, and y o u have t o take a c c o u n t of all of t h e m . ) T h e
figure of 0.38 p u t s a precise numerical value o n t h e fact that, o n an
intuitive level, w e w o u l d n o t b e surprised if t e n coin tosses gave six or
m o r e heads. For at least seven heads, t h e probability w o r k s o u t at 0.17,
a figure that corresponds t o o u r intuition t h a t seven or m o r e heads is
s o m e w h a t u n u s u a l b u t certainly n o t a cause for suspicion t h a t t h e coin
was biased. W h a t w o u l d surprise us is nine or t e n heads, a n d for that t h e
probability w o r k s o u t at a b o u t 0.01, or 1 in 100. T h e probability of get­
ting ten heads is a b o u t 0.001, or 1 in 1,000, a n d if t h a t h a p p e n e d w e
w o u l d definitely suspect an unfair coin. T h u s , b y tossing t h e coin ten
times, w e can form a reliable, precise j u d g m e n t , based o n m a t h e m a t i c s ,
of the hypothesis that t h e coin is unbiased.
    In the case of the suspicious deaths at t h e Veteran's Affairs Medical
Center, the investigators w a n t e d to k n o w if t h e n u m b e r of deaths that
occurred w h e n Kristen Gilbert was o n d u t y w a s so unlikely that it could
not be merely happenstance. T h e m a t h is a bit m o r e complicated t h a n
for the coin tossing, b u t t h e idea is t h e same. Table 1 gives the data t h e
investigators had at their disposal. It gives n u m b e r s of shifts, classified in
different ways, and covers t h e eighteen-month period ending in February

    *Actually, this is not entirely accurate. Because of inertia! properties of a physical
coin, there is a slight tendency for it to resist turning, with the result that, if a perfectly
balanced coin is given a random initial flip, the probability that it will land the same
way up as it started is about 0.51. But we will ignore this caveat in what follows.
18                    THE N U M B E R S B E H I N D   NUMB3RS


1996, the month when the three nurses told their supervisor of their
concerns, shortly after which Gilbert took a medical leave.

     GILBERT PRESENT                    DEATH O N SHIFT
                                        YES             NO             TOTAL
     YES                                40                217              257
     NO                                 34              1,350          1,384
     TOTAL                              74              1,567          1,641

     Table 1. The data for the statistical analysis in the Gilbert case.


    Altogether, there were 74 deaths, spread over a total of 1,641 shifts.
If the deaths are assumed to have occurred randomly, these figures
suggest that the probability of a death on any one shift is about 74
out of 1,641, or 0.045. Focusing now on the shifts when Gilbert was on
duty, there were 257 of them. If Gilbert was not killing any of the patients,
we would expect there to be around 0.045 x 257 = 11.6 deaths on her
shifts, i.e., around 11 or 12 deaths. In fact there were more—40 to be pre­
cise. How likely is this? Using mathematical methods similar to those for
the coin tosses, statistician Gehlbach calculated that the probability of
having 40 or more of the 74 deaths occur on Gilbert's shifts was less than
1 in 100 million. In other words, it is unlikely in the extreme that Gilbert's
shifts were merely "unlucky" for the patients.
   The grand jury decided there was sufficient evidence to indict
Gilbert—presumably the statistical analysis was the most compelling
evidence, but we cannot know for sure, as a grand jury's deliberations
are not public knowledge. She was accused of four specific murders and
three attempted murders. Because the VA is a federal facility, the trial
would be in a federal court rather than a state court, and subject to fed­
eral laws. A significant consequence of this fact for Gilbert was that
although Massachusetts does not have a death penalty, federal law does,
and that is what the prosecutor asked for.


STATISTICS IN THE COURTROOM?
An interesting feature of this case is that the federal trial judge ruled
in pretrial deliberations that the statistical evidence should not be
Fighting   Crime with Statistics       101                        19


presented in court. In m a k i n g his ruling, t h e j u d g e t o o k n o t e of a
submission by a second statistician b r o u g h t into t h e case, G e o r g e C o b b
of M o u n t Holyoke College.
    Cobb and Gehlbach did n o t disagree o n any of t h e statistical analysis.
(In fact, they ended u p writing a joint article about t h e case.) Rather, their
roles were different, and they w e r e addressing different issues. Gehlbach's
task was to use statistics t o d e t e r m i n e if there w e r e reasonable g r o u n d s t o
suspect Gilbert of multiple murder. More specifically, h e carried o u t an
analysis that showed that the increased n u m b e r s of deaths at t h e hospital
during the shifts w h e n Gilbert was o n duty could n o t have arisen due t o
chance variation. T h a t was sufficient t o cast suspicion o n Gilbert as the
cause of the increase, b u t n o t at all e n o u g h t o prove that she did cause the
increase. W h a t C o b b argued was that the establishment of a statistical
relationship does n o t explain the cause of that relationship. T h e j u d g e in
the case accepted this argument, since the p u r p o s e of the trial was n o t t o
decide if there were g r o u n d s t o m a k e Gilbert a suspect—the g r a n d j u r y
and the state attorney's office h a d d o n e that. Rather, t h e j o b before the
court was to determine w h e t h e r or n o t Gilbert caused the deaths in ques­
tion. His reason for excluding the statistical evidence was that, as experi­
ences in previous court cases had demonstrated, j u r o r s n o t well versed in
statistical reasoning—and that w o u l d b e almost all jurors—typically have
great difficulty appreciating w h y odds of 1 in 100 million against the suspi­
cious deaths occurring by chance does not imply that the odds that Gilbert
did not kill the patients are likewise 1 in 100 million. T h e original odds
could be caused by something else.
    Cobb illustrated the distinction by means of a famous example from the
long struggle physicians and scientists had in overcoming the powerful
tobacco lobby to convince governments and the public that cigarette smok­
ing causes lung cancer. Table 2 shows the mortality rates for three categories
of people: nonsmokers, cigarette smokers, and cigar and pipe smokers.


    Nonsmokers                                       20.2
    Cigarette smokers                                20.5
    Cigar and pipe smokers                           35.3


   Table 2. Mortality rates per 1,000 people per year.
20                        T H E NUMBERS B E H I N D       NUMB3RS


     At first glance, t h e figures in Table 2 s e e m t o indicate that cigarette
s m o k i n g is n o t d a n g e r o u s b u t pipe and cigar s m o k i n g are. However, this
is n o t t h e case. T h e r e is a crucial variable lurking behind the data that the
n u m b e r s themselves d o n o t indicate: age. T h e average age of the non-
smokers w a s 54.9, t h e average age of t h e cigarette smokers was 50.5, and
the average age of the cigar and pipe smokers was 65.9. Using statistical
techniques t o m a k e allowance for t h e age differences, statisticians were
able t o adjust t h e figures to p r o d u c e Table 3.


     Nonsmokers                                         20.3
     Cigarette smokers                                  28.3
     Cigar and pipe smokers                             21.2


     Table 3. Mortality rates per 1,000 people per year, adjusted for age.


N o w a very different p a t t e r n emerges, indicating that cigarette s m o k i n g
is highly d a n g e r o u s .
     W h e n e v e r a calculation of probabilities is m a d e based o n observa­
tional data, t h e m o s t that can generally b e concluded is that there is a
correlation b e t w e e n t w o or m o r e factors. T h a t can m e a n e n o u g h to
spur further investigation, b u t o n its o w n it does n o t establish causation.
T h e r e is always t h e possibility of a hidden variable that lies behind the
correlation.
     W h e n a study is m a d e of, say, t h e effectiveness or safety of a n e w
d r u g o r medical p r o c e d u r e , statisticians handle t h e p r o b l e m of hidden
p a r a m e t e r s by relying n o t o n observational data, b u t instead by
c o n d u c t i n g a r a n d o m i z e d , double-blind trial. In such a study, the target
p o p u l a t i o n is divided i n t o t w o g r o u p s by an entirely r a n d o m procedure,
w i t h t h e g r o u p allocation u n k n o w n t o b o t h t h e experimental subjects
a n d t h e caregivers administering t h e d r u g or t r e a t m e n t (hence t h e t e r m
"double-blind"). O n e g r o u p is given t h e n e w d r u g or treatment, the
o t h e r is given a placebo or d u m m y t r e a t m e n t . W i t h such an experiment,
t h e r a n d o m allocation into g r o u p s overrides t h e possible effect o f hid­
d e n p a r a m e t e r s , so that in this case a low probability that a positive
result is simply chance variation can indeed b e taken as conclusive
evidence that t h e d r u g or t r e a t m e n t is w h a t caused t h e result.
Fighting    Crime with Statistics          101                     21

    In trying t o solve a crime, t h e r e is of course n o choice b u t t o
w o r k w i t h t h e data available. H e n c e , use of t h e hypothesis-testing
procedure, as in the Gilbert case, can b e highly effective in t h e identifica­
tion of a suspect, b u t o t h e r m e a n s are generally required t o secure a
conviction.
    In United States v. Kristen Gilbert, t h e j u r y was n o t p r e s e n t e d w i t h
Gehlbach's statistical analysis, b u t they did find sufficient evidence t o
convict her o n three c o u n t s of first-degree m u r d e r , o n e c o u n t of sec­
ond-degree murder, and t w o c o u n t s of a t t e m p t e d m u r d e r . A l t h o u g h t h e
prosecution asked for t h e d e a t h sentence, t h e j u r y split 8-4 o n t h a t issue,
and accordingly Gilbert w a s sentenced t o life i m p r i s o n m e n t w i t h n o
possibility of parole.



POLICING THE POLICE
Another use of basic statistical techniques in law enforcement concerns
the important matter of ensuring that the police themselves obey the law.
    Law enforcement officers are given a considerable a m o u n t of
p o w e r over their fellow citizens, a n d o n e of t h e duties of society is t o
m a k e certain that they d o n o t abuse that power. In particular, police
officers are supposed to treat everyone equally and fairly, free of any
bias based o n gender, race, ethnicity, e c o n o m i c status, age, dress, or
religion.
    But d e t e r m i n i n g bias is a tricky business and, as w e saw in o u r previ­
ous discussion of cigarette s m o k i n g , a superficial glance at t h e statistics
can s o m e t i m e s lead t o a completely false conclusion. This is illustrated
in a particularly d r a m a t i c fashion by t h e following example, which,
while n o t related t o police activity, clearly indicates t h e n e e d t o a p p r o a c h
statistics w i t h s o m e m a t h e m a t i c a l sophistication.
    In t h e 1970s, s o m e b o d y noticed that 44 p e r c e n t of m a l e applicants t o
the g r a d u a t e school of t h e University of California at Berkeley w e r e
accepted, b u t only 35 percent of female applicants w e r e accepted. O n
the face of it, this looked like a clear case of g e n d e r discrimination, and,
n o t surprisingly (particularly at Berkeley, l o n g acknowledged as h o m e
to m a n y leading advocates for g e n d e r equality), t h e r e w a s a lawsuit over
gender bias in admissions policies.
22                   T H E NUMBERS B E H I N D    NUMB3RS


    It turns out that Berkeley applicants do not apply to the graduate
school, but to individual programs of study—such as engineering, phys­
ics, or English—so if there is any admissions bias, it will occur within
one or more particular program. Table 4 gives the admission data pro­
gram by program:


     Major      Male apps        % admit       Female apps        % admit

       A           825              62              108               82
       CD




                   560              63               25               68
       C           325              37              593               34
       D           417              33              375               35
       E            191             28              393               24
       F           373               6              341                7

     Table 4. Admission figures from the University of California at Berkeley
     on a program-by-program basis.


    If you look at each program individually, however, there doesn't
appear to be an advantage in admission for male applicants. Indeed, the
percentage of female applicants admitted to heavily subscribed program
A is considerably higher than for males, and in all other programs the
percentages are fairly close. So how can there appear to be an advantage
for male applicants overall?
    To answer this question, you need to look at what programs males
and females applied to. Males applied heavily to programs A and B,
females applied primarily to programs C, D, E, and F. The programs
that females applied to were more difficult to get into than those for
males (the percentages admitted are low for both genders), and this is
why it appears that males had an admission advantage when looking at
the aggregate data.
    There was indeed a gender factor at work here, but it had nothing to
do with the university's admissions procedures. Rather, it was one of
self-selection by the applying students, where female applicants avoided
progams A and B.
Fighting         Crime with Statistics                    101                                    23


    T h e Berkeley case was an example of a p h e n o m e n o n k n o w n as
Simpson's paradox, n a m e d for E. H . Simpson, w h o studied this curious
p h e n o m e n o n in a famous 1951 paper.*



HOW DO YOU DETERMINE BIAS?
W i t h the above cautionary example in mind, w h a t should w e m a k e of
the study carried o u t in Oakland, California, in 2003 (by t h e R A N D
Corporation, at t h e request of t h e O a k l a n d Police D e p a r t m e n t ' s Racial
Profiling Task Force), t o d e t e r m i n e if there was systematic racial bias in
the way police stopped motorists?
    T h e R A N D researchers analyzed 7,607 vehicle stops recorded b y
Oakland police officers b e t w e e n J u n e and D e c e m b e r 2003, using vari­
ous statistical tools t o examine a n u m b e r of variables t o uncover any
evidence that suggested racial profiling. O n e figure they found w a s that
blacks w e r e involved in 56 percent of all traffic stops studied, a l t h o u g h
they m a k e u p just 35 percent of O a k l a n d ' s residential population. D o e s
this finding indicate racial profiling? Well, it might, b u t as s o o n as y o u
look m o r e closely at w h a t o t h e r factors could b e reflected in those
n u m b e r s , the issue is by n o m e a n s clear cut.
    For instance, like m a n y inner cities, O a k l a n d has s o m e areas w i t h
m u c h higher crime rates t h a n others, and t h e police patrol those higher
crime areas at a m u c h greater rate t h a n they d o areas having less crime.
As a result, they m a k e m o r e traffic stops in those areas. Since t h e higher
crime areas typically have greater concentrations of m i n o r i t y g r o u p s ,
the higher rate of traffic stops in those areas manifests itself as a higher
rate of traffic stops of minority drivers.
    To overcome these uncertainties, t h e R A N D researchers devised a
particularly ingenious way t o look for possible racial bias. If racial profil­
ing was occurring, they reasoned, stops of minority drivers w o u l d b e
higher w h e n the officers could d e t e r m i n e the driver's race prior t o mak­
ing the stop. Therefore, they c o m p a r e d t h e stops m a d e d u r i n g a period


    * E . H. S i m p s o n . " T h e I n t e r p r e t a t i o n o f I n t e r a c t i o n in C o n t i n g e n c y T a b l e s , " Jour­
nal of the Royal Statistical             Society,       Ser. B, 13 (1951) 2 3 8 - 2 4 1 .
24                                  T H E NUMBERS B E H I N D                          NUMB3RS


j u s t before nightfall w i t h those m a d e after d a r k — w h e n t h e officers
w o u l d b e less likely t o b e able t o d e t e r m i n e t h e driver's race. T h e figures
s h o w e d that 50 p e r c e n t of drivers stopped d u r i n g the daylight period
w e r e black, c o m p a r e d w i t h 54 p e r c e n t w h e n it was dark. Based o n that
finding, t h e r e does n o t appear to b e systematic racial bias in traffic
stops.
      But t h e researchers d u g a little further, and looked at the officers'
o w n reports as t o w h e t h e r they could d e t e r m i n e the driver's race prior
t o m a k i n g t h e stop. W h e n officers r e p o r t e d k n o w i n g the race in advance
of t h e stop, 6 6 p e r c e n t of drivers stopped w e r e black, c o m p a r e d w i t h
only 44 percent w h e n t h e police r e p o r t e d n o t k n o w i n g the driver's race
in advance. This is a fairly s t r o n g indicator of racial bias.*




      *Sadly, d e s p i t e m a n y efforts t o e l i m i n a t e t h e p r o b l e m , racial bias b y p o l i c e
s e e m s t o b e a p e r s i s t e n t issue t h r o u g h o u t t h e country. To cite just o n e recent r e p o r t ,
A n Analysis         of Traffic      Stop      Data     in Riverside,         California,        b y Larry K. Gaines of t h e
C a l i f o r n i a State University in San B e r n a r d i n o , p u b l i s h e d in Police                     Quarterly,        9, 2 ,
J u n e 2 0 0 6 , p p . 2 1 0 - 2 3 3 : " T h e f i n d i n g s f r o m racial p r o f i l i n g or traffic s t o p studies
h a v e b e e n fairly c o n s i s t e n t : M i n o r i t i e s , especially African A m e r i c a n s , are s t o p p e d ,
t i c k e t e d , a n d s e a r c h e d at a h i g h e r rate as c o m p a r e d t o W h i t e s . For e x a m p l e ,
L a m b e r t h (cited in State v. Pedro                 Soto,     1996) f o u n d t h a t t h e M a r y l a n d State Police
s t o p p e d a n d s e a r c h e d A f r i c a n A m e r i c a n s at a h i g h e r rate as c o m p a r e d t o their
rate o f s p e e d i n g v i o l a t i o n s . Harris (1999) e x a m i n e d c o u r t records in A k r o n , D a y t o n ,
T o l e d o , a n d C o l u m b u s , O h i o , a n d f o u n d t h a t African A m e r i c a n s w e r e c i t e d at a rate
t h a t surpassed t h e i r r e p r e s e n t a t i o n in t h e d r i v i n g p o p u l a t i o n . C o r d n e r , W i l l i a m s , a n d
Z u n i g a (2000) a n d C o r d n e r , W i l l i a m s , a n d Velasco (2002) f o u n d similar t r e n d s in San
D i e g o , C a l i f o r n i a . Zingraff a n d his c o l l e a g u e s (2000) e x a m i n e d s t o p s b y t h e N o r t h
Carolina H i g h w a y Patrol a n d f o u n d t h a t A f r i c a n A m e r i c a n s w e r e o v e r r e p r e s e n t e d in
s t o p s a n d searches."
CHAPTER


                     Data Mining
3                    Finding Meaningful
                     in Masses of                                        Information
                                                                                                 Patterns




BRUTUS
Charlie Eppes is sitting in front of a b a n k of c o m p u t e r s and television
monitors. H e is testing a c o m p u t e r p r o g r a m h e is developing to help
police m o n i t o r large crowds, l o o k i n g for u n u s u a l behavior that could
indicate a p e n d i n g criminal or terrorist act. His idea is t o use standard
mathematical equations that describe the flow of fluids—in rivers, lakes,
oceans, tanks, pipes, even blood vessels.* H e is trying o u t t h e n e w sys­
t e m at a fund-raising reception for o n e of t h e California state senators.
Overhead cameras m o n i t o r t h e diners as they m o v e a r o u n d t h e r o o m ,
and Charlie's c o m p u t e r p r o g r a m analyzes t h e "flow" of t h e people.
Suddenly t h e test takes o n an u n e x p e c t e d aspect. T h e FBI receives a
telephone w a r n i n g that a g u n m a n is in t h e r o o m , intending t o kill t h e
senator.
     T h e software works, and Charlie is able to identify t h e g u n m a n , b u t
D o n and his t e a m are n o t able t o get t o the killer before h e has shot t h e
senator and t h e n t u r n e d t h e g u n o n himself.
     T h e dead assassin t u r n s o u t t o b e a Vietnamese i m m i g r a n t , a f o r m e r
Vietcong m e m b e r , w h o , despite having b e e n in prison in California,


     * T h e idea is b a s e d o n several real-life p r o j e c t s t o use t h e e q u a t i o n s t h a t d e s c r i b e
f l u i d f l o w s in o r d e r t o analyze v a r i o u s kinds o f c r o w d activity, i n c l u d i n g f r e e w a y traf­
fic f l o w , s p e c t a t o r s e n t e r i n g a n d l e a v i n g a large s p o r t s s t a d i u m , a n d e m e r g e n c y
exits f r o m b u r n i n g b u i l d i n g s .
26                       T H E NUMBERS B E H I N D           NUMB3RS


s o m e h o w m a n a g e d t o obtain U.S. citizenship and b e the recipient of a
regular pension from t h e U.S. Army. H e h a d also taken the illegal d r u g
speed o n t h e evening of t h e assassination. W h e n D o n makes s o m e
enquiries t o find o u t j u s t w h a t is g o i n g on, h e is visited by a CIA agent
w h o asks for help in trying t o prevent t o o m u c h information about the
case leaking out. Apparently t h e dead killer h a d b e e n part of a covert
CIA behavior modification project carried o u t in California prisons dur­
ing t h e 1960s t o t u r n i n m a t e s into trained assassins w h o , w h e n activated,
w o u l d carry o u t their assigned task before killing themselves. (Sadly, this
idea is n o less fanciful t h a n t h a t of Charlie using fluid flow equations to
study c r o w d behavior.)
     But w h y h a d this particular individual suddenly b e c o m e active and
m u r d e r e d t h e state senator?
     T h e picture b e c o m e s m u c h clearer w h e n a second m u r d e r occurs.
T h e victim this t i m e is a p r o m i n e n t psychiatrist, the killer a C u b a n immi­
grant. T h e killer h a d also spent t i m e in a California prison, and h e t o o
w a s t h e recipient of regular A r m y pension checks. But o n this occasion,
w h e n the assassin tries to s h o o t himself after killing the victim, the g u n
fails t o g o off and h e has t o flee t h e scene. A fingerprint identification
from the g u n soon leads t o his arrest.
     W h e n D o n realizes that t h e dead senator h a d b e e n u r g i n g a repeal of
t h e statewide b a n o n t h e use of behavior modification techniques o n
prison inmates, and that t h e dead psychiatrist h a d b e e n r e c o m m e n d i n g
t h e re-adoption of such techniques t o overcome criminal tendencies, h e
quickly concludes that s o m e o n e has started t o t u r n t h e conditioned
assassins o n t h e very p e o p l e w h o w e r e pressing for the reuse of the
techniques that h a d p r o d u c e d t h e m . But who?
     D o n thinks his best line of investigation is to find o u t w h o supplied
t h e g u n s t h a t t h e t w o killers h a d used. H e k n o w s that t h e w e a p o n s orig­
inated w i t h a dealer in Nevada. Charlie is able t o provide t h e next step,
w h i c h leads to t h e identification of the individual b e h i n d the t w o assas­
sinations. H e obtains data o n all g u n sales involving that particular
dealer and analyzes t h e relationships a m o n g all sales that originated
there. H e explains t h a t h e is e m p l o y i n g m a t h e m a t i c a l techniques similar
t o those used t o analyze calling p a t t e r n s o n t h e t e l e p h o n e n e t w o r k — a n
a p p r o a c h used frequently in real-life law enforcement.
Data     Mining                                        27


    This is w h a t viewers saw in t h e third-season episode of NUMB3RS
called "Brutus" (the code n a m e for t h e fictitious CIA conditioned-
assassinator project), first aired o n N o v e m b e r 24, 2006. As usual, t h e
m a t h e m a t i c s Charlie uses in the s h o w is based o n real life.
    T h e m e t h o d Charlie uses to track t h e g u n distribution is generally
referred to as "link analysis," and is o n e a m o n g m a n y that g o u n d e r
the collective heading of "data mining." D a t a m i n i n g obtains useful
information a m o n g the mass of data that is available—often publicly—
in m o d e r n society.



FINDING MEANING IN INFORMATION
Data mining was initially developed by t h e retail industry to detect cus­
t o m e r purchasing patterns. (Ever w o n d e r w h y s u p e r m a r k e t s offer cus­
t o m e r s those loyalty cards—sometimes called "club" cards—in exchange
for discounts? In p a r t it's t o e n c o u r a g e c u s t o m e r s t o k e e p s h o p p i n g at
the same store, b u t l o w prices w o u l d d o that. T h e significant factor for t h e
c o m p a n y is that it enables t h e m t o track detailed purchase p a t t e r n s that
they can link to c u s t o m e r s ' h o m e zip codes, information that they can
t h e n analyze using data-mining techniques.)
    T h o u g h m u c h of the w o r k in data m i n i n g is d o n e by c o m p u t e r s , for
the m o s t part those c o m p u t e r s d o n o t r u n autonomously. H u m a n
expertise also plays a significant role, and a typical data-mining investi­
gation will involve a constant back-and-forth interplay b e t w e e n h u m a n
expert and m a c h i n e .
    Many of the c o m p u t e r applications used in data m i n i n g fall u n d e r
the general area k n o w n as artificial intelligence, a l t h o u g h that t e r m can
be misleading, being suggestive of c o m p u t e r s that think a n d act like
people. Although m a n y people believed that w a s a possibility back in
the 1950s w h e n AI first b e g a n t o b e developed, it eventually b e c a m e
clear that this was n o t g o i n g to h a p p e n within t h e foreseeable future,
and m a y well never b e the case. But that realization did n o t prevent the
development of m a n y " a u t o m a t e d reasoning" p r o g r a m s , s o m e of which
eventually found a powerful and i m p o r t a n t use in data mining, w h e r e
the h u m a n expert often provides t h e "high-level intelligence" that guides
the c o m p u t e r p r o g r a m s that d o the bulk of t h e w o r k . In this way, data
28                       T H E NUMBERS B E H I N D          NUMB3RS


m i n i n g provides an excellent example of t h e p o w e r that results w h e n
h u m a n brains t e a m u p w i t h c o m p u t e r s .
     A m o n g t h e m o r e p r o m i n e n t m e t h o d s and tools used in data
m i n i n g are:


     •   Link analysis—looking           for associations and o t h e r forms of
         c o n n e c t i o n a m o n g , say, criminals or terrorists

     •   Geometric clustering—a specific form of link analysis

     •   Software agents—small,          self-contained pieces of c o m p u t e r code
         t h a t can monitor, retrieve, analyze, and act o n information

     •   Machine learning—algorithms              that can extract profiles of
         criminals a n d graphical m a p s of crimes

     •   Neural networks—special           kinds of c o m p u t e r p r o g r a m s that can
         predict t h e probability of crimes and terrorist attacks.


We'll take a brief l o o k at each of these topics in t u r n .



LINK ANALYSIS
N e w s p a p e r s often refer t o link analysis as "connecting the dots." It's the
process of tracking connections b e t w e e n people, events, locations, and
organizations. T h o s e connections could b e family ties, business relation­
ships, criminal associations, financial transactions, in-person meetings,
e-mail exchanges, and a host of others. Link analysis can b e particularly
powerful in fighting terrorism, organized crime, m o n e y laundering
("follow t h e m o n e y " ) , and telephone fraud.
     Link analysis is primarily a h u m a n - e x p e r t driven process. Mathemat­
ics a n d t e c h n o l o g y are used to provide a h u m a n expert w i t h powerful,
flexible c o m p u t e r tools t o uncover, examine, and track possible connec­
tions. T h o s e tools generally allow t h e analyst t o represent linked data as
a n e t w o r k , displayed and e x a m i n e d (in w h o l e or in part) o n t h e com­
p u t e r screen, w i t h n o d e s representing t h e individuals or organizations
or locations of interest a n d t h e links b e t w e e n those n o d e s representing
relationships or transactions. T h e tools m a y also allow t h e analyst to
Data     Mining                                     29

investigate and record details a b o u t each link, a n d t o discover n e w n o d e s
that connect t o existing ones or n e w links b e t w e e n existing n o d e s .
    For example, in an investigation into a suspected crime ring, an inves­
tigator might carry o u t a link analysis of t e l e p h o n e calls a suspect has
m a d e or received, using t e l e p h o n e c o m p a n y call-log data, l o o k i n g at
factors such as n u m b e r called, t i m e and d u r a t i o n of each call, o r n u m ­
b e r called next. T h e investigator m i g h t t h e n decide t o p r o c e e d further
along the call n e t w o r k , l o o k i n g at calls m a d e t o or from o n e or m o r e of
the individuals w h o h a d h a d p h o n e conversations w i t h t h e initial sus­
pect. This process can b r i n g t o t h e investigator's a t t e n t i o n individuals
n o t previously k n o w n . S o m e m a y t u r n o u t to b e totally innocent, b u t
others could prove to b e criminal collaborators.
   A n o t h e r line of investigation m a y b e t o track cash transactions t o
and from domestic and international b a n k accounts.
   Still a n o t h e r line m a y b e t o e x a m i n e t h e n e t w o r k of places a n d
people visited by the suspect, using such data as train a n d airline ticket
purchases, points of e n t r y or d e p a r t u r e in a given country, car rental
records, credit card records of purchases, websites visited, a n d t h e like.
   Given the difficulty n o w a d a y s of d o i n g almost anything w i t h o u t
leaving an electronic trace, t h e challenge in link analysis is usually n o t
o n e of having insufficient data, b u t r a t h e r of deciding w h i c h of t h e
megabytes of available data t o select for further analysis. Link analysis
w o r k s best w h e n backed u p by o t h e r kinds of information, such as tips
from police informants or from n e i g h b o r s of possible suspects.
   Once an initial link analysis has identified a possible criminal or terrorist
network, it m a y b e possible to determine w h o the key players are by
examining which individuals have the m o s t links to others in the network.



GEOMETRIC CLUSTERING
Because of resource limitations, law enforcement agencies generally focus
m o s t of their attention o n major crime, w i t h the result that m i n o r offenses
such as shoplifting or house burglaries get little attention. If, however, a
single person or an organized g a n g c o m m i t s m a n y such crimes o n a regu­
lar basis, the aggregate can constitute significant criminal activity that
deserves greater police attention. T h e p r o b l e m facing the authorities,
30                             T H E NUMBERS B E H I N D                  NUMB3RS


then, is to identify within the large n u m b e r s of m i n o r crimes that take
place every day, clusters that are the w o r k of a single individual or gang.
     O n e example of a " m i n o r " crime that is often carried o u t o n a regu­
lar basis by t w o (and occasionally three) individuals acting together is
t h e so-called bogus official burglary (or distraction burglary). This is w h e r e
t w o people t u r n u p at t h e front d o o r of a h o m e o w n e r (elderly people
are often t h e preferred targets) posing as s o m e form of officials—perhaps
t e l e p h o n e engineers, representatives of a utility company, or local gov­
e r n m e n t agents—and, while o n e p e r s o n secures t h e attention of the
h o m e o w n e r , the o t h e r moves quickly t h r o u g h the h o u s e or a p a r t m e n t
taking any cash or valuables that are easily accessible.
     Victims of b o g u s official burglaries often file a r e p o r t to the police,
w h o will send an officer t o t h e victim's h o m e t o take a statement. Since
t h e victim will have spent considerable t i m e w i t h o n e of the perpetra­
tors (the distracter), t h e s t a t e m e n t will often include a fairly detailed
description—gender, race, height, b o d y type, approximate age, general
facial appearance, eyes, hair color, hair length, hair style, accent, identi­
fying physical m a r k s , m a n n e r i s m s , shoes, clothing, unusual jewelry,
etc.—together w i t h t h e n u m b e r of accomplices and their genders. In
principle, this w e a l t h of information m a k e s crimes of this nature ideal
for data mining, and in particular for the technique k n o w n as geometric
clustering, t o identify g r o u p s of crimes carried o u t b y a single gang.
Application of t h e m e t h o d is, however, fraught w i t h difficulties, and to
date t h e m e t h o d appears t o have b e e n restricted to o n e or t w o experi­
m e n t a l studies. We'll look at o n e such study, b o t h to s h o w h o w the
m e t h o d w o r k s and t o illustrate s o m e of the p r o b l e m s often faced by the
data-mining practitioner.
     T h e following study w a s carried o u t in England in 2000 and 2001 by
researchers at the University of W o l v e r h a m p t o n , together w i t h the
West Midlands Police.* T h e study looked at victim statements                                                  from
b o g u s official burglaries in t h e police region over a three-year period.
D u r i n g that period, t h e r e w e r e 800 such burglaries recorded, involving



     *Ref. R. A d d e r l e y a n d P. B. M u s g r o v e , G e n e r a l Review o f Police C r i m e R e c o r d i n g
a n d I n v e s t i g a t i o n Systems, Policing:   An International     Journal     of Police Strategies        and
Management,          2 4 (1), 2 0 0 1 , p p . 1 1 0 - 1 1 4 .
Data     Mining                                       31


1,292 offenders. This proved to b e t o o great a n u m b e r for t h e resources
available for the study, so t h e analysis w a s restricted t o those cases w h e r e
the distracter was female, a g r o u p comprising 89 crimes and 105 offender
descriptions.
   T h e first p r o b l e m e n c o u n t e r e d was that the descriptions of t h e p e r p e ­
trators was for the m o s t part in narrative form, as w r i t t e n by t h e investi­
gating officer w h o t o o k the statement from t h e victim. A data-mining
technique k n o w n as text m i n i n g had to b e used to p u t t h e descriptions
into a structured form. Because of the limitations of the text-mining soft­
ware available, h u m a n input was required to handle m a n y of the entries;
for instance, to cope w i t h spelling mistakes, ad h o c or inconsistent abbre­
viations (e.g., "Bham" or " B ' h a m " for "Birmingham"), and the use of
different ways of expressing t h e same thing (e.g., "Birmingham accent",
"Bham accent", "local accent", "accent: local", etc.).
   After s o m e initial analysis, t h e researchers decided t o focus o n eight
variables: age, height, hair color, hair length, build, accent, race, and
n u m b e r of accomplices.
   Once the data had b e e n processed into the appropriate structured
format, the next step was t o use g e o m e t r i c clustering to g r o u p t h e
105 offender descriptions into collections that w e r e likely t o refer t o the
same individual. To u n d e r s t a n d h o w this w a s d o n e , let's first consider a
m e t h o d that at first sight might appear t o b e feasible, b u t which soon
proves to have significant weaknesses. T h e n , by seeing h o w those weak­
nesses m a y be overcome, w e will arrive at the m e t h o d used in t h e British
study.
   First, you code each of t h e eight variables numerically. Age—often a
guess—is likely t o b e recorded either as a single figure or a range; if it is
a range, take the m e a n . G e n d e r (not considered in t h e British Midlands
study because all the cases e x a m i n e d h a d a female distracter) can b e
coded as 1 for male, 0 for female. H e i g h t m a y b e given as a n u m b e r
(inches), a range, or a t e r m such as "tall", " m e d i u m " , or "short"; again,
s o m e m e t h o d has to b e chosen t o convert each of these t o a single
figure. Likewise, schemes have t o b e devised t o represent each of t h e
other variables as a n u m b e r .
   W h e n the numerical coding has been completed, each perpetrator
description is then represented by an eight-vector, the coordinates of
32                       THE NUMBERS B E H I N D            NUMB3RS


a point in eight-dimensional geometric (Euclidean) space. T h e familiar
distance measure of Euclidean g e o m e t r y (the Pythagorean metric) can
then b e used t o measure the geometric distance between each pair of
points. This gives the distance between t w o vectors (x                 v   . . . , x ) and
                                                                                    g



( , . . . , y ) as:
 V l         8




                                                                  2
                             V[(x -y )2 ...
                                    1   1    +    +   (x -y ) ]
                                                        8     8




Points that are close t o g e t h e r u n d e r this m e t r i c are likely t o correspond
t o p e r p e t r a t o r descriptions that have several features in c o m m o n ; a n d
t h e closer t h e points, t h e m o r e features t h e descriptions are likely t o
have in c o m m o n . ( R e m e m b e r , there are p r o b l e m s w i t h this approach,
w h i c h we'll g e t t o momentarily. For t h e time being, however, let's
suppose that things w o r k m o r e or less as j u s t described.)
       T h e challenge n o w is t o identify clusters of points that are close
together. If t h e r e w e r e only t w o variables, this w o u l d b e easy. All t h e
points could b e plotted o n a single x,y-graph a n d visual inspection
w o u l d indicate possible clusters. But h u m a n beings are totally unable t o
visualize eight-dimensional space, n o m a t t e r w h a t assistance t h e soft­
w a r e system designers provide b y w a y of data visualization tools. T h e
w a y a r o u n d this difficulty is t o r e d u c e t h e eight-dimensional array of
points (descriptions) t o a two-dimensional array (i.e., a matrix o r table).
T h e idea is t o a r r a n g e t h e data points (that is, t h e vector representatives
of t h e offender descriptions) in a two-dimensional grid in such a
way that:


       1. pairs of points t h a t are extremely close t o g e t h e r in t h e eight-
          dimensional space are p u t into t h e s a m e grid entry;

       2. pairs of points t h a t are n e i g h b o r s in t h e grid are close together in
          t h e eight-dimensional space; a n d

       3. points t h a t are farther apart in t h e grid are farther apart in t h e
          space.


This c a n b e d o n e using a special kind of c o m p u t e r p r o g r a m k n o w n as a
n e u r a l net, in particular, a K o h o n e n self-organizing m a p (or SOM).
Data     Mining                                        33


Neural nets (including SOMs) are described later in t h e chapter. For
now, all w e n e e d t o k n o w is that these systems, w h i c h w o r k iteratively,
are extremely g o o d at h o m i n g in (over t h e course of m a n y iterations) o n
patterns, such as g e o m e t r i c clusters of t h e kind w e are interested in, and
thus can indeed take an eight-dimensional array of t h e k i n d described
above and place the points appropriately in a two-dimensional grid.
(Part of the skill required t o use an S O M effectively in a case such as this
is deciding in advance, or by s o m e initial trial and error, w h a t are t h e
optimal dimensions of t h e final grid. T h e SOM n e e d s t h a t information
in order to start work.)
    Once the data has b e e n p u t into t h e grid, law enforcement officers can
examine grid squares that contain several entries, which are highly likely
to c o m e from a single g a n g responsible for a series of crimes, a n d can
visually identify clusters o n the grid, w h e r e there is also a likelihood that
they represent g a n g activity. In either case, the officers can examine t h e
corresponding original crime s t a t e m e n t entries, looking for indications
that those crimes are indeed the w o r k of a single gang.
    N o w let's see w h a t goes w r o n g w i t h t h e m e t h o d j u s t described, a n d
h o w to correct it.
    T h e first p r o b l e m is that t h e original e n c o d i n g of entries as n u m b e r s
is n o t systematic. This can lead t o o n e variable d o m i n a t i n g o t h e r s w h e n
the entries are clustered using g e o m e t r i c distance (the P y t h a g o r e a n
metric) in eight-dimensional space. For example, a d i m e n s i o n that m e a ­
sures height (which could b e anything b e t w e e n 60 inches and 76 inches)
w o u l d d o m i n a t e t h e e n t r y for g e n d e r (0 or 1). So t h e first step is t o scale
(in mathematical terminology, normalize) t h e eight numerical variables,
so that each o n e varies b e t w e e n 0 and 1.
    O n e way to do that w o u l d b e t o simply scale d o w n each variable by a
multiplicative scaling factor appropriate for that particular                             feature
(height, age, etc.). But that will introduce further p r o b l e m s w h e n t h e
separation distances are calculated; for example, if g e n d e r and height are
a m o n g the variables, then, all o t h e r variables being roughly the same, a
very tall w o m a n w o u l d c o m e o u t close t o a very short m a n (because
female gives a 0 and m a l e gives a 1, whereas tall c o m e s o u t close to 1 and
short close to 0). T h u s , a m o r e sophisticated normalization p r o c e d u r e
has to b e used.
34                  THE NUMBERS B E H I N D      NUMB3RS


     The approach finally adopted in the British Midlands study was to
make every numerical entry binary (just 0 or 1). This meant splitting the
continuous variables (age and height) into overlapping ranges (a few
years and a few inches, respectively), with a 1 denoting an entry in a given
range and a 0 meaning outside that range, and using pairs of binary vari­
ables to encode each factor of hair color, hair length, build, accent, and
race. The exact coding chosen was fairly specific to the data being stud­
ied, so there is little to be gained from providing all the details here. (The
age and height ranges were taken to be overlapping to account for entries
toward the edges of the chosen ranges.) The normalization process
resulted in a set of 46 binary variables. Thus, the geometric clustering
was done over a geometric space of 46 dimensions.
     Another problem was h o w to handle missing data. For example,
what do you do if a victim's statement says nothing about the perpetra­
tor's accent? If you enter a 0, that would amount to assigning an accent.
But what will the clustering program do if you leave that entry blank?
(In the British Midlands study, the program would treat a missing entry
as 0.) Missing data points are in fact one of the major headaches for data
miners, and there really is n o universally g o o d solution. If there are only
a few such cases, you could either ignore them or else see what solutions
you get with different values entered.
     As mentioned earlier, a key decision that has to be made before the
SOM can be run is the size of the resulting two-dimensional grid. It
needs to be small enough so that the SOM is forced to put some data
points into the same grid squares, and will also result in some non­
empty grid squares having non-empty neighbors. The investigators in
the British Midlands study eventually decided to opt for a five-by-seven
grid. With 105 offender descriptions, this forced the SOM to create
several multi-entry clusters.
     The study itself concluded with experienced police officers examin­
ing the results and comparing them with the original victim statements
and other relevant information (such as geographic proximity of crimes
over a short timespan, which would be another indicator of a gang
activity, not used in the cluster analysis), to determine h o w well the pro­
cess performed. T h o u g h all parties involved in the study declared it to
be successful, the significant amount of person-hours required means
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)
The numbers behind numb3 rs   solving crime with mathematics (malestrom)

Weitere ähnliche Inhalte

Ähnlich wie The numbers behind numb3 rs solving crime with mathematics (malestrom)

Fahrenheit 451 Essay Topics
Fahrenheit 451 Essay TopicsFahrenheit 451 Essay Topics
Fahrenheit 451 Essay TopicsJenny Hardcastle
 
STATM5A2Pulling It All TogetherDoes background music significa.docx
STATM5A2Pulling It All TogetherDoes background music significa.docxSTATM5A2Pulling It All TogetherDoes background music significa.docx
STATM5A2Pulling It All TogetherDoes background music significa.docxdessiechisomjj4
 
Can You Write My Essay For Free Rite My Essa
Can You Write My Essay For Free Rite My EssaCan You Write My Essay For Free Rite My Essa
Can You Write My Essay For Free Rite My EssaLori Mathers
 
Attribution Theory Essays
Attribution Theory EssaysAttribution Theory Essays
Attribution Theory EssaysRobin King
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Problem Solving Essay
Problem Solving EssayProblem Solving Essay
Problem Solving EssayBrenda Howard
 
Nazi Propaganda History Essay. Online assignment writing service.
Nazi Propaganda History Essay. Online assignment writing service.Nazi Propaganda History Essay. Online assignment writing service.
Nazi Propaganda History Essay. Online assignment writing service.Natalie Taylor
 

Ähnlich wie The numbers behind numb3 rs solving crime with mathematics (malestrom) (11)

Fahrenheit 451 Essay Topics
Fahrenheit 451 Essay TopicsFahrenheit 451 Essay Topics
Fahrenheit 451 Essay Topics
 
Siddharth's Quiz, August '09
Siddharth's Quiz, August '09Siddharth's Quiz, August '09
Siddharth's Quiz, August '09
 
Mathematics in everyday life
Mathematics in everyday lifeMathematics in everyday life
Mathematics in everyday life
 
STATM5A2Pulling It All TogetherDoes background music significa.docx
STATM5A2Pulling It All TogetherDoes background music significa.docxSTATM5A2Pulling It All TogetherDoes background music significa.docx
STATM5A2Pulling It All TogetherDoes background music significa.docx
 
R - datascience
R - datascienceR - datascience
R - datascience
 
Can You Write My Essay For Free Rite My Essa
Can You Write My Essay For Free Rite My EssaCan You Write My Essay For Free Rite My Essa
Can You Write My Essay For Free Rite My Essa
 
Attribution Theory Essays
Attribution Theory EssaysAttribution Theory Essays
Attribution Theory Essays
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Problem Solving Essay
Problem Solving EssayProblem Solving Essay
Problem Solving Essay
 
Integrating Conflicting Data_PVERConf_May2011
Integrating Conflicting Data_PVERConf_May2011Integrating Conflicting Data_PVERConf_May2011
Integrating Conflicting Data_PVERConf_May2011
 
Nazi Propaganda History Essay. Online assignment writing service.
Nazi Propaganda History Essay. Online assignment writing service.Nazi Propaganda History Essay. Online assignment writing service.
Nazi Propaganda History Essay. Online assignment writing service.
 

Mehr von João Gabriel Lima

Deep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer SegmentationDeep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer SegmentationJoão Gabriel Lima
 
Aplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full StackAplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full StackJoão Gabriel Lima
 
Realidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKitRealidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKitJoão Gabriel Lima
 
Big data e Inteligência Artificial
Big data e Inteligência ArtificialBig data e Inteligência Artificial
Big data e Inteligência ArtificialJoão Gabriel Lima
 
Segurança na Internet - Estudos de caso
Segurança na Internet - Estudos de casoSegurança na Internet - Estudos de caso
Segurança na Internet - Estudos de casoJoão Gabriel Lima
 
Segurança na Internet - Google Hacking
Segurança na Internet - Google  HackingSegurança na Internet - Google  Hacking
Segurança na Internet - Google HackingJoão Gabriel Lima
 
Segurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentaisSegurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentaisJoão Gabriel Lima
 
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...João Gabriel Lima
 
Mineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - ClusterizaçãoMineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - ClusterizaçãoJoão Gabriel Lima
 
Mineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e WekaMineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e WekaJoão Gabriel Lima
 
Visualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark sideVisualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark sideJoão Gabriel Lima
 
REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?João Gabriel Lima
 
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...João Gabriel Lima
 
E-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãosE-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãosJoão Gabriel Lima
 
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.js[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.jsJoão Gabriel Lima
 
Hackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com JavascriptHackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com JavascriptJoão Gabriel Lima
 
Mercado de Trabalho em Computação - Perfil Analista de Sistemas e Programador
Mercado de Trabalho em Computação - Perfil Analista de Sistemas e ProgramadorMercado de Trabalho em Computação - Perfil Analista de Sistemas e Programador
Mercado de Trabalho em Computação - Perfil Analista de Sistemas e ProgramadorJoão Gabriel Lima
 

Mehr von João Gabriel Lima (20)

Cooking with data
Cooking with dataCooking with data
Cooking with data
 
Deep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer SegmentationDeep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer Segmentation
 
Aplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full StackAplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full Stack
 
Realidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKitRealidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKit
 
JS - IA
JS - IAJS - IA
JS - IA
 
Big data e Inteligência Artificial
Big data e Inteligência ArtificialBig data e Inteligência Artificial
Big data e Inteligência Artificial
 
Segurança na Internet - Estudos de caso
Segurança na Internet - Estudos de casoSegurança na Internet - Estudos de caso
Segurança na Internet - Estudos de caso
 
Segurança na Internet - Google Hacking
Segurança na Internet - Google  HackingSegurança na Internet - Google  Hacking
Segurança na Internet - Google Hacking
 
Segurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentaisSegurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentais
 
Web Machine Learning
Web Machine LearningWeb Machine Learning
Web Machine Learning
 
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
 
Mineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - ClusterizaçãoMineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - Clusterização
 
Mineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e WekaMineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e Weka
 
Visualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark sideVisualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark side
 
REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?
 
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
 
E-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãosE-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãos
 
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.js[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
 
Hackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com JavascriptHackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com Javascript
 
Mercado de Trabalho em Computação - Perfil Analista de Sistemas e Programador
Mercado de Trabalho em Computação - Perfil Analista de Sistemas e ProgramadorMercado de Trabalho em Computação - Perfil Analista de Sistemas e Programador
Mercado de Trabalho em Computação - Perfil Analista de Sistemas e Programador
 

Kürzlich hochgeladen

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Kürzlich hochgeladen (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

The numbers behind numb3 rs solving crime with mathematics (malestrom)

  • 1. I 4 1 1 1 1 ft SOLVIN G CRIME WITH MATHEMATICS 1 - * THE NUMBERS BEHIND NUMB3RS KEITH DEVLIN . N P R ' S " M o t h Guy" and G A R ! ' L O R D E hI, the M o t h C o n s u l t a n t on NU MB3RS", t h e h it C B S tel evision series
  • 2. A COMPANION TO THE HIT CBS CRIME SERIES NUMB3RS PRESENTS THE FASCINATING WAYS MATHEMATICS IS USED TO FIGHT REAL-LIFE CRIME • :i k im Using the popular CBS prime-time TV crime series NUMB3RS' as a springboard, Keith Devlin (known to millions of NPR listeners as "the Math Guy" on NPR's Weekend Edition with Scott Simon) and Gary Lorden (the math consultant to NUMB3RS " explain ) real-life mathematical techniques used by the FBI and other law enforcement agencies to catch and convict criminals. From forensics to counterterrorism. the Riemann hypothesis lo image enhancement, solving murders to beating casino odds, Devlin and Lorden present compelling cases that illustrate how ad­ vanced mathematics can be used in state-of-the-art criminal investigations. P r a i s e for t h e t e l e v i s i o n s e r i e s : "NUMB3RS L O O K S LIKE A W I N N 3 R . " —USA Today
  • 3. A PLUME BOOK THE NUMBERS BEHIND NUMB3RS DR. KEITH DEVLIN is executive director o f Stanford University's Center for the Study o f Language and Information and a consulting professor o f mathematics at Stanford. Devlin has a B.Sc. degree in Mathematics from King's College London (1968) and a Ph.D. in Mathematics from the Uni­ versity o f Bristol (1971). He is a fellow o f the American Association for the Advancement o f Science, a World Economic Forum fellow, and a former member o f the Mathematical Sciences Education Board o f the U.S. National Academy o f Sciences. The author o f twenty-five books, Devlin has been a regular contributor to National Public Radio's popular program Weekend Edition, where he is known as "the Math Guy" in his on-air conversations with host Scott Simon. His monthly column, "Dev­ lin's Angle," appears on Mathematical Association o f America's web journal MAA Online. DR. GARY L O R D E N is a professor in the mathematics department o f the California Institute o f Technology in Pasadena. He graduated from Caltech with a B.S. in mathematics in 1962, received his Ph.D. in math­ ematics from Cornell University in 1966, and taught at Northwestern University before returning to Caltech in 1968. A fellow o f the Institute of Mathematical Statistics, Lorden has taught statistics, probability, and other mathematics at all levels from freshman to doctoral. Lorden has also been active as a consultant and expert witness in mathematics and statistics for government agencies and laboratories, private companies, and law firms. For many years he consulted for Caltech's Jet Propulsion Laboratory for their space exploration programs. He has participated in highly classified research projects aimed at enhancing the ability o f gov­ ernment agencies (such as the NSA) to protect national security. Lorden is the chief mathematics consultant for the CBS T V series NUMB3RS.
  • 4.
  • 5. THE NUMBERS BEHIND NUMB3RS Solving Crime with Mathematics Keith Devlin, Ph.D. and Gary Lorden, Ph.D. © A PLUME B O O K
  • 6. PLUME Published by Penguin Group Penguin Group (USA) Inc., 375 Hudson Street, New York, New York 10014, U.S.A. Penguin Group (Canada), 9 0 Eglinton Avenue East, Suite 700, Toronto, Ontario, Canada M 4 P 2Y3 (a division of Pearson Penguin Canada Inc.) Penguin Books Ltd., 8 0 Strand, London W C 2 R 0 R L , England Penguin Ireland, 25 St. Stephen's Green, Dublin 2, Ireland (a division of Penguin Books Ltd.) Penguin Group (Australia), 2 5 0 Camberwell Road, Camberwell, Victoria 3124, Australia (a division of Pearson Australia Group Pty. Ltd.) Penguin Books India Pvt. Ltd., 11 Community Centre, Panchsheel Park, New Delhi - 110 017, India Penguin Books (NZ), 67 Apollo Drive, Rosedale, North Shore 0 7 4 5 , Auckland, New Zealand (a division of Pearson New Zealand Ltd.) Penguin Books (South Africa) (Pty.) Ltd., 2 4 Sturdee Avenue, Rosebank, Johannesburg 2196, South Africa Penguin Books Ltd., Registered Offices: 80 Strand, London WC2R 0RL, England First published by Plume, a member of Penguin Group (USA) Inc. First Printing, September 2 0 0 7 10 9 8 7 6 5 4 3 2 1 Copyright © Keith Devlin and Gary Lorden, 2007 All rights reserved Illustration credits appear on page 244. REGISTERED TRADEMARK—MARCA REGISTRADA LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Devlin, Keith J. The numbers behind NUMB3RS: solving crime with mathematics/Keith Devlin, Gary Lorden. p. cm. ISBN 978-0-452-28857-7 1. Criminal investigation. 2. Mathematical statistics. 3. Criminal investigation—Data processing. I. Title: Numbers behind numbers. II. Lorden, Gary. HI. Title. HV8073.5.D485 2007 363.2501'5195—dc22 2007018115 Printed in the United States of America Set in Dante MT Designed by Joseph Rutt Without limiting the rights under copyright reserved above, no part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted, in any form, or by any means (electronic, mechanical, photocopying, recording, or otherwise), without the prior written permission of both the copyright owner and the above publisher of this book. PUBLISHER'S NOTE The scanning, uploading, and distribution of this book via the Internet or via any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions, and do not participate in or encour­ age electronic piracy of copyrighted materials. Your support of the author's rights is appreciated. BOOKS ARE AVAILABLE AT QUANTITY DISCOUNTS WHEN USED TO PROMOTE PRODUCTS OR SERVICES. FOR INFORMATION PLEASE WRITE TO PREMIUM MARKETING DIVISION, PENGUIN GROUP (USA) INC., 3 7 5 HUDSON STREET, NEW YORK, NEW YORK 1 0 0 1 4 .
  • 7. Acknowledgments The authors want to thank NUMB3RS creators Cheryl Heuton and Nick Falacci for creating Charlie Eppes, television's first mathematics super­ hero, and succeeding brilliantly in putting math on television in prime time. Their efforts have been joined by a stellar team o f other writers, actors, producers, directors, and specialists whose work has inspired us to write this book. The gifted actor David Krumholtz has earned the undy­ ing love o f mathematicians everywhere for bringing Charlie to life in a way that has led millions o f people to see mathematics in a completely new light. Thanks also to NUMB3RS researchers Andy Black and Matt Kolokoff for being wonderful to work with in coming up with endless applications o f mathematics to make the writers' dreams come true. We wish to express our particular thanks to mathematician Dr. Lenny Rudin o f Cognitech, one o f the world's foremost experts on im­ age enhancement, for considerable help with Chapter 5 and for provid­ ing the images we show in that chapter. Finally, Ted Weinstein, our agent, found us an excellent publisher in David Cashion o f Plume, and both worked tirelessly to turn a manuscript that we felt was as reader-friendly as possible, given that this is a math book, into one that, we have to acknowledge, is now a lot more so! Keith Devlin, Palo Alto, CA Gary Lorden, Pasadena, CA
  • 8.
  • 9. Contents Introduction The Hero Is a Mathematician? ix 1 Finding t h e H o t Z o n e 1 Criminal Geographic Profiling 2 Fighting Crime w i t h Statistics 101 13 3 D a t a Mining 25 Finding Meaningful Patterns in Masses of Information 4 When Does the Writing First Appear on the Wall? 51 Changepoint Detection 5 I m a g e Enhancement and Reconstruction 63 6 Predicting t h e Future 77 Bayesian Inference 7 D N A Profiling 89 8 S e c r e t s — M a k i n g and Breaking C o d e s 105 9 H o w Reliable Is t h e Evidence? 121 Doubts about Fingerprints 10 Connecting t h e Dots 137 The Math of Networks
  • 10. viii Contents 11 The Prisoner's Dilemma, Risk Analysis, and Counterterrorism 153 12 M a t h e m a t i c s in t h e C o u r t r o o m 175 13 C r i m e in t h e Casino 193 Using Math to Beat the System Appendix Mathematical Synopses of the Episodes in the First Three Seasons of NUMB3RS 207 Index 233
  • 11. INTRODUCTION The Hero Is a Mathematician ? On January 23, 2005, a new television crime series called NUMB3RS de­ buted. Created by the husband-and-wife team Nick Falacci and Cheryl Heuton, the series was produced by Paramount Network Television and acclaimed Hollywood veterans Ridley and Tony Scott, whose movie credits include Alien, Top Gun, and Gladiator. Throughout its run, NUMB3RS has regularly beat out the competition to be the most watched series in its time slot on Friday nights. What has surprised many is that one o f the show's two heroes is a mathematician, and much o f the action revolves around mathematics, as professor Charlie Eppes uses his powerful skills to help his older brother, Don, an FBI agent, identify and catch criminals. Many viewers, and several critics, have commented that the stories are entertaining, but the basic premise is far-fetched: You simply can't use math to solve crimes, they say. As this book proves, they are wrong. You can use math to solve crimes, and law enforcement agencies do—not in every instance to be sure, but often enough to make math a powerful weapon in the never-ending fight against crime. In fact, the very first episode o f the series was closely based on a real-life case, as we will discuss in the next chapter. Our book sets out to describe, in a nontechnical fashion, some o f the major mathematical techniques currently available to the police, CIA, and FBI. Most o f these methods have been mentioned during episodes of NUMB3RS, and while we frequently link our explanations to what was depicted on the air, our focus is on the mathematical techniques and how they can be used in law enforcement. In addition we describe
  • 12. X Introduction some real-life cases where mathematics played a role in solving a crime that have not been used in the T V series—at least not directly. In many ways, NUMB3RS is similar to good science fiction, which is based on correct physics or chemistry. Each week, NUMB3RS presents a dramatic story in which realistic mathematics plays a key role in the nar­ rative. The producers o f NUMB3RS go to great lengths to ensure that the mathematics used in the scripts is correct and that the applications shown are possible. Although some o f the cases viewers see are fictional, they certainly could have happened, and in some cases very well may. Though the T V series takes some dramatic license, this book does not. In The Numbers Behind NUMB3RS, you will discover the mathematics that can be, and is, used in fighting real crime and catching actual criminals.
  • 14.
  • 15. CHAPTER Finding the Hot Zone 1 Criminal Geographic Profiling FBI Special Agent D o n Eppes looks again at t h e large street m a p of Los Angeles spread across t h e dining-room table of his father's h o u s e . T h e crosses inked o n t h e m a p s h o w t h e locations w h e r e , over a period of several m o n t h s , a b r u t a l serial killer has struck, raping and t h e n m u r d e r ­ ing a n u m b e r of y o u n g w o m e n . D o n ' s j o b is t o catch t h e killer before h e strikes again. But t h e investigation has stalled. D o n is o u t of clues, a n d doesn't k n o w w h a t t o d o next. "Can I help?" T h e voice is that of D o n ' s y o u n g e r brother, Charlie, a brilliant y o u n g professor of m a t h e m a t i c s at t h e n e a r b y university CalSci. D o n has always b e e n in awe of his b r o t h e r ' s incredible ability at m a t h , and frankly w o u l d w e l c o m e any help h e can get. B u t . . . help from a mathematician? "This case isn't about numbers, Charlie." T h e edge in Don's voice is caused m o r e by frustration than anger, b u t Charlie seems not to notice, and his reply is totally matter-of-fact b u t insistent: "Everything is numbers." D o n is n o t convinced. Sure, h e has often h e a r d Charlie say that m a t h e m a t i c s is all a b o u t patterns—identifying t h e m , analyzing t h e m , m a k i n g predictions a b o u t t h e m . But it didn't take a m a t h genius t o see that t h e crosses o n t h e m a p w e r e scattered haphazardly. T h e r e w a s n o pattern, n o way anyone could predict w h e r e t h e next cross w o u l d g o — the exact location w h e r e t h e next y o u n g girl w o u l d b e attacked. Maybe it w o u l d occur that very evening. If only there w e r e s o m e regularity t o the a r r a n g e m e n t of t h e crosses, a p a t t e r n that could b e c a p t u r e d w i t h a mathematical equation, t h e w a y D o n r e m e m b e r s from his schooldays 2 2 that the equation x + y = 9 describes a circle.
  • 16. 2 T H E NUMBERS B E H I N D NUMB3RS L o o k i n g at t h e m a p , even Charlie has t o agree there is n o way to use m a t h t o predict w h e r e t h e killer w o u l d strike next. H e strolls over to the w i n d o w a n d stares o u t across t h e garden, t h e silence of the evening b r o k e n only by t h e continual flick-flick-jiick-ftick of t h e automatic sprin­ kler w a t e r i n g t h e lawn. Charlie's eyes see t h e sprinkler b u t his m i n d is far away. H e h a d t o a d m i t that D o n w a s probably right. Mathematics could b e used t o d o lots of things, far m o r e t h a n m o s t people realized. But in o r d e r t o use m a t h , t h e r e h a d t o b e s o m e sort of pattern. Flick-Jiick-flick-jlick. T h e sprinkler continued to do its job. T h e r e was t h e brilliant m a t h e m a t i c i a n in N e w York w h o used mathematics to study t h e w a y t h e h e a r t w o r k s , helping doctors spot tiny irregularities in a heartbeat before t h e p e r s o n has a h e a r t attack. Flick-flick-flick-flick. T h e r e were all those mathematics-based c o m p u t e r p r o g r a m s the banks utilized t o track credit card purchases, looking for a sudden change in the p a t t e r n that might indicate identity theft or a stolen card. Flick-flick-flick-flick. W i t h o u t clever m a t h e m a t i c a l algorithms, the cell p h o n e in Charlie's p o c k e t w o u l d have b e e n twice as big and a lot heavier. Flick-flick-flick-flick. In fact, t h e r e w a s scarcely any area of m o d e r n life that did n o t d e p e n d , often in a crucial way, o n m a t h e m a t i c s . But there h a d t o b e a p a t t e r n , o t h e r w i s e t h e m a t h can't get started. Flick-flick-flick-flick. For t h e first t i m e , Charlie notices t h e sprinkler, and suddenly h e k n o w s w h a t t o do. H e has his answer. H e could help solve D o n ' s case, a n d t h e solution has b e e n staring h i m in t h e face all along. H e j u s t h a d n o t realized it. H e drags D o n over t o t h e window. "We've b e e n asking the w r o n g question," h e says. " F r o m w h a t y o u know, there's n o way y o u can pre­ dict w h e r e t h e killer will strike next." H e points t o t h e sprinkler. "Just like, n o m a t t e r h o w m u c h y o u study w h e r e each d r o p of w a t e r hits the grass, there's n o w a y y o u can predict w h e r e the next d r o p will land. T h e r e ' s t o o m u c h uncertainty." H e glances at D o n t o m a k e sure his older b r o t h e r is listening. "But suppose you could n o t see t h e sprinkler, a n d all y o u h a d t o g o o n was t h e p a t t e r n of w h e r e all the drops landed. T h e n , using m a t h , y o u could w o r k o u t exactly w h e r e the sprinkler m u s t be. You can't use t h e p a t t e r n of drops t o predict forward t o the next
  • 17. Finding the Hot Zone 3 drop, b u t y o u can use it t o w o r k b a c k w a r d t o t h e source. It's t h e s a m e with your killer." D o n finds it difficult to accept w h a t his b r o t h e r seems t o b e suggesting. "Charlie, are you telling m e you can figure o u t w h e r e the killer lives?" Charlie's answer is simple: "Yes." D o n is still skeptical that Charlie's idea can really w o r k , b u t he's impressed by his b r o t h e r ' s confidence and passion, a n d so h e agrees t o let h i m assist w i t h t h e investigation. Charlie's first step is to learn s o m e basic facts from the science of crimi­ nology: First, h o w do serial killers behave? Here, his years of experience as a mathematician have taught h i m h o w to recognize the key factors and ignore all the others, so that a seemingly complex problem can b e reduced to one with just a few key variables. Talking with D o n and the other agents at the FBI office where his elder brother works, h e learns, for instance, that violent serial criminals exhibit certain tendencies in selecting locations. They tend to strike close to their h o m e , b u t n o t t o o close; they always set a "buffer z o n e " around their residence w h e r e they will n o t strike, an area that is too close for comfort; outside that comfort zone, the frequency of crime locations decreases as the distance from h o m e increases. T h e n , back in his office in t h e CalSci m a t h e m a t i c s d e p a r t m e n t , Charlie gets t o w o r k in earnest, feverishly covering his blackboards w i t h mathematical equations and formulas. His goal: t o find t h e m a t h ­ ematical key t o d e t e r m i n e a "hot z o n e " — a n area o n t h e m a p , derived from the crime locations, w h e r e t h e p e r p e t r a t o r is m o s t likely t o live. As always w h e n h e w o r k s o n a difficult m a t h e m a t i c a l p r o b l e m , t h e h o u r s fly by as Charlie tries o u t m a n y unsuccessful approaches. T h e n , finally, h e has an idea h e thinks should w o r k . H e erases his previous chalk scribbles o n e m o r e t i m e a n d writes this complicated-looking formula o n t h e board:* =k p, Y, *We'll take a closer look at this formula in a moment.
  • 18. 4 THE NUMBERS B E H I N D NUMB3RS " T h a t should d o t h e trick," h e says t o himself. T h e next step is t o fine-tune his formula by checking it against exam­ ples of past serial crimes D o n provides h i m with. W h e n h e inputs the crime locations from those previous cases into his formula, does it accu­ rately predict w h e r e t h e criminals lived? This is t h e m o m e n t of truth, w h e n Charlie will discover w h e t h e r his m a t h e m a t i c s reflects reality. S o m e t i m e s it doesn't, and h e learns that w h e n h e first decided which factors t o take into a c c o u n t and which to ignore, h e m u s t have got it w r o n g . But this time, after Charlie m a k e s a few m i n o r adjustments, the formula s e e m s t o w o r k . T h e next day, b u r s t i n g w i t h e n e r g y and conviction, Charlie shows u p at t h e FBI offices w i t h a p r i n t o u t of the crime-location m a p w i t h the 2 2 "hot z o n e " p r o m i n e n t l y displayed. Just as the equation x + y = 9 that D o n r e m e m b e r e d from his schooldays describes a circle, so that w h e n t h e e q u a t i o n is fed into a suitably p r o g r a m m e d c o m p u t e r it will draw t h e circle, so t o o w h e n Charlie fed his n e w equation into his computer, it also p r o d u c e d a picture. N o t a circle this time—Charlie's equation is m u c h m o r e complicated. W h a t it gave was a series of concentric col­ ored regions d r a w n o n D o n ' s crime m a p of Los Angeles, regions that h o m e d in o n t h e h o t z o n e w h e r e the killer lives. H a v i n g this m a p will still leave a lot of w o r k for D o n and his col­ leagues, b u t finding t h e killer is n o longer like looking for a needle in a haystack. T h a n k s t o Charlie's m a t h e m a t i c s , the haystack has suddenly dwindled t o a m e r e sackful of hay.
  • 19. Finding t h e H o t Zone 5 Charlie explains to D o n and the other FBI agents w o r k i n g t h e case that the serial criminal has tried n o t to reveal w h e r e h e lives, picking victims in w h a t h e thinks is a r a n d o m p a t t e r n of locations, b u t that t h e m a t h e m a t i ­ cal formula nevertheless reveals the truth: a h o t z o n e in which t h e crimi­ nal's residence is located, to a very high probability. D o n and the t e a m decide to investigate m e n within a certain range of ages, w h o live in t h e h o t zone, and use surveillance and stealth tactics t o obtain D N A evidence from the suspects' discarded cigarette butts, drinking straws, and the like, which can be m a t c h e d w i t h D N A from t h e crime-scene investigations. Within a few days—and a few heart-stopping m o m e n t s — t h e y have their m a n . T h e case is solved. D o n tells his y o u n g e r brother, " T h a t ' s some formula you've got there, Charlie." FACT OR FICTION? Leaving out a few dramatic twists, the above is w h a t t h e T V audience saw in the very first episode of NUMB3RS, broadcast o n January 23, 2005. Many viewers could n o t believe that mathematics could help capture a criminal in this way. In fact, that entire first episode w a s based fairly closely on a real case in which a single mathematical equation was used t o identify the hot zone w h e r e a criminal lived. It was the very equation, reproduced above, that viewers saw Charlie write o n his blackboard. T h e real-life m a t h e m a t i c i a n w h o p r o d u c e d t h a t formula is n a m e d Kim Rossmo. T h e technique of using m a t h e m a t i c s t o predict w h e r e a serial criminal lives, w h i c h R o s s m o helped t o establish, is called geographic profiling. In the 1980s R o s s m o w a s a y o u n g constable o n t h e police force in Vancouver, Canada. W h a t m a d e h i m u n u s u a l for a police officer w a s his talent for mathematics. T h r o u g h o u t school h e h a d b e e n a " m a t h w h i z , " the kind of student w h o m a k e s fellow students, a n d often teachers, a little nervous. T h e story is told that early in t h e twelfth g r a d e , b o r e d w i t h the slow pace of his m a t h e m a t i c s course, h e asked t o take t h e final exam in the second w e e k of t h e semester. After scoring o n e h u n d r e d percent, h e was excused from t h e r e m a i n d e r of t h e course. Similarly b o r e d w i t h t h e typical slow progress of police investigations involving violent serial criminals, R o s s m o decided t o g o back t o school,
  • 20. 6 T H E NUMBERS B E H I N D NUMB3RS ending u p w i t h a Ph.D. in criminology from Simon Fraser University, the first cop in Canada t o get one. His thesis advisers, Paul and Patricia Brantingham, w e r e pioneers in t h e development of mathematical models (essentially sets of equations that describe a situation) of criminal behavior, particularly those that describe w h e r e crimes are m o s t likely to occur based o n w h e r e a criminal lives, works, and plays. (It was the Brantinghams w h o noticed the location patterns of serial criminals that T V veiwers saw Charlie learning a b o u t from D o n and his FBI colleagues.) Rossmo's interest w a s a little different from the Brantinghams'. H e did n o t w a n t t o study p a t t e r n s of criminal behavior. As a police officer, h e w a n t e d t o use actual data a b o u t t h e locations of crimes linked to a single u n k n o w n p e r p e t r a t o r as an investigative tool t o help the police find t h e criminal. R o s s m o h a d s o m e initial successes in re-analyzing old cases, and after receiving his Ph.D. and b e i n g p r o m o t e d to detective inspector, h e pur­ sued his interest in developing b e t t e r m a t h e m a t i c a l m e t h o d s to do w h a t h e c a m e t o call criminal g e o g r a p h i c targeting (CGT). O t h e r s called the m e t h o d "geographic profiling," since it c o m p l e m e n t e d the well-known t e c h n i q u e of "psychological profiling" used by investigators to find criminals based o n their motivations and psychological characteristics. G e o g r a p h i c profiling a t t e m p t s t o locate a likely base of operation for a criminal b y analyzing t h e locations of their crimes. R o s s m o hit u p o n t h e key idea b e h i n d his seemingly m a g i c formula while riding o n a bullet train in J a p a n o n e day in 1991. Finding himself w i t h o u t a n o t e p a d t o w r i t e on, h e scribbled it o n a napkin. W i t h later refinements, the formula b e c a m e the principal e l e m e n t of a c o m p u t e r p r o g r a m R o s s m o w r o t e , called Rigel ( p r o n o u n c e d RYE-gel, a n d n a m e d after t h e star in the constellation Orion, the H u n t e r ) . Today, R o s s m o sells Rigel, along w i t h training and consultancy, to police and o t h e r investigative agencies a r o u n d the world t o help t h e m find criminals. W h e n R o s s m o describes h o w Rigel works to a law enforcement agency interested in t h e p r o g r a m , h e offers his favorite m e t a p h o r — t h a t of d e t e r m i n i n g t h e location of a rotating lawn sprinkler by analyzing the p a t t e r n of t h e w a t e r drops it sprays o n t h e g r o u n d . W h e n NUMB3RS
  • 21. Finding the Hot Zone 7 cocreators Cheryl H e u t o n and Nick Falacci w e r e w o r k i n g o n their pilot episode, they t o o k Rossmo's o w n m e t a p h o r as t h e w a y Charlie w o u l d hit u p o n the formula and explain the idea t o his brother. Rossmo h a d s o m e early successes dealing w i t h serial crime investiga­ tions in Canada, b u t w h a t really m a d e h i m a h o u s e h o l d n a m e a m o n g law enforcement agencies all over N o r t h America w a s t h e case of t h e South Side Rapist in Lafayette, Louisiana. For m o r e t h a n t e n years, an u n k n o w n assailant, his face w r a p p e d bandit-style in a scarf, h a d b e e n stalking w o m e n in t h e t o w n a n d assault­ ing t h e m . In 1998 t h e local police, s n o w e d u n d e r by t h o u s a n d s of tips and a corresponding n u m b e r of suspects, b r o u g h t R o s s m o in t o help. Using Rigel, R o s s m o analyzed t h e crime-location data a n d p r o d u c e d a m a p m u c h like the o n e Charlie displayed in NUMB3RS, w i t h b a n d s of color indicating the h o t z o n e and its increasingly h o t interior rings. T h e m a p enabled police t o n a r r o w d o w n t h e h u n t t o half a square mile a n d about a d o z e n suspects. Undercover officers c o m b e d t h e h o t z o n e using the same techniques p o r t r a y e d in NUMB3RS, t o obtain D N A samples of all males of t h e right age r a n g e in t h e area. Frustration set in w h e n each of t h e suspects in t h e h o t z o n e w a s cleared by D N A evidence. But t h e n they g o t lucky. T h e lead investigator, McCullan "Mac" Gallien, received an a n o n y m o u s tip pointing t o a very unlikely suspect—a sheriff's d e p u t y from a n e a r b y d e p a r t m e n t . As j u s t o n e m o r e tip o n t o p of t h e m o u n t a i n h e already had, Mac w a s inclined t o just file it, b u t o n a w h i m h e decided t o check t h e deputy's address. N o t even close t o t h e h o t z o n e . Still s o m e t h i n g niggled h i m , and h e d u g a little deeper. A n d t h e n h e hit t h e jackpot. T h e d e p u t y h a d previously lived at a n o t h e r address—right in t h e h o t z o n e ! D N A evidence w a s collected from a cigarette butt, and it m a t c h e d t h a t t a k e n from t h e crime scenes. T h e d e p u t y w a s arrested, a n d R o s s m o b e c a m e an instant celebrity in t h e crime-fighting world. Interestingly, w h e n H e u t o n and Falacci w e r e w r i t i n g t h e pilot epi­ sode of NUMB3RS, based o n this real-life case, they could n o t resist incorporating the s a m e d r a m a t i c twist at t h e end. W h e n Charlie first applies his formula, n o D N A m a t c h e s are found a m o n g t h e suspects in the h o t z o n e , as h a p p e n e d w i t h Rossmo's formula in Lafayette. Charlie's belief in his m a t h e m a t i c a l analysis is so s t r o n g that w h e n D o n tells h i m
  • 22. 8 THE NUMBERS B E H I N D NUMB3RS t h e search has d r a w n a blank, h e initially refuses t o accept this o u t c o m e . "You m u s t have missed h i m , " h e says. Frustrated and upset, Charlie huddles w i t h D o n at their father Alan's h o u s e , and Alan says, "I k n o w t h e p r o b l e m can't b e t h e m a t h , Charlie. It m u s t b e s o m e t h i n g else." This r e m a r k spurs D o n t o realize that finding t h e killer's residence m a y b e t h e w r o n g goal. "If y o u tried to find m e w h e r e I live, y o u w o u l d probably fail because I'm almost never there," h e notes. " I ' m usually at work." Charlie seizes o n this n o t i o n t o pursue a different line of attack, modifying his calculations t o look for two h o t z o n e s , o n e t h a t m i g h t contain t h e killer's residence and t h e other his place of w o r k . This t i m e Charlie's m a t h w o r k s . D o n m a n a g e s t o identify a n d catch t h e criminal j u s t before h e kills a n o t h e r victim. T h e s e days, Rossmo's c o m p a n y ECRI (Environmental Criminology Research, Inc.) offers t h e p a t e n t e d c o m p u t e r package Rigel along w i t h training in h o w t o use it effectively t o solve crimes. R o s s m o himself travels a r o u n d t h e world, t o Asia, Africa, E u r o p e , and t h e Middle East, assisting in criminal investigations and giving lectures to police and criminologists. T w o years of training, by R o s s m o or o n e of his assistants, is required t o learn t o adapt t h e use of t h e p r o g r a m to t h e idiosyncrasies of a particular criminal's behavior. Rigel does n o t score a big w i n every time. For example, Rossmo was called in o n t h e n o t o r i o u s Beltway Sniper case w h e n , during a three-week period in O c t o b e r 2002, t e n people w e r e killed and three others critically injured by w h a t t u r n e d o u t t o b e a pair of serial killers operating in and a r o u n d t h e Washington, D.C., area. R o s s m o concluded that the sniper's base w a s s o m e w h e r e in the suburbs t o t h e n o r t h of Washington, b u t it t u r n e d o u t that t h e t w o killers did n o t live in t h e area and moved t o o often t o b e located by geographic profiling. T h e fact that Rigel does n o t always w o r k will n o t c o m e as a surprise t o anyone familiar w i t h w h a t h a p p e n s w h e n y o u try t o apply m a t h e m a t ­ ics t o t h e m e s s y real w o r l d of people. M a n y people c o m e away from their h i g h school experience w i t h m a t h e m a t i c s thinking that there is a right w a y a n d a w r o n g w a y t o use m a t h to solve a p r o b l e m — i n t o o m a n y cases w i t h t h e teacher's w a y b e i n g t h e right o n e and their o w n a t t e m p t s b e i n g t h e w r o n g o n e . But this is rarely t h e case. Mathematics will always give y o u t h e correct answer (if you d o t h e m a t h right) w h e n
  • 23. Finding the Hot Zone 9 you apply it to very well-defined physical situations, such as calculating h o w m u c h fuel a j e t needs t o fly from Los Angeles t o N e w York. (That is, the m a t h will give you t h e right answer provided y o u start w i t h accu­ rate data a b o u t t h e total w e i g h t of t h e plane, passengers, a n d cargo, t h e prevailing winds, a n d so forth. Missing a key piece of i n p u t data t o incorporate into t h e m a t h e m a t i c a l equations will almost always result in an inaccurate answer.) But w h e n y o u apply m a t h t o a social p r o b l e m , such as a crime, things are rarely so clear-cut. Setting u p equations that capture elements of s o m e real-life activity is called constructing a "mathematical m o d e l . " In constructing a physical m o d e l of something, say an aircraft t o study in a w i n d tunnel, t h e impor­ tant thing is t o get everything right, apart from t h e size and t h e materials used. In constructing a mathematical m o d e l , t h e idea is t o get t h e appro­ priate behavior right. For example, to b e useful, a m a t h e m a t i c a l m o d e l of the w e a t h e r should predict rain for days w h e n it rains and predict sun­ shine o n sunny days. Constructing t h e m o d e l in t h e first place is usually the hard part. "Doing the m a t h " w i t h t h e model—i.e., solving t h e equa­ tions that m a k e u p the model—is generally m u c h easier, especially w h e n using computers. Mathematical models of t h e w e a t h e r often fail because the w e a t h e r is simply far t o o complicated (in everyday language, it's "too unpredictable") to b e captured by m a t h e m a t i c s w i t h great accuracy. As w e shall see in later chapters, t h e r e is usually n o such thing as "one correct w a y " t o use m a t h e m a t i c s t o solve p r o b l e m s in t h e real world, particularly p r o b l e m s involving people. To try t o m e e t t h e chal­ lenges that confront Charlie in NUMB3RS—locating criminals, tracing the spread of a disease or of counterfeit money, predicting t h e target selection of terrorists, and so o n — a m a t h e m a t i c i a n c a n n o t m e r e l y w r i t e d o w n an equation and solve it. T h e r e is a considerable art t o t h e process of assembling information and data, selecting m a t h e m a t i c a l variables that describe a situation, and t h e n m o d e l i n g it w i t h a set of equations. And once a m a t h e m a t i c i a n has c o n s t r u c t e d a m o d e l , t h e r e is still t h e m a t t e r of solving it in s o m e way, by approximations or calculations or c o m p u t e r simulations. Every step in t h e process requires j u d g m e n t a n d creativity. N o t w o m a t h e m a t i c i a n s w o r k i n g independently, h o w e v e r brilliant, are likely t o p r o d u c e identical results, if i n d e e d they can p r o d u c e useful results at all.
  • 24. 10 T H E NUMBERS B E H I N D NUMB3RS It is n o t surprising, then, that in t h e field of geographic profiling, R o s s m o has competitors. Dr. Grover M. G o d w i n of t h e Justice Center at t h e University of Alaska, a u t h o r of t h e b o o k Hunting Serial Predators, has developed a c o m p u t e r package called Predator that uses a b r a n c h of m a t h e m a t i c a l statistics called multivariate analysis t o pinpoint a serial killer's h o m e base b y analyzing t h e locations of crimes, w h e r e the victims w e r e last seen, a n d w h e r e t h e bodies w e r e discovered. N e d Levine, a H o u s t o n - b a s e d u r b a n planner, developed a p r o g r a m called Crimestat for t h e National Institute of Justice, a research b r a n c h of the U.S. Justice D e p a r t m e n t . It uses s o m e t h i n g called spatial statistics to analyze serial-crime data, and it can also b e applied t o help agents under­ stand such things as p a t t e r n s of a u t o accidents o r disease outbreaks. A n d David Canter, a professor of psychology at t h e University of Liverpool in England, a n d t h e director of t h e Centre for Investigative Psychology there, has developed his o w n c o m p u t e r p r o g r a m , Dragnet, w h i c h h e has s o m e t i m e s offered free t o researchers. C a n t e r has pointed o u t t h a t so far n o o n e has p e r f o r m e d a head-to-head comparison of the various m a t h / c o m p u t e r systems for locating serial criminals based o n applying t h e m in t h e s a m e cases, and h e has claimed in interviews that in t h e l o n g r u n , his p r o g r a m and o t h e r s will prove to b e at least as accurate as Rigel. ROSSMO'S FORMULA Finally, let's take a closer l o o k at t h e formulas R o s s m o scribbled d o w n o n t h a t p a p e r n a p k i n o n t h e bullet train in Japan b a c k in 1991. c To u n d e r s t a n d w h a t it m e a n s , i m a g i n e a grid of little squares super­ i m p o s e d o n t h e m a p , each square having t w o n u m b e r s that locate it: w h a t r o w it's in and w h a t c o l u m n it's in, "i" and "j". T h e probability, p.., that t h e killer's residence is in that square is w r i t t e n o n t h e left side of
  • 25. Finding the Hot Zone 11 the equation, and t h e right side shows h o w t o calculate it. T h e crime locations are represented by m a p coordinates, ( x ^ ) for t h e first crime, (x ,y ) for the second crime, a n d so on. W h a t t h e formula says is this: 2 2 To get the probability p.^ for t h e square in r o w "i", c o l u m n "j" of t h e grid, first calculate h o w far y o u have t o g o t o get from t h e center p o i n t (x.,y.) of that square t o each crime location ( x , y ) . T h e little "n" h e r e n n stands for any o n e of t h e crime l o c a t i o n s — n = l m e a n s "first crime," n = 2 m e a n s "second crime," and so on. T h e answer t o t h e question of h o w far you have t o g o is: IXi-xJ + ly.-yJ and this is used in t w o ways. Reading from left t o right in t h e formula, t h e first way is to p u t that distance in the d e n o m i n a t o r , w i t h (p in t h e n u m e r a t o r . T h e distance is raised t o the p o w e r / T h e choice of w h a t n u m b e r t o use for t h i s / w i l l b e based o n w h a t w o r k s best w h e n t h e formula is checked against data o n past crime patterns. (If y o u t a k e / = 2, for example, t h e n that p a r t of t h e formula will resemble t h e "inverse square law" that describes t h e force of gravity.) This part of t h e formula expresses t h e idea that t h e probabil­ ity of crime locations decreases as t h e distance increases, once outside of the buffer z o n e . T h e second w a y t h e formula uses t h e "traveling distance" of each crime involves the buffer z o n e . In t h e second fraction, y o u subtract t h e distance from 2B, w h e r e B is a n u m b e r t h a t will b e chosen t o describe the size of t h e buffer z o n e , and y o u use that subtraction result in the second fraction. T h e subtraction p r o d u c e s smaller answers as t h e distance increases, so that after raising those answers t o a n o t h e r power, g, in the d e n o m i n a t o r of t h e second p a r t of t h e formula, y o u get larger results. Together, the first and second parts of t h e formula p e r f o r m a sort of "balancing act," expressing t h e fact that as you m o v e away from t h e criminal's base, the probability of crimes first increases (as y o u m o v e t h r o u g h the buffer zone) and t h e n decreases. T h e t w o p a r t s of t h e formula are c o m b i n e d using a fancy m a t h e m a t i c a l notation, t h e G r e e k letter Z standing for " s u m (add up) t h e contributions from each of t h e
  • 26. 12 T H E NUMBERS B E H I N D NUMB3RS crimes t o t h e evaluation of the probability for the 'if grid square." T h e G r e e k letter (p is u s e d in t h e t w o parts as a way of placing m o r e "weight" o n o n e p a r t or t h e other. A larger choice of (p p u t s m o r e weight o n the p h e n o m e n o n of "decreasing probability as distance increases," whereas a smaller 9 emphasizes t h e effect of t h e buffer z o n e . O n c e t h e formula is used t o calculate t h e probabilities, p„, of all of t h e little squares in t h e grid, it's easy t o m a k e a h o t z o n e map. You just color t h e squares, w i t h t h e highest probabilities bright yellow, slightly smaller probabilities o r a n g e , t h e n red, and so on, leaving t h e squares w i t h l o w probability uncolored. Rossmo's formula is a g o o d example of t h e art of using m a t h e m a t i c s t o describe i n c o m p l e t e k n o w l e d g e of real-world p h e n o m e n a . Unlike t h e law of gravity, w h i c h t h r o u g h careful m e a s u r e m e n t s can b e observed t o o p e r a t e the same way every time, descriptions of t h e behavior of individual h u m a n beings are at best approximate and uncertain. W h e n R o s s m o checked o u t his formula o n past crimes, h e h a d to find the best fit of his formula t o those data b y choosing different possible values of / a n d g, a n d of B a n d (p. H e t h e n used those findings in analyzing future crime p a t t e r n s , still allowing for further fine-tuning in each n e w investigation. Rossmo's m e t h o d is definitely n o t rocket science—space travel d e p e n d s crucially o n always getting t h e right answer w i t h great accu­ racy. But it is nevertheless science. It does n o t w o r k every time, and the answers it gives are probabilities. But in crime detection and other d o m a i n s involving h u m a n behavior, k n o w i n g those probabilities can s o m e t i m e s m a k e all t h e difference.
  • 27. CHAPTER 2 Fighting Crime with Statistics 101 THE ANGEL OF DEATH By 1996, Kristen Gilbert, a thirty-three-year-old divorced m o t h e r of t w o sons, ages seven and ten, and a nurse in W a r d C at t h e Veteran's Affairs Medical Center in N o r t h a m p t o n , Massachusetts, h a d built u p quite a reputation a m o n g her colleagues at the hospital. O n several occasions she was the first o n e to notice that a patient was going into cardiac arrest and to sound a "code blue" to bring t h e e m e r g e n c y resuscitation t e a m . She always stayed calm, and was c o m p e t e n t and efficient in administering to the patient. Sometimes she w o u l d give t h e patient an injection of t h e heart-stimulant d r u g epinephrine to a t t e m p t to restart the h e a r t before the emergency t e a m arrived, occasionally saving t h e patient's life in this way. T h e other nurses had given h e r the nickname 'Angel of Death." But that same year, three nurses approached the authorities to express their growing suspicions that something was not quite right. There had been just too many deaths from cardiac arrest in that particular ward, they felt. There had also been several unexplained shortages of epinephrine. T h e nurses were starting to fear that Gilbert was giving the patients large doses of the drug to bring o n the heart attacks in the first place, so that she could play the heroic role of trying to save them. T h e 'Angel of Death" nickname was beginning to sound m o r e apt than they h a d first intended. T h e hospital launched an investigation, b u t found nothing untoward. In particular, the n u m b e r of cardiac deaths at the unit was broadly in line w i t h the rates at other VA hospitals, they said. Despite t h e findings of t h e initial
  • 28. 14 T H E NUMBERS B E H I N D NUMB3RS investigation, however, the staff at the hospital remained suspicious, and eventually a second investigation was begun. This included bringing in a professional statistician, Stephen Gehlbach of the University of Massachu­ setts, to take a closer look at the unit's cardiac arrest and mortality figures. Largely as a result of Gehlbach's analysis, in 1998 the U.S. Attorney's Office decided to convene a g r a n d j u r y to hear the evidence against Gilbert. Part of t h e evidence w a s h e r alleged motivation. In addition to seek­ ing t h e excitement of t h e code blue a l a r m and the resuscitation process, plus t h e recognition for having struggled valiantly to save t h e patient, it w a s suggested t h a t she s o u g h t t o impress h e r boyfriend, w h o also w o r k e d at t h e hospital. Moreover, she h a d access t o t h e epinephrine. But since n o o n e h a d seen h e r administer any fatal injections, the case against her, while suggestive, was purely circumstantial. Although the patients involved w e r e mostly middle-aged m e n n o t regarded as poten­ tial h e a r t attack victims, it w a s possible that their attacks had occurred naturally. W h a t tipped t h e balance, and led t o a decision t o indict Gilbert for multiple m u r d e r , w a s Gehlbach's statistical analysis. THE SCIENCE OF STATE Statistics is widely used in law enforcement in m a n y ways and for m a n y p u r p o s e s . In NUMB3RS, Charlie often carries o u t a statistical analysis, and t h e use of statistical techniques will appear in m a n y chapters in this b o o k , often w i t h o u t o u r m a k i n g explicit m e n t i o n of t h e fact. But w h a t exactly does statistics entail? A n d w h y was t h e w o r d in the singular in t h a t last sentence? T h e w o r d "statistics" c o m e s from the Latin t e r m statisticum collegium, m e a n i n g "council of state" a n d t h e Italian w o r d statista, m e a n i n g "states­ m a n , " w h i c h reflects t h e initial uses of the technique. T h e G e r m a n w o r d Statistik likewise originally m e a n t t h e analysis of data about the state. Until t h e n i n e t e e n t h century, t h e equivalent English t e r m was "political arithmetic," after w h i c h t h e w o r d "statistics" was introduced t o refer t o any collection and classification of data. Today, "statistics" really has t w o c o n n e c t e d meanings. T h e first is the collection a n d tabulation of data; t h e second is t h e use of mathematical and o t h e r m e t h o d s t o d r a w meaningful and useful conclusions from
  • 29. Fighting Crime with Statistics 101 15 tabulated data. S o m e statisticians refer t o t h e f o r m e r activity as "little-s statistics" and the latter activity as "big-S Statistics". Spelled w i t h a lower-case s, t h e w o r d is treated as plural w h e n it refers t o a collection of n u m b e r s . But it is singular w h e n used t o refer t o t h e activity of collecting and tabulating those n u m b e r s . "Statistics" (with a capital S) refers t o an activity, and h e n c e is singular. T h o u g h m a n y sports fans a n d o t h e r kinds of people enjoy collecting and tabulating numerical data, t h e real value of little-s statistics is t o provide t h e data for big-S Statistics. M a n y of t h e m a t h e m a t i c a l tech­ niques used in big-S Statistics involve t h e b r a n c h of m a t h e m a t i c s k n o w n as probability theory, which b e g a n in t h e sixteenth a n d seventeenth centuries as an a t t e m p t t o u n d e r s t a n d t h e likely o u t c o m e s of g a m e s of chance, in order t o increase t h e likelihood of winning. But w h e r e a s probability t h e o r y is a definite b r a n c h of m a t h e m a t i c s , Statistics is essentially an applied science that uses m a t h e m a t i c a l m e t h o d s . While the law enforcement profession collects a large quantity of little- s statistics, it is the use of big-S Statistics as a tool in fighting crime that w e shall focus on. (From n o w o n w e shall drop the "big S", "little s" terminol­ ogy and use the w o r d "statistics" the way statisticians do, to m e a n b o t h , leaving the reader to determine the intended m e a n i n g from the context.) Although s o m e applications of statistics in law e n f o r c e m e n t use sophisticated m e t h o d s , the basic techniques covered in a first-semester college statistics course are often e n o u g h t o crack a case. This was certainly t r u e for United States v. Kristen Gilbert. In that case, a crucial question for the g r a n d j u r y w a s w h e t h e r there w e r e significantly m o r e deaths in t h e unit w h e n Kristen Gilbert w a s o n duty t h a n at o t h e r times. T h e key w o r d here is "significantly". O n e or t w o extra deaths o n her watch could b e coincidence. H o w m a n y deaths w o u l d it take to reach the level of "significance" sufficient t o indict Gilbert? This is a question that only statistics can answer. Accordingly, Stephen Gehlbach was asked to provide the g r a n d j u r y w i t h a s u m m a r y of his findings. HYPOTHESIS TESTING Gehlbach's testimony was based o n a f u n d a m e n t a l statistical t e c h n i q u e k n o w n as hypothesis testing. This m e t h o d uses probability t h e o r y t o
  • 30. 16 THE NUMBERS B E H I N D NUMB3RS determine whether an observed outcome is so unusual that it is highly unlikely to have occurred naturally. One of the first things Gehlbach did was plot the annual number of deaths at the hospital from 1988 through 1997, broken down by shifts— midnight to 8:00 AM, 8:00 AM to 4:00 PM, and 4:00 PM to midnight. The resulting graph is shown in Figure 1. Each vertical bar shows the total number of deaths in the year during that particular shift. 40 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 Year • Night (12 A . M . - 8 A.M.) • Day (8 A . M . - 4 P.M.) H Evening (4 P.M.-12 A.M.) Figure 1 . Total deaths at the hospital, by shift and year. The graph shows a definite pattern. For the first two years, there were around ten deaths per year on each shift. Then, for each of the years 1990 through 1995, one of the three shifts shows between 25 and 35 deaths per year. Finally, for the last two years, the figures drop back to roughly ten deaths on each of the three shifts. When the investigators examined Kristen Gilbert's work record, they discovered that she started work in Ward C in March 1990 and stopped working at the hospital in February 1996. Moreover, for each of the years she worked at the VA, the shift that showed the dramatically increased number of deaths was the one she worked. To a layperson, this might suggest that Gilbert was clearly respon­ sible for the deaths, but on its own it would not be sufficient to secure a conviction—indeed, it might not be enough to justify even an indictment. The problem is that it may be just a coincidence. The job of the statistician
  • 31. Fighting Crime with Statistics 101 17 in this situation is to d e t e r m i n e just h o w unlikely such a coincidence would be. If the answer is that the likelihood of such a coincidence is, say, 1 in 100, then Gilbert might well b e innocent; and even 1 in 1,000 leaves some d o u b t as to her guilt; b u t with a likelihood of, say, 1 in 100,000, m o s t people w o u l d find the evidence against her t o b e pretty compelling. To see h o w hypothesis testing works, let's start w i t h t h e simple example of tossing a coin. If t h e coin is perfectly balanced (i.e., unbiased or fair), t h e n t h e probability of getting heads is 0.5.* Suppose w e toss t h e coin ten times in a r o w t o see if it is biased in favor of heads. T h e n w e can get a range of different o u t c o m e s , and it is possible t o c o m p u t e t h e likelihood of different results. For example, t h e probability of getting at least six heads is a b o u t 0.38. (The calculation is straightforward b u t a bit intricate, because there are m a n y possible ways y o u can get six or m o r e heads in ten tosses, and y o u have t o take a c c o u n t of all of t h e m . ) T h e figure of 0.38 p u t s a precise numerical value o n t h e fact that, o n an intuitive level, w e w o u l d n o t b e surprised if t e n coin tosses gave six or m o r e heads. For at least seven heads, t h e probability w o r k s o u t at 0.17, a figure that corresponds t o o u r intuition t h a t seven or m o r e heads is s o m e w h a t u n u s u a l b u t certainly n o t a cause for suspicion t h a t t h e coin was biased. W h a t w o u l d surprise us is nine or t e n heads, a n d for that t h e probability w o r k s o u t at a b o u t 0.01, or 1 in 100. T h e probability of get­ ting ten heads is a b o u t 0.001, or 1 in 1,000, a n d if t h a t h a p p e n e d w e w o u l d definitely suspect an unfair coin. T h u s , b y tossing t h e coin ten times, w e can form a reliable, precise j u d g m e n t , based o n m a t h e m a t i c s , of the hypothesis that t h e coin is unbiased. In the case of the suspicious deaths at t h e Veteran's Affairs Medical Center, the investigators w a n t e d to k n o w if t h e n u m b e r of deaths that occurred w h e n Kristen Gilbert was o n d u t y w a s so unlikely that it could not be merely happenstance. T h e m a t h is a bit m o r e complicated t h a n for the coin tossing, b u t t h e idea is t h e same. Table 1 gives the data t h e investigators had at their disposal. It gives n u m b e r s of shifts, classified in different ways, and covers t h e eighteen-month period ending in February *Actually, this is not entirely accurate. Because of inertia! properties of a physical coin, there is a slight tendency for it to resist turning, with the result that, if a perfectly balanced coin is given a random initial flip, the probability that it will land the same way up as it started is about 0.51. But we will ignore this caveat in what follows.
  • 32. 18 THE N U M B E R S B E H I N D NUMB3RS 1996, the month when the three nurses told their supervisor of their concerns, shortly after which Gilbert took a medical leave. GILBERT PRESENT DEATH O N SHIFT YES NO TOTAL YES 40 217 257 NO 34 1,350 1,384 TOTAL 74 1,567 1,641 Table 1. The data for the statistical analysis in the Gilbert case. Altogether, there were 74 deaths, spread over a total of 1,641 shifts. If the deaths are assumed to have occurred randomly, these figures suggest that the probability of a death on any one shift is about 74 out of 1,641, or 0.045. Focusing now on the shifts when Gilbert was on duty, there were 257 of them. If Gilbert was not killing any of the patients, we would expect there to be around 0.045 x 257 = 11.6 deaths on her shifts, i.e., around 11 or 12 deaths. In fact there were more—40 to be pre­ cise. How likely is this? Using mathematical methods similar to those for the coin tosses, statistician Gehlbach calculated that the probability of having 40 or more of the 74 deaths occur on Gilbert's shifts was less than 1 in 100 million. In other words, it is unlikely in the extreme that Gilbert's shifts were merely "unlucky" for the patients. The grand jury decided there was sufficient evidence to indict Gilbert—presumably the statistical analysis was the most compelling evidence, but we cannot know for sure, as a grand jury's deliberations are not public knowledge. She was accused of four specific murders and three attempted murders. Because the VA is a federal facility, the trial would be in a federal court rather than a state court, and subject to fed­ eral laws. A significant consequence of this fact for Gilbert was that although Massachusetts does not have a death penalty, federal law does, and that is what the prosecutor asked for. STATISTICS IN THE COURTROOM? An interesting feature of this case is that the federal trial judge ruled in pretrial deliberations that the statistical evidence should not be
  • 33. Fighting Crime with Statistics 101 19 presented in court. In m a k i n g his ruling, t h e j u d g e t o o k n o t e of a submission by a second statistician b r o u g h t into t h e case, G e o r g e C o b b of M o u n t Holyoke College. Cobb and Gehlbach did n o t disagree o n any of t h e statistical analysis. (In fact, they ended u p writing a joint article about t h e case.) Rather, their roles were different, and they w e r e addressing different issues. Gehlbach's task was to use statistics t o d e t e r m i n e if there w e r e reasonable g r o u n d s t o suspect Gilbert of multiple murder. More specifically, h e carried o u t an analysis that showed that the increased n u m b e r s of deaths at t h e hospital during the shifts w h e n Gilbert was o n duty could n o t have arisen due t o chance variation. T h a t was sufficient t o cast suspicion o n Gilbert as the cause of the increase, b u t n o t at all e n o u g h t o prove that she did cause the increase. W h a t C o b b argued was that the establishment of a statistical relationship does n o t explain the cause of that relationship. T h e j u d g e in the case accepted this argument, since the p u r p o s e of the trial was n o t t o decide if there were g r o u n d s t o m a k e Gilbert a suspect—the g r a n d j u r y and the state attorney's office h a d d o n e that. Rather, t h e j o b before the court was to determine w h e t h e r or n o t Gilbert caused the deaths in ques­ tion. His reason for excluding the statistical evidence was that, as experi­ ences in previous court cases had demonstrated, j u r o r s n o t well versed in statistical reasoning—and that w o u l d b e almost all jurors—typically have great difficulty appreciating w h y odds of 1 in 100 million against the suspi­ cious deaths occurring by chance does not imply that the odds that Gilbert did not kill the patients are likewise 1 in 100 million. T h e original odds could be caused by something else. Cobb illustrated the distinction by means of a famous example from the long struggle physicians and scientists had in overcoming the powerful tobacco lobby to convince governments and the public that cigarette smok­ ing causes lung cancer. Table 2 shows the mortality rates for three categories of people: nonsmokers, cigarette smokers, and cigar and pipe smokers. Nonsmokers 20.2 Cigarette smokers 20.5 Cigar and pipe smokers 35.3 Table 2. Mortality rates per 1,000 people per year.
  • 34. 20 T H E NUMBERS B E H I N D NUMB3RS At first glance, t h e figures in Table 2 s e e m t o indicate that cigarette s m o k i n g is n o t d a n g e r o u s b u t pipe and cigar s m o k i n g are. However, this is n o t t h e case. T h e r e is a crucial variable lurking behind the data that the n u m b e r s themselves d o n o t indicate: age. T h e average age of the non- smokers w a s 54.9, t h e average age of t h e cigarette smokers was 50.5, and the average age of the cigar and pipe smokers was 65.9. Using statistical techniques t o m a k e allowance for t h e age differences, statisticians were able t o adjust t h e figures to p r o d u c e Table 3. Nonsmokers 20.3 Cigarette smokers 28.3 Cigar and pipe smokers 21.2 Table 3. Mortality rates per 1,000 people per year, adjusted for age. N o w a very different p a t t e r n emerges, indicating that cigarette s m o k i n g is highly d a n g e r o u s . W h e n e v e r a calculation of probabilities is m a d e based o n observa­ tional data, t h e m o s t that can generally b e concluded is that there is a correlation b e t w e e n t w o or m o r e factors. T h a t can m e a n e n o u g h to spur further investigation, b u t o n its o w n it does n o t establish causation. T h e r e is always t h e possibility of a hidden variable that lies behind the correlation. W h e n a study is m a d e of, say, t h e effectiveness or safety of a n e w d r u g o r medical p r o c e d u r e , statisticians handle t h e p r o b l e m of hidden p a r a m e t e r s by relying n o t o n observational data, b u t instead by c o n d u c t i n g a r a n d o m i z e d , double-blind trial. In such a study, the target p o p u l a t i o n is divided i n t o t w o g r o u p s by an entirely r a n d o m procedure, w i t h t h e g r o u p allocation u n k n o w n t o b o t h t h e experimental subjects a n d t h e caregivers administering t h e d r u g or t r e a t m e n t (hence t h e t e r m "double-blind"). O n e g r o u p is given t h e n e w d r u g or treatment, the o t h e r is given a placebo or d u m m y t r e a t m e n t . W i t h such an experiment, t h e r a n d o m allocation into g r o u p s overrides t h e possible effect o f hid­ d e n p a r a m e t e r s , so that in this case a low probability that a positive result is simply chance variation can indeed b e taken as conclusive evidence that t h e d r u g or t r e a t m e n t is w h a t caused t h e result.
  • 35. Fighting Crime with Statistics 101 21 In trying t o solve a crime, t h e r e is of course n o choice b u t t o w o r k w i t h t h e data available. H e n c e , use of t h e hypothesis-testing procedure, as in the Gilbert case, can b e highly effective in t h e identifica­ tion of a suspect, b u t o t h e r m e a n s are generally required t o secure a conviction. In United States v. Kristen Gilbert, t h e j u r y was n o t p r e s e n t e d w i t h Gehlbach's statistical analysis, b u t they did find sufficient evidence t o convict her o n three c o u n t s of first-degree m u r d e r , o n e c o u n t of sec­ ond-degree murder, and t w o c o u n t s of a t t e m p t e d m u r d e r . A l t h o u g h t h e prosecution asked for t h e d e a t h sentence, t h e j u r y split 8-4 o n t h a t issue, and accordingly Gilbert w a s sentenced t o life i m p r i s o n m e n t w i t h n o possibility of parole. POLICING THE POLICE Another use of basic statistical techniques in law enforcement concerns the important matter of ensuring that the police themselves obey the law. Law enforcement officers are given a considerable a m o u n t of p o w e r over their fellow citizens, a n d o n e of t h e duties of society is t o m a k e certain that they d o n o t abuse that power. In particular, police officers are supposed to treat everyone equally and fairly, free of any bias based o n gender, race, ethnicity, e c o n o m i c status, age, dress, or religion. But d e t e r m i n i n g bias is a tricky business and, as w e saw in o u r previ­ ous discussion of cigarette s m o k i n g , a superficial glance at t h e statistics can s o m e t i m e s lead t o a completely false conclusion. This is illustrated in a particularly d r a m a t i c fashion by t h e following example, which, while n o t related t o police activity, clearly indicates t h e n e e d t o a p p r o a c h statistics w i t h s o m e m a t h e m a t i c a l sophistication. In t h e 1970s, s o m e b o d y noticed that 44 p e r c e n t of m a l e applicants t o the g r a d u a t e school of t h e University of California at Berkeley w e r e accepted, b u t only 35 percent of female applicants w e r e accepted. O n the face of it, this looked like a clear case of g e n d e r discrimination, and, n o t surprisingly (particularly at Berkeley, l o n g acknowledged as h o m e to m a n y leading advocates for g e n d e r equality), t h e r e w a s a lawsuit over gender bias in admissions policies.
  • 36. 22 T H E NUMBERS B E H I N D NUMB3RS It turns out that Berkeley applicants do not apply to the graduate school, but to individual programs of study—such as engineering, phys­ ics, or English—so if there is any admissions bias, it will occur within one or more particular program. Table 4 gives the admission data pro­ gram by program: Major Male apps % admit Female apps % admit A 825 62 108 82 CD 560 63 25 68 C 325 37 593 34 D 417 33 375 35 E 191 28 393 24 F 373 6 341 7 Table 4. Admission figures from the University of California at Berkeley on a program-by-program basis. If you look at each program individually, however, there doesn't appear to be an advantage in admission for male applicants. Indeed, the percentage of female applicants admitted to heavily subscribed program A is considerably higher than for males, and in all other programs the percentages are fairly close. So how can there appear to be an advantage for male applicants overall? To answer this question, you need to look at what programs males and females applied to. Males applied heavily to programs A and B, females applied primarily to programs C, D, E, and F. The programs that females applied to were more difficult to get into than those for males (the percentages admitted are low for both genders), and this is why it appears that males had an admission advantage when looking at the aggregate data. There was indeed a gender factor at work here, but it had nothing to do with the university's admissions procedures. Rather, it was one of self-selection by the applying students, where female applicants avoided progams A and B.
  • 37. Fighting Crime with Statistics 101 23 T h e Berkeley case was an example of a p h e n o m e n o n k n o w n as Simpson's paradox, n a m e d for E. H . Simpson, w h o studied this curious p h e n o m e n o n in a famous 1951 paper.* HOW DO YOU DETERMINE BIAS? W i t h the above cautionary example in mind, w h a t should w e m a k e of the study carried o u t in Oakland, California, in 2003 (by t h e R A N D Corporation, at t h e request of t h e O a k l a n d Police D e p a r t m e n t ' s Racial Profiling Task Force), t o d e t e r m i n e if there was systematic racial bias in the way police stopped motorists? T h e R A N D researchers analyzed 7,607 vehicle stops recorded b y Oakland police officers b e t w e e n J u n e and D e c e m b e r 2003, using vari­ ous statistical tools t o examine a n u m b e r of variables t o uncover any evidence that suggested racial profiling. O n e figure they found w a s that blacks w e r e involved in 56 percent of all traffic stops studied, a l t h o u g h they m a k e u p just 35 percent of O a k l a n d ' s residential population. D o e s this finding indicate racial profiling? Well, it might, b u t as s o o n as y o u look m o r e closely at w h a t o t h e r factors could b e reflected in those n u m b e r s , the issue is by n o m e a n s clear cut. For instance, like m a n y inner cities, O a k l a n d has s o m e areas w i t h m u c h higher crime rates t h a n others, and t h e police patrol those higher crime areas at a m u c h greater rate t h a n they d o areas having less crime. As a result, they m a k e m o r e traffic stops in those areas. Since t h e higher crime areas typically have greater concentrations of m i n o r i t y g r o u p s , the higher rate of traffic stops in those areas manifests itself as a higher rate of traffic stops of minority drivers. To overcome these uncertainties, t h e R A N D researchers devised a particularly ingenious way t o look for possible racial bias. If racial profil­ ing was occurring, they reasoned, stops of minority drivers w o u l d b e higher w h e n the officers could d e t e r m i n e the driver's race prior t o mak­ ing the stop. Therefore, they c o m p a r e d t h e stops m a d e d u r i n g a period * E . H. S i m p s o n . " T h e I n t e r p r e t a t i o n o f I n t e r a c t i o n in C o n t i n g e n c y T a b l e s , " Jour­ nal of the Royal Statistical Society, Ser. B, 13 (1951) 2 3 8 - 2 4 1 .
  • 38. 24 T H E NUMBERS B E H I N D NUMB3RS j u s t before nightfall w i t h those m a d e after d a r k — w h e n t h e officers w o u l d b e less likely t o b e able t o d e t e r m i n e t h e driver's race. T h e figures s h o w e d that 50 p e r c e n t of drivers stopped d u r i n g the daylight period w e r e black, c o m p a r e d w i t h 54 p e r c e n t w h e n it was dark. Based o n that finding, t h e r e does n o t appear to b e systematic racial bias in traffic stops. But t h e researchers d u g a little further, and looked at the officers' o w n reports as t o w h e t h e r they could d e t e r m i n e the driver's race prior t o m a k i n g t h e stop. W h e n officers r e p o r t e d k n o w i n g the race in advance of t h e stop, 6 6 p e r c e n t of drivers stopped w e r e black, c o m p a r e d w i t h only 44 percent w h e n t h e police r e p o r t e d n o t k n o w i n g the driver's race in advance. This is a fairly s t r o n g indicator of racial bias.* *Sadly, d e s p i t e m a n y efforts t o e l i m i n a t e t h e p r o b l e m , racial bias b y p o l i c e s e e m s t o b e a p e r s i s t e n t issue t h r o u g h o u t t h e country. To cite just o n e recent r e p o r t , A n Analysis of Traffic Stop Data in Riverside, California, b y Larry K. Gaines of t h e C a l i f o r n i a State University in San B e r n a r d i n o , p u b l i s h e d in Police Quarterly, 9, 2 , J u n e 2 0 0 6 , p p . 2 1 0 - 2 3 3 : " T h e f i n d i n g s f r o m racial p r o f i l i n g or traffic s t o p studies h a v e b e e n fairly c o n s i s t e n t : M i n o r i t i e s , especially African A m e r i c a n s , are s t o p p e d , t i c k e t e d , a n d s e a r c h e d at a h i g h e r rate as c o m p a r e d t o W h i t e s . For e x a m p l e , L a m b e r t h (cited in State v. Pedro Soto, 1996) f o u n d t h a t t h e M a r y l a n d State Police s t o p p e d a n d s e a r c h e d A f r i c a n A m e r i c a n s at a h i g h e r rate as c o m p a r e d t o their rate o f s p e e d i n g v i o l a t i o n s . Harris (1999) e x a m i n e d c o u r t records in A k r o n , D a y t o n , T o l e d o , a n d C o l u m b u s , O h i o , a n d f o u n d t h a t African A m e r i c a n s w e r e c i t e d at a rate t h a t surpassed t h e i r r e p r e s e n t a t i o n in t h e d r i v i n g p o p u l a t i o n . C o r d n e r , W i l l i a m s , a n d Z u n i g a (2000) a n d C o r d n e r , W i l l i a m s , a n d Velasco (2002) f o u n d similar t r e n d s in San D i e g o , C a l i f o r n i a . Zingraff a n d his c o l l e a g u e s (2000) e x a m i n e d s t o p s b y t h e N o r t h Carolina H i g h w a y Patrol a n d f o u n d t h a t A f r i c a n A m e r i c a n s w e r e o v e r r e p r e s e n t e d in s t o p s a n d searches."
  • 39. CHAPTER Data Mining 3 Finding Meaningful in Masses of Information Patterns BRUTUS Charlie Eppes is sitting in front of a b a n k of c o m p u t e r s and television monitors. H e is testing a c o m p u t e r p r o g r a m h e is developing to help police m o n i t o r large crowds, l o o k i n g for u n u s u a l behavior that could indicate a p e n d i n g criminal or terrorist act. His idea is t o use standard mathematical equations that describe the flow of fluids—in rivers, lakes, oceans, tanks, pipes, even blood vessels.* H e is trying o u t t h e n e w sys­ t e m at a fund-raising reception for o n e of t h e California state senators. Overhead cameras m o n i t o r t h e diners as they m o v e a r o u n d t h e r o o m , and Charlie's c o m p u t e r p r o g r a m analyzes t h e "flow" of t h e people. Suddenly t h e test takes o n an u n e x p e c t e d aspect. T h e FBI receives a telephone w a r n i n g that a g u n m a n is in t h e r o o m , intending t o kill t h e senator. T h e software works, and Charlie is able to identify t h e g u n m a n , b u t D o n and his t e a m are n o t able t o get t o the killer before h e has shot t h e senator and t h e n t u r n e d t h e g u n o n himself. T h e dead assassin t u r n s o u t t o b e a Vietnamese i m m i g r a n t , a f o r m e r Vietcong m e m b e r , w h o , despite having b e e n in prison in California, * T h e idea is b a s e d o n several real-life p r o j e c t s t o use t h e e q u a t i o n s t h a t d e s c r i b e f l u i d f l o w s in o r d e r t o analyze v a r i o u s kinds o f c r o w d activity, i n c l u d i n g f r e e w a y traf­ fic f l o w , s p e c t a t o r s e n t e r i n g a n d l e a v i n g a large s p o r t s s t a d i u m , a n d e m e r g e n c y exits f r o m b u r n i n g b u i l d i n g s .
  • 40. 26 T H E NUMBERS B E H I N D NUMB3RS s o m e h o w m a n a g e d t o obtain U.S. citizenship and b e the recipient of a regular pension from t h e U.S. Army. H e h a d also taken the illegal d r u g speed o n t h e evening of t h e assassination. W h e n D o n makes s o m e enquiries t o find o u t j u s t w h a t is g o i n g on, h e is visited by a CIA agent w h o asks for help in trying t o prevent t o o m u c h information about the case leaking out. Apparently t h e dead killer h a d b e e n part of a covert CIA behavior modification project carried o u t in California prisons dur­ ing t h e 1960s t o t u r n i n m a t e s into trained assassins w h o , w h e n activated, w o u l d carry o u t their assigned task before killing themselves. (Sadly, this idea is n o less fanciful t h a n t h a t of Charlie using fluid flow equations to study c r o w d behavior.) But w h y h a d this particular individual suddenly b e c o m e active and m u r d e r e d t h e state senator? T h e picture b e c o m e s m u c h clearer w h e n a second m u r d e r occurs. T h e victim this t i m e is a p r o m i n e n t psychiatrist, the killer a C u b a n immi­ grant. T h e killer h a d also spent t i m e in a California prison, and h e t o o w a s t h e recipient of regular A r m y pension checks. But o n this occasion, w h e n the assassin tries to s h o o t himself after killing the victim, the g u n fails t o g o off and h e has t o flee t h e scene. A fingerprint identification from the g u n soon leads t o his arrest. W h e n D o n realizes that t h e dead senator h a d b e e n u r g i n g a repeal of t h e statewide b a n o n t h e use of behavior modification techniques o n prison inmates, and that t h e dead psychiatrist h a d b e e n r e c o m m e n d i n g t h e re-adoption of such techniques t o overcome criminal tendencies, h e quickly concludes that s o m e o n e has started t o t u r n t h e conditioned assassins o n t h e very p e o p l e w h o w e r e pressing for the reuse of the techniques that h a d p r o d u c e d t h e m . But who? D o n thinks his best line of investigation is to find o u t w h o supplied t h e g u n s t h a t t h e t w o killers h a d used. H e k n o w s that t h e w e a p o n s orig­ inated w i t h a dealer in Nevada. Charlie is able t o provide t h e next step, w h i c h leads to t h e identification of the individual b e h i n d the t w o assas­ sinations. H e obtains data o n all g u n sales involving that particular dealer and analyzes t h e relationships a m o n g all sales that originated there. H e explains t h a t h e is e m p l o y i n g m a t h e m a t i c a l techniques similar t o those used t o analyze calling p a t t e r n s o n t h e t e l e p h o n e n e t w o r k — a n a p p r o a c h used frequently in real-life law enforcement.
  • 41. Data Mining 27 This is w h a t viewers saw in t h e third-season episode of NUMB3RS called "Brutus" (the code n a m e for t h e fictitious CIA conditioned- assassinator project), first aired o n N o v e m b e r 24, 2006. As usual, t h e m a t h e m a t i c s Charlie uses in the s h o w is based o n real life. T h e m e t h o d Charlie uses to track t h e g u n distribution is generally referred to as "link analysis," and is o n e a m o n g m a n y that g o u n d e r the collective heading of "data mining." D a t a m i n i n g obtains useful information a m o n g the mass of data that is available—often publicly— in m o d e r n society. FINDING MEANING IN INFORMATION Data mining was initially developed by t h e retail industry to detect cus­ t o m e r purchasing patterns. (Ever w o n d e r w h y s u p e r m a r k e t s offer cus­ t o m e r s those loyalty cards—sometimes called "club" cards—in exchange for discounts? In p a r t it's t o e n c o u r a g e c u s t o m e r s t o k e e p s h o p p i n g at the same store, b u t l o w prices w o u l d d o that. T h e significant factor for t h e c o m p a n y is that it enables t h e m t o track detailed purchase p a t t e r n s that they can link to c u s t o m e r s ' h o m e zip codes, information that they can t h e n analyze using data-mining techniques.) T h o u g h m u c h of the w o r k in data m i n i n g is d o n e by c o m p u t e r s , for the m o s t part those c o m p u t e r s d o n o t r u n autonomously. H u m a n expertise also plays a significant role, and a typical data-mining investi­ gation will involve a constant back-and-forth interplay b e t w e e n h u m a n expert and m a c h i n e . Many of the c o m p u t e r applications used in data m i n i n g fall u n d e r the general area k n o w n as artificial intelligence, a l t h o u g h that t e r m can be misleading, being suggestive of c o m p u t e r s that think a n d act like people. Although m a n y people believed that w a s a possibility back in the 1950s w h e n AI first b e g a n t o b e developed, it eventually b e c a m e clear that this was n o t g o i n g to h a p p e n within t h e foreseeable future, and m a y well never b e the case. But that realization did n o t prevent the development of m a n y " a u t o m a t e d reasoning" p r o g r a m s , s o m e of which eventually found a powerful and i m p o r t a n t use in data mining, w h e r e the h u m a n expert often provides t h e "high-level intelligence" that guides the c o m p u t e r p r o g r a m s that d o the bulk of t h e w o r k . In this way, data
  • 42. 28 T H E NUMBERS B E H I N D NUMB3RS m i n i n g provides an excellent example of t h e p o w e r that results w h e n h u m a n brains t e a m u p w i t h c o m p u t e r s . A m o n g t h e m o r e p r o m i n e n t m e t h o d s and tools used in data m i n i n g are: • Link analysis—looking for associations and o t h e r forms of c o n n e c t i o n a m o n g , say, criminals or terrorists • Geometric clustering—a specific form of link analysis • Software agents—small, self-contained pieces of c o m p u t e r code t h a t can monitor, retrieve, analyze, and act o n information • Machine learning—algorithms that can extract profiles of criminals a n d graphical m a p s of crimes • Neural networks—special kinds of c o m p u t e r p r o g r a m s that can predict t h e probability of crimes and terrorist attacks. We'll take a brief l o o k at each of these topics in t u r n . LINK ANALYSIS N e w s p a p e r s often refer t o link analysis as "connecting the dots." It's the process of tracking connections b e t w e e n people, events, locations, and organizations. T h o s e connections could b e family ties, business relation­ ships, criminal associations, financial transactions, in-person meetings, e-mail exchanges, and a host of others. Link analysis can b e particularly powerful in fighting terrorism, organized crime, m o n e y laundering ("follow t h e m o n e y " ) , and telephone fraud. Link analysis is primarily a h u m a n - e x p e r t driven process. Mathemat­ ics a n d t e c h n o l o g y are used to provide a h u m a n expert w i t h powerful, flexible c o m p u t e r tools t o uncover, examine, and track possible connec­ tions. T h o s e tools generally allow t h e analyst t o represent linked data as a n e t w o r k , displayed and e x a m i n e d (in w h o l e or in part) o n t h e com­ p u t e r screen, w i t h n o d e s representing t h e individuals or organizations or locations of interest a n d t h e links b e t w e e n those n o d e s representing relationships or transactions. T h e tools m a y also allow t h e analyst to
  • 43. Data Mining 29 investigate and record details a b o u t each link, a n d t o discover n e w n o d e s that connect t o existing ones or n e w links b e t w e e n existing n o d e s . For example, in an investigation into a suspected crime ring, an inves­ tigator might carry o u t a link analysis of t e l e p h o n e calls a suspect has m a d e or received, using t e l e p h o n e c o m p a n y call-log data, l o o k i n g at factors such as n u m b e r called, t i m e and d u r a t i o n of each call, o r n u m ­ b e r called next. T h e investigator m i g h t t h e n decide t o p r o c e e d further along the call n e t w o r k , l o o k i n g at calls m a d e t o or from o n e or m o r e of the individuals w h o h a d h a d p h o n e conversations w i t h t h e initial sus­ pect. This process can b r i n g t o t h e investigator's a t t e n t i o n individuals n o t previously k n o w n . S o m e m a y t u r n o u t to b e totally innocent, b u t others could prove to b e criminal collaborators. A n o t h e r line of investigation m a y b e t o track cash transactions t o and from domestic and international b a n k accounts. Still a n o t h e r line m a y b e t o e x a m i n e t h e n e t w o r k of places a n d people visited by the suspect, using such data as train a n d airline ticket purchases, points of e n t r y or d e p a r t u r e in a given country, car rental records, credit card records of purchases, websites visited, a n d t h e like. Given the difficulty n o w a d a y s of d o i n g almost anything w i t h o u t leaving an electronic trace, t h e challenge in link analysis is usually n o t o n e of having insufficient data, b u t r a t h e r of deciding w h i c h of t h e megabytes of available data t o select for further analysis. Link analysis w o r k s best w h e n backed u p by o t h e r kinds of information, such as tips from police informants or from n e i g h b o r s of possible suspects. Once an initial link analysis has identified a possible criminal or terrorist network, it m a y b e possible to determine w h o the key players are by examining which individuals have the m o s t links to others in the network. GEOMETRIC CLUSTERING Because of resource limitations, law enforcement agencies generally focus m o s t of their attention o n major crime, w i t h the result that m i n o r offenses such as shoplifting or house burglaries get little attention. If, however, a single person or an organized g a n g c o m m i t s m a n y such crimes o n a regu­ lar basis, the aggregate can constitute significant criminal activity that deserves greater police attention. T h e p r o b l e m facing the authorities,
  • 44. 30 T H E NUMBERS B E H I N D NUMB3RS then, is to identify within the large n u m b e r s of m i n o r crimes that take place every day, clusters that are the w o r k of a single individual or gang. O n e example of a " m i n o r " crime that is often carried o u t o n a regu­ lar basis by t w o (and occasionally three) individuals acting together is t h e so-called bogus official burglary (or distraction burglary). This is w h e r e t w o people t u r n u p at t h e front d o o r of a h o m e o w n e r (elderly people are often t h e preferred targets) posing as s o m e form of officials—perhaps t e l e p h o n e engineers, representatives of a utility company, or local gov­ e r n m e n t agents—and, while o n e p e r s o n secures t h e attention of the h o m e o w n e r , the o t h e r moves quickly t h r o u g h the h o u s e or a p a r t m e n t taking any cash or valuables that are easily accessible. Victims of b o g u s official burglaries often file a r e p o r t to the police, w h o will send an officer t o t h e victim's h o m e t o take a statement. Since t h e victim will have spent considerable t i m e w i t h o n e of the perpetra­ tors (the distracter), t h e s t a t e m e n t will often include a fairly detailed description—gender, race, height, b o d y type, approximate age, general facial appearance, eyes, hair color, hair length, hair style, accent, identi­ fying physical m a r k s , m a n n e r i s m s , shoes, clothing, unusual jewelry, etc.—together w i t h t h e n u m b e r of accomplices and their genders. In principle, this w e a l t h of information m a k e s crimes of this nature ideal for data mining, and in particular for the technique k n o w n as geometric clustering, t o identify g r o u p s of crimes carried o u t b y a single gang. Application of t h e m e t h o d is, however, fraught w i t h difficulties, and to date t h e m e t h o d appears t o have b e e n restricted to o n e or t w o experi­ m e n t a l studies. We'll look at o n e such study, b o t h to s h o w h o w the m e t h o d w o r k s and t o illustrate s o m e of the p r o b l e m s often faced by the data-mining practitioner. T h e following study w a s carried o u t in England in 2000 and 2001 by researchers at the University of W o l v e r h a m p t o n , together w i t h the West Midlands Police.* T h e study looked at victim statements from b o g u s official burglaries in t h e police region over a three-year period. D u r i n g that period, t h e r e w e r e 800 such burglaries recorded, involving *Ref. R. A d d e r l e y a n d P. B. M u s g r o v e , G e n e r a l Review o f Police C r i m e R e c o r d i n g a n d I n v e s t i g a t i o n Systems, Policing: An International Journal of Police Strategies and Management, 2 4 (1), 2 0 0 1 , p p . 1 1 0 - 1 1 4 .
  • 45. Data Mining 31 1,292 offenders. This proved to b e t o o great a n u m b e r for t h e resources available for the study, so t h e analysis w a s restricted t o those cases w h e r e the distracter was female, a g r o u p comprising 89 crimes and 105 offender descriptions. T h e first p r o b l e m e n c o u n t e r e d was that the descriptions of t h e p e r p e ­ trators was for the m o s t part in narrative form, as w r i t t e n by t h e investi­ gating officer w h o t o o k the statement from t h e victim. A data-mining technique k n o w n as text m i n i n g had to b e used to p u t t h e descriptions into a structured form. Because of the limitations of the text-mining soft­ ware available, h u m a n input was required to handle m a n y of the entries; for instance, to cope w i t h spelling mistakes, ad h o c or inconsistent abbre­ viations (e.g., "Bham" or " B ' h a m " for "Birmingham"), and the use of different ways of expressing t h e same thing (e.g., "Birmingham accent", "Bham accent", "local accent", "accent: local", etc.). After s o m e initial analysis, t h e researchers decided t o focus o n eight variables: age, height, hair color, hair length, build, accent, race, and n u m b e r of accomplices. Once the data had b e e n processed into the appropriate structured format, the next step was t o use g e o m e t r i c clustering to g r o u p t h e 105 offender descriptions into collections that w e r e likely t o refer t o the same individual. To u n d e r s t a n d h o w this w a s d o n e , let's first consider a m e t h o d that at first sight might appear t o b e feasible, b u t which soon proves to have significant weaknesses. T h e n , by seeing h o w those weak­ nesses m a y be overcome, w e will arrive at the m e t h o d used in t h e British study. First, you code each of t h e eight variables numerically. Age—often a guess—is likely t o b e recorded either as a single figure or a range; if it is a range, take the m e a n . G e n d e r (not considered in t h e British Midlands study because all the cases e x a m i n e d h a d a female distracter) can b e coded as 1 for male, 0 for female. H e i g h t m a y b e given as a n u m b e r (inches), a range, or a t e r m such as "tall", " m e d i u m " , or "short"; again, s o m e m e t h o d has to b e chosen t o convert each of these t o a single figure. Likewise, schemes have t o b e devised t o represent each of t h e other variables as a n u m b e r . W h e n the numerical coding has been completed, each perpetrator description is then represented by an eight-vector, the coordinates of
  • 46. 32 THE NUMBERS B E H I N D NUMB3RS a point in eight-dimensional geometric (Euclidean) space. T h e familiar distance measure of Euclidean g e o m e t r y (the Pythagorean metric) can then b e used t o measure the geometric distance between each pair of points. This gives the distance between t w o vectors (x v . . . , x ) and g ( , . . . , y ) as: V l 8 2 V[(x -y )2 ... 1 1 + + (x -y ) ] 8 8 Points that are close t o g e t h e r u n d e r this m e t r i c are likely t o correspond t o p e r p e t r a t o r descriptions that have several features in c o m m o n ; a n d t h e closer t h e points, t h e m o r e features t h e descriptions are likely t o have in c o m m o n . ( R e m e m b e r , there are p r o b l e m s w i t h this approach, w h i c h we'll g e t t o momentarily. For t h e time being, however, let's suppose that things w o r k m o r e or less as j u s t described.) T h e challenge n o w is t o identify clusters of points that are close together. If t h e r e w e r e only t w o variables, this w o u l d b e easy. All t h e points could b e plotted o n a single x,y-graph a n d visual inspection w o u l d indicate possible clusters. But h u m a n beings are totally unable t o visualize eight-dimensional space, n o m a t t e r w h a t assistance t h e soft­ w a r e system designers provide b y w a y of data visualization tools. T h e w a y a r o u n d this difficulty is t o r e d u c e t h e eight-dimensional array of points (descriptions) t o a two-dimensional array (i.e., a matrix o r table). T h e idea is t o a r r a n g e t h e data points (that is, t h e vector representatives of t h e offender descriptions) in a two-dimensional grid in such a way that: 1. pairs of points t h a t are extremely close t o g e t h e r in t h e eight- dimensional space are p u t into t h e s a m e grid entry; 2. pairs of points t h a t are n e i g h b o r s in t h e grid are close together in t h e eight-dimensional space; a n d 3. points t h a t are farther apart in t h e grid are farther apart in t h e space. This c a n b e d o n e using a special kind of c o m p u t e r p r o g r a m k n o w n as a n e u r a l net, in particular, a K o h o n e n self-organizing m a p (or SOM).
  • 47. Data Mining 33 Neural nets (including SOMs) are described later in t h e chapter. For now, all w e n e e d t o k n o w is that these systems, w h i c h w o r k iteratively, are extremely g o o d at h o m i n g in (over t h e course of m a n y iterations) o n patterns, such as g e o m e t r i c clusters of t h e kind w e are interested in, and thus can indeed take an eight-dimensional array of t h e k i n d described above and place the points appropriately in a two-dimensional grid. (Part of the skill required t o use an S O M effectively in a case such as this is deciding in advance, or by s o m e initial trial and error, w h a t are t h e optimal dimensions of t h e final grid. T h e SOM n e e d s t h a t information in order to start work.) Once the data has b e e n p u t into t h e grid, law enforcement officers can examine grid squares that contain several entries, which are highly likely to c o m e from a single g a n g responsible for a series of crimes, a n d can visually identify clusters o n the grid, w h e r e there is also a likelihood that they represent g a n g activity. In either case, the officers can examine t h e corresponding original crime s t a t e m e n t entries, looking for indications that those crimes are indeed the w o r k of a single gang. N o w let's see w h a t goes w r o n g w i t h t h e m e t h o d j u s t described, a n d h o w to correct it. T h e first p r o b l e m is that t h e original e n c o d i n g of entries as n u m b e r s is n o t systematic. This can lead t o o n e variable d o m i n a t i n g o t h e r s w h e n the entries are clustered using g e o m e t r i c distance (the P y t h a g o r e a n metric) in eight-dimensional space. For example, a d i m e n s i o n that m e a ­ sures height (which could b e anything b e t w e e n 60 inches and 76 inches) w o u l d d o m i n a t e t h e e n t r y for g e n d e r (0 or 1). So t h e first step is t o scale (in mathematical terminology, normalize) t h e eight numerical variables, so that each o n e varies b e t w e e n 0 and 1. O n e way to do that w o u l d b e t o simply scale d o w n each variable by a multiplicative scaling factor appropriate for that particular feature (height, age, etc.). But that will introduce further p r o b l e m s w h e n t h e separation distances are calculated; for example, if g e n d e r and height are a m o n g the variables, then, all o t h e r variables being roughly the same, a very tall w o m a n w o u l d c o m e o u t close t o a very short m a n (because female gives a 0 and m a l e gives a 1, whereas tall c o m e s o u t close to 1 and short close to 0). T h u s , a m o r e sophisticated normalization p r o c e d u r e has to b e used.
  • 48. 34 THE NUMBERS B E H I N D NUMB3RS The approach finally adopted in the British Midlands study was to make every numerical entry binary (just 0 or 1). This meant splitting the continuous variables (age and height) into overlapping ranges (a few years and a few inches, respectively), with a 1 denoting an entry in a given range and a 0 meaning outside that range, and using pairs of binary vari­ ables to encode each factor of hair color, hair length, build, accent, and race. The exact coding chosen was fairly specific to the data being stud­ ied, so there is little to be gained from providing all the details here. (The age and height ranges were taken to be overlapping to account for entries toward the edges of the chosen ranges.) The normalization process resulted in a set of 46 binary variables. Thus, the geometric clustering was done over a geometric space of 46 dimensions. Another problem was h o w to handle missing data. For example, what do you do if a victim's statement says nothing about the perpetra­ tor's accent? If you enter a 0, that would amount to assigning an accent. But what will the clustering program do if you leave that entry blank? (In the British Midlands study, the program would treat a missing entry as 0.) Missing data points are in fact one of the major headaches for data miners, and there really is n o universally g o o d solution. If there are only a few such cases, you could either ignore them or else see what solutions you get with different values entered. As mentioned earlier, a key decision that has to be made before the SOM can be run is the size of the resulting two-dimensional grid. It needs to be small enough so that the SOM is forced to put some data points into the same grid squares, and will also result in some non­ empty grid squares having non-empty neighbors. The investigators in the British Midlands study eventually decided to opt for a five-by-seven grid. With 105 offender descriptions, this forced the SOM to create several multi-entry clusters. The study itself concluded with experienced police officers examin­ ing the results and comparing them with the original victim statements and other relevant information (such as geographic proximity of crimes over a short timespan, which would be another indicator of a gang activity, not used in the cluster analysis), to determine h o w well the pro­ cess performed. T h o u g h all parties involved in the study declared it to be successful, the significant amount of person-hours required means