A Corporate Counsel headline from late last year asked, “Can Predictive Coding Save The World?” A better, albeit more modest question is, can it save you money? This panel addresses that loaded question and the related issues of:
• Deploying advanced technologies across enterprise data,
• Measuring the effectiveness of advanced technologies,
• Vetting and selecting appropriate service providers, and
Validating the results of predictive coding.
In this panel, IT and legal experts survey the technology horizon, giving you insights and best practices for finding the solutions that work best for you.
5. “Technology-Assisted Review,” called by its nickname
“Predictive Coding,” describes a process whereby
computers are programmed to search a large amount of
data to find quickly and efficiently the data that meet a
particular requirement. Computer science and the
sciences of statistics and psychology inform its use. While
it bruises the human ego, scientists…determined that
…[i]t is now indubitable that technology-assisted review
is an appreciably better and more accurate means of
searching a set of data.”
THE GROSSMAN-CORMACK GLOSSARY OF
TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)
Foreword by John M. Facciola, U.S. Magistrate Judge
6. “Technology-Assisted Review,” called by its nickname
“Predictive Coding,” describes a process whereby
computers are programmed to search a large amount of
data to find quickly and efficiently the data that meet a
particular requirement. Computer science and the
sciences of statistics and psychology inform its use. While
it bruises the human ego, scientists…determined that
…[i]t is now indubitable that technology-assisted review
is an appreciably better and more accurate means of
searching a set of data.”
THE GROSSMAN-CORMACK GLOSSARY OF
TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)
Foreword by John M. Facciola, U.S. Magistrate Judge
Process: a series of
actions that produce
something or that lead
to a particular result
7. “Now, the methodology of the use of technology-
assisted review may itself be in dispute, with the
parties controverted to each other’s use of a
particular method or tool. Those controversies have
already lead to judicial decisions that have to
grapple with a wholly new way of searching and with
scientific principles derived from the science of
statistics or other disciplines.”
THE GROSSMAN-CORMACK GLOSSARY OF
TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)
Foreword by John M. Facciola, U.S. Magistrate Judge
8. “Now, the methodology of the use of technology-
assisted review may itself be in dispute, with the
parties controverted to each other’s use of a
particular method or tool. Those controversies have
already lead to judicial decisions that have to
grapple with a wholly new way of searching and with
scientific principles derived from the science of
statistics or other disciplines.”
THE GROSSMAN-CORMACK GLOSSARY OF
TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)
Foreword by John M. Facciola, U.S. Magistrate Judge
Methodology: a set of methods,
rules, or ideas that are important in a
science or art : a particular procedure
or set of procedures
9. THE GROSSMAN-CORMACK GLOSSARY OF
TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)
Predictive Coding:
An industry-specific term generally used to describe a
Technology-Assisted Review process involving the
use of a Machine Learning Algorithm to distinguish
Relevant from Non-Relevant Documents, based on
Subject Matter Expert(s)’ Coding of a Training Set of
Documents.
10. THE GROSSMAN-CORMACK GLOSSARY OF
TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)
Predictive Coding:
An industry-specific term generally used to describe a
Technology-Assisted Review process involving the
use of a Machine Learning Algorithm to distinguish
Relevant from Non-Relevant Documents, based on
Subject Matter Expert(s)’ Coding of a Training Set of
Documents.
11. “A word is not a crystal, transparent and unchanged, it
is the skin of a living thought and may vary greatly in
color and content according to the circumstances
and the time in which it is used.”
Justice Oliver Wendell Holmes Jr.,
Towne v. Eisner, 245 U.S. 418, 425 (1918)
THE GROSSMAN-CORMACK GLOSSARY OF
TECHNOLOGY-ASSISTED REVIEW
FEDERAL COURTS LAW REVIEW Volume 7, Issue 1 (2013)
Foreword by John M. Facciola, U.S. Magistrate Judge
12.
13.
14.
15. “I think you should be more
explicit here in step two.”
16. Published as guest contributor to Ralph
Losey’s E-Discovery Team Blog Site:
http://e-discoveryteam.com/2013/04/28/predictive-codings-erroneous-zones-are-
emerging-junk-science/?shareadraft=517d80048f827
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
17. “Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
• “PBS’ Frontline’s Forensic Tools: What’s Reliable and
What’s Not-So-Scientific dispelled the infallibility, and
in some instances, the validity, of analytical
techniques long relied upon by our legal profession.”
• “Even if those techniques were not botched or
biased, their validity ranges from bought-and-paid-
for infomercials to, at best, an approximation.”
• “Back then attorneys and judges (and experts and
vendors) did with those junk sciences just what we
are doing now with respect to predictive coding:
allowing claims, however unjustified and
erroneous, to form the basis of our practices, to
influence our precedent and to accrue authority.”
18. “[T]hose of us who trust the scientific and
adversarial process recognize that erroneous
claims don’t naturally defeat truth. They
suppress truth, distract from truth and
sometimes persist so long that we forget to
inquire into the truth. Oftentimes, weak
interests seek to dispel erroneous claims
which are promoted by strong commercial
interests. With respect to predictive coding
my sense is that we are neither deluded nor
deceptive — well, not too much anyway —
but we just have not yet thought it through.”
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
19. “[T]hose of us who trust the scientific and
adversarial process recognize that erroneous
claims don’t naturally defeat truth. They
suppress truth, distract from truth and
sometimes persist so long that we forget to
inquire into the truth. Oftentimes, weak
interests seek to dispel erroneous claims
which are promoted by strong commercial
interests. With respect to predictive coding
my sense is that we are neither deluded nor
deceptive — well, not too much anyway —
but we just have not yet thought it through.”
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
20. Erroneous Practice
#1
Using a full-text search to identify
prospectively responsive documents and then
employing predictive coding to eliminate those
that are not responsive.
Erroneous Practice
#2
Pulling a random sample of documents to train
the initial seed set.
Erroneous Practice
#3
Identifying “magic numbers” of minimum:
• “Iterations”
• Responsive documents within a
randomly accumulated set
Erroneous Practice
#4
Asserting that Predictive Coding software is
the “gold standard” for document retrieval in
complex matters.
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
21. Erroneous Practice
#4
Asserting that Predictive Coding software is
the “gold standard” for document retrieval in
complex matters.
Is Erroneous
Because
It asserts that predictive coding is a standard:
• Share some commonly understood
characteristics but no precise attributes
• Involves some general methodologies but no
clear rules
• Are associated with general aspirations but
no comprehensively defined operations.
Example All advertisements or orders for “predictive
coding”
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
22. Erroneous Practice
#4
Asserting that Predictive Coding software is
the “gold standard” for document retrieval in
complex matters.
Is Erroneous
Because
It asserts that predictive coding is a standard:
• Share some commonly understood
characteristics but no precise attributes
• Involves some general methodologies but no
clear rules
• Are associated with general aspirations but
no comprehensively defined operations.
Example All advertisements or orders for “predictive
coding”
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
24. Erroneous Practice
#2
Pulling a random sample of documents to train
the initial seed set.
Is Erroneous
Because
A. Looks for relevance in all the wrong places:
Thoughtful researchers don’t try learn about
relevant docs by examining irrelevant ones.
B. It turns a blind eye to what is staring you in
the eye: denies that attorneys know what
they are paid to know: where to look and
what to find.
C. Measures the wrong stuff:
• Constrained and circular “like” definition
• Prevalence vs Relevance vs Probativeness
Example Global Aerospace v. Landow Aviation (settled
without court ruling re strategy)
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
25. Erroneous Practice
#2
Pulling a random sample of documents to train
the initial seed set.
Is Erroneous
Because
A. Looks for relevance in all the wrong places:
Thoughtful researchers don’t try learn about
relevant docs by examining irrelevant ones.
B. It turns a blind eye to what is staring you in
the eye: denies that attorneys know what
they are paid to know: where to look and
what to find.
C. Measures the wrong stuff:
• Constrained and circular “like” definition
• Prevalence vs Relevance vs Probativeness
Example Global Aerospace v. Landow Aviation (settled
without court ruling re strategy)
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
26.
27. Erroneous Practice
#1
Using a full-text search to identify
prospectively responsive documents and then
employing predictive coding to eliminate those
that are not responsive.
Is Erroneous
Because
A.Over-relies and under-delivers: presumed
arrogance or clairvoyance
B.It arbitrarily places documents out-of-sight
and, therefore, out-of-mind: likelihood that
responsive documents will ever be produced
but dumbing-down the predictive coding
intelligence
Example In re: Biomet M2a Magnum Hip Implant Prods.
Liab. Litig. (endorsed by court)
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
28. Erroneous Practice
#1
Using a full-text search to identify
prospectively responsive documents and then
employing predictive coding to eliminate those
that are not responsive.
Is Erroneous
Because
A.Over-relies and under-delivers: presumed
arrogance or clairvoyance
B.It arbitrarily places documents out-of-sight
and, therefore, out-of-mind: likelihood that
responsive documents will ever be produced
but dumbing-down the predictive coding
intelligence
Example In re: Biomet M2a Magnum Hip Implant Prods.
Liab. Litig. (endorsed by court)
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
30. Erroneous Practice #3 Identifying “magic numbers” of minimum:
• “Iterations”
• Responsive documents within a randomly
accumulated set
Is Erroneous Because A.You may not be able to get there from here:
Don’t know starting point or ending point
B.You don’t know what isn’t yet known: Cannot
predict alternative paths
C. Consider low frequency, high probativeness
D.Who’s the witness?
Example • “This [iteration] process shall be repeated for a total
of seven iterations… [Requesting party pays] costs and
fees… [for] more 40,000 documents.” (DaSilva Moore)
• Vendors’ affidavits in various matters
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
31. Erroneous Practice #3 Identifying “magic numbers” of minimum:
• “Iterations”
• Responsive documents within a randomly
accumulated set
Is Erroneous Because A.You may not be able to get there from here:
Don’t know starting point or ending point
B.You don’t know what isn’t yet known: Cannot
predict alternative paths
C. Consider low frequency, high probativeness
D.Who’s the witness?
Example • “This [iteration] process shall be repeated for a total
of seven iterations… [Requesting party pays] costs and
fees… [for] more 40,000 documents.” (DaSilva Moore)
• Vendors’ affidavits in various matters
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
32. May not be able to get there even with a
“Magic” number of steps…
33. Erroneous Practice
#1
Using a full-text search to identify
prospectively responsive documents and then
employing predictive coding to eliminate those
that are not responsive.
Erroneous Practice
#2
Pulling a random sample of documents to train
the initial seed set.
Erroneous Practice
#3
Identifying “magic numbers” of minimum:
• “Iterations”
• Responsive documents within a
randomly accumulated set
Erroneous Practice
#4
Asserting that Predictive Coding software is
the “gold standard” for document retrieval in
complex matters.
“Predictive Coding’s Erroneous Zones
Are Emerging Junk Science”
34. Search Mechanisms’ InferencesInferences(risk)rerecall
Search Mechanism
Databases
Files, Folders
(in place)
End-user
tags
Files, Folders
(per user)
Duplicates
“Technology
Assisted Review”
via Machine
Learning
Key words
Random
Sampling
Similarity/
Clusters
Sorting
Similarity
Clustering