SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Getting to a Manageable Review Set

            Intake
                                                       Focus on finding,
                     Duplicates
             Data      25%                           reviewing & using the
             100%
                                                           “right” data,
                                  Junk/Spam/
                                     Porn             not just filtering data
                                     20%
                                   NR/Priv
                                    20%
                                                    Non-
                                                 Responsive
                                                    20%
                                               Responsive     Produced
                                               & Priv 15%      12.25%


            These figures vary based upon the data set received

12/5/2011                                                                       1
Review risks
       Failure to collect the right data
       Failure to find responsive documents
       Failure to recognize responsive documents
       Failure to recognize privileged documents
       Inconsistent treatment of documents (e.g.,
       duplicates)
       Failure to complete project in a timely manner

       Sophisticated Tools
            – Understand What They Do and Don’t Do Well
            – Inform Yourself, Speak to References, Consultants
12/5/2011                                                         2
Search Methodologies

                                        Visualization
                                        Measurement
                          Relationship
                            Analysis
                         documents with
                           causal or
                      sequential relationship
Context
                     Social Network Analysis
               relationships among relevant people
               relationships among relevant people
               Clustering
               Clustering              Ontology
                                       Ontology
Concept       similarity of
               similarity of          generalized
                                      generalized
             salient features
             salient features       words or phrases
                                    words or phrases
                                    specific exact words,
 Content     Keyword
             Keyword                specific exact words
                                     specific exact words
                                proximity searches, stemming


12/5/2011                                                      3
Myth
            Keyword Searching is the Way to Go

               If I agree to keyword terms, I am OK
               Missing in Action (Under-inclusive)
               Unwanted Extras (Over-inclusive)
               Multiple subject/persons (Disambiguate)

            Reality: Keyword Search is one tool among many!




12/5/2011                                                     4
"simple keyword searches end
 up being both over- and under-
 inclusive."
     Judge Paul Grimm, Victor Stanley, Inc. v. Creative Pipe, Inc., No. MJG-06-2662, 2008 U.S. Dist. LEXIS 42025
                                                                                          (D. Md. May 29, 2008).




Keyword culling
Keyword Accuracy Example
    Keyword search reduced the
    document set by only 47%

    And 88% of the documents
    returned by keyword
    search were not responsive
    (Over-inclusive)




     8,553 responsive documents
     missed by keyword search
     (Almost 8% of responsive
     documents missed by
     keyword search - Under-inclusive)



12/5/2011                                    6
Under Inclusive - Missing in Action
        Missing abbreviations / acronyms / clippings:
            – incentive stock option but not ISO

        Missing inflectional variants:
            – grant but not grants, granted, granting

        Missing spellings or common misspellings:
            – gray but not grey

            – privileged but not priviliged, priviledged, privilidged,
               priveliged, privelidged, priveledged, 


        Missing syntactic variants:
            ‱ board of directors meetingbut not meeting of the board of
              directors, BOD meeting, board meeting, BOD mtg

        Missing Synonyms/Paraphrases:
            ‱ Hire date but not start date
12/5/2011                                                                 7
Over-Inclusive - Unwanted Extras (a)
            Options

            Target: Sheila was granted 100,000 options at $10
             Match: What are our options for lunch?
             Match in a signature line:
                     Amanda Wacz
                     Acme Stock Options Administrator
            Destroy
             Target:destroyevidence
             Match in a disclaimer: The information in this email, and any
               attachments, may contain confidential and/or privileged
               information and is intended solely for the use of the named
               recipient(s). Any disclosure or dissemination in whatever form, by
               anyone other than the recipient is strictly prohibited. If you have
               received this transmission in error, please contact the sender
               and destroy this message and any attachments. Thank you.
12/5/2011                                                                            8
Over-Inclusive - Unwanted Extras (b)
       alter*

       Target: alter, alters, altered, altering
  Matches: alternate, alternative, alternation, altercate,
       altercation, alterably, 



       grant

    Target:stock optiongrant
  Matches names:Grant Woods, Howard Grant


12/5/2011                                                    9
Failure to Disambiguate
               Words that Relate to Multiple Subjects
            Example: refund is used to refer to:
             – FERC-ordered refunds owed by Enron for
               overcharging
             – Tax refunds (both corporate and personal)
             – Mundane business matters

            In a given matter, one might be of interest
            while the others are not




12/5/2011                                                  10
Technology Enhanced Review:
            Speed, Predictable Costs, and Accuracy
       Automate any portion of the review

              Source    Eliminate
               Data    Duplicates &
                       System Files


             100%                 Non-Responsive
                       30%           Isolation         Example from a real case
                                    ontologies


                                                 NR by
                                      30%     Technology  Responsive
                                               Enhanced by Technology
                                                Review     Enhanced
                                               (removed     Review       Priv by
                                             another 18%)  (removed    High-Speed
                                                          another 7%) Manual Review

                                              22%                         3%
                                                        15%


12/5/2011                                                                             11
Example: “priv” ontology


                   Valuable, re-usable work product
                   Combines classifiers into concepts,
                   into bigger concepts




12/5/2011                                            12
Disclaimer Detection

        Disclaimers can throw
        off attempts to detect
        privileged
        communications
        Prevalent throughout
        many companies,
        even on trivial
        communications
        Detect them
        automatically, and
        exclude them from
        searches



12/5/2011                                    13
Privileged by Actor Only




            Responsive                                Privileged by Actor and Term


                         D omain of D isclaimer
                               D etection


                                                  Privileged by Term Only




                              Privileged by
                            D isclaimer Only




12/5/2011                                                                            14
Priv Logs
       Expensive - But Do NOT Have to Be
       In re Vioxx Products Liability Litigation (E.D. La 2007)
       Merck’s Priv Log had 30,000 items on it
            – How to Make a Judge Angry
            – How to Waste Client Money
            – How to Attract Sanctions




12/5/2011                                                         15
Transparency of Process
       Discussing Review Protocols
        – Provide transparent, defensible, sophisticated search
          based on document content
        – Clustering, Ontologies, Analytics, and yes, sometimes
          Keywords too
       Develop search methodologies for each case
        – Use technology experts in consultation with case / legal
          experts
       Results verifiable by Quality Control
        – Defensible sampling
       Sophisticated Tools
        – Understand What They Do and Don’t Do Well
        – Inform Yourself, Speak to References, Consultants
12/5/2011                                                            16
Blair &Maron:
                          Keyword search is incomplete
                            What the lawyers thought
                       100%
                            they were finding
                       90%
Responsive documents




                       80%
                       70%
                       60%
                       50%
                                                                         What they
                       40%                                               actually found
                       30%
                       20%
                       10%
                        0%
                                     Predicted                             Obtained

                                                 Blair and Maron, Communications of the ACM, 28, 1985, 289-299
Blair and Maron
         “It is impossibly difficult for users to
             predict the exact words, word
        combinations, and phrases that are
              used by all (or most) relevant
        documents and only (or primarily) by
                   those documents.”




Blair & Maron Study: 20% recall
Lawyers picked 3 key terms,
B & M found 26 more
Defense: “Unfortunate incident”
Plaintiff: “Disaster”



                                     Blair and Maron, Communications of the ACM, 28, 1985, 289-299
Predictive
 Coding
Document categorization in Legal Discovery:
Computer Classification vs. Manual Review
Herbert L. Roitblat, Anne Kershaw, & Patrick Oot
1

                          0.95

                           0.9
Agreement with original




                          0.85

                           0.8

                          0.75

                           0.7

                          0.65

                           0.6

                          0.55

                           0.5
                                 Team A    Team B     System C       System D
                                  Manual            Computer
                                  review            classification 2010, JASIST
                                                       Roitblat, Kershaw, &Oot,
Gold Standard
Turing test




Alan Turing, 1912-1954
Substantial disagreement between
            Team A & Team B

                  28%

      629         580                 858
                                                                    A
                                                                    Both
                                                                    B




0           500     1000       1500                2000

                  Responsive Documents

                                 Roitblat, Kershaw, &Oot, 2010, JASIST
Conclusion
The computer systems yielded comparable level of
performance relative to manual review
Fewer people, less time, less cost
Measure performance to evaluate
Will lawyers lose control?




Computer system amplifies the
  intelligence of the Expert
Will lawyers
lose their jobs?
Tap into the mind of an expert
Technology-Enhanced or Automated Review




12/5/2011                                   29
Setup




                                           Sample




    Responsive                                                 Non-
                                Expert judges                  responsive
                                   sample


Repeat as needed
                                                Model learns
                                                Model
                                                predicts

         Responsive                                    Non-responsive

                   Model categorizes all remaining
                   documents
Predictive coding achieves much higher
           accuracy (Jaccard)


  Team A Only   Team A and Team B                        Team B


    0.304            0.281                                  0.415




Humans           Humans and Predictive Coding                    Predictive Coding


0.186                         0.688                                          0.126




                Responsive Documents
                       Data from Roitblat, et al. and an Internal OrcaTec Case Study
Why doesn’t everyone use it?

‱   Attorneys don’t understand the
    technology
‱   May not be aware of the accuracy data
‱   May not understand how to fit into their
    work flow
‱   Not in everyone’s economic interest
‱   Acceptable to judges?
Defensible?


Measure TREC      Roitblat, e   Roitblat   Predictiv
        2008      t al. Team    et al.     e
                  A             Team B     Coding*
Precision 0.210   0.197         0.183      0.899
Recall    0.555   0.488         0.539      0.873


                                           *OrcaTec internal Result
Thank you!




               Herb Roitblat           Sonya Sigler
            770-650-7706x229           650-281-8325
            herb@orcatec.com        sonya@sigler.name




12/5/2011                                               34

Weitere Àhnliche Inhalte

Andere mochten auch

XĂąy dá»±ng vĂ  phĂĄt triển pháș§n mềm mĂŁ mở - Cáș§n thÆĄ nguyễn vĆ© hÆ°ng
XĂąy dá»±ng vĂ  phĂĄt triển pháș§n mềm mĂŁ mở - Cáș§n thÆĄ   nguyễn vĆ© hÆ°ngXĂąy dá»±ng vĂ  phĂĄt triển pháș§n mềm mĂŁ mở - Cáș§n thÆĄ   nguyễn vĆ© hÆ°ng
XĂąy dá»±ng vĂ  phĂĄt triển pháș§n mềm mĂŁ mở - Cáș§n thÆĄ nguyễn vĆ© hÆ°ngVu Hung Nguyen
 
Wollongong blogs and wikis 43
Wollongong blogs and wikis  43Wollongong blogs and wikis  43
Wollongong blogs and wikis 43Mark Woolley
 
Ow2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie ProjectOw2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie ProjectOW2
 
Izaro e bilboko metro geltokia
Izaro e bilboko metro geltokiaIzaro e bilboko metro geltokia
Izaro e bilboko metro geltokiakontakatiluak06
 
JPCL jz201593z Wang Presentation
JPCL jz201593z Wang PresentationJPCL jz201593z Wang Presentation
JPCL jz201593z Wang Presentationjpcoffice
 
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-rokuWojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-rokuKsięgarnia Grzbiet
 
Curs OfimĂ tica 2004-2005. Bloc OO Presentacions
Curs OfimĂ tica 2004-2005. Bloc OO PresentacionsCurs OfimĂ tica 2004-2005. Bloc OO Presentacions
Curs OfimĂ tica 2004-2005. Bloc OO PresentacionsAlex Araujo
 
Xearthquakerocksjapan efatt saleh
Xearthquakerocksjapan efatt salehXearthquakerocksjapan efatt saleh
Xearthquakerocksjapan efatt salehgeoffdymond
 
Fraternidad Misionera Padre Manuel Soria
Fraternidad Misionera Padre Manuel SoriaFraternidad Misionera Padre Manuel Soria
Fraternidad Misionera Padre Manuel SoriaOscar Sagastume
 

Andere mochten auch (14)

RIP+MIX HCI3 24.01.11
RIP+MIX HCI3 24.01.11RIP+MIX HCI3 24.01.11
RIP+MIX HCI3 24.01.11
 
XĂąy dá»±ng vĂ  phĂĄt triển pháș§n mềm mĂŁ mở - Cáș§n thÆĄ nguyễn vĆ© hÆ°ng
XĂąy dá»±ng vĂ  phĂĄt triển pháș§n mềm mĂŁ mở - Cáș§n thÆĄ   nguyễn vĆ© hÆ°ngXĂąy dá»±ng vĂ  phĂĄt triển pháș§n mềm mĂŁ mở - Cáș§n thÆĄ   nguyễn vĆ© hÆ°ng
XĂąy dá»±ng vĂ  phĂĄt triển pháș§n mềm mĂŁ mở - Cáș§n thÆĄ nguyễn vĆ© hÆ°ng
 
Wollongong blogs and wikis 43
Wollongong blogs and wikis  43Wollongong blogs and wikis  43
Wollongong blogs and wikis 43
 
Proyecto gpm
Proyecto gpmProyecto gpm
Proyecto gpm
 
Ow2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie ProjectOw2 Open World Forum09 Trustie Project
Ow2 Open World Forum09 Trustie Project
 
Izaro e bilboko metro geltokia
Izaro e bilboko metro geltokiaIzaro e bilboko metro geltokia
Izaro e bilboko metro geltokia
 
JPCL jz201593z Wang Presentation
JPCL jz201593z Wang PresentationJPCL jz201593z Wang Presentation
JPCL jz201593z Wang Presentation
 
RIP GCM
RIP GCMRIP GCM
RIP GCM
 
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-rokuWojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
Wojna austriacko-pruska-wojna-austrii-z-prusami-i-wlochami-w-1866-roku
 
Curs OfimĂ tica 2004-2005. Bloc OO Presentacions
Curs OfimĂ tica 2004-2005. Bloc OO PresentacionsCurs OfimĂ tica 2004-2005. Bloc OO Presentacions
Curs OfimĂ tica 2004-2005. Bloc OO Presentacions
 
Xearthquakerocksjapan efatt saleh
Xearthquakerocksjapan efatt salehXearthquakerocksjapan efatt saleh
Xearthquakerocksjapan efatt saleh
 
GuĂ­a 1 Emprendedores
GuĂ­a 1 EmprendedoresGuĂ­a 1 Emprendedores
GuĂ­a 1 Emprendedores
 
Fraternidad Misionera Padre Manuel Soria
Fraternidad Misionera Padre Manuel SoriaFraternidad Misionera Padre Manuel Soria
Fraternidad Misionera Padre Manuel Soria
 
Luis Gamboa
Luis  GamboaLuis  Gamboa
Luis Gamboa
 

Ähnlich wie SF Women in eDiscovery Sept 2011

Exploring session search
Exploring session searchExploring session search
Exploring session searchGene Golovchinsky
 
TVOT June 2012
TVOT June 2012TVOT June 2012
TVOT June 2012Viaccess-Orca
 
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1jmorriso
 
Is this Entitity Relevant to your Needs - CIKM2012
Is this Entitity Relevant to your Needs - CIKM2012Is this Entitity Relevant to your Needs - CIKM2012
Is this Entitity Relevant to your Needs - CIKM2012David Carmel
 
201206 IASA Session 408 - Applied Analytics
201206 IASA Session 408 - Applied Analytics201206 IASA Session 408 - Applied Analytics
201206 IASA Session 408 - Applied AnalyticsSteven Callahan
 
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability in Research
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability  in ResearchExplaining the Explainability: ‘Why’ and ‘How’ of Explainability  in Research
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability in ResearchMelih Bahar
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Gianluca Tarasconi
 
User Research in the Financial Space
User Research in the Financial SpaceUser Research in the Financial Space
User Research in the Financial SpaceBentleyDUC
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
Using Big Data to create a data drive organization
Using Big Data to create a data drive organizationUsing Big Data to create a data drive organization
Using Big Data to create a data drive organizationEdward Chenard
 
Metadata Quality
Metadata QualityMetadata Quality
Metadata Qualitytbruce
 
PSY 540 Short Presentation Guidelines and Rubric Overvi.docx
PSY 540 Short Presentation Guidelines and Rubric  Overvi.docxPSY 540 Short Presentation Guidelines and Rubric  Overvi.docx
PSY 540 Short Presentation Guidelines and Rubric Overvi.docxpotmanandrea
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeAmit Sheth
 
Leadership Decision Making Process 052311
Leadership   Decision Making Process   052311Leadership   Decision Making Process   052311
Leadership Decision Making Process 052311Richard Gay, CPPO, RSBO
 
Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2Sonya Sigler
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Dave King
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics AKAGroup
 
Towards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic DataTowards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic DataPanos Alexopoulos
 

Ähnlich wie SF Women in eDiscovery Sept 2011 (20)

Exploring session search
Exploring session searchExploring session search
Exploring session search
 
TVOT June 2012
TVOT June 2012TVOT June 2012
TVOT June 2012
 
SLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 PresentationSLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 Presentation
 
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
Hscb Focus 2010 Data Acquisition Extraction Management Debrief Jgm R1
 
Is this Entitity Relevant to your Needs - CIKM2012
Is this Entitity Relevant to your Needs - CIKM2012Is this Entitity Relevant to your Needs - CIKM2012
Is this Entitity Relevant to your Needs - CIKM2012
 
201206 IASA Session 408 - Applied Analytics
201206 IASA Session 408 - Applied Analytics201206 IASA Session 408 - Applied Analytics
201206 IASA Session 408 - Applied Analytics
 
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability in Research
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability  in ResearchExplaining the Explainability: ‘Why’ and ‘How’ of Explainability  in Research
Explaining the Explainability: ‘Why’ and ‘How’ of Explainability in Research
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
 
User Research in the Financial Space
User Research in the Financial SpaceUser Research in the Financial Space
User Research in the Financial Space
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
Using Big Data to create a data drive organization
Using Big Data to create a data drive organizationUsing Big Data to create a data drive organization
Using Big Data to create a data drive organization
 
Big Data in Context
Big Data in ContextBig Data in Context
Big Data in Context
 
Metadata Quality
Metadata QualityMetadata Quality
Metadata Quality
 
PSY 540 Short Presentation Guidelines and Rubric Overvi.docx
PSY 540 Short Presentation Guidelines and Rubric  Overvi.docxPSY 540 Short Presentation Guidelines and Rubric  Overvi.docx
PSY 540 Short Presentation Guidelines and Rubric Overvi.docx
 
Semantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCubeSemantics empowered Physical-Cyber-Social Systems for EarthCube
Semantics empowered Physical-Cyber-Social Systems for EarthCube
 
Leadership Decision Making Process 052311
Leadership   Decision Making Process   052311Leadership   Decision Making Process   052311
Leadership Decision Making Process 052311
 
Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2Georgetown Law Guest Lecture 2012 6 2
Georgetown Law Guest Lecture 2012 6 2
 
Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2Mining and analyzing social media hicss 45 tutorial – part 2
Mining and analyzing social media hicss 45 tutorial – part 2
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics
 
Towards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic DataTowards Vagueness-Aware Semantic Data
Towards Vagueness-Aware Semantic Data
 

Mehr von Sonya Sigler

Georgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 fullGeorgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 fullSonya Sigler
 
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started SiglerSonya Sigler
 
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths SiglerSonya Sigler
 
2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 SiglerSonya Sigler
 
2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 SiglerSonya Sigler
 
2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 SiglerSonya Sigler
 

Mehr von Sonya Sigler (6)

Georgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 fullGeorgetown lecture 2012 6 2 full
Georgetown lecture 2012 6 2 full
 
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler2013 3 27 TAR Webinar Part 4 Getting Started Sigler
2013 3 27 TAR Webinar Part 4 Getting Started Sigler
 
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler2013 7 24 TAR Webinar 5 Tips & Myths Sigler
2013 7 24 TAR Webinar 5 Tips & Myths Sigler
 
2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler2012 6 27 TAR Webinar Part 1 Sigler
2012 6 27 TAR Webinar Part 1 Sigler
 
2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler2012 11 7 TAR Webinar Part 3 Sigler
2012 11 7 TAR Webinar Part 3 Sigler
 
2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler2012 8 29 TAR Webinar Part 2 Sigler
2012 8 29 TAR Webinar Part 2 Sigler
 

KĂŒrzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 

KĂŒrzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 

SF Women in eDiscovery Sept 2011

  • 1. Getting to a Manageable Review Set Intake Focus on finding, Duplicates Data 25% reviewing & using the 100% “right” data, Junk/Spam/ Porn not just filtering data 20% NR/Priv 20% Non- Responsive 20% Responsive Produced & Priv 15% 12.25% These figures vary based upon the data set received 12/5/2011 1
  • 2. Review risks Failure to collect the right data Failure to find responsive documents Failure to recognize responsive documents Failure to recognize privileged documents Inconsistent treatment of documents (e.g., duplicates) Failure to complete project in a timely manner Sophisticated Tools – Understand What They Do and Don’t Do Well – Inform Yourself, Speak to References, Consultants 12/5/2011 2
  • 3. Search Methodologies Visualization Measurement Relationship Analysis documents with causal or sequential relationship Context Social Network Analysis relationships among relevant people relationships among relevant people Clustering Clustering Ontology Ontology Concept similarity of similarity of generalized generalized salient features salient features words or phrases words or phrases specific exact words, Content Keyword Keyword specific exact words specific exact words proximity searches, stemming 12/5/2011 3
  • 4. Myth Keyword Searching is the Way to Go If I agree to keyword terms, I am OK Missing in Action (Under-inclusive) Unwanted Extras (Over-inclusive) Multiple subject/persons (Disambiguate) Reality: Keyword Search is one tool among many! 12/5/2011 4
  • 5. "simple keyword searches end up being both over- and under- inclusive." Judge Paul Grimm, Victor Stanley, Inc. v. Creative Pipe, Inc., No. MJG-06-2662, 2008 U.S. Dist. LEXIS 42025 (D. Md. May 29, 2008). Keyword culling
  • 6. Keyword Accuracy Example Keyword search reduced the document set by only 47% And 88% of the documents returned by keyword search were not responsive (Over-inclusive) 8,553 responsive documents missed by keyword search (Almost 8% of responsive documents missed by keyword search - Under-inclusive) 12/5/2011 6
  • 7. Under Inclusive - Missing in Action Missing abbreviations / acronyms / clippings: – incentive stock option but not ISO Missing inflectional variants: – grant but not grants, granted, granting Missing spellings or common misspellings: – gray but not grey – privileged but not priviliged, priviledged, privilidged, priveliged, privelidged, priveledged, 
 Missing syntactic variants: ‱ board of directors meetingbut not meeting of the board of directors, BOD meeting, board meeting, BOD mtg
 Missing Synonyms/Paraphrases: ‱ Hire date but not start date 12/5/2011 7
  • 8. Over-Inclusive - Unwanted Extras (a) Options Target: Sheila was granted 100,000 options at $10 Match: What are our options for lunch? Match in a signature line: Amanda Wacz Acme Stock Options Administrator Destroy Target:destroyevidence Match in a disclaimer: The information in this email, and any attachments, may contain confidential and/or privileged information and is intended solely for the use of the named recipient(s). Any disclosure or dissemination in whatever form, by anyone other than the recipient is strictly prohibited. If you have received this transmission in error, please contact the sender and destroy this message and any attachments. Thank you. 12/5/2011 8
  • 9. Over-Inclusive - Unwanted Extras (b) alter* Target: alter, alters, altered, altering Matches: alternate, alternative, alternation, altercate, altercation, alterably, 
 grant Target:stock optiongrant Matches names:Grant Woods, Howard Grant 12/5/2011 9
  • 10. Failure to Disambiguate Words that Relate to Multiple Subjects Example: refund is used to refer to: – FERC-ordered refunds owed by Enron for overcharging – Tax refunds (both corporate and personal) – Mundane business matters In a given matter, one might be of interest while the others are not 12/5/2011 10
  • 11. Technology Enhanced Review: Speed, Predictable Costs, and Accuracy Automate any portion of the review Source Eliminate Data Duplicates & System Files 100% Non-Responsive 30% Isolation Example from a real case ontologies NR by 30% Technology Responsive Enhanced by Technology Review Enhanced (removed Review Priv by another 18%) (removed High-Speed another 7%) Manual Review 22% 3% 15% 12/5/2011 11
  • 12. Example: “priv” ontology Valuable, re-usable work product Combines classifiers into concepts, into bigger concepts 12/5/2011 12
  • 13. Disclaimer Detection Disclaimers can throw off attempts to detect privileged communications Prevalent throughout many companies, even on trivial communications Detect them automatically, and exclude them from searches 12/5/2011 13
  • 14. Privileged by Actor Only Responsive Privileged by Actor and Term D omain of D isclaimer D etection Privileged by Term Only Privileged by D isclaimer Only 12/5/2011 14
  • 15. Priv Logs Expensive - But Do NOT Have to Be In re Vioxx Products Liability Litigation (E.D. La 2007) Merck’s Priv Log had 30,000 items on it – How to Make a Judge Angry – How to Waste Client Money – How to Attract Sanctions 12/5/2011 15
  • 16. Transparency of Process Discussing Review Protocols – Provide transparent, defensible, sophisticated search based on document content – Clustering, Ontologies, Analytics, and yes, sometimes Keywords too Develop search methodologies for each case – Use technology experts in consultation with case / legal experts Results verifiable by Quality Control – Defensible sampling Sophisticated Tools – Understand What They Do and Don’t Do Well – Inform Yourself, Speak to References, Consultants 12/5/2011 16
  • 17. Blair &Maron: Keyword search is incomplete What the lawyers thought 100% they were finding 90% Responsive documents 80% 70% 60% 50% What they 40% actually found 30% 20% 10% 0% Predicted Obtained Blair and Maron, Communications of the ACM, 28, 1985, 289-299
  • 18. Blair and Maron “It is impossibly difficult for users to predict the exact words, word combinations, and phrases that are used by all (or most) relevant documents and only (or primarily) by those documents.” Blair & Maron Study: 20% recall Lawyers picked 3 key terms, B & M found 26 more Defense: “Unfortunate incident” Plaintiff: “Disaster” Blair and Maron, Communications of the ACM, 28, 1985, 289-299
  • 20. Document categorization in Legal Discovery: Computer Classification vs. Manual Review Herbert L. Roitblat, Anne Kershaw, & Patrick Oot
  • 21. 1 0.95 0.9 Agreement with original 0.85 0.8 0.75 0.7 0.65 0.6 0.55 0.5 Team A Team B System C System D Manual Computer review classification 2010, JASIST Roitblat, Kershaw, &Oot,
  • 24. Substantial disagreement between Team A & Team B 28% 629 580 858 A Both B 0 500 1000 1500 2000 Responsive Documents Roitblat, Kershaw, &Oot, 2010, JASIST
  • 25. Conclusion The computer systems yielded comparable level of performance relative to manual review Fewer people, less time, less cost Measure performance to evaluate
  • 26. Will lawyers lose control? Computer system amplifies the intelligence of the Expert
  • 28. Tap into the mind of an expert
  • 29. Technology-Enhanced or Automated Review 12/5/2011 29
  • 30. Setup Sample Responsive Non- Expert judges responsive sample Repeat as needed Model learns Model predicts Responsive Non-responsive Model categorizes all remaining documents
  • 31. Predictive coding achieves much higher accuracy (Jaccard) Team A Only Team A and Team B Team B 0.304 0.281 0.415 Humans Humans and Predictive Coding Predictive Coding 0.186 0.688 0.126 Responsive Documents Data from Roitblat, et al. and an Internal OrcaTec Case Study
  • 32. Why doesn’t everyone use it? ‱ Attorneys don’t understand the technology ‱ May not be aware of the accuracy data ‱ May not understand how to fit into their work flow ‱ Not in everyone’s economic interest ‱ Acceptable to judges?
  • 33. Defensible? Measure TREC Roitblat, e Roitblat Predictiv 2008 t al. Team et al. e A Team B Coding* Precision 0.210 0.197 0.183 0.899 Recall 0.555 0.488 0.539 0.873 *OrcaTec internal Result
  • 34. Thank you! Herb Roitblat Sonya Sigler 770-650-7706x229 650-281-8325 herb@orcatec.com sonya@sigler.name 12/5/2011 34

Hinweis der Redaktion

  1. Keyword and Boolean selection / searching yielded only 20% of the responsive documents.
  2. OrcaTec’s performance compares very favorably to similar measures observed using teams of human reviewers and other predictive coding systems. In the TREC 2008 ad hoc task, the highest recall achieved by a system was 0.555 (i.e., 55.5% of the documents identified as relevant were retrieved; Run “wat7fuse”). The precision corresponding to that level of recall was 0.210, meaning that 21% of the retrieved documents were determined to be relevant.Roitblat, Kershaw, and Oot measured precision and recall for two human teams. Team A yielded precision of 0.197 and recall of 0.488. Team B yielded precision of 0.183 and recall of 0.539.