SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Mining Legal Text




     sil.fd SIL.fd .
                                                                                         .
          Information Mining and Visualization of a Large
                      Volume of Legal Texts
     .
     ..                                                                              .




                                                                                         .
           Flávio Codeço Coelho, Renato Rocha Souza and Pablo de
                              Camargo Cerdeira

                    Applied Mathematics School – Getulio Vargas Foundation


                                     August 22, 2011




                                                            .    .     .     .   .           .
Mining Legal Text




     Outline I
       .
    . . Introduction
      1

      ..
    . 2 Web-Scraping
           HTML Parsing
      .. Pattern Matching
    .3
           Regular expressions
      ..
    . 4 Database Interaction
           MySQLDb
           SQLAlchemy
           MongoDb
      .. Natural Language Processing
    .5
           NLTK
      .. Visualization
    .6
           Matplotlib
           Ubigraph
           Gource                      .   .   .   .   .   .
Mining Legal Text




     Outline II
            Visual Python




       .
    . . Results
      7




       .
    . . Future Directions
      8




                            .   .   .   .   .   .
Mining Legal Text
 Introduction



     Conquering text


           Scraping and indexing the world’s web pages has changed the
           world...
           Should pagerank be our main measure of information
           relevance?
           What is possible if we go a little further?




                                                    .    .   .   .   .   .
Mining Legal Text
 Introduction



     It’s documents all the way down...




     Luckily, we didn’t have to scan
     them...
     We have to conquer an
     information mountain...




                                       .   .   .   .   .   .
Mining Legal Text
 Introduction



     We had generous help...




                               .   .   .   .   .   .
Mining Legal Text
 Web-Scraping



     Obtaining the Data




     No API for access, a little
     heuristics was necessary
     Scraping took more than 3
     months.
       1.3 million cases




                                   .   .   .   .   .   .
Mining Legal Text
 Web-Scraping



     Example: Photos

     Navigating with Mechanize1
     br = mechanize . Browser ( )
     br . open ( ” h t t p : / /www. s t f . j u s . br / p o r t a l / m i n i s t r o / m i n i s t r o . asp ? p e r i o d o=s t
     i = 0
     l i n k = br . f i n d l i n k ( u r l r e g e x=r ’ v e r M i n i s t r o . asp ’ , nr=i )
     while 1:
            br . f o l l o w l i n k ( l i n k )
             i l = br . f i n d l i n k ( u r l r e g e x=’ imagem . asp ’ )
            u r l = ” h t t p : / /www. s t f . j u s . br / p o r t a l ”+ i l . u r l . s t r i p ( ’ . . ’ )
            nome = i l . t e x t
            download photo ( u r l , nome . decode ( ’ l a t i n 1 ’ ) . s p l i t ( ’ [ ’ ) [ 0 ] )
            br . back ( )
            try :
                    l i n k = br . f i n d l i n k ( u r l r e g e x=r ’ v e r M i n i s t r o . asp ’ , nr=i )
                    i += 1
            e x c e p t LinkNotFoundError :
                    break



         1
             http://wwwsearch.sourceforge.net/mechanize/
                                                     .                                   .       .        .       .       .
Mining Legal Text
 Web-Scraping
   HTML Parsing


     Parsing scraped HTML

              Beautiful Soup2 to the rescue!
              Firebug helped analyze page structure.
              Parsing was done during the scraping, to clean data for
              insertion into MySQL
              Some parts of the page were stored in HTML for later parsing
     sopa=B e a u t i f u l S o u p ( d [ ’ d e c i s a o ’ ] . s t r i p ( ’ [ ] ’ ) , fromEncoding=’ ISO8859−1 ’ )
     r s = sopa . f i n d A l l ( ’ s t r o n g ’ , t e x t=r e . c o m p i l e ( ’ ˆ L e g i s l a ’ ) )




         2
             http://www.crummy.com/software/BeautifulSoup/
                                                     .   .                             .       .      .       .
Mining Legal Text
 Pattern Matching



    Extracting Even more Information



             With Data on Local db, we started mining it:
             Tried to use the best SQL and Python had to offer
             Pattern matching, aggregation, string matching3 , etc...

                    Read from Db   →   Process   →       Write to Db
                             SQL   →   Python    →       SQL




        3
            difflib                                    .     .    .      .   .   .
Mining Legal Text
 Pattern Matching
   Regular expressions


       Regular Expressions
        re module, great, but tricky for
        different encodings.
        Kodosa : visual debugging
        indispensable!




   a
       http://kodos.sourceforge.net/
        r a w s t r = r ”””>∗s ∗ ( [ A−Z] { 2 , 3 }  s∗−s ∗ . [ A−Z0 − 9 ] ∗ ) | (CF ) | ( ”CAPUT”) s+”””
        c o m p i l e o b j = r e . c o m p i l e ( r a w s t r , r e . LOCALE)




                                                                      .      .      .      .      .     .
Mining Legal Text
 Database Interaction



     Structuring the Data




     .
     Goals                                                                  .
    ..
         Reflect the original structure of the data
           Store additional info coming from raw text
           Design data model with future analytical needs in mind
     .
     ..                                                                 .




                                                                            .
                                                 .      .   .   .   .           .
Mining Legal Text
 Database Interaction
   MySQLDb


     Databases and Drivers




              MySQL (MariaDb4 ) was relational Db of Choice
              MySQLDb’s    cursor.execute(’ select ∗ from ... ’)

              Server side cursors were essential.
              MongoDb + PyMongo




         4
             http://mariadb.org                               .    .   .   .   .   .
Mining Legal Text
 Database Interaction
   SQLAlchemy


     What about ORMs?




              Object-relational mappers are great but...
              SqlAlchemy5 used mostly in table creation and data insertion.
              For analytical purposes, server-side raw SQL, stored procs and
              views can’t be beaten.
              We mostly used Elixir to design the tables.




         5
             http://www.sqlalchemy.org                .     .   .   .   .      .
Mining Legal Text
 Database Interaction
   MongoDb


       Escaping from 2D data
Benefits:                               Tips:
    Exploring MongoDba as an
                                           db.cursor( cursorclass =SSDictCursor)
    alternative for Analytics
                                           Convert every string to UTF-8
        Auto-sharding + Map/reduce!
                                           Pymongo’s transparent
        Escape costly Joins in MySQL
                                           conversion of dictionaries to
   a
       www.mongodb.org                     BSON




                                                 .     .     .      .     .        .
Mining Legal Text
 Natural Language Processing



     Understanding Text




     Biggest challenge is extracting
     meaning from decisions
     Is a given decision pro- or
     against the defendant?
     What is the vote count on
     non-unanimous decisions?




                                       .   .   .   .   .   .
Mining Legal Text
 Natural Language Processing
   NLTK


     Natural Language Toolkit




           Lots of batteries
           included




                                .   .   .   .   .   .
Mining Legal Text
 Visualization



     Visualizing the Data




           You can’t ask questions about what you don’t know...
           Data driven research




                                                 .   .    .   .   .   .
Mining Legal Text
 Visualization
   Matplotlib


     Standard Charting and Plotting: Matplotlib




     Great for plotting summary
     statistics
     Together with NetworkX can
     help visualizing some small
     graphs




                                    .   .   .   .   .   .
Mining Legal Text
 Visualization
   Ubigraph


       Large Graph Visualization: Ubigraph




        Ubigraph Rocks!a
        Navigating Huge graphs gave
        powerful insights
        Takes advantage of multiple
        cores and GPU
   a
       http://ubietylab.net/ubigraph/




                                        .   .   .   .   .   .
Mining Legal Text
 Visualization
   Gource


     Untangling Temporal patterns:

              A bit of Python to create logs compatible with Gource6
     This:
     Q = dbdec . e x e c u t e ( ”SELECT r e l a t o r , p r o c e s s o , t i p o , p r o c c l a s s e , duracao , U
     decs = Q. f e t c h a l l ( )
     d u r a t i o n s = [ d [ 4 ] f o r d i n de cs ]
     cmap = cm . j e t
     norm = n o r m a l i z e ( min ( d u r a t i o n s ) , max( d u r a t i o n s ) ) #n o r m a l i z i n g d u r a t i o n
     with open ( ’ d e c i s o e s %s . l o g ’%ano , ’w ’ ) as f :
            f o r d i n decs :
                     c = rgb2hex (cmap( norm ( d [ 4 ] ) ) [ : 3 ] ) . s t r i p ( ’#’ )
                     path = ”/%s/%s/%s/%s ”%(d [ 5 ] , d [ 2 ] , d [ 3 ] , d [ 1 ] ) #/ S t a t e / t i p o / p r o c
                     l = ”%s |% s |% s |% s |% s n”%( i n t ( time . mktime ( d [ 6 ] . t i m e t u p l e ( ) ) ) , d [
                     f . write ( l )

     Generates this:
     885967200|MIN . SYDNEY SANCHES|A| /MG/ Monocrática /INQUÉRITO/1606809|0000
     885967200|MIN . SYDNEY SANCHES|A| /MG/ P r e s i d ê n c i a /INQUÉRITO/1606809|0000

         6
             http://code.google.com/p/gource/                                .       .       .      .       .       .
Mining Legal Text
 Visualization
   Gource


     A snapshot of the Supreme Court activities: 1998




                                     .   .   .   .   .   .
Mining Legal Text
 Visualization
   Gource


     The Dynamics




     Video



                    .   .   .   .   .   .
Mining Legal Text
 Visualization
   Visual Python


       It’s a Jungle Out There. . .



        Division of labor in the supreme
        court
        VPythona is great to quickly
        create complex animations.
        Here judges are trees, branches
        are subjects and leaves are legal
        decisions
   a
       vpython.org




                                            .   .   .   .   .   .
Mining Legal Text
 Results



     Results



     Detailed X-ray of the inner
     workings of the Supreme court
     92% of the cases are appeals of
     a non-constitutional nature
     These results led to the proposal
     of an amendment to the
     constitution!
     More questions than answers!
     Python for data mining rocks!



                                         .   .   .   .   .   .
Mining Legal Text
 Future Directions



     To be continued...


            Further automate and optimize
            More explorations
            Scale up the pipeline
            Model the life history of a legal process




                                                        .   .   .   .   .   .
Mining Legal Text
 Future Directions



     Acknowledgements




            FGV - Direito Rio
            FGV - EMAp
            Brazilian Supreme Court
            Asla Sá (for kindly lending us her server)




                                                    .    .   .   .   .   .

Weitere ähnliche Inhalte

Was ist angesagt?

Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptSubrata Kumer Paul
 
Elements of IoT connectivity technologies
Elements of IoT connectivity technologiesElements of IoT connectivity technologies
Elements of IoT connectivity technologiesusman sarwar
 
IoT - IT 423 ppt
IoT - IT 423 pptIoT - IT 423 ppt
IoT - IT 423 pptMhae Lyn
 
Blockchain overview, use cases, implementations and challenges
Blockchain overview, use cases, implementations and challengesBlockchain overview, use cases, implementations and challenges
Blockchain overview, use cases, implementations and challengesSébastien Tandel
 
Data processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsData processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsBob Marcus
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDatamining Tools
 
Symmetric ciphers questions and answers
Symmetric ciphers questions and answersSymmetric ciphers questions and answers
Symmetric ciphers questions and answersprdpgpt
 
Privacy and security in IoT
Privacy and security in IoTPrivacy and security in IoT
Privacy and security in IoTVasco Veloso
 
What is tokenization in blockchain?
What is tokenization in blockchain?What is tokenization in blockchain?
What is tokenization in blockchain?Ulf Mattsson
 
Role of Forensic Triage In Cyber Security Trends 2021
Role of Forensic Triage In Cyber Security Trends 2021Role of Forensic Triage In Cyber Security Trends 2021
Role of Forensic Triage In Cyber Security Trends 2021Amrit Chhetri
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detectionVikrant Arya
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based EncryptionPratik Poddar
 
What is zero trust model (ztm)
What is zero trust model (ztm)What is zero trust model (ztm)
What is zero trust model (ztm)Ahmed Banafa
 

Was ist angesagt? (20)

Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.ppt
 
Overview of IoT and Security issues
Overview of IoT and Security issuesOverview of IoT and Security issues
Overview of IoT and Security issues
 
Elements of IoT connectivity technologies
Elements of IoT connectivity technologiesElements of IoT connectivity technologies
Elements of IoT connectivity technologies
 
IoT - IT 423 ppt
IoT - IT 423 pptIoT - IT 423 ppt
IoT - IT 423 ppt
 
Blockchain overview, use cases, implementations and challenges
Blockchain overview, use cases, implementations and challengesBlockchain overview, use cases, implementations and challenges
Blockchain overview, use cases, implementations and challenges
 
Data processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsData processing in Cyber-Physical Systems
Data processing in Cyber-Physical Systems
 
IoT security (Internet of Things)
IoT security (Internet of Things)IoT security (Internet of Things)
IoT security (Internet of Things)
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Blockchain Security
Blockchain SecurityBlockchain Security
Blockchain Security
 
Symmetric ciphers questions and answers
Symmetric ciphers questions and answersSymmetric ciphers questions and answers
Symmetric ciphers questions and answers
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Privacy and security in IoT
Privacy and security in IoTPrivacy and security in IoT
Privacy and security in IoT
 
IOT Security
IOT SecurityIOT Security
IOT Security
 
Digital signature
Digital signatureDigital signature
Digital signature
 
What is tokenization in blockchain?
What is tokenization in blockchain?What is tokenization in blockchain?
What is tokenization in blockchain?
 
Role of Forensic Triage In Cyber Security Trends 2021
Role of Forensic Triage In Cyber Security Trends 2021Role of Forensic Triage In Cyber Security Trends 2021
Role of Forensic Triage In Cyber Security Trends 2021
 
Data leakage detection
Data leakage detectionData leakage detection
Data leakage detection
 
Ch14
Ch14Ch14
Ch14
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based Encryption
 
What is zero trust model (ztm)
What is zero trust model (ztm)What is zero trust model (ztm)
What is zero trust model (ztm)
 

Andere mochten auch

Big data at scrapinghub
Big data at scrapinghubBig data at scrapinghub
Big data at scrapinghubDana Brophy
 
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...Anton
 
Frontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling FrameworkFrontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling Frameworksixtyone
 
快快樂樂學 Scrapy
快快樂樂學 Scrapy快快樂樂學 Scrapy
快快樂樂學 Scrapyrecast203
 
Tips for Reading and Understanding Legal Documents
Tips for Reading and Understanding Legal DocumentsTips for Reading and Understanding Legal Documents
Tips for Reading and Understanding Legal Documentsoudesign
 
Processamento de Linguagem natural com PHP
Processamento de Linguagem natural com PHPProcessamento de Linguagem natural com PHP
Processamento de Linguagem natural com PHPIvo Nascimento
 
Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyErin Shellman
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoSammy Fung
 
Legal Translation Gone Wrong: 10 Mistakes to Avoid When Having Your Legal Con...
Legal Translation Gone Wrong: 10 Mistakes to Avoid When Having Your Legal Con...Legal Translation Gone Wrong: 10 Mistakes to Avoid When Having Your Legal Con...
Legal Translation Gone Wrong: 10 Mistakes to Avoid When Having Your Legal Con...VIA
 
Module 1- Legal Documents an Overview
Module 1- Legal Documents an OverviewModule 1- Legal Documents an Overview
Module 1- Legal Documents an Overviewrostrumlegal
 
Module 2 Important Elelments of Legal Drafting
Module 2 Important Elelments of Legal DraftingModule 2 Important Elelments of Legal Drafting
Module 2 Important Elelments of Legal Draftingrostrumlegal
 
Minicurso: O que o twitter está pensando? Extraindo informações do twitter ut...
Minicurso: O que o twitter está pensando? Extraindo informações do twitter ut...Minicurso: O que o twitter está pensando? Extraindo informações do twitter ut...
Minicurso: O que o twitter está pensando? Extraindo informações do twitter ut...Adolfo Guimaraes
 
Characteristics of Legal English
Characteristics of Legal EnglishCharacteristics of Legal English
Characteristics of Legal Englishegonzalezlara
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for PythonWes McKinney
 
Legal translations
Legal translationsLegal translations
Legal translationsdeep0000
 
Don't be rich, Live rich - One year on the road - The good and the bad
Don't be rich, Live rich - One year on the road - The good and the badDon't be rich, Live rich - One year on the road - The good and the bad
Don't be rich, Live rich - One year on the road - The good and the badnomadznu
 
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)Amazon Web Services
 
Puppet Fundamentals Talk at DevOps Dubai by Hameedullah Khan
Puppet Fundamentals Talk at DevOps Dubai by Hameedullah KhanPuppet Fundamentals Talk at DevOps Dubai by Hameedullah Khan
Puppet Fundamentals Talk at DevOps Dubai by Hameedullah KhanHameedullah Khan
 

Andere mochten auch (20)

摘星
摘星摘星
摘星
 
Big data at scrapinghub
Big data at scrapinghubBig data at scrapinghub
Big data at scrapinghub
 
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
 
Frontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling FrameworkFrontera-Open Source Large Scale Web Crawling Framework
Frontera-Open Source Large Scale Web Crawling Framework
 
快快樂樂學 Scrapy
快快樂樂學 Scrapy快快樂樂學 Scrapy
快快樂樂學 Scrapy
 
Collocations
CollocationsCollocations
Collocations
 
Tips for Reading and Understanding Legal Documents
Tips for Reading and Understanding Legal DocumentsTips for Reading and Understanding Legal Documents
Tips for Reading and Understanding Legal Documents
 
Processamento de Linguagem natural com PHP
Processamento de Linguagem natural com PHPProcessamento de Linguagem natural com PHP
Processamento de Linguagem natural com PHP
 
Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + Scrapy
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and Django
 
Legal Translation Gone Wrong: 10 Mistakes to Avoid When Having Your Legal Con...
Legal Translation Gone Wrong: 10 Mistakes to Avoid When Having Your Legal Con...Legal Translation Gone Wrong: 10 Mistakes to Avoid When Having Your Legal Con...
Legal Translation Gone Wrong: 10 Mistakes to Avoid When Having Your Legal Con...
 
Module 1- Legal Documents an Overview
Module 1- Legal Documents an OverviewModule 1- Legal Documents an Overview
Module 1- Legal Documents an Overview
 
Module 2 Important Elelments of Legal Drafting
Module 2 Important Elelments of Legal DraftingModule 2 Important Elelments of Legal Drafting
Module 2 Important Elelments of Legal Drafting
 
Minicurso: O que o twitter está pensando? Extraindo informações do twitter ut...
Minicurso: O que o twitter está pensando? Extraindo informações do twitter ut...Minicurso: O que o twitter está pensando? Extraindo informações do twitter ut...
Minicurso: O que o twitter está pensando? Extraindo informações do twitter ut...
 
Characteristics of Legal English
Characteristics of Legal EnglishCharacteristics of Legal English
Characteristics of Legal English
 
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Pythonpandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
 
Legal translations
Legal translationsLegal translations
Legal translations
 
Don't be rich, Live rich - One year on the road - The good and the bad
Don't be rich, Live rich - One year on the road - The good and the badDon't be rich, Live rich - One year on the road - The good and the bad
Don't be rich, Live rich - One year on the road - The good and the bad
 
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
AWS re:Invent 2016: Blockchain on AWS: Disrupting the Norm (GPST301)
 
Puppet Fundamentals Talk at DevOps Dubai by Hameedullah Khan
Puppet Fundamentals Talk at DevOps Dubai by Hameedullah KhanPuppet Fundamentals Talk at DevOps Dubai by Hameedullah Khan
Puppet Fundamentals Talk at DevOps Dubai by Hameedullah Khan
 

Ähnlich wie Mining legal texts with Python

Semnews - Euroscipy 2011
Semnews - Euroscipy 2011Semnews - Euroscipy 2011
Semnews - Euroscipy 2011Vincent Michel
 
Sharded By Business Line: Migrating to a Core Database using MongoDB and Solr
Sharded By Business Line: Migrating to a Core Database using MongoDB and SolrSharded By Business Line: Migrating to a Core Database using MongoDB and Solr
Sharded By Business Line: Migrating to a Core Database using MongoDB and SolrMongoDB
 
Mongo la search platform - january 2013
Mongo la   search platform - january 2013Mongo la   search platform - january 2013
Mongo la search platform - january 2013MongoDB
 
Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDBPatrick Stokes
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasMapR Technologies
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
Confessions of a relational addict
Confessions of a relational addictConfessions of a relational addict
Confessions of a relational addictChandra Patni
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]Trent McConaghy
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
 
Key-Value Stores: a practical overview
Key-Value Stores: a practical overviewKey-Value Stores: a practical overview
Key-Value Stores: a practical overviewMarc Seeger
 
SSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesSSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesRiccardo Albertoni
 
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB Rakuten Group, Inc.
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modelingRomain Hardouin
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking systemJesse Vincent
 

Ähnlich wie Mining legal texts with Python (20)

Semnews - Euroscipy 2011
Semnews - Euroscipy 2011Semnews - Euroscipy 2011
Semnews - Euroscipy 2011
 
Sharded By Business Line: Migrating to a Core Database using MongoDB and Solr
Sharded By Business Line: Migrating to a Core Database using MongoDB and SolrSharded By Business Line: Migrating to a Core Database using MongoDB and Solr
Sharded By Business Line: Migrating to a Core Database using MongoDB and Solr
 
Mongo la search platform - january 2013
Mongo la   search platform - january 2013Mongo la   search platform - january 2013
Mongo la search platform - january 2013
 
Social Analytics with MongoDB
Social Analytics with MongoDBSocial Analytics with MongoDB
Social Analytics with MongoDB
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
MongoDB Basics Unileon
MongoDB Basics UnileonMongoDB Basics Unileon
MongoDB Basics Unileon
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
Confessions of a relational addict
Confessions of a relational addictConfessions of a relational addict
Confessions of a relational addict
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]Blockchains for AI [With New Applications]
Blockchains for AI [With New Applications]
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
Key-Value Stores: a practical overview
Key-Value Stores: a practical overviewKey-Value Stores: a practical overview
Key-Value Stores: a practical overview
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
Spark Meetup
Spark MeetupSpark Meetup
Spark Meetup
 
SSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesSSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data Entities
 
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB
 
Cassandra advanced data modeling
Cassandra advanced data modelingCassandra advanced data modeling
Cassandra advanced data modeling
 
SD, a P2P bug tracking system
SD, a P2P bug tracking systemSD, a P2P bug tracking system
SD, a P2P bug tracking system
 

Mehr von Flávio Codeço Coelho

Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...Flávio Codeço Coelho
 
Alerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridosAlerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridosFlávio Codeço Coelho
 
Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...Flávio Codeço Coelho
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFlávio Codeço Coelho
 
Gabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data NeedsGabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data NeedsFlávio Codeço Coelho
 
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Flávio Codeço Coelho
 
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...Flávio Codeço Coelho
 
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...Flávio Codeço Coelho
 
Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.Flávio Codeço Coelho
 
Mark smolinski big data and public health
Mark smolinski   big data and public healthMark smolinski   big data and public health
Mark smolinski big data and public healthFlávio Codeço Coelho
 
Haroldo lopes datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes   datasus - Informações em Saúde: história, uso e desafiosHaroldo lopes   datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes datasus - Informações em Saúde: história, uso e desafiosFlávio Codeço Coelho
 
Wim de Grave: Big Data in life sciences
Wim de Grave:  Big Data in life sciencesWim de Grave:  Big Data in life sciences
Wim de Grave: Big Data in life sciencesFlávio Codeço Coelho
 
Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.Flávio Codeço Coelho
 
Access to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in BrazilAccess to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in BrazilFlávio Codeço Coelho
 

Mehr von Flávio Codeço Coelho (20)

Big dengue
Big dengueBig dengue
Big dengue
 
Alerta_Dengue simplified english
Alerta_Dengue simplified englishAlerta_Dengue simplified english
Alerta_Dengue simplified english
 
dengueARS0
dengueARS0dengueARS0
dengueARS0
 
Alerta dengue expo epi out2014
Alerta dengue expo epi out2014Alerta dengue expo epi out2014
Alerta dengue expo epi out2014
 
Alerta dengue abrasco 2014
Alerta dengue   abrasco 2014Alerta dengue   abrasco 2014
Alerta dengue abrasco 2014
 
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
Sistema de Alerta de Dengue Utilizando Dados Hbridos de Redes Sociais, Moni...
 
Alerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridosAlerta dengue: Sistema de alertas de surtos usando dados híbridos
Alerta dengue: Sistema de alertas de surtos usando dados híbridos
 
Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...Mauricio barreto:Big data: how can it help to expand epidemiological investig...
Mauricio barreto:Big data: how can it help to expand epidemiological investig...
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
 
Gabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data NeedsGabriela gomes: Mathematical Modeling and Data Needs
Gabriela gomes: Mathematical Modeling and Data Needs
 
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
Carl koppeschaar: Disease Radar: Measuring and Forecasting the Spread of Infe...
 
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
Gabriel laporta: Biodiversity can help prevent malaria outbreaks in tropical ...
 
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
Sander van noort: Influenzanet: self-reporting of influenza-like illness in c...
 
Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.
 
Mark smolinski big data and public health
Mark smolinski   big data and public healthMark smolinski   big data and public health
Mark smolinski big data and public health
 
Haroldo lopes datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes   datasus - Informações em Saúde: história, uso e desafiosHaroldo lopes   datasus - Informações em Saúde: história, uso e desafios
Haroldo lopes datasus - Informações em Saúde: história, uso e desafios
 
Wim de Grave: Big Data in life sciences
Wim de Grave:  Big Data in life sciencesWim de Grave:  Big Data in life sciences
Wim de Grave: Big Data in life sciences
 
Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.Marco Andreazzi: IBGE research and data collection on health related issues.
Marco Andreazzi: IBGE research and data collection on health related issues.
 
Access to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in BrazilAccess to Information, privacy, and health research in Brazil
Access to Information, privacy, and health research in Brazil
 
Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
 

Kürzlich hochgeladen

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 

Kürzlich hochgeladen (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 

Mining legal texts with Python

  • 1. Mining Legal Text sil.fd SIL.fd . . Information Mining and Visualization of a Large Volume of Legal Texts . .. . . Flávio Codeço Coelho, Renato Rocha Souza and Pablo de Camargo Cerdeira Applied Mathematics School – Getulio Vargas Foundation August 22, 2011 . . . . . .
  • 2. Mining Legal Text Outline I . . . Introduction 1 .. . 2 Web-Scraping HTML Parsing .. Pattern Matching .3 Regular expressions .. . 4 Database Interaction MySQLDb SQLAlchemy MongoDb .. Natural Language Processing .5 NLTK .. Visualization .6 Matplotlib Ubigraph Gource . . . . . .
  • 3. Mining Legal Text Outline II Visual Python . . . Results 7 . . . Future Directions 8 . . . . . .
  • 4. Mining Legal Text Introduction Conquering text Scraping and indexing the world’s web pages has changed the world... Should pagerank be our main measure of information relevance? What is possible if we go a little further? . . . . . .
  • 5. Mining Legal Text Introduction It’s documents all the way down... Luckily, we didn’t have to scan them... We have to conquer an information mountain... . . . . . .
  • 6. Mining Legal Text Introduction We had generous help... . . . . . .
  • 7. Mining Legal Text Web-Scraping Obtaining the Data No API for access, a little heuristics was necessary Scraping took more than 3 months. 1.3 million cases . . . . . .
  • 8. Mining Legal Text Web-Scraping Example: Photos Navigating with Mechanize1 br = mechanize . Browser ( ) br . open ( ” h t t p : / /www. s t f . j u s . br / p o r t a l / m i n i s t r o / m i n i s t r o . asp ? p e r i o d o=s t i = 0 l i n k = br . f i n d l i n k ( u r l r e g e x=r ’ v e r M i n i s t r o . asp ’ , nr=i ) while 1: br . f o l l o w l i n k ( l i n k ) i l = br . f i n d l i n k ( u r l r e g e x=’ imagem . asp ’ ) u r l = ” h t t p : / /www. s t f . j u s . br / p o r t a l ”+ i l . u r l . s t r i p ( ’ . . ’ ) nome = i l . t e x t download photo ( u r l , nome . decode ( ’ l a t i n 1 ’ ) . s p l i t ( ’ [ ’ ) [ 0 ] ) br . back ( ) try : l i n k = br . f i n d l i n k ( u r l r e g e x=r ’ v e r M i n i s t r o . asp ’ , nr=i ) i += 1 e x c e p t LinkNotFoundError : break 1 http://wwwsearch.sourceforge.net/mechanize/ . . . . . .
  • 9. Mining Legal Text Web-Scraping HTML Parsing Parsing scraped HTML Beautiful Soup2 to the rescue! Firebug helped analyze page structure. Parsing was done during the scraping, to clean data for insertion into MySQL Some parts of the page were stored in HTML for later parsing sopa=B e a u t i f u l S o u p ( d [ ’ d e c i s a o ’ ] . s t r i p ( ’ [ ] ’ ) , fromEncoding=’ ISO8859−1 ’ ) r s = sopa . f i n d A l l ( ’ s t r o n g ’ , t e x t=r e . c o m p i l e ( ’ ˆ L e g i s l a ’ ) ) 2 http://www.crummy.com/software/BeautifulSoup/ . . . . . .
  • 10. Mining Legal Text Pattern Matching Extracting Even more Information With Data on Local db, we started mining it: Tried to use the best SQL and Python had to offer Pattern matching, aggregation, string matching3 , etc... Read from Db → Process → Write to Db SQL → Python → SQL 3 difflib . . . . . .
  • 11. Mining Legal Text Pattern Matching Regular expressions Regular Expressions re module, great, but tricky for different encodings. Kodosa : visual debugging indispensable! a http://kodos.sourceforge.net/ r a w s t r = r ”””>∗s ∗ ( [ A−Z] { 2 , 3 } s∗−s ∗ . [ A−Z0 − 9 ] ∗ ) | (CF ) | ( ”CAPUT”) s+””” c o m p i l e o b j = r e . c o m p i l e ( r a w s t r , r e . LOCALE) . . . . . .
  • 12. Mining Legal Text Database Interaction Structuring the Data . Goals . .. Reflect the original structure of the data Store additional info coming from raw text Design data model with future analytical needs in mind . .. . . . . . . . .
  • 13. Mining Legal Text Database Interaction MySQLDb Databases and Drivers MySQL (MariaDb4 ) was relational Db of Choice MySQLDb’s cursor.execute(’ select ∗ from ... ’) Server side cursors were essential. MongoDb + PyMongo 4 http://mariadb.org . . . . . .
  • 14. Mining Legal Text Database Interaction SQLAlchemy What about ORMs? Object-relational mappers are great but... SqlAlchemy5 used mostly in table creation and data insertion. For analytical purposes, server-side raw SQL, stored procs and views can’t be beaten. We mostly used Elixir to design the tables. 5 http://www.sqlalchemy.org . . . . . .
  • 15. Mining Legal Text Database Interaction MongoDb Escaping from 2D data Benefits: Tips: Exploring MongoDba as an db.cursor( cursorclass =SSDictCursor) alternative for Analytics Convert every string to UTF-8 Auto-sharding + Map/reduce! Pymongo’s transparent Escape costly Joins in MySQL conversion of dictionaries to a www.mongodb.org BSON . . . . . .
  • 16. Mining Legal Text Natural Language Processing Understanding Text Biggest challenge is extracting meaning from decisions Is a given decision pro- or against the defendant? What is the vote count on non-unanimous decisions? . . . . . .
  • 17. Mining Legal Text Natural Language Processing NLTK Natural Language Toolkit Lots of batteries included . . . . . .
  • 18. Mining Legal Text Visualization Visualizing the Data You can’t ask questions about what you don’t know... Data driven research . . . . . .
  • 19. Mining Legal Text Visualization Matplotlib Standard Charting and Plotting: Matplotlib Great for plotting summary statistics Together with NetworkX can help visualizing some small graphs . . . . . .
  • 20. Mining Legal Text Visualization Ubigraph Large Graph Visualization: Ubigraph Ubigraph Rocks!a Navigating Huge graphs gave powerful insights Takes advantage of multiple cores and GPU a http://ubietylab.net/ubigraph/ . . . . . .
  • 21. Mining Legal Text Visualization Gource Untangling Temporal patterns: A bit of Python to create logs compatible with Gource6 This: Q = dbdec . e x e c u t e ( ”SELECT r e l a t o r , p r o c e s s o , t i p o , p r o c c l a s s e , duracao , U decs = Q. f e t c h a l l ( ) d u r a t i o n s = [ d [ 4 ] f o r d i n de cs ] cmap = cm . j e t norm = n o r m a l i z e ( min ( d u r a t i o n s ) , max( d u r a t i o n s ) ) #n o r m a l i z i n g d u r a t i o n with open ( ’ d e c i s o e s %s . l o g ’%ano , ’w ’ ) as f : f o r d i n decs : c = rgb2hex (cmap( norm ( d [ 4 ] ) ) [ : 3 ] ) . s t r i p ( ’#’ ) path = ”/%s/%s/%s/%s ”%(d [ 5 ] , d [ 2 ] , d [ 3 ] , d [ 1 ] ) #/ S t a t e / t i p o / p r o c l = ”%s |% s |% s |% s |% s n”%( i n t ( time . mktime ( d [ 6 ] . t i m e t u p l e ( ) ) ) , d [ f . write ( l ) Generates this: 885967200|MIN . SYDNEY SANCHES|A| /MG/ Monocrática /INQUÉRITO/1606809|0000 885967200|MIN . SYDNEY SANCHES|A| /MG/ P r e s i d ê n c i a /INQUÉRITO/1606809|0000 6 http://code.google.com/p/gource/ . . . . . .
  • 22. Mining Legal Text Visualization Gource A snapshot of the Supreme Court activities: 1998 . . . . . .
  • 23. Mining Legal Text Visualization Gource The Dynamics Video . . . . . .
  • 24. Mining Legal Text Visualization Visual Python It’s a Jungle Out There. . . Division of labor in the supreme court VPythona is great to quickly create complex animations. Here judges are trees, branches are subjects and leaves are legal decisions a vpython.org . . . . . .
  • 25. Mining Legal Text Results Results Detailed X-ray of the inner workings of the Supreme court 92% of the cases are appeals of a non-constitutional nature These results led to the proposal of an amendment to the constitution! More questions than answers! Python for data mining rocks! . . . . . .
  • 26. Mining Legal Text Future Directions To be continued... Further automate and optimize More explorations Scale up the pipeline Model the life history of a legal process . . . . . .
  • 27. Mining Legal Text Future Directions Acknowledgements FGV - Direito Rio FGV - EMAp Brazilian Supreme Court Asla Sá (for kindly lending us her server) . . . . . .